Tuesday, November 02, 2010

Always On - Aster Data example

on June 29th, 2010, Google's Adwords stopped serving Ads sometime around 1:40pm PST and lasted for about 3 hours. The estimated cost is about $7.8 million. For Amazon or ebay, even some shoppers may come back later, they still lose impulse buyers, which counts for about millions per hour.

Currently zero downtime practices have been widely deployed for data migration. But for database/data warehouse, it is still a challenging problem. In general the system downtime can be classified as planned and unplanned. As 24x7 availability is becoming more and more critical for Data warehouse systems, it is expected that system is always on during the planned or unplanned downtime.

As claimed by Aster Data, they built solutions upon the Recovery-Oriented Computing to achieve this goal. The basic functionalities include:

- In-cluster replication and transparent fail-over
Data replicas are placed across the cluster, and server failure are transparently transferred to replicas within the cluster.

- Self diagnostics
If permanent failure, creating new replicas on existing or new servers without downtime. If transient failure, resync after the server recovers.

- Network aggregation
Multiple network hardware to provide parallelism and redundancy.

- Separation of duty
Dedicated servers for loading/exporting data, and backup/restore.

- Workload prediction
Policy-driven tools to manage priority of workloads and dynamically assign resources.

Labels: ,


Post a Comment

<< Home