Qing (Matt) Zhang's technical blog: Failure Protection in Teradata

let's first look at Data Allocation in Teradata: Bacisally, OS recognize logical units(LUN), which is composed of slices(UNIX) or partitions(Windows/Linux) from each of the disk drives of a disk rank. Then the PDE translates the LUN into one or more pdisks. psdisks are then assigned to AMPs. All the logical disk spaces an AMP manages is called a vdisk. In general all pdisks from a rank will be assigned to the same AMP.

Failure protection in Teradata falls in the several different levels:

Disk drive level: RAID
RAID: Redundant Array of Independent(or Inexpensive) Disks.
The various designs of RAID systems involve two key goals: increase data reliability and increase input/output performance. There are six different designs RAID 1 to RAID 6 that provides fault tolerance (There's also so called RAID 0 which has no fault tolerance, and RAID 10 TBD.)

Teradata supports RAID 1 and RAID 5.
- RAID 1(mirroring without parity): Data is fully replicated in mirror disk(s). Read blocks from the 1st available disk. Besides failure protection it also provides great performance benefit.

- RAID 5(block-level striping with distributed parity): Data is striped across a rank of disks one segment at a time. Parity is also striped all disk drives, interleaved with data. When a disk fails, data is reconstructed on the fly using existing data and parity.

RAID 1 is faster than RAID 5, as the two(or more) disks are read parallelly, and no parity computation.

AMP level: Fallback tables
Storing a 2nd copy of each row of a table on a different AMP in the same cluster. Specified during table creation. Fallback will cause twict I/O on data modifications.

Obviously the highest level of protection is RAID 1 with Fallback protection.

Componenet/Process Level: Journal
Journals are used for specific types of data or process recovery.
Recovery Journals: maintained by system automatically. Two different types:
- transient journal: keeps "before image" of changed rows so data can be restored to previous state in case of an interrupted transaction. Happens in each AMP.
- down AMP recovery journal: log write changes to data on the failed AMP by other AMPs in the cluster. Then applying changes to the recovered AMP.

Permanet Journals: optional, user specifies at table level, and can store before images or after images to provide full-table recovery to a specific point in time.

Database Object Level: Locks
Applied at 3 different levels: Database/Table/Row Hash
4 types:
- Exclusive: at db/table level, used for DDL, blocks all other locks
- Write: ensures data consistency while writing, only allow access locks
- Read: ensures data consistency while reading, allows read/access locks
- Access: allows table update only for small single-row changes, blocks exclusive locks.
Local deadlocks are checked at AMP level, and global deadlocks are coordinated by PE on a timed basis.

Labels: database, security

2 Comments:

Anonymous said...: logically; 7/17/2011 2:07 AM
Anonymous said...: Hey There. I found your blog using msn. This is a very well written article. I will be sure to bookmark it and come back to read more of your useful information. Thanks for the post. I'll definitely return.; 8/15/2011 11:14 AM

<< Home

Qing (Matt) Zhang's technical blog

Monday, March 14, 2011

Failure Protection in Teradata

2 Comments:

Previous Posts

About Me