Column-oriented Database System, and query execution
Column database is evolving very fast these days, accompanied with the requirement in storing and querying "big data". I've made some notes here based on Daniel Abadi's doctor dissertation, which wins the Sigmod Jim Gray Doctoral Dissertation Award. It also contains materials from the VLDB 09' tutorial on columnar dbms.
In a column store db, queries only read in relevant attributes, but tuple writes need multiple seeks. So generally it's good for read-intensive data warehouses, which is characterized by lots of column scans and batch writes.
Simplest store solution is to use tuple ID: each column is a separate table, with table header and tuple id for each value. Optimization is very expensive representing in this way. We can remove the tuple id and store data in its original order, and table header is also stored in separate column. For each query, we first join all accessed attributes into traditional row tuples, then apply row optimization.
- first compute the predicates on the selected columns,
- then join the result bit array or bloom filter,
- then do projection on the selection attribute,
- finally do aggregation on the projection.
0 Comments:
Post a Comment
<< Home