[e]
[e]
[e]

Data Warehousing with MySql

Data Warehousing is the idea of setting up a separate database structure for purely reporting purposes, so you can stop compromising between reporting and transactional demands.
MySql is a database that is fast, easy to use, and (generally) free.

> Aggregation
> Benchmarks
> ClickstreamData
> DatesAndTimes
> Home
> Indexing
> KPIs
> Links
> MathML
> Money
> MysqlCompared
> PhpPlugins
> Recipe
> Regression
> Scripting
> Sudoku
> SummaryData
> TableTypes
> TuningMysql
New Page:

search:

>
edit css
show debugging
contact: tom (.) cunningham (at) gmail (dot) com
[e]
[e]

Indexing

MySql Indexes. The biggest negative for MySql as a data-warehousing DB is its lack of support for aggregating indexes: especially bitmap indexes.

  • MySql supports: B-Tree, Hash, R-Tree
  • Note: MySql5 supports index merging, which is halfway there.

    Unique vs. Non-Unique Indexes.

    • For single-column indexes, unique and non-unique indexes seem to take about the same time.
    • For multiple-column indexes (composite indexes), a unique index can be much slower (see test script). Both types of indexes seem to use B-trees, so I'm not sure why this is so.
    • Note that primary keys are unique indexes (and must be non-nullable); this means it may be useful to have an index which covers the primary-key columns.

    Composite Indexes. (manual)

    • An index can be on a set of columns, instead of a single column.
    • The speeds of getting results from a single-column key, and the first part of a composite key, seem about the same.
    • Note: a composite index cannot support range lookups on subsets on the index. I.e., for queries of this type: "SELECT ... FROM t WHERE key_part_1=X and key_part_2 between Y and Z", the composite index will only be used for the first criteria, and not the second one. There is a useful workaround for some cases: make a lookup table for all values of key_part_2, and join to that table, then use the inequality on the lookup table. This way both parts of the composite key can be used.

    Bitmap Indexes.

    Partial Indexes.

    Packed Keys.

    B-Tree, R-Tree, Hashes.

    Queries on Periodic-Snapshot Fact Tables

    • A periodic-snapshot fact table. This kind of table has a separate row for each combination of date and item: so it has at least two keys pointing to dimension tables, but probably more. You can have a uniqueness constraint on the date-item combination.
    • Order of joins. When you query this table -- joining it to the dimension tables -- the order in which the tables are joined can be important. The general issue is this: either you can generate every possible combination of dimensions which meet the criteria, and then look up each combination in the fact table; or you can find every entry in the fact table, then look up the dimension tables for each entry to check that it meets all the criteria. Generally you want this order: (unique dimensions), (fact table), (other dimensions); eg., if your fact table has a unique index on date and customer_id, then the join order should be: date_dimension, customer_dimension, facts, other_dimensions. Unfortunately Mysql often cannot tell that this is the best join order, so needs to be given hints.
    • Always join to the date_dimension. Most queries will only want to look at a certain specific date-range. If you are storing dates with the mysql "DATE" type you can generally write the query without joining to the date dimension. However, you will have this problem: no matter what indexes you have defined, mysql will not be able to use an index on any other data apart from the date. This is because when the dates are defined as a range (using "BETWEEN" or inequalities) mysql can use a "range" lookup on only a single part of an index for any table. By joining to the date_dimension, and making the range apply to that table, the lookups on the fact table can use all the parts of a composite index: eg., both item_key and date_key. (Note: Mysql can use a "range" with a "const" lookup, but it can't use a "range" with a "ref" lookup. All this is true of v4.1, I think it's also true of v5, manual).


  • Total Time: 0.06s
    REQUEST:Array
    (
        [debug] => 1
        [PHPSESSID] => b9a289e2959005ab5c102380940272d1
    )
    SESSION:Array
    (
        [body/body] => {incl:Indexing}
    )
    MESSAGES:PLUGINS: