[e]
[e]
[e]

Data Warehousing with MySql

Data Warehousing is the idea of setting up a separate database structure for purely reporting purposes, so you can stop compromising between reporting and transactional demands.
MySql is a database that is fast, easy to use, and (generally) free.

> Aggregation
> Benchmarks
> ClickstreamData
> DatesAndTimes
> Home
> Indexing
> KPIs
> Links
> MathML
> Money
> MysqlCompared
> PhpPlugins
> Recipe
> Regression
> Scripting
> Sudoku
> SummaryData
> TableTypes
> TuningMysql
New Page:

search:

>
edit css
show debugging
contact: tom (.) cunningham (at) gmail (dot) com
[e]
[e]

Dates & Times




Dilemma: date as start-point or end-point


When building a by-date aggregate fact table, should the data indexed under a given date be (a) all the events that happened up until the start of that day; or (b) all the events that happened in the 24 hours of that day.

 Reason for date as end-point: Some fact columns will be instantaneous snapshots of data -- eg., account-balance -- which have to be interpreted as being true at some instant, not over a period of time. Then it's natural that for the row with date "2005-01-05", if the balance=£50, then the balance was £50 at "2005-01-05 00:00:00", not at "2005-01-06 00:00:00". And if this is true, then it makes sense to treat the other columns -- which sum activity over the day -- to sum up until the same instant at which the balance is taken.

 Reason for date as start-point: For the fact columns that are aggregations of activity over the day, as transactions=50 for "2005-01-05", the natural interpretation is that the activity took place in the 24 hours subsequent to that point in day.

Personally, I use the data as the start-point, and haven't had too many problems with it so far.

Date Dimension Table


This SQL will create a dimension table with 3000 dates in it, & columns for restricting dates.




###### small-numbers table
 DROP TABLE IF EXISTS numbers_small;
CREATE TABLE numbers_small (number INT);
INSERT INTO numbers_small VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

###### main numbers table
 DROP TABLE IF EXISTS numbers;
CREATE TABLE numbers (number BIGINT);
INSERT INTO numbers
SELECT thousands.number * 1000 + hundreds.number * 100 + tens.number * 10 + ones.number
 FROM numbers_small thousands, numbers_small hundreds, numbers_small tens, numbers_small ones
LIMIT 1000000;

###### date table
 DROP TABLE IF EXISTS dates;
CREATE TABLE dates (
      date_id          BIGINT PRIMARY KEY,
      date             DATE NOT NULL,
      timestamp        BIGINT NOT NULL,
      uk_holiday       CHAR(11) NOT NULL DEFAULT "Not Holiday",
      next_business_day date NOT NULL default '0000-00-00',
      weekend          CHAR(10) NOT NULL DEFAULT "Weekday",
      day_of_week      CHAR(10) NOT NULL,
      month            CHAR(10) NOT NULL,
      month_day        INT NOT NULL,
      year             INT NOT NULL,
      week_starting_monday CHAR(2) NOT NULL,
     UNIQUE KEY `date` (`date`),
     KEY `year_week` (`year`,`week_starting_monday`)
);

###### populate it with days
INSERT INTO dates (date_id, date)
SELECT number, DATE_ADD( '2000-01-01', INTERVAL number DAY )
 FROM numbers
WHERE DATE_ADD( '2000-01-01', INTERVAL number DAY ) BETWEEN '2000-01-01' AND '2010-01-01'
ORDER BY number;

###### fill in other rows
UPDATE dates SET
      timestamp=   UNIX_TIMESTAMP(date),
      day_of_week= DATE_FORMAT( date, "%W" ),
      weekend=     IF( DATE_FORMAT( date, "%W" ) IN ('Saturday','Sunday'), 'Weekend', 'Weekday'),
      month=       DATE_FORMAT( date, "%M"),
      year =       DATE_FORMAT( date, "%Y" ),
      month_day =  DATE_FORMAT( date, "%d" );

UPDATE dates SET week_starting_monday=DATE_FORMAT(date,'%v');


UK Holidays 2005-2008 & Next-Business-Day



UPDATE dates
  SET holiday='Holiday'
WHERE date IN
('2005-01-03', '2006-01-02', '2007-01-01', '2008-01-01', # New year's day
'2005-03-25', '2006-04-14', '2007-04-06', '2008-03-21', # Good Friday
'2005-03-28', '2006-04-17', '2007-04-09', '2008-03-24', # Easter Monday
'2005-05-02', '2006-05-01', '2007-05-07', '2008-05-05', # Early May Bank Holiday
'2005-05-30', '2006-05-29', '2007-05-28', '2008-05-26', # Spring Bank Holiday
'2005-08-29', '2006-08-28', '2007-08-27', '2008-08-25', # Summer Bank Holiday
'2005-12-27', '2006-12-25', '2007-12-25', '2008-12-25', # Christmas Day
'2005-12-26', '2006-12-26', '2007-12-26', '2008-12-26'  # Boxing Day
);


# calculate the next_business_day:

CREATE TEMPORARY TABLE dates_copy LIKE dates;
INSERT INTO dates_copy SELECT * FROM dates;

UPDATE dates
  SET next_business_day= (SELECT MIN(date)
                            FROM dates_copy d2
                           WHERE d2.date>dates.date
                             AND weekend='Weekday'
                             AND uk_holiday='Not Holiday');


Links


  • Every page and book seems to say the best site is "holidayfestival.com", but that site is now defunct.
  • Algorithms for generating US holidays [smart.net]
  • Calendar FAQ [tondering.dk]
  • UK Bank Holidays [dti.gov.uk]


  • Total Time: 0.06s
    REQUEST:Array
    (
        [link_body/body] => {incl:DatesAndTimes}
        [PHPSESSID] => b9a289e2959005ab5c102380940272d1
    )
    SESSION:Array
    (
        [body/body] => {incl:DatesAndTimes}
    )
    MESSAGES:PLUGINS: