[e][e]Dates & Times
Dilemma: date as start-point or end-point
When building a by-date aggregate fact table, should the data indexed under a given date be (a) all the events that happened up until the start of that day; or (b) all the events that happened in the 24 hours of that day.
Reason for date as end-point: Some fact columns will be instantaneous snapshots of data -- eg., account-balance -- which have to be interpreted as being true at some instant, not over a period of time. Then it's natural that for the row with date "2005-01-05", if the balance=£50, then the balance was £50 at "2005-01-05 00:00:00", not at "2005-01-06 00:00:00". And if
this is true, then it makes sense to treat the other columns -- which sum activity over the day -- to sum up until the same instant at which the balance is taken.
Reason for date as start-point: For the fact columns that are aggregations of activity over the day, as transactions=50 for "2005-01-05", the natural interpretation is that the activity took place in the 24 hours subsequent to that point in day.
Personally, I use the data as the start-point, and haven't had too many problems with it so far.
Date Dimension Table
This SQL will create a dimension table with 3000 dates in it, & columns for restricting dates.
###### small-numbers table
DROP TABLE IF EXISTS numbers_small;
CREATE TABLE numbers_small (number INT);
INSERT INTO numbers_small VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
###### main numbers table
DROP TABLE IF EXISTS numbers;
CREATE TABLE numbers (number BIGINT);
INSERT INTO numbers
SELECT thousands.number * 1000 + hundreds.number * 100 + tens.number * 10 + ones.number
FROM numbers_small thousands, numbers_small hundreds, numbers_small tens, numbers_small ones
LIMIT 1000000;
###### date table
DROP TABLE IF EXISTS dates;
CREATE TABLE dates (
date_id BIGINT PRIMARY KEY,
date DATE NOT NULL,
timestamp BIGINT NOT NULL,
uk_holiday CHAR(11) NOT NULL DEFAULT "Not Holiday",
next_business_day date NOT NULL default '0000-00-00',
weekend CHAR(10) NOT NULL DEFAULT "Weekday",
day_of_week CHAR(10) NOT NULL,
month CHAR(10) NOT NULL,
month_day INT NOT NULL,
year INT NOT NULL,
week_starting_monday CHAR(2) NOT NULL,
UNIQUE KEY `date` (`date`),
KEY `year_week` (`year`,`week_starting_monday`)
);
###### populate it with days
INSERT INTO dates (date_id, date)
SELECT number, DATE_ADD( '2000-01-01', INTERVAL number DAY )
FROM numbers
WHERE DATE_ADD( '2000-01-01', INTERVAL number DAY ) BETWEEN '2000-01-01' AND '2010-01-01'
ORDER BY number;
###### fill in other rows
UPDATE dates SET
timestamp= UNIX_TIMESTAMP(date),
day_of_week= DATE_FORMAT( date, "%W" ),
weekend= IF( DATE_FORMAT( date, "%W" ) IN ('Saturday','Sunday'), 'Weekend', 'Weekday'),
month= DATE_FORMAT( date, "%M"),
year = DATE_FORMAT( date, "%Y" ),
month_day = DATE_FORMAT( date, "%d" );
UPDATE dates SET week_starting_monday=DATE_FORMAT(date,'%v');
UK Holidays 2005-2008 & Next-Business-Day
UPDATE dates
SET holiday='Holiday'
WHERE date IN
('2005-01-03', '2006-01-02', '2007-01-01', '2008-01-01', # New year's day
'2005-03-25', '2006-04-14', '2007-04-06', '2008-03-21', # Good Friday
'2005-03-28', '2006-04-17', '2007-04-09', '2008-03-24', # Easter Monday
'2005-05-02', '2006-05-01', '2007-05-07', '2008-05-05', # Early May Bank Holiday
'2005-05-30', '2006-05-29', '2007-05-28', '2008-05-26', # Spring Bank Holiday
'2005-08-29', '2006-08-28', '2007-08-27', '2008-08-25', # Summer Bank Holiday
'2005-12-27', '2006-12-25', '2007-12-25', '2008-12-25', # Christmas Day
'2005-12-26', '2006-12-26', '2007-12-26', '2008-12-26' # Boxing Day
);
# calculate the next_business_day:
CREATE TEMPORARY TABLE dates_copy LIKE dates;
INSERT INTO dates_copy SELECT * FROM dates;
UPDATE dates
SET next_business_day= (SELECT MIN(date)
FROM dates_copy d2
WHERE d2.date>dates.date
AND weekend='Weekday'
AND uk_holiday='Not Holiday');
Links
Every page and book seems to say the best site is "holidayfestival.com", but that site is now defunct.
Algorithms for generating US holidays [smart.net]
Calendar FAQ [tondering.dk]
UK Bank Holidays [dti.gov.uk]