You have probably already heard about SCD if you are playing with Data Warehouses. If not, let me explain you what it is.
While building a Data Warehouse you will encounter the evolution of dimensions values through the time. Let’s introduce to different SCD types.
What is a Slowly Changing Dimension ?
If you are playing with data warhouses and data models, you already heard about SCD or Slowly Changing Dimensions. A SCD is a dimension with more or less static data which can change in an unpredictable way. To manage that behavior we can put in place different type of solutions.
Type 1
If you update the value in the dimension table. You will not have access to what were the data before the update. This is the Type 1 SCD.
Example :
The employee Robert Dunon is currently living in France (FR) but he planned to move in Belgium (BE)
Employee Id | Employee First Name | Employee Last Name | Employee Country |
1 | Hans | Deblijf | BE |
2 | Robert | Dunon | FR |
After the update, the record will contain BE for the country where Robert lives :
Employee Id | Employee First Name | Employee Last Name | Employee Country |
1 | Hans | Deblijf | BE |
2 | Robert | Dunon | BE |
The problem that can generates that dimension type is once updated, all the fact data linked to the modified dimension will have the new values, no history will be kept.
Type 2
But I want to have access to data history !
To have a view on what were the data you can implement a type 2 SCD. To do that, you need 3 extra columns :
- A start date
- An end date
- A IsCurrent column (is it the last version ?)
Employee Id | Employee First Name | Employee Last Name | Employee Country | Employee Start Date | Employee End Date | Employee IsCurrent |
1 | Hans | Deblijf | BE | 1901-01-01 | NULL | 1 |
2 | Robert | Dunon | FR | 1901-01-01 | 2022-08-03 | 0 |
3 | Robert | Dunon | BE | 2022-08-04 | NULL | 1 |
That way when an information about a record will change:
- An end date will be put in the column associated
- The IsCurrent flag will be updated to 0
- A record with the new informations will be added
Using a type 2 SCD, the data in the fact table linked to the old version of the record will stay the same and you will be able to access the history !
Type 3
Type 3 SCD is basically storing two versions of a targeted column in the same record by adding a new column (you can also add a “modified date” column to track when the modification happened)
Employee Id | Employee First Name | Employee Last Name | Current Employee Country | Original Employee Country | Creation date | Modified date |
1 | Hans | Deblijf | BE | BE | 2022-08-03 | 2022-08-03 |
2 | Robert | Dunon | FR | FR | 2022-08-03 | 2022-08-03 |
After the update:
Employee Id | Employee First Name | Employee Last Name | Current Employee Country | Original Employee Country | Creation date | Modified date |
1 | Hans | Deblijf | BE | BE | 2022-08-03 | 2022-08-03 |
2 | Robert | Dunon | BE | FR | 2022-08-03 | 2022-08-04 |
The information is updated in the “Current Employee Country” but the original value remains in the “Original Employee Country” and the “Modified Date” is updated !
Type 6
Type 6 SCD combines the type 1, 2 & 3 (6 = 1+2+3 ☺). You can take the structure of a type 2 SCD (Start Date, End Date and IsCurrent flag) and add the “Original value” column from the type 3 SCD.
Employee Id | Employee First Name | Employee Last Name | Current Employee Country | Original Employee Country | Employee Start Date | Employee End Date | Employee IsCurrent |
1 | Hans | Deblijf | BE | BE | 1901-01-01 | NULL | 1 |
2 | Robert | Dunon | FR | FR | 1901-01-01 | NULL | 1 |
After that Robert moved to BE :
Employee Id | Employee First Name | Employee Last Name | Current Employee Country | Original Employee Country | Employee Start Date | Employee End Date | Employee IsCurrent |
1 | Hans | Deblijf | BE | BE | 1901-01-01 | NULL | 1 |
2 | Robert | Dunon | BE | FR | 1901-01-01 | 2022-08-03 | 0 |
3 | Robert | Dunon | BE | BE | 2022-08-04 | NULL | 1 |
That way you can track modifications that happened and have an history of the record changes.
Conclusion
Depending of if you need data history or not, tracking changes in your dimensions, … you will be more likely to use a type instead of another one.
Those are the most commons SCD types. Others exists but are way less met. If you want more informations about them you can check the well-explained Wikipedia page about SCD :
https://en.wikipedia.org/wiki/Slowly_changing_dimension
And if you want to have more informations about Data Warehouses in general you can take a look at the famous “The Data Warehouse Toolkit” book from Ralph Kimball !
If you need more informations or that you don’t know how to implement SCD, leave a comment and I will be happy to help you ! ☺