Slowly Changing Dimensions – Yes but, which one ?

You have probably already heard about SCD if you are playing with Data Warehouses. If not, let me explain you what it is.

While building a Data Warehouse you will encounter the evolution of dimensions values through the time. Let’s introduce to different SCD types.

What is a Slowly Changing Dimension ?

If you are playing with data warhouses and data models, you already heard about SCD or Slowly Changing Dimensions. A SCD is a dimension with more or less static data which can change in an unpredictable way. To manage that behavior we can put in place different type of solutions.

Type 1

If you update the value in the dimension table. You will not have access to what were the data before the update. This is the Type 1 SCD.

Example :

The employee Robert Dunon is currently living in France (FR) but he planned to move in Belgium (BE)

Employee IdEmployee First NameEmployee Last NameEmployee Country
1HansDeblijfBE
2RobertDunonFR

After the update, the record will contain BE for the country where Robert lives :

Employee IdEmployee First NameEmployee Last NameEmployee Country
1HansDeblijfBE
2RobertDunonBE

The problem that can generates that dimension type is once updated, all the fact data linked to the modified dimension will have the new values, no history will be kept.

Type 2

But I want to have access to data history !

To have a view on what were the data you can implement a type 2 SCD. To do that, you need 3 extra columns :

  • A start date
  • An end date
  • A IsCurrent column (is it the last version ?)

Employee IdEmployee First NameEmployee Last NameEmployee CountryEmployee Start DateEmployee End DateEmployee IsCurrent
1HansDeblijfBE1901-01-01NULL1
2RobertDunonFR1901-01-012022-08-030
3RobertDunonBE2022-08-04NULL1

That way when an information about a record will change:

  • An end date will be put in the column associated
  • The IsCurrent flag will be updated to 0
  • A record with the new informations will be added

Using a type 2 SCD, the data in the fact table linked to the old version of the record will stay the same and you will be able to access the history !

Type 3

Type 3 SCD is basically storing two versions of a targeted column in the same record by adding a new column (you can also add a “modified date” column to track when the modification happened)

Employee IdEmployee First NameEmployee Last NameCurrent Employee CountryOriginal Employee Country Creation date Modified date
1HansDeblijfBEBE2022-08-032022-08-03
2RobertDunonFRFR2022-08-032022-08-03

After the update:

Employee IdEmployee First NameEmployee Last NameCurrent Employee CountryOriginal Employee CountryCreation dateModified date
1HansDeblijfBEBE2022-08-032022-08-03
2RobertDunonBEFR2022-08-032022-08-04

The information is updated in the “Current Employee Country” but the original value remains in the “Original Employee Country” and the “Modified Date” is updated !

Type 6

Type 6 SCD combines the type 1, 2 & 3 (6 = 1+2+3 ☺). You can take the structure of a type 2 SCD (Start Date, End Date and IsCurrent flag) and add the “Original value” column from the type 3 SCD.

Employee IdEmployee First NameEmployee Last NameCurrent Employee CountryOriginal Employee CountryEmployee Start DateEmployee End DateEmployee IsCurrent
1HansDeblijfBEBE1901-01-01NULL1
2RobertDunonFRFR1901-01-01NULL1

After that Robert moved to BE :

Employee IdEmployee First NameEmployee Last NameCurrent Employee CountryOriginal Employee CountryEmployee Start DateEmployee End DateEmployee IsCurrent
1HansDeblijfBEBE1901-01-01NULL1
2RobertDunonBEFR1901-01-012022-08-030
3RobertDunonBEBE2022-08-04NULL1

That way you can track modifications that happened and have an history of the record changes.

Conclusion

Depending of if you need data history or not, tracking changes in your dimensions, … you will be more likely to use a type instead of another one.

Those are the most commons SCD types. Others exists but are way less met. If you want more informations about them you can check the well-explained Wikipedia page about SCD :

https://en.wikipedia.org/wiki/Slowly_changing_dimension

And if you want to have more informations about Data Warehouses in general you can take a look at the famous “The Data Warehouse Toolkit” book from Ralph Kimball !

https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-toolkit/

If you need more informations or that you don’t know how to implement SCD, leave a comment and I will be happy to help you ! ☺

Share