12. Slowly Changing Dimensions
Support primary role of data warehouse to describe
the past accurately
Maintain historical context as new or changed data is
loaded into dimension tables
Slowly Changing Dimension (SCD) types
Type 1: Overwrite the existing dimension record
Type 2: Insert a new ‘versioned’ dimension record
Type 3: Track limited history with attributes
The concept of Slowly Changing Dimensions was
introduced by Ralph Kimball
The dimensions reflect the business processes (functional structure) and measures reflect numeric data flow , A dimensional model is made up a central fact table (or tables) and its associated dimensions. The dimensional model is also called a star schema because it looks like a star with the fact table in the middle and the dimensions serving as the points on the star. From a relational data modeling perspective, the dimensional model consists of a normalized fact table with denormalized dimension tables.
Think about dimensions as tables in a database because that's how it implements. Each table contains a list of homogeneous entities—products in a manufacturing company, patients in a hospital, vehicles on auto insurance policies, or customers in just about every organization. Usually, a dimension Includes all instances of its entity—all the products the company sells, for example. There is only one active row for each particular instance in the table at any time, and each row has a set of attributes that identify, describe, define, and classify the instance. A product will have a certain size and a standard weight, and belong to a product group. These sizes and groups have descriptions, like a food product might come in Mini-Pak or Jumbo size. A vehicle is painted a certain color, like white, and has a certain option package, such as the Jungle Jim sports utility package (which includes side impact air bags, six-disc CD player, DVD system, and simulated leopard skin seats).
Most facts are numeric and each fact value can vary widely depending on the business process being measured. Most facts are additive (such as dollar or unit sales), meaning they can be summed up across all dimensions. Additivity is important because DW/BI applications seldom retrieve a single fact table record. User queries generally select hundreds or thousands of records at a time and add them up. Other facts are semi-additive (such as market share or account balance), and still others are non-additive (such as unit price).Not all numeric data are facts. Exceptions include discrete descriptive information like package size or weight (describes a product) or customer age (describes a customer). Generally, these less volatile numeric values end up as descriptive attributes in dimension tables. Such descriptive information is more naturally used for constraining a query, rather than being summed in a computation. This distinction is helpful when deciding whether a data element is part of a dimension or fact.Some business processes track events without any real measures. If the event happens, we get an entry in the source system; if not, there is no row. Common examples of this kind of event include employment activities, such as hiring and firing, and event attendance, such as when a student attends a class. The fact tables that track these events typically do not have any actual fact measurements, so they're called factlessfact tables. Actually, we usually add a column called something like EventCount that contains the number 1. This provides users with an easy way to count the number of events by summing the EventCount fact.Some facts are derived or computed from other facts, just as a Net Sale num¬ber is calculated from Gross Sales minus Sales Tax. Some semi-additive facts can be handled using a derived column that is based on the context of the query. Month End Balance would add up across accounts, but not across date, for example. The non-additive Unit Price example could be avoided by defin¬ing it as a computation done in the query, which is Total Amount divided by Total Quantity. There are several options for dealing with these derived or computed facts. You can calculate them as part of the ETL process and store them in the fact table, you can put them in the fact table view definition, or you can include them in the definition of the Analysis Services database. The only way we find unacceptable is to leave the calculation to the user.
A surrogate key is a unique value, usually an integer, assigned to each row in the dimension. This surrogate key becomes the primary key of the dimension table and is used to join the dimension to the associated foreign key field in the fact table. Surrogate keys protect the DW/BI system from changes in the source system. Surrogate keys allow the DW/BI system to integrate data from multiple source systems. Different source systems might keep data on the same customers or products, but with different keys. Surrogate keys enable you to add rows to dimensions that do not exist in the source system. Surrogate keys provide the means for tracking changes in dimension attributes over time.
A surrogate key is a unique value, usually an integer, assigned to each row in the dimension. This surrogate key becomes the primary key of the dimension table and is used to join the dimension to the associated foreign key field in the fact table. Surrogate keys protect the DW/BI system from changes in the source system. Surrogate keys allow the DW/BI system to integrate data from multiple source systems. Different source systems might keep data on the same customers or products, but with different keys. Surrogate keys enable you to add rows to dimensions that do not exist in the source system. Surrogate keys provide the means for tracking changes in dimension attributes over time.
It a Dimensions than have changeable attribute values (SCD).There is three types of SCD:Type 1 SCD overwrites the existing attribute value with the new value.The Type 1 change does not preserve the attribute value that was in place at the time a historical transaction occurred. Type 2 change tracking is a powerful technique for capturing the attribute values that were in effect at a point in time and relating them to the business events in which they participated. When a change to a Type 2 attribute occurs, the ETL process creates a new row in the dimension table to capture the new values of the changed item. Type 3, keeps separate columns for both the old and new attribute, Type 3 is less common because it involves changing the physical tables and is not very scalable.
It a Dimensions than have changeable attribute values (SCD).There is three types of SCD:Type 1 SCD overwrites the existing attribute value with the new value.The Type 1 change does not preserve the attribute value that was in place at the time a historical transaction occurred. Type 2 change tracking is a powerful technique for capturing the attribute values that were in effect at a point in time and relating them to the business events in which they participated. When a change to a Type 2 attribute occurs, the ETL process creates a new row in the dimension table to capture the new values of the changed item. Type 3, keeps separate columns for both the old and new attribute, Type 3 is less common because it involves changing the physical tables and is not very scalable.
It a Dimensions than have changeable attribute values (SCD).There is three types of SCD:Type 1 SCD overwrites the existing attribute value with the new value.The Type 1 change does not preserve the attribute value that was in place at the time a historical transaction occurred. Type 2 change tracking is a powerful technique for capturing the attribute values that were in effect at a point in time and relating them to the business events in which they participated. When a change to a Type 2 attribute occurs, the ETL process creates a new row in the dimension table to capture the new values of the changed item. Type 3, keeps separate columns for both the old and new attribute, Type 3 is less common because it involves changing the physical tables and is not very scalable.
It a Dimensions than have changeable attribute values (SCD).There is three types of SCD:Type 1 SCD overwrites the existing attribute value with the new value.The Type 1 change does not preserve the attribute value that was in place at the time a historical transaction occurred. Type 2 change tracking is a powerful technique for capturing the attribute values that were in effect at a point in time and relating them to the business events in which they participated. When a change to a Type 2 attribute occurs, the ETL process creates a new row in the dimension table to capture the new values of the changed item. Type 3, keeps separate columns for both the old and new attribute, Type 3 is less common because it involves changing the physical tables and is not very scalable.