Saturday, 26 May 2012

Slowly Changing Dimensions in odi

Slowly Changing Dimensions
Type 2 Slowly Changing Dimension (SCD) is a strategy used for loading data warehouses. It is often used for loading dimension tables, in order to keep track of changes on specific columns. A typical slowly changing dimension table would contain the flowing columns:
  • A surrogate key. This is usually a numeric column containing an automatically-generated number (using an identity column, a rank function or a sequence).
  • A natural key. This is the list of columns that represent the primary key of the operational system.
  • Columns that one must overwritten on change.
  • Columns that require to add row on change.
  • A starting timestamp column indicating when the record was created in the data warehouse
  • An ending timestamp column indicating when the record became obsolete (closing date)
  • A current record flag indicating whether the record is the actual one (1) or an old one (0)
The following example illustrate the Slowly Changing Dimension behavior.
In the operational system, a product is defined by its ID that acts as a primary key. Every product has a name, a size, a supplier and a family. In the Data Warehouse a new version of this product is stored whenever the supplier or the family is updated in the operational system.
Figure 6-1 Type 2 Slow Changing Dimensions Example
Description of Figure 6-1 follows
Description of "Figure 6-1 Type 2 Slow Changing Dimensions Example"
In this example, the product dimension is first initialized in the Data Warehouse on March 12, 2006. All the records are inserted and are assigned a calculated surrogate key as well as a fake ending date set to January 1, 2400. As these records represent the current state of the operational system, their current record flag is set to 1. After the first load, the following changes happen in the operational system:
  1. The supplier is updated for product P1
  2. The family is updated for product P2
  3. The name is updated for product P3
  4. Product P5 is added
These updates have the following impact on the data warehouse dimension:
  • The update of the supplier of P1 is translated into the creation of a new current record (Surrogate Key 5) and the closing of the previous record (Surrogate Key 1)
  • The update of the family of P2 is translated into the creation of a new current record (Surrogate Key 6) and the closing of the previous record (Surrogate Key 2)
  • The update of the name of P3 simply updates the target record with Surrogate Key 3
  • The new product P5 is translated into the creation of a new current record (Surrogate Key 7).
To create a Knowledge Module that implements this behavior, it is necessary to know which columns act as a surrogate key, a natural key, a start date etc. Oracle Data Integrator stores this information in Slowly Changing Dimension Behavior field in the Description tab for every column in the model.
When populating such a datastore in an interface, the IKM has access to this metadata using the SCD_xx selectors on the getColList() substitution method.
The way Oracle Data Integrator implements Type 2 Slowly Changing Dimensions is described below:
  1. Drop (if it exists) and create the integration table in the staging area.
  2. Insert the flow data in the integration table using only mappings that apply to the natural key, overwrite on change and add row on change columns. Set the starting timestamp to the current date and the ending timestamp to a constant.
  3. Recycle previous rejected records
  4. Call the CKM to perform a data quality check on the flow
  5. Flag the records in the integration table to 'U' when the natural key and the add row on change columns have not changed compared to the current records of the target.
  6. Update the target with the columns flagged overwrite on change by using the integration table content filtered on the 'U' flag.
  7. Close old records - those for which the natural key exists in the integration table, and set their current record flag to 0 and their ending timestamp to the current date
  8. Insert the new changing records with their current record flag set to 1
  9. Drop the integration table.
Again, this approach can be adapted. There may be some cases where the SQL produced requires further tuning and optimization.

No comments:

Post a Comment