Saturday, 26 May 2012

Check Knowledge Modules (CKM)

Check Knowledge Modules (CKM)

The CKM is in charge of checking that records of a data set are consistent with defined constraints. The CKM is used to maintain data integrity and participates in the overall data quality initiative. The CKM can be used in 2 ways:
  • To check the consistency of existing data. This can be done on any datastore or within interfaces, by setting the STATIC_CONTROL option to "Yes". In the first case, the data checked is the data currently in the datastore. In the second case, data in the target datastore is checked after it is loaded.
  • To check consistency of the incoming data before loading the records to a target datastore. This is done by using the FLOW_CONTROL option. In this case, the CKM simulates the constraints of the target datastore on the resulting flow prior to writing to the target.
In summary: the CKM can check either an existing table or the temporary "I$" table created by an IKM.
The CKM accepts a set of constraints and the name of the table to check. It creates an "E$" error table which it writes all the rejected records to. The CKM can also remove the erroneous records from the checked result set.
The following figures show how a CKM operates in both STATIC_CONTROL and FLOW_CONTROL modes.
Figure 1-2 Check Knowledge Module (STATIC_CONTROL)
Description of Figure 1-2 follows
Description of "Figure 1-2 Check Knowledge Module (STATIC_CONTROL)"
In STATIC_CONTROL mode, the CKM reads the constraints of the table and checks them against the data of the table. Records that don't match the constraints are written to the "E$" error table in the staging area.
Figure 1-3 Check Knowledge Module (FLOW_CONTROL)
Description of Figure 1-3 follows
Description of "Figure 1-3 Check Knowledge Module (FLOW_CONTROL)"
In FLOW_CONTROL mode, the CKM reads the constraints of the target table of the Interface. It checks these constraints against the data contained in the "I$" flow table of the staging area. Records that violate these constraints are written to the "E$" table of the staging area.
In both cases, a CKM usually performs the following tasks:
  1. Create the "E$" error table on the staging area. The error table should contain the same columns as the datastore as well as additional columns to trace error messages, check origin, check date etc.
  2. Isolate the erroneous records in the "E$" table for each primary key, alternate key, foreign key, condition, mandatory column that needs to be checked.
  3. If required, remove erroneous records from the table that has been checked.

No comments:

Post a Comment