How the system tracks data lineage

As data flows from input clinical data models to successive target models, the system stores information linking each data item to the data items that contributed to it from "upstream" input and target models and the data items it contributes to in "downstream" target models. Data reviewers can see a data item's source and target lineage in the Listings pages.

Maintaining this context, or data lineage, is required to pass discrepancies back and forth between DMW and its data sources and to recognize a discrepancy as the same discrepancy in all models.

The system uses the following:

  • Mappings: The system uses the table and column mappings you define as part of a transformation to generate record-level data mappings during transformation and validation check execution.

  • Generated Surrogate Keys: The system generates a surrogate key value for each record by concatenating a generated table identifier and the values in the primary key columns in the order specified in the primary key constraint, separated by tildes (~). For example, table_ID~subject~visit~crf~test.

  • Generated Columns to Store Surrogate Keys

    • When a clinical data model is installed, the system adds one auxiliary column named CDR$SKEY to each table to store the surrogate key value for each record in the table.

    • When a transformation program is installed, it adds one auxiliary column to each target table for each of its source tables, to store the surrogate key value of source records.