Supporting duplicate primary key values in a load

In rare cases you may need to allow a single load of source data to contain records with duplicate primary key values—for example, when loading data from a small lab that may not be able to guarantee uniqueness. In those cases you can define a composite key, but it may not be sufficient to ensure uniqueness.

If you need to support duplicate primary key values within a single data load, check Supports Duplicate when you define the primary key for the table in the input data model. Selecting this option ensures that all records are loaded and not deleted but requires careful checking of the data.

When you select Supports Duplicate, the system adds the column CDR$DUP_NUM to the target table. During each data load, the system detects whether multiple incoming records have an identical primary key value and:

  • The system inserts a value of 1 to the CDR$DUP_NUM column for each record with the first occurrence of a set of primary key values.

  • If another record with the same primary key values is loaded in the same data load, the system inserts a value of 2 into its CDR$DUP_NUM column, and 3 for the third record with the same primary key values, and so on.

  • During subsequent data loads, the system assumes that the first record with a particular set of primary key values is the same as the existing one with the CDR$DUP_NUM value 1, the second is the same as 2, and so on. If two records exist with values 1 and 2, and the next load contains three records with the same values, the system gives the third record a CDR$DUP_NUM value of 3.

    Note:

    For this system to work, the lab must always reload all records in the same order, adding new records to the end of the file.

The CDR$DUP_NUM value becomes part of the surrogate key as well as the primary key and is used for data lineage tracing; see How the system tracks data lineage.