Importing data to a sink with an IDENTITY column

Learn how to import data to a sink that includes an IDENTITY column.

You can import the data from a valid source to a sink table (on-premises/Cloud Services) that includes an IDENTITY column. An IDENTITY column as either GENERATED ALWAYS AS IDENTITY or GENERATED BY DEFAULT AS IDENTITY. For more information about creating tables with an IDENTITY column, see Creating Tables With an IDENTITY Column in the SQL Reference Guide.

Note:

Using an IDENTITY column in any table does not force uniqueness. If your application requires unique values for every row of an IDENTITY column, as a recommended practice, you must create the column as GENERATED ALWAYS AS IDENTITY, and never permit any use of the CYCLE attribute for sequence generator.

Sink table with IDENTITY column as GENERATED ALWAYS AS IDENTITY

A sink table with the primary key as an IDENTITY column created as GENERATED ALWAYS AS IDENTITY is especially useful when migrating source data where the primary key is intended to be system-generated. During migration, the sequence generator continues producing new identity values for the primary key column even when the table already contains rows. This helps avoid primary key conflicts and potential constraint-violation errors.

Additionally, during parallel execution, the generated identity values remain distinct across concurrent writes. That is, each migrated row gets a different identity value even when multiple threads insert at the same time.

For example, you want to import data from a JSON file source to the Oracle NoSQL Database table as the sink.

Consider the following schema for the sink table:

CREATE TABLE IF NOT EXISTS migrateID(ID INTEGER GENERATED ALWAYS AS IDENTITY, name STRING, course STRING, PRIMARY KEY(ID))

The Migrator utility handles the data migration as described in the following cases:

Source condition User action Migration outcome

CASE 1: Source data does not supply a value for the IDENTITY field of the sink table.

Example: JSON source file sample_noID.json

{"name":"John", "course":"Computer Science"}
{"name":"Jane", "course":"BioTechnology"}
{"name":"Tony", "course":"Electronics"}

Create/generate the configuration file.

Data migration is successful. IDENTITY column values are autogenerated. Migrated data in Oracle NoSQL Database sink table migrateID:

{"ID":1001,"name":"Jane","course":"BioTechnology"}
{"ID":1003,"name":"John","course":"Computer Science"}
{"ID":1002,"name":"Tony","course":"Electronics"}

CASE 2: Source data supplies values for the IDENTITY field of the sink table.

Example: JSON source file sampleID.json

{"ID":1, "name":"John", "course":"Computer Science"}
{"ID":2, "name":"Jane", "course":"BioTechnology"}
{"ID":3, "name":"Tony", "course":"Electronics"}

Create/generate the configuration file.

Data migration is successful. Because the primary key column is GENERATED ALWAYS AS IDENTITY, user-supplied values are ignored and the sink table's IDENTITY column values are autogenerated.

Migrated data in Oracle NoSQL Database sink table migrateID:
{"ID":2003,"name":"John","course":"Computer Science"}
{"ID":2002,"name":"Tony","course":"Electronics"}
{"ID":2001,"name":"Jane","course":"BioTechnology"}

In this example, notice that the user-supplied ID=1..3, but the sink table generates the ID=2001..2003. These ID values are determined by the attributes of the underlying sequence generator.

If you delete the migrated rows and reimport the same source file, the IDs will change. The sequence generator will continue from the last generated value, producing new IDs.

Note:

The order of generated IDs may not match the source order.
As the primary key column is defined as GENERATED ALWAYS AS IDENTITY, the sink table ignores user-supplied values to that column and generates its own values. To migrate the user-supplied ID values as-is, follow these steps:
  1. Drop the IDENTITY property on the primary key column so the sink can insert user-supplied ID values:
    ALTER TABLE migrateID (MODIFY ID DROP IDENTITY);
  2. Create/generate the configuration file and run the migration.

Note:

In the sink configuration template, you can set overwrite=true if you want to overwrite existing rows when duplicate primary key values are encountered.
Data migration is successful. The user-supplied ID values from the source are copied into the ID column of the sink table migrateID.
{"ID":2,"name":"Jane","course":"BioTechnology"}
{"ID":3,"name":"Tony","course":"Electronics"}
{"ID":1,"name":"John","course":"Computer Science"}
You can now change the schema of the sink table back to using an IDENTITY column as follows:
  1. To avoid conflicts between newly generated IDs and the migrated IDs, set the sequence generator's START WITH attribute to a value greater than the current maximum ID.You can identify the maximum value of the ID field using the following query:
    SELECT ID FROM migrateID ORDER BY ID DESC LIMIT 1
    Sample output:
    {"ID":3}
    In this example, the maximum value of the ID column is 3. Therefore, set the START WITH to any number greater than 3.
  2. Use the ALTER TABLE statement to modify the primary key column (ID) to be an IDENTITY column and set the sequence generator's START WITH value to 4 as follows:
    ALTER Table migrateID (MODIFY ID GENERATED ALWAYS AS IDENTITY (START WITH 4))
    After this, the sequence generator will generate ID values starting from 4 while inserting a new row.

Sink table with IDENTITY column as GENERATED BY DEFAULT AS IDENTITY

For a sink table with the IDENTITY column created as GENERATED BY DEFAULT AS IDENTITY, the data import is dependent on whether or not the source supplies the values to the IDENTITY column and ignoreFields transformation parameter.

For example, you want to import data from a JSON file source to the Oracle NoSQL Database table as the sink.

Consider the following schema for the sink table:

CREATE TABLE IF NOT EXISTS migrateID(ID INTEGER GENERATED BY DEFAULT AS IDENTITY, name STRING, course STRING, PRIMARY KEY
      (ID))
The Migrator utility handles the data migration as described in the following cases:
Source condition User action Migration outcome

CASE 1: Source data does not supply a value for the IDENTITY field of the sink table.

Example: JSON source file sample_noID.json

{"name":"John", "course":"Computer Science"}
{"name":"Jane", "course":"BioTechnology"}
{"name":"Tony", "course":"Electronics"}

Create/generate the configuration file.

Data migration is successful. IDENTITY column values are auto-generated. Migrated data in Oracle NoSQL Database sink table migrateID:
{"ID":1,"name":"John","course":"Computer Science"}
{"ID":2,"name":"Jane","course":"BioTechnology"}
{"ID":3,"name":"Tony","course":"Electronics"}

CASE 2: Source data supplies values for the IDENTITY field of the sink table and it is a Primary Key field.

Example: JSON source file sampleID.json

{"ID":1, "name":"John", "course":"Computer Science"}
{"ID":2, "name":"Jane", "course":"BioTechnology"}
{"ID":3, "name":"Tony", "course":"Electronics"}

Create/generate the configuration file. You provide an ignoreFields transformation for the ID column in the sink configuration template (Recommended).

"transforms" : { "ignoreFields" : ["ID"] }

Data migration is successful. The supplied ID values are skipped and the IDENTITY column values are auto-generated.

Migrated data in Oracle NoSQL Database sink table migrateID:
{"ID":1002,"name":"John","course":"Computer Science"}
{"ID":1001,"name":"Jane","course":"BioTechnology"}
{"ID":1003,"name":"Tony","course":"Electronics"}

You create/generate the configuration file without the ignoreFields transformation for the IDENTITY column.

Data migration is successful. The supplied ID values from the source are copied into the ID column in the sink table.

When you try to insert an additional row to the table without supplying an ID value, the sequence generator tries to auto-generate the ID value. The sequence generator's starting value is 1. As a result, the generated ID value can potentially duplicate one of the existing ID values in the sink table. Since this is a violation of the primary key constraint, an error is returned and the row does not get inserted.

See Sequence Generator for additional information.

To avoid the primary key constraint violation, the sequence generator must start the sequence with a value that does not conflict with existing ID values in the sink table. To use the START WITH attribute to make this modification, see the example below:

Example: Migrated data in Oracle NoSQL Database sink table migrateID:
{"ID":1,"name":"John","course":"Computer Science"}
{"ID":2,"name":"Jane","course":"BioTechnology"}
{"ID":3,"name":"Tony","course":"Electronics"}
To find the appropriate value for the sequence generator to insert in the ID column, fetch the maximum value of the ID field using the following query:
SELECT ID FROM migrateID ORDER BY ID DESC LIMIT 1
Output:
{"ID":3}
The maximum value of the ID column in the sink table is 3. You want the sequence generator to start generating the ID values beyond 3 to avoid duplication. You update the sequence generator's START WITH attribute to 4 using the following statement:
ALTER Table migrateID (MODIFY ID GENERATED BY DEFAULT AS IDENTITY (START WITH 4))

This will start the sequence at 4.

Now when you insert rows to the sink table without supplying the ID values, the sequence generator auto-generates the ID values from 4 onwards averting the duplication of the IDs.

For more details on the transformation configuration parameters, see the topic Transformation Configuration Templates.