Workflow for Oracle NoSQL Database Migrator

Learn about the various steps involved in using the Oracle NoSQL Database Migrator utility for migrating your NoSQL data.

The high level flow of tasks involved in using NoSQL Database Migrator is depicted in the below figure.

Download the NoSQL Data Migrator Utility

The Oracle NoSQL Database Migrator utility is available for download from the Oracle NoSQL Downloads page. Once you download and unzip it on your machine, you can access the runMigrator command from the command line interface.

Note:

Oracle NoSQL Database Migrator utility requires Java 11 or higher versions to run.

Identify the Source and Sink

Before using the migrator, you must identify the data source and sink. For instance, if you want to migrate a NoSQL table from Oracle NoSQL Database on-premise to a JSON formatted file, your source will be Oracle NoSQL Database and sink will be JSON file. Ensure that the identified source and sink are supported by the Oracle NoSQL Database Migrator by referring to Supported Sources and Sinks. This is also an appropriate phase to decide the schema for your NoSQL table in the target or sink, and create them.
  • Identify Sink Table Schema: If the sink is Oracle NoSQL Database on-premise or cloud, you must identify the schema for the sink table and ensure that the source data matches with the target schema. If required, use transformations to map the source data to the sink table.
    • Default Schema: NoSQL Database Migrator provides an option to create a table with the default schema without the need to predefine the schema for the table. This is useful primarily when loading JSON source files into Oracle NoSQL Database.
      If the source is a MongoDB-formatted JSON file, the default schema for the table will be as follows:
      CREATE TABLE IF NOT EXISTS <tablename>(ID STRING, DOCUMENT JSON,PRIMARY KEY(SHARD(ID))

      Where:

      — tablename = value provided for the table attribute in the configuration.

      — ID = _id value from each document of the mongoDB exported JSON source file.

      — DOCUMENT = For each document in the mongoDB exported file, the contents excluding the _id field are aggregated into the DOCUMENT column.

      If the source is a DynamoDB-formatted JSON file, the default schema for the table will be as follows:
      CREATE TABLE IF NOT EXISTS <TABLE_NAME>(DDBPartitionKey_name DDBPartitionKey_type, 
      [DDBSortKey_name DDBSortKey_type],DOCUMENT JSON,
      PRIMARY KEY(SHARD(DDBPartitionKey_name),[DDBSortKey_name]))

      Where:

      — TABLE_NAME = value provided for the sink table in the configuration

      — DDBPartitionKey_name = value provided for the partition key in the configuration

      — DDBPartitionKey_type = value provided for the data type of the partition key in the configuration

      — DDBSortKey_name = value provided for the sort key in the configuration if any

      — DDBSortKey_type = value provided for the data type of the sort key in the configuration if any

      — DOCUMENT = All attributes except the partition and sort key of a Dynamo DB table item aggregated into a NoSQL JSON column

      If the source format is a CSV file, a default schema is not supported for the target table. You can create a schema file with a table definition containing the same number of columns and data types as the source CSV file. For more details on the Schema file creation, see Providing Table Schema.

      For all the other sources, the default schema will be as follows:
      CREATE TABLE IF NOT EXISTS <tablename> (ID LONG GENERATED ALWAYS AS IDENTITY, DOCUMENT JSON, PRIMARY KEY(ID))

      Where:

      — tablename = value provided for the table attribute in the configuration.

      — ID = An auto-generated LONG value.

      — DOCUMENT = The JSON record provided by the source is aggregated into the DOCUMENT column.

      Note:

      If the _id value is not provided as a string in the MongoDB-formatted JSON file, NoSQL Database Migrator converts it into a string before inserting it into the default schema.
  • Providing Table Schema: NoSQL Database Migrator allows the source to provide schema definitions for the table data using schemaInfo attribute. The schemaInfo attribute is available in all the data sources that do not have an implicit schema already defined. Sink data stores can choose any one of the following options.
    • Use the default schema defined by the NoSQL Database Migrator.
    • Use the source-provided schema.
    • Override the source-provided schema by defining its own schema. For example, if you want to transform the data from the source schema to another schema, you need to override the source-provided schema and use the transformation capability of the NoSQL Database Migrator tool.


    The table schema file, for example, mytable_schema.ddl can include table DDL statements. The NoSQL Database Migrator tool executes this table schema file before starting the migration. The migrator tool supports no more than one DDL statement per line in the schema file. For example,
    CREATE TABLE IF NOT EXISTS(id INTEGER, name STRING, age INTEGER, PRIMARY KEY(SHARD(ID)))

    Note:

    Migration will fail if the table is present at the sink and the DDL in the schemaPath is different than the table.
  • Create Sink Table: Once you identify the sink table schema, create the sink table either through the Admin CLI or using the schemaInfo attribute of the sink configuration file. See Sink Configuration Templates .

    Note:

    If the source is a CSV file, create a file with the DDL commands for the schema of the target table. Provide the file path in schemaInfo.schemaPath parameter of the sink configuration file.

Migrating TTL Metadata for Table Rows

You can choose to include the TTL metadata for table rows along with the actual data when performing migration of NoSQL tables. The NoSQL Database Migrator provides a configuration parameter to support the export and import of table row TTL metadata. Additionally, the tool provides an option to select the relative expiry time for table rows during the import operation. You can optionally export or import TTL metadata using the includeTTL parameter.

Note:

The support for migrating TTL metadata for table rows is only available for Oracle NoSQL Database and Oracle NoSQL Database Cloud Service.

Exporting TTL metadata

When a table is exported, TTL data is exported for the table rows that have a valid expiration time. If a row does not expire, then it is not included explicitly in the exported data because its expiration value is always 0. TTL information is contained in the _metadata JSON object for each exported row. The NoSQL Database Migrator exports the expiration time for each row as the number of milliseconds since the UNIX epoch (Jan 1st, 1970). For example,
//Row 1
{
    "id" : 1,
    "name" : "xyz",
    "age" : 45,
    "_metadata" : {
        "expiration" : 1629709200000   //Row Expiration time in milliseconds
    }
}

//Row 2
{
    "id" : 2,
    "name" : "abc",
    "age" : 52,
    "_metadata" : {
        "expiration" : 1629709400000   //Row Expiration time in milliseconds
    }
}

//Row 3 No Metadata for below row as it will not expire
{
    "id" : 3,
    "name" : "def",
    "age" : 15
}

Importing TTL metadata

You can optionally import TTL metadata using a configuration parameter, includeTTL. The import operation handles the following use cases when migrating table rows containing TTL metadata. These use-cases are applicable only when the includeTTL configuration parameter is specified.

In the use-cases 2 and 3, the default Reference Time of import operation is the current time in milliseconds, obtained from System.currentTimeMillis(), of the machine where the NoSQL Database Migrator tool is running. But you can also set a custom Reference Time using the ttlRelativeDate configuration parameter if you want to extend the expiration time and import rows that would otherwise expire immediately.
  • Use-case 1: No TTL metadata information is present in the importing table row.

    When you import a JSON source file produced from an external source or exported using earlier versions of the migrator, the importing row does not have TTL information. But since the includeTTL configuration parameter is equal to true, the migrator set the TTL=0 for the table row, which means the importing table row never expires.

  • Use-case 2: TTL value of the source table row is expired relative to the Reference Time when the table row gets imported.

    When you export a table row to a file and try to import it after the expiration time of the table row, the table row is ignored and is not written into the store.

  • Use-case 3: TTL value of the source table row is not expired relative to the Reference Time when the table row gets imported.

    When you export a table row to a file and try to import it before the expiration time of the table row, the table row gets imported with a TTL value. But the new TTL value for the table row may not be equal to exported TTL value because of the integer hour and day window constraints in the TimeToLive class. For example,

    Exported table row
    {
      "id" : 8,
      "name" : "xyz",
      "_metadata" : {
        "expiration" : 1629709200000 //Monday, August 23, 2021 9:00:00 AM in UTC
      }
    }

    The reference time while importing is 1629707962582, which is Monday, August 23, 2021 8:39:22.582 AM.

    Imported table row
    {
      "id" : 8,
      "name" : "xyz",
      "_metadata" : {
        "ttl" :  1629712800000 //Monday, August 23, 2021 10:00:00 AM UTC
      }
    }

Importing data to a sink with an IDENTITY column

You can import the data from a valid source to a sink table (On-premises/Cloud Services) with an IDENTITY column. You create the IDENTITY column as either GENERATED ALWAYS AS IDENTITY or GENERATED BY DEFAULT AS IDENTITY. For more information on table creation with an IDENTITY column, see Creating Tables With an IDENTITY Column in the SQL Reference Guide.

Before importing the data, make sure that the Oracle NoSQL Database table at the sink is empty if it exists. If there is pre-existing data in the sink table, migration can lead to issues such as overwriting existing data in the sink table or skipping source data during the import.

Sink table with IDENTITY column as GENERATED ALWAYS AS IDENTITY

Consider a sink table with the IDENTITY column created as GENERATED ALWAYS AS IDENTITY. The data import is dependent on whether or not the source supplies the values to the IDENTITY column and ignoreFields transformation parameter in the configuration file.

For example, you want to import data from a JSON file source to the Oracle NoSQL Database table as the sink. The schema of the sink table is:

CREATE TABLE IF NOT EXISTS migrateID(ID INTEGER GENERATED ALWAYS AS IDENTITY, name STRING, course STRING, PRIMARY KEY
      (ID))
The Migrator utility handles the data migration as described in the following cases:
Source condition User action Migration outcome

CASE 1: Source data does not supply a value for the IDENTITY field of the sink table.

Example: JSON source file sample_noID.json

{"name":"John", "course":"Computer Science"}
{"name":"Jane", "course":"BioTechnology"}
{"name":"Tony", "course":"Electronics"}

Create/generate the configuration file.

Data migration is successful. IDENTITY column values are auto-generated. Migrated data in Oracle NoSQL Database sink table migrateID:

{"ID":1001,"name":"Jane","course":"BioTechnology"}
{"ID":1003,"name":"John","course":"Computer Science"}
{"ID":1002,"name":"Tony","course":"Electronics"}

CASE 2: Source data supplies values for the IDENTITY field of the sink table.

Example: JSON source file sampleID.json

{"ID":1, "name":"John", "course":"Computer Science"}
{"ID":2, "name":"Jane", "course":"BioTechnology"}
{"ID":3, "name":"Tony", "course":"Electronics"}

Create/generate the configuration file. You provide an ignoreFields transformation for the ID column in the sink configuration template.

"transforms" : { "ignoreFields" : ["ID"] }

Data migration is successful. The supplied ID values are skipped and the IDENTITY column values are auto-generated.

Migrated data in Oracle NoSQL Database sink table migrateID:
{"ID":2003,"name":"John","course":"Computer Science"}
{"ID":2002,"name":"Tony","course":"Electronics"}
{"ID":2001,"name":"Jane","course":"BioTechnology"}

You create/generate the configuration file without the ignoreFields transformation for the IDENTITY column.

Data migration fails with the following error message:

"Cannot set value for a generated always identity column".

For more details on the transformation configuration parameters, see the topic Transformation Configuration Templates.

Sink table with IDENTITY column as GENERATED BY DEFAULT AS IDENTITY

Consider a sink table with the IDENTITY column created as GENERATED BY DEFAULT AS IDENTITY. The data import is dependent on whether or not the source supplies the values to the IDENTITY column and ignoreFields transformation parameter.

For example, you want to import data from a JSON file source to the Oracle NoSQL Database table as the sink. The schema of the sink table is:

CREATE TABLE IF NOT EXISTS migrateID(ID INTEGER GENERATED BY DEFAULT AS IDENTITY, name STRING, course STRING, PRIMARY KEY
      (ID))
The Migrator utility handles the data migration as described in the following cases:
Source condition User action Migration outcome

CASE 1: Source data does not supply a value for the IDENTITY field of the sink table.

Example: JSON source file sample_noID.json

{"name":"John", "course":"Computer Science"}
{"name":"Jane", "course":"BioTechnology"}
{"name":"Tony", "course":"Electronics"}

Create/generate the configuration file.

Data migration is successful. IDENTITY column values are auto-generated. Migrated data in Oracle NoSQL Database sink table migrateID:
{"ID":1,"name":"John","course":"Computer Science"}
{"ID":2,"name":"Jane","course":"BioTechnology"}
{"ID":3,"name":"Tony","course":"Electronics"}

CASE 2: Source data supplies values for the IDENTITY field of the sink table and it is a Primary Key field.

Example: JSON source file sampleID.json

{"ID":1, "name":"John", "course":"Computer Science"}
{"ID":2, "name":"Jane", "course":"BioTechnology"}
{"ID":3, "name":"Tony", "course":"Electronics"}

Create/generate the configuration file. You provide an ignoreFields transformation for the ID column in the sink configuration template (Recommended).

"transforms" : { "ignoreFields" : ["ID"] }

Data migration is successful. The supplied ID values are skipped and the IDENTITY column values are auto-generated.

Migrated data in Oracle NoSQL Database sink table migrateID:
{"ID":1002,"name":"John","course":"Computer Science"}
{"ID":1001,"name":"Jane","course":"BioTechnology"}
{"ID":1003,"name":"Tony","course":"Electronics"}

You create/generate the configuration file without the ignoreFields transformation for the IDENTITY column.

Data migration is successful. The supplied ID values from the source are copied into the ID column in the sink table.

When you try to insert an additional row to the table without supplying an ID value, the sequence generator tries to auto-generate the ID value. The sequence generator's starting value is 1. As a result, the generated ID value can potentially duplicate one of the existing ID values in the sink table. Since this is a violation of the primary key constraint, an error is returned and the row does not get inserted.

See Sequence Generator for additional information.

To avoid the primary key constraint violation, the sequence generator must start the sequence with a value that does not conflict with existing ID values in the sink table. To use the START WITH attribute to make this modification, see the example below:

Example: Migrated data in Oracle NoSQL Database sink table migrateID:
{"ID":1,"name":"John","course":"Computer Science"}
{"ID":2,"name":"Jane","course":"BioTechnology"}
{"ID":3,"name":"Tony","course":"Electronics"}
To find the appropriate value for the sequence generator to insert in the ID column, fetch the maximum value of the ID field using the following query:
SELECT max(ID) FROM migrateID
Output:
{"Column_1":3}
The maximum value of the ID column in the sink table is 3. You want the sequence generator to start generating the ID values beyond 3 to avoid duplication. You update the sequence generator's START WITH attribute to 4 using the following statement:
ALTER Table migrateID (MODIFY ID GENERATED BY DEFAULT AS IDENTITY (START WITH 4))

This will start the sequence at 4.

Now when you insert rows to the sink table without supplying the ID values, the sequence generator auto-generates the ID values from 4 onwards averting the duplication of the IDs.

For more details on the transformation configuration parameters, see the topic Transformation Configuration Templates.

Run the runMigrator command

The runMigrator executable file is available in the extracted NoSQL Database Migrator files. You must install Java 11 or higher version and bash on your system to successfully run the runMigrator command.

You can run the runMigrator command in two ways:
  1. By creating the configuration file using the runtime options of the runMigrator command as shown below.
    [~]$ ./runMigrator
    configuration file is not provided. Do you want to generate configuration? (y/n)                                                                              
     
    [n]: y
    ...
    ...
    • When you invoke the runMigrator utility, it provides a series of runtime options and creates the configuration file based on your choices for each option.
    • After the utility creates the configuration file, you have a choice to either proceed with the migration activity in the same run or save the configuration file for a future migration.
    • Irrespective of your decision to proceed or defer the migration activity with the generated configuration file, the file will be available for edits or customization to meet your future requirements. You can use the customized configuration file for migration later.
  2. By passing a manually created configuration file (in the JSON format) as a runtime parameter using the -c or --config option. You must create the configuration file manually before running the runMigrator command with the -c or --config option. For any help with the source and sink configuration parameters, see Sources and Sinks.
    [~]$ ./runMigrator -c </path/to/the/configuration/json/file>

Logging Migrator Progress

NoSQL Database Migrator tool provides options, which enables trace, debugging, and progress messages to be printed to standard output or to a file. This option can be useful in tracking the progress of migration operation, particularly for very large tables or data sets.

  • Log Levels

    To control the logging behavior through the NoSQL Database Migrator tool, pass the --log-level or -l run time parameter to the runMigrator command. You can specify the amount of log information to write by passing the appropriate log level value.

    $./runMigrator --log-level <loglevel>
    Example:
    $./runMigrator --log-level debug

    Table 5-1 Supported Log Levels for NoSQL Database Migrator

    Log Level Description
    warning Prints errors and warnings.
    info (default) Prints the progress status of data migration such as validating source, validating sink, creating tables, and count of number of data records migrated.
    debug Prints additional debug information.
    all Prints everything. This level turns on all levels of logging.
  • Log File:
    You can specify the name of the log file using --log-file or -f parameter. If --log-file is passed as run time parameter to the runMigrator command, the NoSQL Database Migrator writes all the log messages to the file else to the standard output.
    $./runMigrator --log-file <log file name>
    Example:
    $./runMigrator --log-file nosql_migrator.log