Migrate from CSV file to Oracle NoSQL Database

This example shows the usage of Oracle NoSQL Database Migrator to copy data from a CSV file to Oracle NoSQL Database.

Example

After evaluating multiple options, an organization finalizes Oracle NoSQL Database as its NoSQL Database platform. As its source contents are in CSV file format, they are looking for a way to migrate them to Oracle NoSQL Database.

In this example, you will learn to migrate the data from a CSV file called course.csv, which contains information about various courses offered by a university. You generate the configuration file from the runMigrator utility.

You can also prepare the configuration file with the identified source and sink details. See Sources and Sinks.

Prerequisites
  • Identify the source and sink for the migration.
    • Source: CSV file

      In this example, the source file is course.csv

      
      cat [~/nosql-migrator-1.5.0]/course.csv
      1,"Computer Science", "San Francisco", "2500"
      2,"Bio-Technology", "Los Angeles", "1200"
      3,"Journalism", "Las Vegas", "1500"
      4,"Telecommunication", "San Francisco", "2500"
      
    • Sink: Oracle NoSQL Database
  • The CSV file must conform to the RFC4180 format.
  • Create a file containing the DDL commands for the schema of the target table, course. The table definition must match the CSV data file concerning the number of columns and their types.

    In this example, the DDL file is mytable_schema.ddl

    
    cat [~/nosql-migrator-1.5.0]/mytable_schema.ddl
    create table course (id INTEGER, name STRING, location STRING, fees INTEGER, PRIMARY KEY(id));
    
Procedure
To migrate the CSV file data from course.csv to Oracle NoSQL Database Service, perform the following steps:
  1. Open the command prompt and navigate to the directory where you extracted the Oracle NoSQL Database Migrator utility.
  2. To generate the configuration file using Oracle NoSQL Database Migrator, execute the runMigrator command without any runtime parameters.
    [~/nosql-migrator-1.5.0]$./runMigrator
  3. As you did not provide the configuration file as a runtime parameter, the utility prompts if you want to generate the configuration now. Type y.
    You can choose a location for the configuration file or retain the default location by pressing the Enter key.
    
    Configuration file is not provided. Do you want to generate
    configuration? (y/n) [n]: y
    Generating a configuration file interactively.
    
    Enter a location for your config [./migrator-config.json]: 
    ./migrator-config.json already exist. Do you want to overwrite?(y/n) [n]: y
    
  4. Based on the prompts from the utility, choose your options for the Source configuration.
    
    Select the source: 
    1) nosqldb
    2) nosqldb_cloud
    3) file
    4) object_storage_oci
    5) aws_s3
    #? 3
    
    Configuration for source type=file
    Select the source file format: 
    1) json
    2) mongodb_json
    3) dynamodb_json
    4) csv
    #? 4
    
  5. Provide the path to the source CSV file. Further, based on the prompts from the utility, you can choose to reorder the column names, select the encoding method, and trim the tailing spaces from the target table.
    
    Enter path to a file or directory containing csv data: [~/nosql-migrator-1.5.0]/course.csv
    Does the CSV file contain a headerLine? (y/n) [n]: n
    Do you want to reorder the column names of NoSQL table with respect to
    CSV file columns? (y/n) [n]: n
    Provide the CSV file encoding. The supported encodings are:
    UTF-8,UTF-16,US-ASCII,ISO-8859-1. [UTF-8]: 
    Do you want to trim the tailing spaces? (y/n) [n]: n
    
  6. Based on the prompts from the utility, choose your options for the Sink configuration.
    
    Select the sink:
    1) nosqldb
    2) nosqldb_cloud
    #? 1
    Configuration for sink type=nosqldb
    Enter store name of the Oracle NoSQL Database: mystore
    Enter comma separated list of host:port of Oracle NoSQL Database: <hostname>:5000
    
  7. Based on the prompts from the utility, provide the name of the target table.
    
    Enter fully qualified table name: course
    
  8. Enter your choice to set the TTL value. The default value is n.
    
    Include TTL data? If you select 'yes' TTL value provided by the
    source will be set on imported rows. (y/n) [n]: n
    
  9. Based on the prompts from the utility, specify whether or not the target table must be created through the Oracle NoSQL Database Migrator tool. If the table is already created, it is suggested to provide n. If the table is not created, the utility will request the path for the file containing the DDL commands for the schema of the target table.
    
    Would you like to create table as part of migration process?
    Use this option if you want to create table through the migration tool.
    If you select yes, you will be asked to provide a file that contains
    table DDL or to use schema provided by the source or default schema.
    (y/n) [n]: y
    Enter path to a file containing table DDL: [~/nosql-migrator-1.5.0]/mytable_schema.ddl
    Is the store secured? (y/n) [y]: n
    would you like to overwrite records which are already present?
    If you select 'no' records with same primary key will be skipped [y/n] [y]: y
    Enter store operation timeout in milliseconds. [5000]:
    Would you like to add transformations to source data? (y/n) [n]: n
    
  10. Enter your choice to determine whether to proceed with the migration in case any record fails to migrate.
    
    Would you like to continue migration if any data fails to be migrated? 
    (y/n) [n]: n
    
  11. The utility displays the generated configuration on the screen.
    
    Generated configuration is:
    {
      "source" : {
        "type" : "file",
        "format" : "csv",
        "dataPath" : "[~/nosql-migrator-1.5.0]/course.csv",
        "hasHeader" : false,
        "csvOptions" : {
          "encoding" : "UTF-8",
          "trim" : false
        }
      },
      "sink" : {
        "type" : "nosqldb",
        "storeName" : "mystore",
        "helperHosts" : ["<hostname>:5000"],
        "table" : "migrated_table",
        "query" : "",
        "includeTTL" : false,
        "schemaInfo" : {
          "schemaPath" : "[~/nosql-migrator-1.5.0]/mytable_schema.ddl"
        },
        "overwrite" : true,
        "requestTimeoutMs" : 5000
      },
      "abortOnError" : true,
      "migratorVersion" : "1.5.0"
    }
    
  12. Finally, the utility prompts you to specify whether or not to proceed with the migration using the generated configuration file. The default option is y.
    Note: If you select n, you can use the generated configuration file to perform the migration. Specify the ./runMigrator -c or the ./runMigrator --config option.
    
    Would you like to run the migration with above configuration?
    If you select no, you can use the generated configuration file to
    run the migration using:
    ./runMigrator --config ./migrator-config.json
    (y/n) [y]: y
    
    
  13. The NoSQL Database Migrator copies your data from the CSV file to Oracle NoSQL Database.
    
    creating source from given configuration:
    source creation completed
    creating sink from given configuration:
    sink creation completed
    creating migrator pipeline
    migration started
    [nosqldb sink] : start loading DDLs
    [nosqldb sink] : executing DDL: create table course (id INTEGER, name STRING, location STRING, fees INTEGER, PRIMARY KEY(id))
    [nosqldb sink] : completed loading DDLs
    [nosqldb sink] : start loading records
    [csv file source] : start parsing CSV records from file: course.csv
    migration completed. Records provided by source=4, Records written to sink=4, Records failed=0,Records skipped=0.
    Elapsed time: 0min 0sec 559ms
    Migration completed.
    
Validation
Start the SQL prompt in your KVStore.
 java -jar lib/sql.jar -helper-hosts localhost:5000 -store kvstore
Verify that the new table is created with the source data:

sql-> select * from course;
{"id":4,"name":"Telecommunication","location":"San Francisco","fees":2500}
{"id":1,"name":"Computer Science","location":"San Francisco","fees":2500}
{"id":2,"name":"Bio-Technology","location":"Los Angeles","fees":1200}
{"id":3,"name":"Journalism","location":"Las Vegas","fees":1500}
 
4 rows returned