Running the Oracle NoSQL Database Analytics Integrator
Steps to run the Oracle NoSQL Database Analytics Integrator.
Create a configuration file for the integrator
Before you can execute the Oracle NoSQL Database Analytics Integrator, you must first create a configuration file. This configuration file will be used when invoking the utility. The configuration file should have the entries in a JSON format as shown in the examples below. The following are just two sample configuration files. Not all of the parameters used below are required. The table below explains every parameter being used in the example and highlights if it is optional or required.
Example 1:You execute the utility from an Oracle Cloud Compute Instance and you wish to authenticate using an Instance Principal.
{
"nosqlstore": {
"type" : "nosqldb_cloud",
"endpoint" : "us-ashburn-1",
"useInstancePrincipal" : true,
"compartment" : <ocid.of.compartment.containing.nosql.tables>,
"table" : <tableName1,tableName2,tableName3>,
"readUnitsPercent" : "90,90,90",
"requestTimeoutMs" : "5000"
},
"objectstore" : {
"type" : "object_storage_oci",
"endpoint" : "us-ashburn-1",
"useInstancePrincipal" : true,
"compartment" : <ocid.of.compartment.containing.bucket>,
"bucket" : <bucket-name-objectstorage>,
"compression" : "snappy"
},
"database": {
"type" : "database_cloud",
"endpoint" : "us-ashburn-1",
"credentials" : "/home/opc/.oci/config",
"credentialsProfile" : <profile-for-adw-auth>,
"databaseName" : <database-name>,
"databaseUser" : "ADMIN",
"databaseWallet"" : <path-where-wallet-unzipped>
}
}
Example 2: You prefer to authenticate using your own user credentials, or you are executing from outside of the Oracle Cloud and thus Instance Principal authentication is not available.
{
"nosqlstore": {
"type" : "nosqldb_cloud",
"endpoint" : "us-ashburn-1",
"credentials" : "/home/opc/.oci/config",
"credentialsProfile" : <nosqldb-user-credentials>,
"table" : <tableName1,tableName2,tableName3>,
"readUnitsPercent" : "90,90,90",
"requestTimeoutMs" : "5000"
},
"objectstore" : {
"type" : "object_storage_oci",
"endpoint" : "us-ashburn-1",
"credentials" : "/home/opc/.oci/config",
"credentialsProfile" : <objectstorage-user-credentials>,
"bucket" : <bucket-name-objectstorage>,
"compression" : "snappy"
},
"database": {
"type" : "database_cloud",
"endpoint" : "us-ashburn-1",
"credentials" : "/home/opc/.oci/config",
"credentialsProfile" : <adw-user-credentials>,
"databaseName" : <database-name>,
"databaseUser" : "ADMIN",
"databaseWallet" : <path-where-wallet-unzipped>
}
"abortOnError" : false
}
The configuration is divided into three sections - nosqlstore, objectstore, and database - whose entries are used to specify how the utility interacts with each respective cloud service: the NoSQL Cloud Service, Oracle Object Storage, and Oracle Autonomous AI Lakehouse.
There are some parameters that are common in all three sections.
Table - Common Parameters for all sections
| Paramter name | Details of the parameter |
|---|---|
| type | Currently, this parameter can take one of the three values: nosqldb_cloud (for the nosqlstore section), object_storage_oci (for the objectstore section), and database_cloud (for the database section). |
| endpoint | The value of this entry must be set to the region in which the associated resource is located. The value specified for this entry can be either the region’s API endpoint or the Region identifier for the resource. For example, if each resource is located in the US East (Ashburn) region, then the endpoint entry in each section can be specified using either the region’s identifier (“us-ashburn-1”) or the region’s API endpoint for the desired service. |
Table - Parameters in the configuration file
| Parameter name | Specified Section | Details of the section |
|---|---|---|
| useInstancePrincipal | nosqlstore(Optional) objectstore(Optional) |
The useInstancePrincipal entry can be specified as the boolean value true if the following conditions are satisfied:
If true is specified for the useInstancePrincipal entry and the credentials entry is also specified, then the credentials entry takes precedence, and the user credentials referenced in that entry's value will be used to interact with the associated resource. Note: User credentials must be specified in the database section because the Autonomous AI Database hosted in Oracle Autonomous AI Lakehouse requires it. |
| compartment | nosqlstore(Optional) objectstore(Optional) |
|
| credentials | nosqlstore(Optional) objectstore(Optional) database(Required) |
The credentials entry is required in the database section under all circumstances. It is required in the nosqlstore and objectstore sections in one or more of the following circumstances:
The value specified for this entry must reference a file on the local file system that specifies user credentials that can be used to securely interact with the associated resource. |
| credentialsProfile | nosqlstore(Optional) objectstore(Optional) database(Optional) |
The credentialsProfile entry is optional in each section, and even if specified, applies only when a corresponding credentials entry is also specified. |
| table | nosqlstore(Required) | The table entry is required and must be specified in the nosqlstore section. The value of this entry is a string consisting of a comma-separated list of names; where each name references the name of a table in the NoSQL Database Cloud Service whose contents should be retrieved and copied to the Oracle Autonomous AI Lakehouse. |
| readUnitsPercent | nosqlstore(Optional) | The readUnitsPercent entry is optional and is applicable only in the nosqlstore section. The value of this entry is a string consisting of a comma-separated list of integers; between 1 and 100, representing the percentage of read units that can be consumed when retrieving data from the corresponding table. This entry allows you to specify different read unit percentages for each of the tables referenced in the table entry; where the first percentage in the list corresponds to the first table in the list of tables, the second percentage corresponds to the second table, and so on. It is not required that the number of percentages in this list equal the number of tables in the list of tables. A default value of 90 percent will be assigned to any table in the list of tables that does not have a corresponding percentage in this list. For example, suppose four table names are specified in the table entry, but the readUnitsPercent entry is set to the value "50,80". For this case, data from the first table will be retrieved using 50 percent of the available read units, whereas 80 percent of the read units will be used when retrieving data from the second table. And finally, for the remaining two tables, 90 percent of the read units (the default) will be used when retrieving the data from each of those tables. |
| requestTimeoutMs | nosqlstore(Optional) | The requestTimeoutMs entry is optional and is applicable only in the nosqlstore section. The value of this entry is a string consisting of a comma-separated list of positive integers; where each integer represents the number of milliseconds allowed for each data retrieval request to complete for the corresponding table. This entry allows you to specify different timeout values for each of the tables referenced in the table entry. If this entry is not specified, or if this entry specifies a timeout for only a subset of the tables, then the default value of 5000 will be assigned to the remaining tables. |
| bucket | objectstore(Required) | The bucket entry is required and must be specified in the objectstore section. The value of this entry is a string representing the name of the OCI Object Storage bucket, into which the utility copies the data retrieved from the NoSQL tables. |
| compression | objectstore(Optional) | The compression entry is optional and is applicable in only the objectstore section. The value specified for this entry is a string representing how the data is retrieved from the table(s) specified in the nosqlstore. If this is set, then the table data is compressed when being copied to object storage. The value specified for this entry must be one of the following:
Note: If the compression entry is not specified, then snappy compression will be performed. |
| databaseName | database(Required) | The dabaseName entry is required and must be specified in the database section. This entry is a string whose value is the name of the database created in the Oracle Autonomous AI Lakehouse Cloud Service. |
| databaseUser | database(Optional) | The databaseUser entry is optional and should be specified in the database section. This entry is a string whose value is the name of the user account in the Autonomous AI Database specified in the dabaseName entry. If this entry is not specified, then you will be prompted in the command line to provide the value. |
| databaseWallet | database(Required) | The databaseWallet entry is required and must be specified in the database section. This entry is a string whose value is the filesystem path to the directory containing the contents of the Oracle Wallet downloaded from the Autonomous AI Database user account specified in the databaseUser entry in the configuration file. |
| abortOnError | Optional | Specifies the action to be taken on facing an error. The default value is true. |
Note: Each entry in the configuration file can be overridden on the command line by setting a system property with the name of the form, section.entry for example, -Dnosqlstore.table=tableName1,tableName3. If an entry is not located within a section, then the name to use for such a property is simply the name of the entry itself; for example, -DabortOnError=false. This feature may be useful when testing or writing scripts that run the utility at regular intervals.
Specifying config information in the credentials file:
Oracle Cloud Infrastructure requires basic configuration information, like user credentials, tenancy OCID, etc which can be specified in the config file. The default location for this config file is ~/.oci. You can specify multiple sets of user credentials in this config file.
A sample credentials file is shown below.
[DEFAULT]
user=<ocid.of.default.user>
fingerprint=<fingerprint.of.default.user>
key_file=<path.to.default.user.oci.api.private.key.file.pem>
tenancy=<ocid.of.default.user.tenancy>
region=us-ashburn-1
compartment=<ocid.of.default.compartment>
[nosqldb-user-credentials]
user=<ocid.of.nosqldb.user>
fingerprint=<fingerprint.of.nosqldb.user>
key_file=<path.to.nosqldb.user.oci.api.private.key.file.pem>
tenancy=<ocid.of.nosqldb.user.tenancy>
region=us-ashburn-1
compartment=<ocid.of.nosqldb.compartment>
[objectstorage-user-credentials]
user=<ocid.of.objectstorage.user>
fingerprint=<fingerprint.of.objectstorage.user>
key_file=<path.to.objectstorage.user.oci.api.private.key.file.pem>
tenancy=<ocid.of.objectstorage.user.tenancy>
region=us-ashburn-1
compartment=<ocid.of.objectstorage.compartment>
[adw-user-credentials]
user=<ocid.of.adw.user>
fingerprint=<fingerprint.of.adw.user>
key_file=<path.to.adw.user.oci.api.private.key.file.pem>
tenancy=<ocid.of.adw.user.tenancy>
region=us-ashburn-1
compartment=<ocid.of.adw.compartment>
dbmsOcid=<ocid.of.autonomous.database.in.adw>
dbmsCredentialName=<OCI$RESOURCE_PRINCIPAL or NOSQLADWDB_OBJ_STORE_CREDENTIAL>
Note: In the above configuration file, there are three separate entries for nosql-db-user, objectstorage-user and adw-user. This is not mandatory and a config file can exist with only one DEFAULT profile. However, having separate profiles is a good practice rather than combining all parameters in the DEFAULT profile.
| Parameter Name | Details of the parameter |
|---|---|
| user | The OCID of the user |
| fingerprint | A short sequence of bytes used to identify a longer public key for the default user |
| keyfile | The path/filename to the file which contains the private key for the default user |
| tenancy | The OCID of the tenancy |
| regions | The endpoint of the region |
| compartment | compartment name or OCID of the compartment of the default user |
| dbmsOcid | OCID of the Autonomous AI Database |
| dbmsCredentialName | This is the name of the credential the Oracle Autonomous AI Lakehouse database will use to authenticate with Object Storage; which is either the name OCI$RESOURCE_PRINCIPAL (if you choose to employ Resource Principal authentication), or the name of the AUTH_TOKEN credential that is created when the DBMS_CLOUD.CREATE_CREDENTIAL procedure is executed by either the user or the system administrator (for example,NOSQLADWDB_OBJ_STORE_CREDENTIAL ). |
Running the tool
After all the requirements for using the necessary Oracle Cloud services (NoSQL Database, Object Storage, and Oracle Autonomous AI Lakehouse) have been completed and a valid configuration file has been created, the Oracle NoSQL Database Analytics Integrator can be executed by simply typing a command on the command line.
-
Navigate to the directory nosqlanalytics under the installation directory
(/home/opc/nosqlanalytics-<version>).cd /home/opc/nosqlanalytics-1.0.1/nosqlanalytics -
Invoke the utility using the following command. The configuration file
oci-nosqlanalytics-config.jsonis present under the.ocidirectory inside the home directory.java -Djava.util.logging.config.file=./src/main/resources/logging/java-util-logging.properties -Dlog4j.configurationFile=file:./src/main/resources/logging/log4j2-analytics.properties -jar ./lib/nosqlanalytics-1.0.1.jar -config ~/.oci/oci-nosqlanalytics-config.json
Note: The system properties that configure the loggers used during execution are optional. If those system properties are not specified, then the utility will produce no logging output.
Logging
The Oracle NoSQL Database Analytics Integrator executes software from multiple third-party libraries, where each library defines its own set of loggers with different namespaces. For convenience, the Oracle NoSQL Database Analytics Integrator provides two logging configuration files as part of the release; one to configure logging mechanisms based on java.util.logging, and one for loggers based on Log4j2.
Note: By default, the logger configuration files provided with the utility are designed to produce minimal output as the utility executes. But if you wish to see verbose output from the various components that are employed by the utility, then you should increase the logging levels of the specific loggers whose behavior you wish to analyze.