DBMS_CLOUD Package Format Options

The format argument in DBMS_CLOUD specifies the format of source files.

The two ways to specify the format argument are:

format => '{"format_option" : “format_value” }'  

And:

format => json_object('format_option' value 'format_value'))

Examples:

format => json_object('type' VALUE 'CSV')

To specify multiple format options, separate the values with a ",".

For example:

format => json_object('ignoremissingcolumns' value 'true', 'removequotes' value 'true', 
                           'dateformat' value 'YYYY-MM-DD-HH24-MI-SS', 'blankasnull' value 'true', 'logretention' value 7)

Note:

For Avro, ORC, or Parquet format options, see DBMS_CLOUD Package Format Options for Avro, ORC, or Parquet.

As noted in the Format Option column, a limited set of format options are valid with DBMS_CLOUD.COPY_COLLECTION or with DBMS_CLOUD.COPY_DATA when the format type is JSON.

Format Option Description Syntax

access_protocol

Specifies the type of Apache Iceberg table, such as AWS or OCI Object Storage, and what information is used to create the external table, for example information from a data catalog or from a direct metadata URI.

See CREATE_EXTERNAL_TABLE Procedure for Apache Iceberg, for details on the access_protocol syntax.

blankasnull

When set to true, loads fields consisting of spaces as null.

blankasnull : true

Default value: False

characterset

Valid with format JSON and COPY_DATA

Specifies the characterset of source files

characterset: string

Default value: Database characterset

columnpath

Only use with format JSON and COPY_DATA

Array of JSON path expressions that correspond to the fields that need to be extracted from the JSON records. Each of the JSON path expressions in the array should follow the rules described in SQL/JSON Path Expressions .

Only use with format JSON and DBMS_CLOUD.COPY_DATA.

JSON Array of json path expressions expressed in string format. For example: 'columnpath' value '["$.WEATHER_STATION_ID", "$.WEATHER_STATION_NAME"]'

compression

Option valid with JSON data

Specifies the compression type of the source file.

ZIP archiving format is not supported.

Specifying the value auto checks for the compression types: gzip, zlib, bzip2.

compression: auto|gzip|zlib|bzip2

Default value: Null value meaning no compression.

conversionerrors

If a row is rejected because of data type conversion errors, the related columns are stored as null or the row is rejected.

conversionerrors : reject_record | store_null

Default value: reject_record

dateformat

Specifies the date format in the source file. The format option AUTO searches for the following formats:

J 
MM-DD-YYYYBC 
MM-DD-YYYY 
YYYYMMDD HHMISS 
YYMMDD HHMISS 
YYYY.DDD 
YYYY-MM-DD

dateformat : string

Default value: Database date format

delimiter

Specifies the field delimiter.

To use a special character as the delimiter, specify the HEX value of the ASCII code of the character. For example, the following specifies the TAB character as the delimiter:

format => json_object('delimiter' value 'X''9''')

delimiter : character

Default value | (pipe character)

detectfieldorder

Specifies that the fields in the external data files are in a different order than the columns in the table. Detect the order of fields using the first row of each external data file and map it to the columns of the table. The field names in external data files are compared in case insensitive manner with the names of the table columns.

This format option is applicable for the following procedures:

  • DBMS_CLOUD.COPY_DATA

  • DBMS_CLOUD.CREATE_EXTERNAL_TABLE

  • DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE

  • DBMS_CLOUD.CREATE_HYBRID_PART_TABLE

Restrictions for detectfieldorder:

  • Field names in the data file must appear in the first record line and must not contain any white spaces between the field names.

  • The field delimiter in the field names record must be the same as the field delimiter for the data in the file.

  • Quoted field names are not supported. The field names in data files are compared, in case insensitive manner, with the names of the external table columns.

  • Embedded field delimiters are not allowed in the field names.

  • The number of columns in the table must match the number of fields in the data files.

  • This format option is not applicable for Bigdata or Oracle Data Pump formats, as those formats have precise column metadata information in the binary file format.

    The text formats, CSV, JSON, Parquet, or XML can benefit from this automatic field order detection when the first line contains the field names.

See FIELD NAMES and the description for ALL FILES for more information.

detectfieldorder: true

Default value: false

enablelogs

The format option enablelogs is used with the following DBMS_CLOUD procedures:

  • COPY_DATA

  • COPY_COLLECTION

  • EXPORT_DATA

enablelogs specifies a boolean value, when set to TRUE, logs are generated. When set to FALSE, logs are not generated.

For example:

format => JSON_OBJECT('enablelogs' value FALSE)

enablelogs: false

Default value: true

encryption

The format option encryption specifies the encryption and decryption options to export and import data to and from the Object Store.

Use encryption to specify the following parameters to encrypt and decrypt:

  • user_defined_function: Specifies a fully qualified user defined function to decrypt or encrypt the specified BLOB (binary large object). It returns a decrypted or encrypted BLOB. user_defined_function is mutually exclusive with other parameters for encryption.

    For example, ADMIN.DECRYPTION_CALLBACK.

  • type: Specifies the DBMS_CRYPTO encryption algorithm to decrypt or encrypt.

    type accepts values in the Block Cipher Algorithms + Block Cipher Chaining Modifiers + Block Cipher Padding Modifiers format.

    Supported Block Cipher Algorithms are:

    • DBMS_CRYPTO.ENCRYPT_AES256

    Supported Block Cipher Chaining Modifiers are:

    • DBMS_CRYPTO.CHAIN_CBC

    • DBMS_CRYPTO.CHAIN_CFB

    • DBMS_CRYPTO.CHAIN_ECB

    • DBMS_CRYPTO.CHAIN_OFB

    Supported Block Cipher Padding Modifiers are:

    • DBMS_CRYPTO.PAD_PKCS5

    • DBMS_CRYPTO.PAD_NONE

    • DBMS_CRYPTO.PAD_ZERO

    • DBMS_CRYPTO.PAD_ORCL

  • credential_name: Specifies the credential used to store the encryption key.

The Block Cipher Chaining Modifiers and Block Cipher Padding Modifiers values defaults to DBMS_CRYPTO.CHAIN_CBC and DBMS_CRYPTO.PAD_PKCS5, if you do not specify values for these parameters.

The format option encryption is used with the following DBMS_CLOUD procedures:
  • Used to pass parameters to decrypt for these procedures:

    • DBMS_CLOUD.COPY_DATA

    • DBMS_CLOUD.CREATE_EXTERNAL_TABLE

    • DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE

    • DBMS_CLOUD.CREATE_HYBRID_PART_TABLE

      For DBMS_CLOUD.CREATE_HYBRID_PART_TABLE this option is only applicable to the Object Storage files.

    • DBMS_CLOUD.COPY_COLLECTION

  • Used to pass parameters to encrypt for these procedure:

    • DBMS_CLOUD.EXPORT_DATA

For example:

format => JSON_OBJECT('encryption' value json_object ('type' value DBMS_CRYPTO.ENCRYPT_AES256 + DBMS_CRYPTO.CHAIN_CBC + DBMS_CRYPTO.PAD_PKCS5, 'credential_name' value 'ENCRYPTION_CRED'))
encryption:value

Where value is a JSON string that provides additional parameters for encryption:

type: value

Specifies the encryption type.

credential_name: value

Specifies the credential used to store the encryption key.

user_defined_function: value

Specifies a fully qualified user-defined function to decrypt or encrypt the specified BLOB (binary large object).

endquote

Data can be enclosed between two delimiters, specified with quote and endquote. The quote and endquote characters are removed during loading when specified.

For example:

format => JSON_OBJECT(‘quote’ value ‘(’, ‘endquote’ value ‘)’)

endquote:character

Default value: Null, meaning no endquote.

escape

The character "\" is used as the escape character when specified.

escape : true

Default value: false

ignoreblanklines

Option valid with JSON data

Blank lines are ignored when set to true.

ignoreblanklines : true

Default value: False

ignoremissingcolumns

If there are more columns in the field_list than there are in the source files, the extra columns are stored as null.

ignoremissingcolumns : true

Default value False

jsonpath

Only use with COPY_COLLECTION

JSON path to identify the document to load.

This option is valid only for JSON collection data with DBMS_CLOUD.COPY_COLLECTION.

jsonpath: string

Default value: Null

keyassignment

Only use with COPY_COLLECTION

Specifies whether a new collection is created as a mongo-compatible collection or as a SODA collection.

When the value is set to embedded_oid, a new collection is created as a mongo-compatible collection.

By default this parameter is not set, meaning a new collection is created as a SODA collection.

keyassignment: embedded_oid

Default: keyassignment is not set

keypath

Only use with COPY_COLLECTION

Specifies an attribute in the data to be loaded as the '_id' value.

If keypath is specified then you must also specify the keyassignment value as embedded_oid.

Set the value to a path, for example, '$.mykey', to pick the value of the path as '_id' value.

This parameter is optional and is only valid for loading into mongo-compatible collections.

If not specified, Oracle generates a 12-byte unique system ID and populates that as the '_id' attribute, if an '_id' attribute is not already present in the data being loaded.

keypath: string

Default: keypath is not set.

When keypath is set, the default string value is NULL.

language

Specifies a language name (for example, FRENCH), from which locale-sensitive information can be derived.

language: string

Default value: Null

See Locale Data in Oracle Database Globalization Support Guide for a listing of Oracle-supported languages.

logdir

Specifies a string value that determines the directory object name where the logfile_table or badfile_table files are saved.

By default, the logdir is not case-sensitive, but the case is reserved when the specified value is enclosed in double-quotes.

For example:

format => JSON_OBJECT ('logdir' value 'test_log')

The logdir format option specified in the above example saves the logfile_table or badfile_table files in the TEST_LOG directory object.

format => JSON_OBJECT ('logdir' value '"test_log"')

The logdir format option specified in the above example saves the logfile_table or badfile_table files in the test_log directory object.

logdir: string

Default value: DATA_PUMP_DIR

logprefix

Specifies a string value that determines the prefix for the logfile_table and badfile_table files.

The log table name format is: logprefix$operation_id

By default, the logprefix is in upper case, but the case is reserved when the specified value is enclosed in double-quotes.

For example:

format => JSON_OBJECT ('logprefix' value 'TEST')

Log files then use the TEST prefix, such as: TEST$2_LOG and TEST$2_BAD.

logprefix: string

Default value: COPY

logretention

Specifies a positive integer duration, in days, for which the logfile_table and badfile_table files are retained.

Valid values: 0 to 99999

For example:

format => JSON_OBJECT ('logretention' value 7)

logretention: number

Default value: 2

maxdocsize

This option is valid only with JSON data

Maximum size of JSON documents.

maxdocsize: number

Default value: 1 Megabyte

Maximum allowed value: 2 Gigabytes

numericcharacters

Specifies the characters to use as the group separator and decimal character.

decimal_character: The decimal separates the integer portion of a number from the decimal portion.

group_separator: The group separator separates integer groups (that is, thousands, millions, billions, and so on).

numericcharacters: 'decimal_character group_separator'

Default value: ".,"

See NLS_NUMERIC_CHARACTERS in Oracle Database Globalization Support Guide for more information.

numberformat

Specifies the number format model. Number format models cause the number to be rounded to the specified number of significant digits. A number format model is composed of one or more number format elements.

This is used in combination with numericcharacters.

numberformat: number_format_model

Default value: is derived from the setting of the NLS_TERRITORY parameter

See Number Format Models in SQL Language Reference for more information.

partition_columns

The format option partition_columns is used with DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE to specify the column names and data types of partition columns when the partition columns are derived from the file path, depending on the type of data file, structured or unstructured:

  • When the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE includes the column_list parameter and the data file is unstructured, such as with CSV text files, partition_columns does not include the data type. For example, use a format such as the following for this type of partition_columns specification:

    '"partition_columns":["state","zipcode"]'

    The data type is not required because it is specified in the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE column_list parameter.

  • When the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE does not include the column_list parameter and the data files are structured, such as Avro, ORC, or Parquet files, the partition_columns option includes the data type. For example, the following shows a partition_columns specification:

    '"partition_columns":[
                   {"name":"country", "type":"varchar2(10)"},
                   {"name":"year", "type":"number"},
                   {"name":"month", "type":"varchar2(10)"}]'

If the data files are unstructured and the type sub-clause is specified with partition_columns, the type sub-clause is ignored.

For object names that are not based on hive format, the order of the partition_columns specified columns must match the order as they appear in the object name in the file_uri_list.

 

quote

Specifies the quote character for the fields, the quote characters are removed during loading when specified.

quote: character

Default value: Null meaning no quote

recorddelimiter

Option valid with JSON data

Specifies the record delimiter.

By default, DBMS_CLOUD tries to automatically find the correct newline character as the delimiter. It first searches the file for the Windows newline character "\r\n". If it finds the Windows newline character, this is used as the record delimiter for all files in the procedure. If a Windows newline character is not found, it searches for the UNIX/Linux newline character "\n" and if it finds one it uses "\n" as the record delimiter for all files in the procedure.

Specify this argument explicitly if you want to override the default behavior, for example:

format => json_object('recorddelimiter' VALUE '''\r\n''')

To indicate that there is no record delimiter you can specify a recorddelimiter that does not occur in the input file. For example, to indicate that there is no delimiter, specify the control character 0x01 (SOH) as a value for the recorddelimiter and set the recorddelimiter value to "0x''01''" (this character does not occur in JSON text). For example:

format => '{"recorddelimiter" : "0x''01''"}'

The recorddelimiter is set once per procedure call. If you are using the default value, detected newline, then all files use the same record delimiter, if one is detected.

recorddelimiter: character

Default value: detected newline

regexuri

The format option regexuri is used with the following DBMS_CLOUD procedures:

  • COPY_COLLECTION

  • COPY_DATA

  • CREATE_EXTERNAL_TABLE

  • CREATE_EXTERNAL_PART_TABLE

  • CREATE_HYBRID_PART_TABLE

When the value of regexuri is set to TRUE, you can use wildcards as well as regular expressions in the file names in Cloud source file URIs.

The characters "*" and "?" are considered wildcard characters when the regexuri parameter is set to FALSE. When the regexuri parameter is set to TRUE the characters "*" and "?" are part of the specified regular expression pattern.

Regular expression patterns are only supported for the file name or subfolder path in your URIs and the pattern matching is identical to that performed by the REGEXP_LIKE function. Regular expression patterns are not supported for directory names.

For external tables, this option is only supported with the tables that are created on a file in the Object Storage.

For example:

format => JSON_OBJECT('regexuri' value TRUE)

See REGEXP_LIKE Condition for more information on REGEXP_LIKE condition.

regexuri: True

Default value : False

rejectlimit

The operation will error out after specified number of rows are rejected.

rejectlimit: number

Default value: 0

removequotes

Removes any quotes that are around any field in the source file.

removequotes: true

Default value: False

skipheaders

Specifies how many rows should be skipped from the start of the file.

skipheaders: number

Default value: 0 if not specified, 1 if specified without a value

territory

Specifies a territory name to further determine input data characteristics.

territory: string

Default value: Null

See Locale Data in Oracle Database Globalization Support Guide for a listing of Oracle-supported territories.

timestampformat

Specifies the timestamp format in the source file. The format option AUTO searches for the following formats:

YYYY-MM-DD HH:MI:SS.FF 
YYYY-MM-DD HH:MI:SS.FF3
YYYY-MM-DD HH24:MI:SS.FF3
MM/DD/YYYY HH:MI:SS.FF3

timestampformat : string

Default value: Database timestamp format

The string can contain wildcard characters such as "$".

timestampltzformat

Specifies the timestamp with local timezone format in the source file. The format option AUTO searches for the following formats:

DD Mon YYYY HH:MI:SS.FF TZR 
MM/DD/YYYY HH:MI:SS.FF TZR 
YYYY-MM-DD HH:MI:SS+/-TZR 
YYYY-MM-DD HH:MI:SS.FF3 
DD.MM.YYYY HH:MI:SS TZR

timestampltzformat : string

Default value: Database timestamp with local timezone format

timestamptzformat

Specifies the timestamp with timezone format in the source file. The format option AUTO searches for the following formats:

DD Mon YYYY HH:MI:SS.FF TZR 
MM/DD/YYYY HH:MI:SS.FF TZR 
YYYY-MM-DD HH:MI:SS+/-TZR 
YYYY-MM-DD HH:MI:SS.FF3 
DD.MM.YYYY HH:MI:SS TZR

timestamptzformat: string

Default value: Database timestamp with timezone format

trimspaces

Specifies how the leading and trailing spaces of the fields are trimmed.

See the description of trim_spec.

trimspaces: rtrim| ltrim| notrim| lrtrim| ldrtrim

Default value: notrim

truncatecol

If the data in the file is too long for a field, then this option will truncate the value of the field rather than reject the row.

truncatecol:true

Default value: False

type

Specifies the source file type.

See the description of CSV in field_definitions Clause

If the type is datapump, then the only other valid format option is rejectlimit.

If the type is datapump, then the only Object Stores supported are Oracle Cloud Infrastructure Object Storage and Oracle Cloud Infrastructure Object Storage Classic.

See DBMS_CLOUD Package Format Options for Avro, ORC, or Parquet for type values avro, orc, or parquet.

For JSON data with DBMS_CLOUD.COPY_COLLECTION type has two valid values: json (default) and ejson. For DBMS_CLOUD.COPY_COLLECTION these values both specify that the input is JSON data. The value ejson causes extended objects in the textual JSON input data to be translated to scalar JSON values in the native binary JSON collection. The value json does not perform this transformation and all objects in the input data are converted to binary JSON format.

For JSON data with DBMS_CLOUD.COPY_DATA type has one valid value: json. This value specifies that the input is JSON data.

type: csv|csv with embedded|csv without embedded |avro||datapump|orc|parquet

Note:

Not all DBMS_CLOUD procedures support all of these types.

csv is the same as csv without embedded.

Default value: Null

For JSON data there are two valid type values for use with DBMS_CLOUD.COPY_COLLECTION: json|ejson In this case the default value is json. For JSON data with DBMS_CLOUD.COPY_DATA, only json is valid.

unpackarrays

Only use with COPY_COLLECTION

When set to true, if a loaded document is an array, then the contents of the array are loaded as documents rather than the array itself. This only applies to the top-level array.

When set to true, the entire array is inserted as a single document.

This option is valid only for JSON collection data with DBMS_CLOUD.COPY_COLLECTION.

unpackarrays: true

Default value: False