DBMS_CLOUD Package Format Options

The format argument in DBMS_CLOUD specifies the format of source files.

The two ways to specify the format argument are:

format => '{"format_option" : “format_value” }'  

And:

format => json_object('format_option' value 'format_value'))

Examples:

format => json_object('type' VALUE 'CSV')

To specify multiple format options, separate the values with a ",".

For example:

format => json_object('ignoremissingcolumns' value 'true', 'removequotes' value 'true', 
                           'dateformat' value 'YYYY-MM-DD-HH24-MI-SS', 'blankasnull' value 'true')

Note:

For Avro, ORC, or Parquet format options, see DBMS_CLOUD Package Format Options for Avro, ORC, or Parquet.

As noted in the Format Option column, a limited set of format options are valid with DBMS_CLOUD.COPY_COLLECTION or with DBMS_CLOUD.COPY_DATA when the format type is JSON.

Format Option Description Syntax

blankasnull

When set to true, loads fields consisting of spaces as null.

blankasnull : true

Default value: False

characterset

Valid with format JSON and COPY_DATA

Specifies the characterset of source files

characterset: string

Default value: Database characterset

columnpath

Only use with format JSON and COPY_DATA

Array of JSON path expressions that correspond to the fields that need to be extracted from the JSON records. Each of the JSON path expressions in the array should follow the rules described in SQL/JSON Path Expressions .

Only use with format JSON and DBMS_CLOUD.COPY_DATA.

JSON Array of json path expressions expressed in string format. For example: 'columnpath' value '["$.WEATHER_STATION_ID", "$.WEATHER_STATION_NAME"]'

compression

Option valid with JSON data

Specifies the compression type of the source file.

ZIP archiving format is not supported.

Specifying the value auto checks for the compression types: gzip, zlib, bzip2.

compression: auto|gzip|zlib|bzip2

Default value: Null value meaning no compression.

conversionerrors

If a row is rejected because of data type conversion errors, the related columns are stored as null or the row is rejected.

conversionerrors : reject_record | store_null

Default value: reject_record

dateformat

Specifies the date format in the source file. The format option AUTO searches for the following formats:

J 
MM-DD-YYYYBC 
MM-DD-YYYY 
YYYYMMDD HHMISS 
YYMMDD HHMISS 
YYYY.DDD 
YYYY-MM-DD

dateformat : string

Default value: Database date format

delimiter

Specifies the field delimiter.

To use a special character as the delimiter, specify the HEX value of the ASCII code of the character. For example, the following specifies the TAB character as the delimiter:

format => json_object('delimiter' value 'X''9''')

delimiter : character

Default value | (pipe character)

endquote

Data can be enclosed between two delimiters, specified with quote and endquote. The quote and endquote characters are removed during loading when specified.

For example:

format => JSON_OBJECT(‘quote’ value ‘(’, ‘endquote’ value ‘)’)

endquote:character

Default value: Null, meaning no endquote.

escape

The character "\" is used as the escape character when specified.

escape : true

Default value: false

ignoreblanklines

Option valid with JSON data

Blank lines are ignored when set to true.

ignoreblanklines : true

Default value: False

ignoremissingcolumns

If there are more columns in the field_list than there are in the source files, the extra columns are stored as null.

ignoremissingcolumns : true

Default value False

jsonpath

Only use with COPY_COLLECTION

JSON path to identify the document to load.

This option is valid only for JSON collection data with DBMS_CLOUD.COPY_COLLECTION.

jsonpath: string

Default value: Null

language

Specifies a language name (for example, FRENCH), from which locale-sensitive information can be derived.

language: string

Default value: Null

See Locale Data in Oracle Database Globalization Support Guide for a listing of Oracle-supported languages.

maxdocsize

This option is valid only with JSON data

Maximum size of JSON documents.

maxdocsize: number

Default value: 1 Megabyte

Maximum allowed value: 2 Gigabytes

numericcharacters

Specifies the characters to use as the group separator and decimal character.

decimal_character: The decimal separates the integer portion of a number from the decimal portion.

group_separator: The group separator separates integer groups (that is, thousands, millions, billions, and so on).

numericcharacters: 'decimal_character group_separator'

Default value: ".,"

See NLS_NUMERIC_CHARACTERS in Oracle Database Globalization Support Guide for more information.

numberformat

Specifies the number format model. Number format models cause the number to be rounded to the specified number of significant digits. A number format model is composed of one or more number format elements.

This is used in combination with numericcharacters.

numberformat: number_format_model

Default value: is derived from the setting of the NLS_TERRITORY parameter

See Number Format Models in SQL Language Reference for more information.

partition_columns

The format option partition_columns is used with DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE to specify the column names and data types of partition columns when the partition columns are derived from the file path, depending on the type of data file, structured or unstructured:

  • When the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE includes the column_list parameter and the data file is unstructured, such as with CSV text files, partition_columns does not include the data type. For example, use a format such as the following for this type of partition_columns specification:

    '"partition_columns":["state","zipcode"]'

    The data type is not required because it is specified in the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE column_list parameter.

  • When the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE does not include the column_list parameter and the data files are structured, such as Avro, ORC, or Parquet files, the partition_columns option includes the data type. For example, the following shows a partition_columns specification:

    '"partition_columns":[
                   {"name":"country", "type":"varchar2(10)"},
                   {"name":"year", "type":"number"},
                   {"name":"month", "type":"varchar2(10)"}]'

If the data files are unstructured and the type sub-clause is specified with partition_columns, the type sub-clause is ignored.

For object names that are not based on hive format, the order of the partition_columns specified columns must match the order as they appear in the object name in the file_url_list.

 

quote

Specifies the quote character for the fields, the quote characters are removed during loading when specified.

quote: character

Default value: Null meaning no quote

recorddelimiter

Option valid with JSON data

Specifies the record delimiter.

By default, DBMS_CLOUD tries to automatically find the correct newline character as the delimiter. It first searches the file for the Windows newline character "\r\n". If it finds the Windows newline character, this is used as the record delimiter for all files in the procedure. If a Windows newline character is not found, it searches for the UNIX/Linux newline character "\n" and if it finds one it uses "\n" as the record delimiter for all files in the procedure.

Specify this argument explicitly if you want to override the default behavior, for example:

format => json_object('recorddelimiter' VALUE '''\r\n''')

To indicate that there is no record delimiter you can specify a recorddelimiter that does not occur in the input file. For example, to indicate that there is no delimiter, specify the control character 0x01 (SOH) as a value for the recorddelimiter and set the recorddelimiter value to "0x''01''" (this character does not occur in JSON text). For example:

format => '{"recorddelimiter" : "0x''01''"}'

The recorddelimiter is set once per procedure call. If you are using the default value, detected newline, then all files use the same record delimiter, if one is detected.

recorddelimiter: character

Default value: detected newline

rejectlimit

Option valid with JSON data

The operation will error out after specified number of rows are rejected.

rejectlimit: number

Default value: 0

removequotes

Removes any quotes that are around any field in the source file.

removequotes: true

Default value: False

skipheaders

Specifies how many rows should be skipped from the start of the file.

skipheaders: number

Default value: 0 if not specified, 1 if specified without a value

territory

Specifies a territory name to further determine input data characteristics.

territory: string

Default value: Null

See Locale Data in Oracle Database Globalization Support Guide for a listing of Oracle-supported territories.

timestampformat

Specifies the timestamp format in the source file. The format option AUTO searches for the following formats:

YYYY-MM-DD HH:MI:SS.FF 
YYYY-MM-DD HH:MI:SS.FF3 
MM/DD/YYYY HH:MI:SS.FF3

timestampformat : string

Default value: Database timestamp format

The string can contain wildcard characters such as "$".

timestampltzformat

Specifies the timestamp with local timezone format in the source file. The format option AUTO searches for the following formats:

DD Mon YYYY HH:MI:SS.FF TZR 
MM/DD/YYYY HH:MI:SS.FF TZR 
YYYY-MM-DD HH:MI:SS+/-TZR 
YYYY-MM-DD HH:MI:SS.FF3 
DD.MM.YYYY HH:MI:SS TZR

timestampltzformat : string

Default value: Database timestamp with local timezone format

timestamptzformat

Specifies the timestamp with timezone format in the source file. The format option AUTO searches for the following formats:

DD Mon YYYY HH:MI:SS.FF TZR 
MM/DD/YYYY HH:MI:SS.FF TZR 
YYYY-MM-DD HH:MI:SS+/-TZR 
YYYY-MM-DD HH:MI:SS.FF3 
DD.MM.YYYY HH:MI:SS TZR

timestamptzformat: string

Default value: Database timestamp with timezone format

trimspaces

Specifies how the leading and trailing spaces of the fields are trimmed.

See the description of trim_spec.

trimspaces: rtrim| ltrim| notrim| lrtrim| ldrtrim

Default value: notrim

truncatecol

If the data in the file is too long for a field, then this option will truncate the value of the field rather than reject the row.

truncatecol: true

Default value: False

type

Specifies the source file type.

See the description of CSV in field_definitions Clause

If the type is datapump, then the only other valid format option is rejectlimit.

If the type is datapump, then the only Object Stores supported are Oracle Cloud Infrastructure Object Storage and Oracle Cloud Infrastructure Object Storage Classic.

See DBMS_CLOUD Package Format Options for Avro, ORC, or Parquet for type values avro, orc, or parquet.

For JSON data with DBMS_CLOUD.COPY_COLLECTION type has two valid values: json (default) and ejson. For DBMS_CLOUD.COPY_COLLECTION these values both specify that the input is JSON data. The value ejson causes extended objects in the textual JSON input data to be translated to scalar JSON values in the native binary JSON collection. The value json does not perform this transformation and all objects in the input data are converted to binary JSON format.

For JSON data with DBMS_CLOUD.COPY_DATA type has one valid value: json. This value specifies that the input is JSON data.

type: csv|csv with embedded|csv without embedded | datapump

csv is the same as csv without embedded.

Default value: Null

For JSON data there are two valid type values for use with DBMS_CLOUD.COPY_COLLECTION: json|ejson In this case the default value is json. For JSON data with DBMS_CLOUD.COPY_DATA, only json is valid.

unpackarrays

Only use with COPY_COLLECTION

When set to true, if a loaded document is an array, then the contents of the array are loaded as documents rather than the array itself. This only applies to the top-level array.

When set to true, the entire array is inserted as a single document.

This option is valid only for JSON collection data with DBMS_CLOUD.COPY_COLLECTION.

unpackarrays: true

Default value: False