JSON Configuration Fields Specifying Migrator Parameters

21.4 JSON Configuration Fields Specifying Migrator Parameters

You configure JSON-to-duality migration by passing a migrator configuration object as argument to PL/SQL DBMS_JSON_DUALITY subprograms infer_schema, infer_and_generate_schema, and import_all. The supported fields of such an object are described.

Note:

You might want to skim this topic on a first reading, and refer back to it later. The information is presented here to give you an idea of what's available.

Procedure import_all is the only subprogram that uses configuration fields errorLog, errorLogSchema, and rejectLimit. Functions infer_schema and infer_and_generate_schema are the only subprograms that use configuration fields hints, ingestLimit, minFieldFrequency, minTypeFrequency, normalize, outputFormat, softnessThreshold, tablespace, updatability, and useFlexFields.

The configuration fields actually used by the various DBMS_JSON_DUALITY migrator subprograms are thus different, but there is some overlap. You can pass any of the configuration fields to any of these subprograms; fields that aren't used are ignored. In particular, this means that you can pass a common configuration document to any of these subprograms.

Instead of accepting a JSON configuration object, PL/SQL subprograms import, validate_schema_report, and validate_import_report accept specific non-JSON arguments that act the same as, or similarly to, the use of some of the configuration fields. The parameter names are similar to the field names, and the field descriptions here generally apply to the corresponding parameters as well. For example, parameter table_name of function validate_import_report corresponds to configuration field tableName.

These are the migrator configuration fields. All of them except tableNames are optional. The use of any fields other than those listed raises an error.

errorLog (Optional) — A string that names the single error log to use, or an array of strings that name the error logs to use, one for each duality view.

Field errorLog is used only for procedure import_all.
errorLogSchema (Optional) — A string that names the database schema (user) that owns the error log(s). If you don't specify an error-log owner in errorLogSchema, then the name of the currently connected user is used.

Field errorLogSchema is used only for procedure import_all.
hints (Optional) — A JSON array with elements that are JSON objects whose fields specify overrides for the behavior of the converter in generating a relational schema. The name "hint" is a bit of a misnomer, as these are imperatives, not mere suggestions: if a hint can't be respected for some reason then an error is raised; a hint is never ignored. An error is also raised if a hint is specified incorrectly.

A hint object must have these fields (otherwise, an error is raised):
- type — The value is one of these strings, specifying the type of migrator-behavior override:
  - "datatype" — Mandates the SQL data type to use for a column in a table underlying a duality view definition. The column corresponds to the document field targeted by path. Field value is a string naming a scalar SQL data type (including "json" and "vector", for types JSON and VECTOR). The data-type name is interpreted case-insensitively, and it can be any column type accepted by CREATE TABLE.
  - "key" — Mandates the identifying column or columns for the table underlying the document object (or array of objects) targeted by field path. Field value is either a string naming a scalar JSON field whose column is an identifying column, or it is an array of such field-name strings, each of whose column is an identifying column for the table (that is, together these columns uniquely identify a row; for an example, see: Car-Racing Example, Tables in JSON-Relational Duality Developer's Guide ).
  - "normalize" — Mandates (if field value is false) that the JSON data targeted in documents by field path is not to be shared.
    
    This applies to all documents in the input document sets acted on by infer_schema and infer_and_generate_schema (field hints is used only by those functions). The targeting of document data is not specific to any particular kind of document. Any document, of any kind, with data that comes from columns in table and is targeted by the same path value, is affected.
    
    Field path must target an object, which can be at any level (including the root object of the document, which is targeted by path $). An error is raised if path targets a scalar or array value.
    
    In effect, the given table is locked/dedicated to data at the specified path, and vice versa. The table is not mapped to any other data than that targeted by path, anywhere in the input document sets. The data targeted by the given path (in any document) is thus not shared anywhere at a different path, either within the same document or in different documents. It is shared in all documents at the locations specified by path.
    
    Only the table underlying the targeted object is locked; sharing of data underlying a subobject within the targeted object is controlled by its own underlying table.
    
    If field value is true then this hint has no effect, since trying to normalize input JSON data is the default behavior.
- table — A string naming an input table whose document-set data is used to define a duality view.
- path — A SQL/JSON path expression string that targets data in input JSON documents. An error is raised if the path targets no data.
- value — Information specific to the particular type, providing detail that defines the hint. (This is the only hint field whose value is not necessarily a string. For normalize it is a Boolean.)
Field hints is used only for functions infer_schema and infer_schema_and_generate.
ingestLimit (Optional) — The maximum number of documents to be analyzed in each document set. No error is raised if the limit is exceeded; the additional documents are simply not examined.

The default value is 100,000.

Field ingestLimit is used only for functions infer_schema and infer_schema_and_generate.
minFieldFrequency (Optional) — The minimum frequency for a field not to be considered an outlier (high-entropy).

A field is an occurrence outlier for a given document set if it occurs in less than minFieldFrequency percent of the documents. A value of zero (0) percent means that no fields are considered as outliers.

For example, in the input course documents, if a value of 25 is used for minFieldFrequency then field Notes is an occurrence outlier because it occurs in less than 25% of the documents in the course document set.

The default minFieldFrequency value is 5, meaning that a field that occurs in less than 5% of an input document-set's documents is considered high-entropy.

The converter does not map an occurrence-outlier field to any underlying column. When there are flex columns, the importer puts all fields (such as occurrence-outlier fields) that are not mapped to columns into the flex columns corresponding to the field locations.

Note:

In the student-teacher-course examples presented in this documentation, which involve very few documents in each document set, we use 25 as the minFieldFrequency value, in order to demonstrate the determination and handling of occurrence outliers.
minTypeFrequency (Optional) — The minimum frequency for the type of a field's value not to be considered an outlier (high-entropy).

A field is an type-occurrence outlier, or type outlier, for a given document set if any of its values occurs with a given type in less than minTypeFrequency percent of the documents. A value of zero (0) percent means that no fields are considered as outliers.

For example, in the input course documents, if a value of 15 is used for minTypeFrequency then student field age is a type outlier because it has a string value in 10% (less than 15%) of the documents. (It has a number value in the other documents.)

The default minTypeFrequency value is 5, meaning that a field has a given type in less than 5% of an input document-set's documents is considered an outlier.

The importer tries to convert a value of rare type to the common type for the field. For example, if the common type for a length field is number then a length occurrence with a value of "42" is converted to the number 42. If the conversion attempt fails then an error is logged for that occurrence.

Note:

In the examples presented here, which involve very few documents in each document set, we use 15 as the minTypeFrequency value, in order to demonstrate the determination and handling of type outliers.
normalize (Optional) — A Boolean value (true/false) that indicates whether the converter should try to normalize (share) the relational tables it infers. A false value means that each object in a document supported by a duality view has its own underlying table, that is, a table that's not shared with any other duality view.

The default value is true.

Field normalize is used only for functions infer_schema and infer_schema_and_generate.

Note that this top-level configuration field normalize applies to the general converter behavior, for all tables and duality views being generated. On the other hand, field normalize in a hints field's object provides a more fine-grained prevention of sharing, the sharing of a table that underlies a particular document object.
outputFormat (Optional) — A string whose value defines the format of the output data definition language (DDL) script.

The default value is "executable", which means you can execute the DDL script directly: it uses PL/SQL EXECUTE IMMEDIATE. The other possible value is "standalone", which means you can use the DDL script in a SQL script that you run separately.

Field outputFormat is used only for functions infer_schema and infer_schema_and_generate.

If the generated DDL is larger than 32K bytes then you must use "standalone; otherwise, an error is raised when EXECUTE IMMEDIATE is invoked. An "executable" DDL script can be too large if the input data sets are themselves very large or if they have many levels of nested values.
rejectLimit (Optional) — The maximum number of errors that can be logged. If this limit is exceeded then the import operation is canceled (fails) and is rolled back, so no error logs are available. By default there is no limit.

Field rejectLimit is used only for procedure import_all.
softnessThreshold (Optional) — The minimum cleanliness level allowed for input data. The default value is 99, meaning that at least 99% of the input documents must not have missing or incorrect information.

Field softnessThreshold is used only for functions infer_schema and infer_schema_and_generate.
sourceSchema (Optional) — A string whose value is the name of the database schema (user) that owns the input tables (tableNames).

If not provided then the database schema used to identify the input tables is the one that's current when the DDL is generated (not when it is executed).
tableNames (Required) — An array of strings naming the Oracle Database transfer tables that correspond to the original external document sets. Each table must have a JSON-type column (it need not be named data), which stores the documents of a given document set.

If field viewNames is provided then its array length must be the same as that of field tableNames; otherwise, an error is raised (not logged).
tablespace (Optional) — A string whose value is the name of the tablespace to use for all of the tables underlying the duality views.

If not provided then no tablespace is specified in the output DDL. This means that the tablespace used is the one that's current at the time the DDL code is executed (not when it is generated).

Field tablespace is used only for functions infer_schema and infer_schema_and_generate.
targetSchema (Optional) — A string whose value is the name of the database schema (user) that will own the output database views (viewNames).

If not provided then no database schema is specified in the output DDL; the names of the database objects to be created are unqualified. This means that the schema used is the one that's current at the time the DDL code is executed (not when it is generated).
updatability (Optional) — A Boolean value determining whether the duality views to be generated are to be updatable (true) or not (false). When true, annotations (for an example, see Annotations (NO)UPDATE, (NO)INSERT, (NO)DELETE, To Allow/Disallow Updating Operations) are set for maximum updatability of each view. When false all of the views created are read-only.

The default value is true.

Field updatability is used only for functions infer_schema and infer_schema_and_generate.
useFlexFields (Optional) — A Boolean value determining whether flex columns are to be added to the tables underlying the duality views. Flex columns are used at application runtime to store unrecognized fields in an incoming document to be inserted or updated.

If useFlexFields is true, then for each duality view <view-name>, a flex column named ora$<view-name>_flex is added to each table that directly underlies the top-level fields of an object in the supported documents. (The fields stored in a given flex column are unnested to that object.)

The default value is true.

Field useFlexFields is used only by converter functions infer_schema and infer_schema_and_generate.

The importer doesn't use field useFlexFields. But when flex columns have been created by the converter, the importer puts all fields that are not mapped to columns into flex columns corresponding to the field locations. For example, occurrence-outlier fields are handled this way. If there are no flex columns then the importer reports an error for an unmapped field.
viewNames (Optional) — An array of strings naming the duality views to be created, one for each document set.

If not provided then the tableNames with _duality appended are used as the view names. For example the name of the view corresponding to the documents in table foo defaults to foo_duality.

If field viewNames is provided then its array length must be the same as that of field tableNames; otherwise, an error is raised (not logged).

Parent topic: Migrating From JSON To Duality