21.4 JSON Configuration Fields Specifying Migrator Parameters
You configure JSON-to-duality migration by passing a migrator
configuration object as argument to PL/SQL DBMS_JSON_DUALITY
subprograms infer_schema
, infer_and_generate_schema
, and
import_all
. The supported fields of such an object are
described.
Note:
You might want to skim this topic on a first reading, and refer back to it later. The information is presented here to give you an idea of what's available.
Procedure import_all
is the only subprogram that uses
configuration fields errorLog
, errorLogSchema
, and
rejectLimit
. Functions infer_schema
and
infer_and_generate_schema
are the only subprograms that use
configuration fields hints
, ingestLimit
,
minFieldFrequency
, minTypeFrequency
,
normalize
, outputFormat
,
softnessThreshold
, tablespace
,
updatability
, and useFlexFields
.
The configuration fields actually used by the various
DBMS_JSON_DUALITY
migrator subprograms are thus different, but
there is some overlap. You can pass any of the configuration fields to any of these
subprograms; fields that aren't used are ignored. In particular, this means that you can
pass a common configuration document to any of these subprograms.
Instead of accepting a JSON configuration object, PL/SQL subprograms
import
, validate_schema_report
, and
validate_import_report
accept specific non-JSON arguments
that act the same as, or similarly to, the use of some of the configuration fields. The
parameter names are similar to the field names, and the field descriptions here
generally apply to the corresponding parameters as well. For example, parameter
table_name
of function validate_import_report
corresponds to configuration field tableName
.
These are the migrator configuration fields. All of them except
tableNames
are optional. The use of any fields other than those
listed raises an error.
-
errorLog
(Optional) — A string that names the single error log to use, or an array of strings that name the error logs to use, one for each duality view.Field
errorLog
is used only for procedureimport_all
. -
errorLogSchema
(Optional) — A string that names the database schema (user) that owns the error log(s). If you don't specify an error-log owner inerrorLogSchema
, then the name of the currently connected user is used.Field
errorLogSchema
is used only for procedureimport_all
. -
hints
(Optional) — A JSON array with elements that are JSON objects whose fields specify overrides for the behavior of the converter in generating a relational schema. The name "hint" is a bit of a misnomer, as these are imperatives, not mere suggestions: if a hint can't be respected for some reason then an error is raised; a hint is never ignored. An error is also raised if a hint is specified incorrectly.A hint object must have these fields (otherwise, an error is raised):
-
type
— The value is one of these strings, specifying the type of migrator-behavior override:-
"datatype"
— Mandates the SQL data type to use for a column in a table underlying a duality view definition. The column corresponds to the document field targeted bypath
. Fieldvalue
is a string naming a scalar SQL data type (including"json"
and"vector"
, for typesJSON
andVECTOR
). The data-type name is interpreted case-insensitively, and it can be any column type accepted byCREATE TABLE
. -
"key"
— Mandates the identifying column or columns for the table underlying the document object (or array of objects) targeted by fieldpath
. Fieldvalue
is either a string naming a scalar JSON field whose column is an identifying column, or it is an array of such field-name strings, each of whose column is an identifying column for the table (that is, together these columns uniquely identify a row; for an example, see: Car-Racing Example, Tables in JSON-Relational Duality Developer's Guide ). -
"normalize"
— Mandates (if fieldvalue
isfalse
) that the JSON data targeted in documents by fieldpath
is not to be shared.This applies to all documents in the input document sets acted on by
infer_schema
andinfer_and_generate_schema
(fieldhints
is used only by those functions). The targeting of document data is not specific to any particular kind of document. Any document, of any kind, with data that comes from columns intable
and is targeted by the samepath
value, is affected.Field
path
must target an object, which can be at any level (including the root object of the document, which is targeted by path$
). An error is raised ifpath
targets a scalar or array value.In effect, the given table is locked/dedicated to data at the specified path, and vice versa. The table is not mapped to any other data than that targeted by
path
, anywhere in the input document sets. The data targeted by the given path (in any document) is thus not shared anywhere at a different path, either within the same document or in different documents. It is shared in all documents at the locations specified by path.Only the table underlying the targeted object is locked; sharing of data underlying a subobject within the targeted object is controlled by its own underlying table.
If field
value
istrue
then this hint has no effect, since trying to normalize input JSON data is the default behavior.
-
-
table
— A string naming an input table whose document-set data is used to define a duality view. -
path
— A SQL/JSON path expression string that targets data in input JSON documents. An error is raised if the path targets no data. -
value
— Information specific to the particulartype
, providing detail that defines the hint. (This is the only hint field whose value is not necessarily a string. Fornormalize
it is a Boolean.)
Field
hints
is used only for functionsinfer_schema
andinfer_schema_and_generate
. -
-
ingestLimit
(Optional) — The maximum number of documents to be analyzed in each document set. No error is raised if the limit is exceeded; the additional documents are simply not examined.The default value is 100,000.
Field
ingestLimit
is used only for functionsinfer_schema
andinfer_schema_and_generate
. -
minFieldFrequency
(Optional) — The minimum frequency for a field not to be considered an outlier (high-entropy).A field is an occurrence outlier for a given document set if it occurs in less than
minFieldFrequency
percent of the documents. A value of zero (0
) percent means that no fields are considered as outliers.For example, in the input course documents, if a value of
25
is used forminFieldFrequency
then fieldNotes
is an occurrence outlier because it occurs in less than 25% of the documents in the course document set.The default
minFieldFrequency
value is5
, meaning that a field that occurs in less than 5% of an input document-set's documents is considered high-entropy.The converter does not map an occurrence-outlier field to any underlying column. When there are flex columns, the importer puts all fields (such as occurrence-outlier fields) that are not mapped to columns into the flex columns corresponding to the field locations.
Note:
In the student-teacher-course examples presented in this documentation, which involve very few documents in each document set, we use
25
as theminFieldFrequency
value, in order to demonstrate the determination and handling of occurrence outliers. -
minTypeFrequency
(Optional) — The minimum frequency for the type of a field's value not to be considered an outlier (high-entropy).A field is an type-occurrence outlier, or type outlier, for a given document set if any of its values occurs with a given type in less than
minTypeFrequency
percent of the documents. A value of zero (0) percent means that no fields are considered as outliers.For example, in the input course documents, if a value of
15
is used forminTypeFrequency
then student fieldage
is a type outlier because it has a string value in 10% (less than 15%) of the documents. (It has a number value in the other documents.)The default
minTypeFrequency
value is5
, meaning that a field has a given type in less than 5% of an input document-set's documents is considered an outlier.The importer tries to convert a value of rare type to the common type for the field. For example, if the common type for a
length
field isnumber
then alength
occurrence with a value of"42"
is converted to the number42
. If the conversion attempt fails then an error is logged for that occurrence.Note:
In the examples presented here, which involve very few documents in each document set, we use
15
as theminTypeFrequency
value, in order to demonstrate the determination and handling of type outliers. -
normalize
(Optional) — A Boolean value (true
/false
) that indicates whether the converter should try to normalize (share) the relational tables it infers. Afalse
value means that each object in a document supported by a duality view has its own underlying table, that is, a table that's not shared with any other duality view.The default value is
true
.Field
normalize
is used only for functionsinfer_schema
andinfer_schema_and_generate
.Note that this top-level configuration field
normalize
applies to the general converter behavior, for all tables and duality views being generated. On the other hand, fieldnormalize
in ahints
field's object provides a more fine-grained prevention of sharing, the sharing of a table that underlies a particular document object. -
outputFormat
(Optional) — A string whose value defines the format of the output data definition language (DDL) script.The default value is
"executable"
, which means you can execute the DDL script directly: it uses PL/SQLEXECUTE IMMEDIATE
. The other possible value is"standalone"
, which means you can use the DDL script in a SQL script that you run separately.Field
outputFormat
is used only for functionsinfer_schema
andinfer_schema_and_generate
.If the generated DDL is larger than 32K bytes then you must use
"standalone
; otherwise, an error is raised whenEXECUTE IMMEDIATE
is invoked. An"executable"
DDL script can be too large if the input data sets are themselves very large or if they have many levels of nested values. -
rejectLimit
(Optional) — The maximum number of errors that can be logged. If this limit is exceeded then the import operation is canceled (fails) and is rolled back, so no error logs are available. By default there is no limit.Field
rejectLimit
is used only for procedureimport_all
. -
softnessThreshold
(Optional) — The minimum cleanliness level allowed for input data. The default value is99
, meaning that at least 99% of the input documents must not have missing or incorrect information.Field
softnessThreshold
is used only for functionsinfer_schema
andinfer_schema_and_generate
. -
sourceSchema
(Optional) — A string whose value is the name of the database schema (user) that owns the input tables (tableNames
).If not provided then the database schema used to identify the input tables is the one that's current when the DDL is generated (not when it is executed).
-
tableNames
(Required) — An array of strings naming the Oracle Database transfer tables that correspond to the original external document sets. Each table must have aJSON
-type column (it need not be nameddata
), which stores the documents of a given document set.If field
viewNames
is provided then its array length must be the same as that of fieldtableNames
; otherwise, an error is raised (not logged). -
tablespace
(Optional) — A string whose value is the name of the tablespace to use for all of the tables underlying the duality views.If not provided then no tablespace is specified in the output DDL. This means that the tablespace used is the one that's current at the time the DDL code is executed (not when it is generated).
Field
tablespace
is used only for functionsinfer_schema
andinfer_schema_and_generate
. -
targetSchema
(Optional) — A string whose value is the name of the database schema (user) that will own the output database views (viewNames
).If not provided then no database schema is specified in the output DDL; the names of the database objects to be created are unqualified. This means that the schema used is the one that's current at the time the DDL code is executed (not when it is generated).
-
updatability
(Optional) — A Boolean value determining whether the duality views to be generated are to be updatable (true
) or not (false
). Whentrue
, annotations (for an example, see Annotations (NO)UPDATE, (NO)INSERT, (NO)DELETE, To Allow/Disallow Updating Operations) are set for maximum updatability of each view. Whenfalse
all of the views created are read-only.The default value is
true
.Field
updatability
is used only for functionsinfer_schema
andinfer_schema_and_generate
. -
useFlexFields
(Optional) — A Boolean value determining whether flex columns are to be added to the tables underlying the duality views. Flex columns are used at application runtime to store unrecognized fields in an incoming document to be inserted or updated.If
useFlexFields
istrue
, then for each duality view<view-name>
, a flex column namedora$<view-name>_flex
is added to each table that directly underlies the top-level fields of an object in the supported documents. (The fields stored in a given flex column are unnested to that object.)The default value is
true
.Field
useFlexFields
is used only by converter functionsinfer_schema
andinfer_schema_and_generate
.The importer doesn't use field
useFlexFields
. But when flex columns have been created by the converter, the importer puts all fields that are not mapped to columns into flex columns corresponding to the field locations. For example, occurrence-outlier fields are handled this way. If there are no flex columns then the importer reports an error for an unmapped field. -
viewNames
(Optional) — An array of strings naming the duality views to be created, one for each document set.If not provided then the
tableNames
with_duality
appended are used as the view names. For example the name of the view corresponding to the documents in tablefoo
defaults tofoo_duality
.If field
viewNames
is provided then its array length must be the same as that of fieldtableNames
; otherwise, an error is raised (not logged).
Parent topic: Migrating From JSON To Duality