21.7 Before Using the Converter (2): Optionally Create Data-Guide JSON Schemas
A data-guide JSON schema provides frequency information about the fields in a document set, in addition to structure and type information. You can use such schemas to get an idea how migration might proceed, and you can compare them with other JSON schemas as a shortcut for comparing document sets.
A JSON data guide created using keyword
FORMAT_SCHEMA
is a special kind of JSON
schema that includes not only the usual document structure and type
information but also statistical information about the specific content; in particular,
for each field, in what percentage of documents it occurs, in what percentage
of documents it has values of which types, and the range of values for each
type.
Creating data-guide JSON schemas for your input document sets is optionalFoot 1, and you can create them at any time (as long as you still have the transfer tables of input documents). But it's a good idea to create them before starting to convert your input document sets, in particular because they can help guide how you configure the converter.
Example 21-5 Create JSON Data Guides For Input Document Sets
This example uses Oracle SQL function json_dataguide
to
create data guides for the input student, teacher, and course document sets. These
are JSON schemas that can be used to validate their documents.
Parameter DBMS_JSON.FORMAT_SCHEMA
ensures that
the data guide is usable for validating. Parameter
DBMS_JSON.PRETTY
pretty-prints the result. Parameter
DBMS_JSON.GATHER_STATS
provides the data guide with
statistical fields such as o:frequency
, which specifies the
percentage of documents in which a given field occurs or has a given type of
value.
SELECT json_dataguide(data,
DBMS_JSON.FORMAT_SCHEMA,
DBMS_JSON.PRETTY+DBMS_JSON.GATHER_STATS)
FROM student_tab;
SELECT json_dataguide(data,
DBMS_JSON.FORMAT_SCHEMA,
DBMS_JSON.PRETTY+DBMS_JSON.GATHER_STATS)
FROM teacher_tab;
SELECT json_dataguide(data,
DBMS_JSON.FORMAT_SCHEMA,
DBMS_JSON.PRETTY+DBMS_JSON.GATHER_STATS)
FROM course_tab;
See Also:
-
JSON_DATAGUIDE in Oracle Database SQL Language Reference
-
DBMS_JSON Constants in Oracle Database PL/SQL Packages and Types Reference for information about constants
DBMS_JSON.FORMAT_SCHEMA
,DBMS_JSON.GATHER_STATS
, andDBMS_JSON.PRETTY
The resulting data guides for input student document set, input teacher
document set, and input course documentation set are shown in the JSON data guide
examples that follow in this topic, "JSON Data Guide for Input Student Document
Set", "JSON Data Guide for Input Teacher Document Set", and "JSON Data
Guide for Input Course Document Set", which describe the documents in
JSON
-type column data
of tables
student_tab
, teacher_tab
, and
course_tab
, respectively.
Comparing JSON schemas can serve as a useful proxy for comparing entire document sets — in particular, migration input document sets versus output document collections supported by duality views (proposed during migration or fully created).
-
As the first conversion step, PL/SQL function
DBMS_JSON_DUALITY.infer_schema
produces a JSON schema that describes the entire proposed relational schema for a migration, that is, the proposed duality views plus their underlying tables.The JSON schema produced by function
infer_schema
is not a data-guide JSON schema — there are no duality views yet, so there are no supported document collections from which statistical information can be gathered. But it does specify the structure and typing of the document sets that could result from a migration.You can use the view parts of this JSON schema to compare against JSON schemas for input document sets (in a transfer table).
For example, instead of comparing the individual input documents in table
course_tab
with the individual documents to be supported by the (inferred)course
duality view, you can compare the data-guide JSON schema from "JSON Data Guide for Input Course Document Set" in this topic with theCOURSE
duality-view part of the JSON schema inferred byinfer_schema
— see the section "JSON Schema from INFER_SCHEMA for Duality Vies with No Outliers" in Migrating To Duality, Simplified Recipe. When you do that, you can ignore fields that are relevant to only one or the other kind of JSON schema--for example, fields named with prefix "o:" (for Oracle) and fields named with prefix "db" (for database). -
Similarly, comparing a JSON schema for an input document set against a JSON schema for the created and populated duality view that replaces it after migration can highlight differences. For example, you can compare the data-guide JSON schema for the course input table in the example "JSON Data Guide For Import Course Document Set" against a data-guide JSON schema for the course duality view. A data-guide schema serves as a shortcut (proxy) for comparing the documents supported by a duality view with the corresponding input documents.
Example 21-6 JSON Data Guide For Input Student Document Set
This data guide summarizes the input set of student documents stored in
transfer table student_tab
.
{
"type" : "object",
"o:frequency" : 100,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"properties" :
{
"age" :
{
"oneOf" :
[
{
"type" : "number",
"o:preferred_column_name" : "age",
"o:frequency" : 90,
"o:low_value" : 19,
"o:high_value" : 21,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"maximum" : 21,
"minimum" : 19
},
{
"type" : "string",
"o:length" : 8,
"o:preferred_column_name" : "age",
"o:frequency" : 10,
"o:low_value" : "Nineteen",
"o:high_value" : "Nineteen",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"maxLength" : 8,
"minLength" : 8
}
]
},
"name" :
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "name",
"o:frequency" : 100,
"o:low_value" : "Donald P.",
"o:high_value" : "Ming L.",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"maxLength" : 10,
"minLength" : 6
},
"courses" :
{
"type" : "array",
"o:preferred_column_name" : "courses",
"o:frequency" : 100,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"items" :
{
"properties" :
{
"name" :
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "name",
"o:frequency" : 100,
"o:low_value" : "Advanced Algebra",
"o:high_value" : "Data Structures",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"maxLength" : 16,
"minLength" : 7
},
"avgGrade" :
{
"oneOf" :
[
{
"type" : "number",
"o:preferred_column_name" : "avgGrade",
"o:frequency" : 100,
"o:low_value" : 75,
"o:high_value" : 95,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"maximum" : 95,
"minimum" : 75
},
{
"type" : "string",
"o:length" : 4,
"o:preferred_column_name" : "avgGrade",
"o:frequency" : 50,
"o:low_value" : "TBD",
"o:high_value" : "TBD",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"maxLength" : 3,
"minLength" : 3
}
]
},
"courseNumber" :
{
"type" : "string",
"o:length" : 8,
"o:preferred_column_name" : "courseNumber",
"o:frequency" : 100,
"o:low_value" : "CS101",
"o:high_value" : "MATH103",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"maxLength" : 7,
"minLength" : 5
}
}
}
},
"advisorId" :
{
"type" : "number",
"o:preferred_column_name" : "advisorId",
"o:frequency" : 100,
"o:low_value" : 101,
"o:high_value" : 103,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"maximum" : 103,
"minimum" : 101
},
"dormitory" :
{
"type" : "object",
"o:preferred_column_name" : "dormitory",
"o:frequency" : 100,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"properties" :
{
"dormId" :
{
"type" : "number",
"o:preferred_column_name" : "dormId",
"o:frequency" : 100,
"o:low_value" : 201,
"o:high_value" : 205,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"maximum" : 205,
"minimum" : 201
},
"dormName" :
{
"type" : "string",
"o:length" : 4,
"o:preferred_column_name" : "dormName",
"o:frequency" : 100,
"o:low_value" : "ABC",
"o:high_value" : "XYZ",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"maxLength" : 3,
"minLength" : 3
}
}
},
"studentId" :
{
"type" : "number",
"o:preferred_column_name" : "studentId",
"o:frequency" : 100,
"o:low_value" : 1,
"o:high_value" : 10,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 10,
"required" : true,
"maximum" : 10,
"minimum" : 1
}
}
}
Field age
has a type
that is
either (1) a number
, with o:frequency
90
, or (2) a string
, with
o:frequency
10
. This means that a numeric age appears in 90% of
the documents, and a string grade appears in 10% of the documents. Field
age
is thus a mixed-type field.
Similarly, field avgGrade
is a mixed-type field,
with a numeric grade in 100% of the documents, and a string grade in
50% of the documents.
Converter configuration fields minFieldFrequency
and minTypeFrequency
test the percentage of documents where
a field, or a field of a given type, respectively, is present
across the document set.
All of the fields in the student documents are present in 100% of the
documents, so none of them can be occurrence outliers, regardless of the value of
minFieldFrequency
.
If the converter is used with a value of 15
for
configuration field minTypeFrequency
then field
age
will be considered a type-occurence outlier, because
it occurs with a string value in only 10% of the student documents (10 < 15).
Field avgGrade
will not be considered a type-occurrence outlier,
because neither of its types is used in less than 15% of the student documents.
As a type-occurrence outlier, field age
would be mapped
by the converter to a column with SQL type NUMBER
(JSON number type
being dominant for the field). Then the importer would try, and fail, to convert the
string value "Nineteen"
to a number, and would log an error for the
document where age
is "Nineteen"
.
A field that doesn't occur rarely but has a type that occurs rarely is not removed from the data.
Because there is no SQL data type of number-or-string, non-outlier
mixed-type field avgGrade
will be mapped by the converter to a
JSON
-type column, and it will apply a JSON schema to
that column as a validating check constraint, to require the value to
always be either a string or a number.
Example 21-7 JSON Data Guide For Input Teacher Document Set
This data guide summarizes the input set of teacher documents stored in
transfer table teacher_tab
.
{
"type" : "object",
"o:frequency" : 100,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"required" : true,
"properties" :
{
"_id" :
{
"type" : "number",
"o:preferred_column_name" : "_id",
"o:frequency" : 100,
"o:low_value" : 101,
"o:high_value" : 104,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"required" : true,
"maximum" : 104,
"minimum" : 101
},
"name" :
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "name",
"o:frequency" : 100,
"o:low_value" : "Abdul J.",
"o:high_value" : "Natalie C.",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"required" : true,
"maxLength" : 10,
"minLength" : 8
},
"salary" :
{
"type" : "number",
"o:preferred_column_name" : "salary",
"o:frequency" : 100,
"o:low_value" : 180000,
"o:high_value" : 300000,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"required" : true,
"maximum" : 300000,
"minimum" : 180000
},
"department" :
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "department",
"o:frequency" : 100,
"o:low_value" : "Computer Science",
"o:high_value" : "Mathematics",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"required" : true,
"maxLength" : 16,
"minLength" : 11
},
"phoneNumber" :
{
"oneOf" :
[
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "phoneNumber",
"o:frequency" : 50,
"o:low_value" : "222-555-022",
"o:high_value" : "222-555-044",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"maxLength" : 11,
"minLength" : 11
},
{
"type" : "array",
"o:preferred_column_name" : "phoneNumber",
"o:frequency" : 50,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"items" :
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "scalar_string",
"o:frequency" : 50,
"o:low_value" : "222-555-011",
"o:high_value" : "222-555-023",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"maxLength" : 11,
"minLength" : 11
}
}
]
},
"coursesTaught" :
{
"type" : "array",
"o:preferred_column_name" : "coursesTaught",
"o:frequency" : 100,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"required" : true,
"items" :
{
"properties" :
{
"name" :
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "name",
"o:frequency" : 75,
"o:low_value" : "Advanced Algebra",
"o:high_value" : "Data Structures",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"maxLength" : 16,
"minLength" : 7
},
"courseId" :
{
"type" : "string",
"o:length" : 8,
"o:preferred_column_name" : "courseId",
"o:frequency" : 75,
"o:low_value" : "CS101",
"o:high_value" : "MATH103",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"maxLength" : 7,
"minLength" : 5
},
"classType" :
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "classType",
"o:frequency" : 75,
"o:low_value" : "In-person",
"o:high_value" : "Online",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"maxLength" : 9,
"minLength" : 6
}
}
}
},
"studentsAdvised" :
{
"type" : "array",
"o:preferred_column_name" : "studentsAdvised",
"o:frequency" : 100,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"required" : true,
"items" :
{
"properties" :
{
"name" :
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "name",
"o:frequency" : 75,
"o:low_value" : "Donald P.",
"o:high_value" : "Ming L.",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"maxLength" : 10,
"minLength" : 6
},
"dormId" :
{
"type" : "number",
"o:preferred_column_name" : "dormId",
"o:frequency" : 75,
"o:low_value" : 201,
"o:high_value" : 205,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"maximum" : 205,
"minimum" : 201
},
"studentId" :
{
"type" : "number",
"o:preferred_column_name" : "studentId",
"o:frequency" : 75,
"o:low_value" : 1,
"o:high_value" : 10,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 4,
"maximum" : 10,
"minimum" : 1
}
}
}
}
}
}
Field phoneNumber
has a type
that is
either (1) a string
or (2) an array
of
string
s. Each of those types occurs in 50%
(o:frequency
= 50
) of the teacher documents.
It is thus a mixed-type field. If the converter is used with a
minTypeFrequency
value of 15
then it will not
be considered a type-occurrence outlier.
Because there is no SQL data type of string-or-array-of-strings,
(non-outlier) mixed-type field phoneNumber
will be mapped by the
converter to a JSON
-type column, and it will apply a JSON
schema to that column as a validating check constraint, to require the
value to always be either a string or an array of strings.
Example 21-8 JSON Data Guide For Input Course Document Set
This data guide summarizes the input set of course documents stored in
transfer table course_tab
.
{
"type" : "object",
"o:frequency" : 100,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"required" : true,
"properties" :
{
"name" :
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "name",
"o:frequency" : 100,
"o:low_value" : "Advanced Algebra",
"o:high_value" : "Data Structures",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"required" : true,
"maxLength" : 16,
"minLength" : 7
},
"Notes" :
{
"type" : "string",
"o:length" : 64,
"o:preferred_column_name" : "Notes",
"o:frequency" : 20,
"o:low_value" : "Prerequisite for Advanced Algebra",
"o:high_value" : "Prerequisite for Advanced Algebra",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"maxLength" : 33,
"minLength" : 33
},
"teacher" :
{
"type" : "object",
"o:preferred_column_name" : "teacher",
"o:frequency" : 100,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"required" : true,
"properties" :
{
"name" :
{
"type" : "string",
"o:length" : 8,
"o:preferred_column_name" : "name",
"o:frequency" : 100,
"o:low_value" : "Abdul J.",
"o:high_value" : "Colin J.",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"required" : true,
"maxLength" : 8,
"minLength" : 8
},
"teacherId" :
{
"type" : "number",
"o:preferred_column_name" : "teacherId",
"o:frequency" : 100,
"o:low_value" : 101,
"o:high_value" : 103,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"required" : true,
"maximum" : 103,
"minimum" : 101
}
}
},
"courseId" :
{
"type" : "string",
"o:length" : 8,
"o:preferred_column_name" : "courseId",
"o:frequency" : 100,
"o:low_value" : "CS101",
"o:high_value" : "MATH103",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"required" : true,
"maxLength" : 7,
"minLength" : 5
},
"students" :
{
"type" : "array",
"o:preferred_column_name" : "students",
"o:frequency" : 100,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"required" : true,
"items" :
{
"properties" :
{
"name" :
{
"type" : "string",
"o:length" : 16,
"o:preferred_column_name" : "name",
"o:frequency" : 100,
"o:low_value" : "Donald P.",
"o:high_value" : "Ming L.",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"required" : true,
"maxLength" : 10,
"minLength" : 6
},
"studentId" :
{
"type" : "number",
"o:preferred_column_name" : "studentId",
"o:frequency" : 100,
"o:low_value" : 1,
"o:high_value" : 10,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"required" : true,
"maximum" : 10,
"minimum" : 1
}
}
}
},
"creditHours" :
{
"oneOf" :
[
{
"type" : "number",
"o:preferred_column_name" : "creditHours",
"o:frequency" : 80,
"o:low_value" : 3,
"o:high_value" : 5,
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"maximum" : 5,
"minimum" : 3
},
{
"type" : "string",
"o:length" : 1,
"o:preferred_column_name" : "creditHours",
"o:frequency" : 20,
"o:low_value" : "3",
"o:high_value" : "3",
"o:num_nulls" : 0,
"o:last_analyzed" : "2024-12-30T18:12:41",
"o:sample_size" : 5,
"maxLength" : 1,
"minLength" : 1
}
]
}
}
}
Field Notes
occurs in only 20% of the documents (field
o:frequency
is 20
). If the converter is used
with a value of 25
for configuration field
minFieldFrequency
then field Notes
is
considered an occurrence outlier, and the converter will not map it to any
column.
However, if configuration field useFlexFields
is
true
(the default) then the converter creates flex columns, and
when flex columns exist the importer places all unmapped fields into flex
columns. In that default (useFlexFields
=
true
) case, field Notes
will therefore be
supported (by a flex column) after migration. If the converter is used with
useFlexFields
= false
then the importer will
log an error for rare field Notes
.
Field creditHours
has a type
that is
either (1) a number
, with o:frequency
=
80
, or (2) a string
, with
o:frequency
= 20
. It is thus a mixed-type
field. If the converter is used with a minTypeFrequency
value of
15
then it will not be considered a type-occurrence
outlier.
Because there is no SQL data type of number-or-string, (non-outlier)
mixed-type field creditHours
will be mapped by the converter to a
JSON
-type column, and it will apply a JSON schema to
that column as a validating check constraint, to require the value to
always be either a string or a number.
See Also:
-
Validating JSON Documents with a JSON Schema in Oracle Database JSON Developer’s Guide for information about using JSON schemas to constrain or validate JSON data
-
JSON Data Guide in Oracle Database JSON Developer’s Guide
-
JSON_DATAGUIDE in Oracle Database SQL Language Reference
-
json-schema.org for information about JSON Schema
Parent topic: Migrating From JSON To Duality
Footnote Legend
Footnote 1: Transfer tables for your input document sets are all you need, to use the JSON-To-Duality converter.