21.5 School Administration Example, Migrator Input Documents
Existing student, teacher, and course document sets comprise the JSON-to-duality migrator input for the school-administration example. In a typical migration scenario each might be received in the form of a JSON dump file from another database.
Note:
The document sets in the examples here are very small. In order
to demonstrate the handling of outlier (high-entropy) fields, in examples here we
use large values for migrator configuration fields
minFieldFrequency
(value 25
) and
minTypeFrequency
(value 15
), instead of the
default value of 5
.
A field is an occurrence outlier for a given
document set if it occurs in less than minFieldFrequency
percent of
the documents.
A field is a type outlier for a given document
set if any of its values occurs with a given type in less than
minTypeFrequency
percent of the documents.
-
An occurrence-outlier field (a field that occurs rarely) is not mapped by the converter to any underlying column. If the converter produces flex columns (configuration field
useFlexFields
=true
, the default value), then the importer places an unmapped field in a flex column of a table underlying the duality view. If there are no flex columns then the importer reports an unmapped field in an import error log, and the field is not supported in the duality view. -
A type-outlier field (a field whose value is rarely of a different type than usual) is handled differently. Import tries to convert any values of a rare type to the expected type for the field. Unsuccessful conversion is reported in an import error log, and the field is not used in the duality view.
Example 21-1 Student Document Set (Migrator Input)
These are the student documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator.
{"studentId" : 1,
"name" : "Donald P.",
"age" : 20,
"advisorId" : 102,
"courses" : [ {"courseNumber" : "CS101",
"name" : "Algorithms",
"avgGrade" : 75},
{"courseNumber" : "CS102",
"name" : "Data Structures",
"avgGrade" : "TBD"},
{"courseNumber" : "MATH101",
"name" : "Algebra",
"avgGrade" : 90} ],
"dormitory" : {"dormId" : 201, "dormName" : "ABC"}}
{"studentId" : 2,
"name" : "Elena H.",
"age" : 21,
"advisorId" : 103,
"courses" : [ {"courseNumber" : "CS101",
"name" : "Algorithms",
"avgGrade" : 75},
{"courseNumber" : "CS102",
"name" : "Data Structures",
"avgGrade" : "TBD"},
{"courseNumber" : "MATH102",
"name" : "Calculus",
"avgGrade" : 95} ],
"dormitory" : {"dormId" : 202, "dormName" : "XYZ"}}
{"studentId" : 3,
"name" : "Francis K.",
"age" : 20,
"advisorId" : 103,
"courses" : [ {"courseNumber" : "MATH103",
"name" : "Advanced Algebra",
"avgGrade" : 82} ],
"dormitory" : {"dormId" : 204, "dormName" : "QWE"}}
{"studentId" : 4,
"name" : "Georgia D.",
"age" : 19,
"advisorId" : 101,
"courses" : [ {"courseNumber" : "CS101",
"name" : "Algorithms",
"avgGrade" : 75},
{"courseNumber" : "MATH102",
"name" : "Calculus",
"avgGrade" : 95},
{"courseNumber" : "MATH103",
"name" : "Advanced Algebra",
"avgGrade" : 82} ],
"dormitory" : {"dormId" : 203, "dormName" : "LMN"}}
{"studentId" : 5,
"name" : "Hye E.",
"age" : 21,
"advisorId" : 103,
"courses" : [ {"courseNumber" : "CS102",
"name" : "Data Structures",
"avgGrade" : "TBD"},
{"courseNumber" : "MATH101",
"name" : "Algebra",
"avgGrade" : 90} ],
"dormitory" : {"dormId" : 201, "dormName" : "ABC"}}
{"studentId" : 6,
"name" : "Ileana D.",
"age" : 21,
"advisorId" : 102,
"courses" : [ {"courseNumber" : "MATH103",
"name" : "Advanced Algebra",
"avgGrade" : 82} ],
"dormitory" : {"dormId" : 205, "dormName" : "GHI"}}
{"studentId" : 7,
"name" : "Jatin S.",
"age" : 20,
"advisorId" : 101,
"courses" : [ {"courseNumber" : "CS101",
"name" : "Algorithms",
"avgGrade" : 75},
{"courseNumber" : "CS102",
"name" : "Data Structures",
"avgGrade" : "TBD"} ],
"dormitory" : {"dormId" : 204, "dormName" : "QWE"}}
{"studentId" : 8,
"name" : "Katie H.",
"age" : 21,
"advisorId" : 102,
"courses" : [ {"courseNumber" : "CS102",
"name" : "Data Structures",
"avgGrade" : "TBD"},
{"courseNumber" : "MATH103",
"name" : "Advanced Algebra",
"avgGrade" : 82} ],
"dormitory" : {"dormId" : 205, "dormName" : "GHI"}}
{"studentId" : 9,
"name" : "Luis F.",
"age" : "Nineteen",
"advisorId" : 101,
"courses" : [ {"courseNumber" : "CS101",
"name" : "Algorithms",
"avgGrade" : 75},
{"courseNumber" : "MATH102",
"name" : "Calculus",
"avgGrade" : 95},
{"courseNumber" : "MATH103",
"name" : "Advanced Algebra",
"avgGrade" : 82} ],
"dormitory" : {"dormId" : 201, "dormName" : "ABC"}}
{"studentId" : 10,
"name" : "Ming L.",
"age" : 20,
"advisorId" : 101,
"courses" : [ {"courseNumber" : "MATH102",
"name" : "Calculus",
"avgGrade" : 95} ],
"dormitory" : {"dormId" : 202, "dormName" : "XYZ"}}
Notice these two fields, in particular:
-
Field
age
is of a mixed type: number and string. In one of the ten documents (10%) its value is a string ("Nineteen"
); in the others (90%) the value is a number. -
Field
avgGrade
is of a mixed type: number and string. In all ten documents (100%) at least one of its occurrences has a number value. In five of the ten documents (50%) at least one of its occurrences has a string value ("TBD"
).
You might want to consider field age
to be a type
outlier, because you consider that you normally expect its value to be a
number but the field occurs rarely with a string value. The migrator lets you decide
the occurrence frequencies to consider "rare", and thus handle such fields specially
(with configuration fields minFieldFrequency
and
minTypeFrequency
).
Example 21-2 Teacher Document Set (Migrator Input)
These are the teacher documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator.
{"_id" : 101,
"name" : "Abdul J.",
"phoneNumber" : [ "222-555-011", "222-555-012" ],
"salary" : 200000,
"department" : "Mathematics",
"coursesTaught" : [ {"courseId" : "MATH101",
"name" : "Algebra",
"classType" : "Online"},
{"courseId" : "MATH102",
"name" : "Calculus",
"classType" : "In-person"} ],
"studentsAdvised" : [ {"studentId" : 4, "name" : "Georgia D.", "dormId" : 203},
{"studentId" : 7, "name" : "Jatin S.", "dormId" : 204},
{"studentId" : 9, "name" : "Luis F.", "dormId" : 201},
{"studentId" : 10, "name" : "Ming L.", "dormId" : 202} ]}
{"_id" : 102,
"name" : "Betty Z.",
"phoneNumber" : "222-555-022",
"salary" : 300000,
"department" : "Computer Science",
"coursesTaught" : [ {"courseId" : "CS101",
"name" : "Algorithms",
"classType" : "Online"},
{"courseId" : "CS102",
"name" : "Data Structures",
"classType" : "In-person"} ],
"studentsAdvised" : [ {"studentId" : 1, "name" : "Donald P.", "dormId" : 201},
{"studentId" : 6, "name" : "Ileana D.", "dormId" : 205},
{"studentId" : 8, "name" : "Katie H.", "dormId" : 205} ]}
{"_id" : 103,
"name" : "Colin J.",
"phoneNumber" : [ "222-555-023" ],
"salary" : 220000,
"department" : "Mathematics",
"coursesTaught" : [ {"courseId" : "MATH103",
"name" : "Advanced Algebra",
"classType" : "Online"} ],
"studentsAdvised" : [ {"studentId" : 2, "name" : "Elena H.", "dormId" : 202},
{"studentId" : 3, "name" : "Francis K.", "dormId" : 204},
{"studentId" : 5, "name" : "Hye E.", "dormId" : 201} ]}
{"_id" : 104,
"name" : "Natalie C.",
"phoneNumber" : "222-555-044",
"salary" : 180000,
"department" : "Computer Science",
"coursesTaught" : [],
"studentsAdvised" : []}
Field phoneNumber
is of a mixed type: string and array (array of
strings). In two of the four documents (50%) its value is a string; in the other two
the value is an array of strings.
(Fields coursesTaught
and studentsAdvised
each have
one occurrence whose value is the empty array.)
Example 21-3 Course Document Set (Migrator Input)
These are the course documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator.
{"courseId" : "MATH101",
"name" : "Algebra",
"creditHours" : 3,
"students" : [ {"studentId" : 1, "name" : "Donald P."},
{"studentId" : 5, "name" : "Hye E."} ],
"teacher" : {"teacherId" : 104, "name" : "Abdul J."},
"Notes" : "Prerequisite for Advanced Algebra"}
{"courseId" : "MATH102",
"name" : "Calculus",
"creditHours" : 4,
"students" : [ {"studentId" : 2, "name" : "Elena H."},
{"studentId" : 4, "name" : "Georgia D."},
{"studentId" : 9, "name" : "Luis F."},
{"studentId" : 10, "name" : "Ming L."} ],
"teacher" : {"teacherId" : 101, "name" : "Abdul J."}}
{"courseId" : "CS101",
"name" : "Algorithms",
"creditHours" : 5,
"students" : [ {"studentId" : 1, "name" : "Donald P."},
{"studentId" : 2, "name" : "Elena H."},
{"studentId" : 4, "name" : "Georgia D."},
{"studentId" : 7, "name" : "Jatin S."},
{"studentId" : 9, "name" : "Luis F."} ],
"teacher" : {"teacherId" : 102, "name" : "Betty Z."}}
{"courseId" : "CS102",
"name" : "Data Structures",
"creditHours" : 3,
"students" : [ {"studentId" : 1, "name" : "Donald P."},
{"studentId" : 2, "name" : "Elena H."},
{"studentId" : 5, "name" : "Hye E."},
{"studentId" : 7, "name" : "Jatin S."},
{"studentId" : 8, "name" : "Katie H."} ],
"teacher" : {"teacherId" : 102, "name" : "Betty Z."}}
{"courseId" : "MATH103",
"name" : "Advanced Algebra",
"creditHours" : "3",
"students" : [ {"studentId" : 3, "name" : "Francis K."},
{"studentId" : 4, "name" : "Georgia D."},
{"studentId" : 6, "name" : "Ileana D."},
{"studentId" : 8, "name" : "Katie H."},
{"studentId" : 9, "name" : "Luis F."} ],
"teacher" : {"teacherId" : 103, "name" : "Colin J."}}
Notice these two fields, in particular:
-
Field
Notes
occurs in only one course document (one out of five, 20%). -
Field
creditHours
is of a mixed type: number and string. In one of the five documents (20%) its value is a string; in the others (80%) the value is a number.
You might want to consider field Notes
to be an
occurrence outlier, because you consider 20% occurrence to be rare,
and you might want to consider field creditHours
to be a type
outlier, because it occurs rarely (20%) with a string value. The migrator
lets you decide the occurrence frequencies to consider "rare", and thus handle such
fields specially (with configuration fields minFieldFrequency
and
minTypeFrequency
).
Parent topic: Migrating From JSON To Duality