21.5 School Administration Example, Migrator Input Documents

Existing student, teacher, and course document sets comprise the JSON-to-duality migrator input for the school-administration example. In a typical migration scenario each might be received in the form of a JSON dump file from another database.

Note:

The document sets in the examples here are very small. In order to demonstrate the handling of outlier (high-entropy) fields, in examples here we use large values for migrator configuration fields minFieldFrequency (value 25) and minTypeFrequency (value 15), instead of the default value of 5.

A field is an occurrence outlier for a given document set if it occurs in less than minFieldFrequency percent of the documents.

A field is a type outlier for a given document set if any of its values occurs with a given type in less than minTypeFrequency percent of the documents.

  • An occurrence-outlier field (a field that occurs rarely) is not mapped by the converter to any underlying column. If the converter produces flex columns (configuration field useFlexFields = true, the default value), then the importer places an unmapped field in a flex column of a table underlying the duality view. If there are no flex columns then the importer reports an unmapped field in an import error log, and the field is not supported in the duality view.

  • A type-outlier field (a field whose value is rarely of a different type than usual) is handled differently. Import tries to convert any values of a rare type to the expected type for the field. Unsuccessful conversion is reported in an import error log, and the field is not used in the duality view.

Example 21-1 Student Document Set (Migrator Input)

These are the student documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator.

{"studentId" : 1,
 "name"      : "Donald P.",
 "age"       : 20,
 "advisorId" : 102,
 "courses"   : [ {"courseNumber" : "CS101",
                  "name"         : "Algorithms",
                  "avgGrade"     : 75},
                 {"courseNumber" : "CS102",
                  "name"         : "Data Structures",
                  "avgGrade"     : "TBD"},
                 {"courseNumber" : "MATH101",
                  "name"         : "Algebra",
                  "avgGrade"     : 90} ],
 "dormitory" : {"dormId" : 201, "dormName" : "ABC"}}

{"studentId" : 2,
 "name"      : "Elena H.",
 "age"       : 21,
 "advisorId" : 103,
 "courses"   : [ {"courseNumber" : "CS101",
                  "name"         : "Algorithms",
                  "avgGrade"     : 75},
                 {"courseNumber" : "CS102",
                  "name"         : "Data Structures",
                  "avgGrade"     : "TBD"},
                 {"courseNumber" : "MATH102",
                  "name"         : "Calculus",
                  "avgGrade"     : 95} ],
 "dormitory" : {"dormId" : 202, "dormName" : "XYZ"}}

{"studentId" : 3,
 "name"      : "Francis K.",
 "age"       : 20,
 "advisorId" : 103,
 "courses"   : [ {"courseNumber" : "MATH103",
                  "name"         : "Advanced Algebra",
                  "avgGrade"     : 82} ],
 "dormitory" : {"dormId" : 204, "dormName" : "QWE"}}

{"studentId" : 4,
 "name"      : "Georgia D.",
 "age"       : 19,
 "advisorId" : 101,
 "courses"   : [ {"courseNumber" : "CS101",
                  "name"         : "Algorithms",
                  "avgGrade"     : 75},
                 {"courseNumber" : "MATH102",
                  "name"         : "Calculus",
                  "avgGrade"     : 95},
                 {"courseNumber" : "MATH103",
                  "name"         : "Advanced Algebra",
                  "avgGrade"     : 82} ],
 "dormitory" : {"dormId" : 203, "dormName" : "LMN"}}

{"studentId" : 5,
 "name"      : "Hye E.",
 "age"       : 21,
 "advisorId" : 103,
 "courses"   : [ {"courseNumber" : "CS102",
                  "name"         : "Data Structures",
                  "avgGrade"     : "TBD"},
                 {"courseNumber" : "MATH101",
                  "name"         : "Algebra",
                  "avgGrade"     : 90} ],
 "dormitory" : {"dormId" : 201, "dormName" : "ABC"}}

{"studentId" : 6,
 "name"      : "Ileana D.",
 "age"       : 21,
 "advisorId" : 102,
 "courses"   : [ {"courseNumber" : "MATH103",
                  "name"         : "Advanced Algebra",
                  "avgGrade"     : 82} ],
 "dormitory" : {"dormId" : 205, "dormName" : "GHI"}}

{"studentId" : 7,
 "name"      : "Jatin S.",
 "age"       : 20,
 "advisorId" : 101,
 "courses"   : [ {"courseNumber" : "CS101",
                  "name"         : "Algorithms",
                  "avgGrade"     : 75},
                 {"courseNumber" : "CS102",
                  "name"         : "Data Structures",
                  "avgGrade"     : "TBD"} ],
 "dormitory" : {"dormId" : 204, "dormName" : "QWE"}}

{"studentId" : 8,
 "name"      : "Katie H.",
 "age"       : 21,
 "advisorId" : 102,
 "courses"   : [ {"courseNumber" : "CS102",
                  "name"         : "Data Structures",
                  "avgGrade"     : "TBD"},
                 {"courseNumber" : "MATH103",
                  "name"         : "Advanced Algebra",
                  "avgGrade"     : 82} ],
 "dormitory" : {"dormId" : 205, "dormName" : "GHI"}}

{"studentId" : 9,
 "name"      : "Luis F.",
 "age"       : "Nineteen",
 "advisorId" : 101,
 "courses"   : [ {"courseNumber" : "CS101",
                  "name"         : "Algorithms",
                  "avgGrade"     : 75},
                 {"courseNumber" : "MATH102",
                  "name"         : "Calculus",
                  "avgGrade"     : 95},
                 {"courseNumber" : "MATH103",
                  "name"         : "Advanced Algebra",
                  "avgGrade"     : 82} ],
 "dormitory" : {"dormId" : 201, "dormName" : "ABC"}}

{"studentId" : 10,
 "name"      : "Ming L.",
 "age"       : 20,
 "advisorId" : 101,
 "courses"   : [ {"courseNumber" : "MATH102",
                  "name"         : "Calculus",
                  "avgGrade"     : 95} ],
 "dormitory" : {"dormId" : 202, "dormName" : "XYZ"}}

Notice these two fields, in particular:

  • Field age is of a mixed type: number and string. In one of the ten documents (10%) its value is a string ("Nineteen"); in the others (90%) the value is a number.

  • Field avgGrade is of a mixed type: number and string. In all ten documents (100%) at least one of its occurrences has a number value. In five of the ten documents (50%) at least one of its occurrences has a string value ("TBD").

You might want to consider field age to be a type outlier, because you consider that you normally expect its value to be a number but the field occurs rarely with a string value. The migrator lets you decide the occurrence frequencies to consider "rare", and thus handle such fields specially (with configuration fields minFieldFrequency and minTypeFrequency).

Example 21-2 Teacher Document Set (Migrator Input)

These are the teacher documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator.

{"_id"             : 101,
 "name"            : "Abdul J.",
 "phoneNumber"     : [ "222-555-011", "222-555-012" ],
 "salary"          : 200000,
 "department"      : "Mathematics",
 "coursesTaught"   : [ {"courseId"  : "MATH101",
                        "name"      : "Algebra",
                        "classType" : "Online"},
                       {"courseId"  : "MATH102",
                        "name"      : "Calculus",
                        "classType" : "In-person"} ],
 "studentsAdvised" : [ {"studentId" : 4,  "name"  : "Georgia D.", "dormId" : 203},
                       {"studentId" : 7,  "name"  : "Jatin S.",   "dormId" : 204},
                       {"studentId" : 9,  "name"  : "Luis F.",    "dormId" : 201},
                       {"studentId" : 10, "name"  : "Ming L.",    "dormId" : 202} ]}

{"_id"            : 102,
 "name"           : "Betty Z.",
 "phoneNumber"    : "222-555-022",
 "salary"         : 300000,
 "department"     : "Computer Science",
 "coursesTaught"  : [ {"courseId"  : "CS101",
                       "name"      : "Algorithms",
                       "classType" : "Online"},
                      {"courseId"  : "CS102",
                       "name"      : "Data Structures",
                       "classType" : "In-person"} ],
 "studentsAdvised" : [ {"studentId" : 1, "name" : "Donald P.", "dormId" : 201},
                       {"studentId" : 6, "name" : "Ileana D.", "dormId" : 205},
                       {"studentId" : 8, "name" : "Katie H.",  "dormId" : 205} ]}

{"_id"             : 103,
 "name"            : "Colin J.",
 "phoneNumber"     : [ "222-555-023" ],
 "salary"          : 220000,
 "department"      : "Mathematics",
 "coursesTaught"   : [ {"courseId"  : "MATH103",
                        "name"      : "Advanced Algebra",
                        "classType" : "Online"} ],
 "studentsAdvised" : [ {"studentId" : 2, "name" : "Elena H.",   "dormId" : 202},
                       {"studentId" : 3, "name" : "Francis K.", "dormId" : 204},
                       {"studentId" : 5, "name" : "Hye E.",     "dormId" : 201} ]}

{"_id"             : 104,
 "name"            : "Natalie C.",
 "phoneNumber"     : "222-555-044",
 "salary"          : 180000,
 "department"      : "Computer Science",
 "coursesTaught"   : [],
 "studentsAdvised" : []}

Field phoneNumber is of a mixed type: string and array (array of strings). In two of the four documents (50%) its value is a string; in the other two the value is an array of strings.

(Fields coursesTaught and studentsAdvised each have one occurrence whose value is the empty array.)

Example 21-3 Course Document Set (Migrator Input)

These are the course documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator.

{"courseId"         : "MATH101",
 "name"             : "Algebra",
 "creditHours"      : 3,
 "students"         : [ {"studentId" : 1, "name" : "Donald P."},
                        {"studentId" : 5, "name" : "Hye E."} ],
 "teacher"          : {"teacherId" : 104, "name" : "Abdul J."},
 "Notes"            : "Prerequisite for Advanced Algebra"}
{"courseId"         : "MATH102",
 "name"             : "Calculus",
 "creditHours"      : 4,
 "students"         : [ {"studentId" : 2,  "name" : "Elena H."},
                        {"studentId" : 4,  "name" : "Georgia D."},
                        {"studentId" : 9,  "name" : "Luis F."},
                        {"studentId" : 10, "name" : "Ming L."} ],
 "teacher"          : {"teacherId" : 101,  "name" : "Abdul J."}}

{"courseId"         : "CS101",
 "name"             : "Algorithms",
 "creditHours"      : 5,
 "students"         : [ {"studentId" : 1, "name" : "Donald P."},
                        {"studentId" : 2, "name" : "Elena H."},
                        {"studentId" : 4, "name" : "Georgia D."},
                        {"studentId" : 7, "name" : "Jatin S."},
                        {"studentId" : 9, "name" : "Luis F."} ],
 "teacher"          : {"teacherId" : 102, "name" : "Betty Z."}}

{"courseId"         : "CS102",
 "name"             : "Data Structures",
 "creditHours"      : 3,
 "students"         : [ {"studentId" : 1, "name" : "Donald P."},
                        {"studentId" : 2, "name" : "Elena H."},
                        {"studentId" : 5, "name" : "Hye E."},
                        {"studentId" : 7, "name" : "Jatin S."},
                        {"studentId" : 8, "name" : "Katie H."} ],
 "teacher"          : {"teacherId" : 102, "name" : "Betty Z."}}

{"courseId"         : "MATH103",
 "name"             : "Advanced Algebra",
 "creditHours"      : "3",
 "students"         : [ {"studentId" : 3, "name" : "Francis K."},
                        {"studentId" : 4, "name" : "Georgia D."},
                        {"studentId" : 6, "name" : "Ileana D."},
                        {"studentId" : 8, "name" : "Katie H."},
                        {"studentId" : 9, "name" : "Luis F."} ],
 "teacher"          : {"teacherId" : 103, "name" : "Colin J."}}

Notice these two fields, in particular:

  • Field Notes occurs in only one course document (one out of five, 20%).

  • Field creditHours is of a mixed type: number and string. In one of the five documents (20%) its value is a string; in the others (80%) the value is a number.

You might want to consider field Notes to be an occurrence outlier, because you consider 20% occurrence to be rare, and you might want to consider field creditHours to be a type outlier, because it occurs rarely (20%) with a string value. The migrator lets you decide the occurrence frequencies to consider "rare", and thus handle such fields specially (with configuration fields minFieldFrequency and minTypeFrequency).