Review: Secondary Indexes on JSON Document Content

How to index, for Full Text Search, content from JSON documents stored in an Oracle NoSQL Database table is presented in the next section. But to help you better understand the material in that section, you should first review the material in Indexing JSON in the SQL Reference Guide. It describes how to store values in a field of a NoSQL table when those values consist of strings in valid JSON format; that is, when those values are JSON documents.

When reviewing those materials, it is important to not confuse creating a Secondary Index on JSON content with creating a Text Index. Creating a Text Index on a field containing JSON documents is presented in the next section of this document.

When JSON is stored in an Oracle NoSQL Database table, the data can be any valid JSON, stored as a string; referred to as a JSON document. Each such document stored in a field (or column) of a NoSQL table consists of elements that are referred to as either the fields or the attributes of the document. Thus, when discussing the elements of a given JSON document in the sections below, the term field and the term attribute can be used interchangeably; where the context should distinguish the field (or column) of an Oracle NoSQL table from the field (or attribute) of a JSON document stored in the table.

Although you can create a Secondary Index on the attributes of a JSON document stored in a given table, there are numerous restrictions on such indexes; restrictions which may make a Text Index more attractive. First, when creating a Secondary Index, you can only index the scalar attributes of the document. That is, the attributes cannot be nested JSON objects. Additionally, only integer, long, double, number, string, and boolean are supported Oracle NoSQL data types for JSON Secondary Indexes. Finally, you cannot perform Full Text Search on such an index.

For example, consider the following JSON document whose content specifies information related to a given member of the United States senate. For each senator (both current and former), a JSON document like that shown here is created and the Oracle NoSQL Table API can be used to store each such document in a column of a given table. Note that throughout this section and the following section, the example JSON document shown here will be referenced numerous times to demonstrate how such a JSON document can be indexed; in either a Secondary Index or a Text Index.

{
  "description": "Senior Senator for Ohio", 
  "party": "Democrat",
  "congress_numbers": [223,224,225], 
  "state": "OH",
  "startdate": "2010-01—03T05:04:09.456",
  "enddate": "2020-11-12T03:01:02.567812359",
  "seniority": 37, 
  "current": true,
  "duties": {
    "committee": ["Ways and Means","Judiciary","Steering"],
    "caucus": ["Automotive","Human Rights","SteelIndustry"]
  },
  "personal": {
    "firstname":"Sherrod",
    "lastname":"Brown",
    "birthday":"1952-11-09",
    "social_media": {
      "website":"https://www.brown.senate.gov",
      "rss_url":"http://www.brown.senate.gov/rss/feeds",
      "twittered":"SenSherrodBrown"
    },
    "address": {
      "home": {
        "number":"9115-ext",
        "street":"Vaughan",
        "apt":null,
        "city":"Columbus",
        "state":"OH",
        "zipcode":43221,
        "phone": "614-742-8331"
      },
      "work": {
        "number":"Hart Senate Office Building",
        "street":"Second Street NE",
        "apt":713,
        "city":"Washington",
        "state":"DC",
        "zipcode":20001
        "phone": "202-553-5132"
      }
    }, 
    "cspanid": 57884
  }, 
  "contrib": 2571354.93
}

The example JSON document above consists of a variety of JSON attributes of different types. Some attributes are scalar fields in "name":"value" form, whereas others are either nested objects, or arrays of scalar values. An attribute that is a nested object is a structure, encapsulated by curly braces {...}, that contains a set of valid JSON field types; scalars, arrays of scalars, and/or JSON objects (named or unnamed). An array type is an ordered, comma-separated list of elements, encapsulated by square brackets [...], where each element must be the same scalar type; string, date, or numerical type (integer, double, number, and so on).

The value of a scalar field nested within an object is dereferenced using JSON path notation. For example, the scalar field containing each senator's date of birth is nested in the object named personal. Each senator's birthday can then be specified in a search query using the JSON path, jsonFieldName.personal.birthday; where the value of the jsonFieldName component is the name specified for the column of the table in which each JSON document is written. Similarly, a search on each senator's home city can be expressed using the path, jsonFieldName.personal.address.home.city.

Note that in Elasticsearch, array fields are handled in a way that may be unexpected. When querying arrays in Elasticsearch, you cannot refer to the "first element", the "last element", the "element at index 3", etc. Arrays are handled as a "bag of values of the same type". For the example document above, if you wanted to search the committees on which each senator serves, you would not construct your query using a path like, jsonFieldName.duties.committee[0]. Such a path is not allowed. Instead, you would specify the path to the array itself, along with the values you wish to search for that may be elements of the array; for example, "jsonFieldName.duties.committee":"Judiciary Steering".

As discussed previously, each attribute in a JSON document has a type; where the type is implied by the structure of the attribute, or the value associated with the attribute. An attribute in a JSON document whose content is encapsulated by curly braces implies that the attribute is a JSON object type. With respect to scalar fields, the implied data type of the value associated with such a field is dependent on the value of the field itself. This is true whether the index is a Secondary Index or a Text Index. For example, the scalar attributes named description and seniority from the JSON document shown above will be handled as string and integer types respectively.

Compare this with a value such as that specified for the JSON document's contrib attribute (2571354.93). Such a scalar value will be handled as a NoSQL double data type when creating a Secondary Index; and as either an Elasticsearch float or double type when creating a Text Index for Full Text Search in an Elasticsearch cluster. Similarly, for attributes that contain information representing date and time (example the startdate, enddate, and birthday attributes), the value of such fields can only be handled as an Oracle NoSQL string type when creating a Secondary Index, but may be handled as either an Elasticsearch string or date type when creating a Text Index.

Finally, although an attribute containing a comma-separated list of scalars encapsulated by square brackets implies a JSON array type, the data type of the array's elements (that is, the array's type) is implied by the values of the elements in the same way as was previously described for scalar attributes.

Suppose then that you wish to create a table named jsonTable consisting of an id field containing the table's Primary Key, and a field named jsonField that will contain values consisting of JSON documents like the example document presented previously. To create such a table, and examine its resulting structure, one would execute a command like the following from the Admin CLI:

kv-> execute 'CREATE TABLE jsonTable
         (id INTEGER, jsonField JSON, PRIMARY KEY (id))';

kv-> execute 'DESCRIBE AS JSON TABLE jsonTable';
{
  "json_version" " 1,
  "type" : "table",
  "name" : "jsonTable",
  "shardKey" : [ "id" ],
  "primaryKey" : [ "id" ],
  "fields" : [ {
    "name" : "id",
    "type" : "INTEGER",
    "nullable" : false,
    "default" : null
  }, {
    "name": "jsonField",
    "type" : "JSON",
    "nullable" : true,
    "default" : null
  } ]
}

To populate the table with JSON documents like the example document presented above, you could execute code like the following:

final KVStore store = KVStoreFactory.getStore
   (new KVStoreConfig(<storeName>, <host> + ":" + <port>));
final tableAPI = store.getTableAPI();
final table = tableAPI.getTable("tsTable");
final List<String> listOfJsonDocs = {...}; 
for (int i = 0; i < listOfJsonDocs.size(); i++) {
    final Row row = table.createRow();
    row.put(id, i);
    row.putJson("jsonField", listOfJsonDocs.get(i));
    tableAPI.putIfAbsent(row, null, null); 
}

After populating the table with the necessary JSON documents (using the method row.putJson from the Table API), a Secondary Index on selected attributes of each document stored in the table's jsonField field can be created by executing a command like:

kv-> execute 'CREATE INDEX jsonSecIndex ON jsonTable
         (jsonField.party AS STRING,
          jsonField.current AS BOOLEAN,
          jsonField.contrib AS DOUBLE,
          jsonField.seniority AS INTEGER)';

In this case, queries can be performed based on various combinations of each senator's party affiliation, seniority, total amount of money contributed to the senator's campaign, and whether or not the senator is a currently sitting senator. For example, to find all current democratic senators with contributions totaling between 1 million and 20 million dollars, a command like the following could be executed from the Admin CLI:

kv-> GET TABLE –name jsonTable 
  –index jsonSecIndex 
  -field jsonField.party –value "Democrat" 
  -field jsonField.current –value true 
  -field jsonField.contrib –start 1000000.00 –end 20000000