Creating Text Indexes on JSON Document Content

Using the example presented previously, this section describes how to create a Text Index on the contents of a JSON document stored in a NoSQL table, and then perform various Full Text Search queries on the resulting index in Elasticsearch.

Unlike Oracle NoSQL Database Secondary Indexes, where the type of each value stored in a field of a given table is inferred from the table schema, for Text Indexes, the type of each attribute to be indexed cannot be inferred from the schema; and thus, must be specified in the CREATE FULLTEXT INDEX command. Although the table's schema tells Oracle NoSQL that the values in a given field (column) of a table is a JSON document, it tells Oracle NoSQL nothing about the internal structure of the document itself, other than each element is JSON formatted content. Since Oracle NoSQL knows neither the attributes within the JSON document to be indexed, nor the data types that should be used when indexing those attributes, that information must be explicitly given to Oracle NoSQL via the CREATE FULLTEXT INDEX command.

Thus, to create a Text Index on a column containing JSON documents, in addition to specifying the attributes to index, in JSON path notation, you must also always provide a mapping specification. This tells Oracle NoSQL the attributes within the document to index, as well as the data type to tell Elasticsearch to use when indexing each such attribute.

For example, in the previous section a Secondary Index was created and queried to find all current democratic senators with contributions totaling between 1 million and 20 million dollars. But suppose you want to refine that search, to find all current democratic senators with contributions totaling between 1 million and 20 millions dollars, who also serve on either the Judiciary or Appropriations committee (or both). For such a search, a Text Index should be created instead of a Secondary Index; not only because the committee information is contained in a nested array of strings, but also so that a Full Text Search can be performed.

To do this, first create the desired Text Index by executing the following command from the Admin CLI:

kv-> execute 'CREATE FULLTEXT INDEX jsonTxtIndex ON 
    jsonTable (
      jsonField.current{"type":"boolean"},
      jsonField.party{"type":"string","analyzer":"standard"},
      jsonField.duties.committe{"type":"string"},
      jsonField.contrib{"type":"double"})';

Rather than creating a Secondary Index on the ts column of the table named jsonTable, like you did in the previous section's example, the command above instead creates a Text Index consisting of specific attributes of the documents stored in that column. Although the previous example index allowed you to find all current democratic senators with contributions totaling between 1 million and 20 million dollars, the Text Index created above allows the search to be refined. With the Text Index, you can search for all current democratic senators with contributions totaling between 1 million and 20 millions dollars, who also serve on either the Judiciary or Appropriations committee, or both.

After creating the Text Index using the command above, you can then query Elasticsearch for the documents that satisfy the desired search criteria by executing a curl command from a node that has network access to the Elasticsearch cluster with which the Oracle NoSQL store is registered. For example, from the node named esHost,

curl –X GET 
'http://esHost:9200/ondb.kvstore.jsontable.jsontxtindex/_search?pretty' 
'-d {query":{"bool":{
"must":{"match":{"jsonField.party":"Democrat"}},
"must":{"match":"jsonField.current":"true"}},
"must":{"range":{"jsonField.contrib":{
        "gte":"1000000.00","lte":20000000.00"}}},
"must":"match":{"jsonField.duties.committe":
                  "Judiciary Apropriations"}}}}}'

As previously explained, ondb.kvstore.jsontable.jsontxtindex in the query above is the name of the index that Oracle NoSQL creates in Elasticsearch; where kvstore is the name of the Oracle NoSQL store, jsontable corresponds to the table (jsonTable) in that store that contains the JSON documents being indexed, and jsontxtindex corresponds to the Text Index metadata maintained by the store.

The output produced by the Elasticsearch query above (with some re-formatting for readability) should look something like:

{
  ....
  "hits" : {
    "total" : 31,
    "max_score" : 1.4695805,

    "hits" : [ {
      "_index" : "ondb.kvstore.jsontable.jsontindex",
      "_type" : "text_index_mapping",
      "_id" : "/w/0001",
      "_score" : 1.4695805,

      "_source":{"_pkey":{"_table":"jsontable","id":"1"},  
        "jsonField":"{"description":
                          "Senior Senator for Ohio"},
        "jsonField"{"current":"true"},
        "jsonField":{"congress_numbers":[223,224,225]},
        "jsonField":{"party":"Democrat"},
        "jsonField":{"seniority":37},
        "jsonFeld":{"personal":{"birthday":1952-11-09"}},
        "jsonField":{"personal":{"lastname":"Brown"}},
        "jsonField":{"contrib":257134.93},
        "jsonField":{"duties":{"committee":["Ways and 
            Means","Judiciary","Democratic Steering"]}}, 
        "jsonField":{"duties":{"caucus":["Congressional 
            Automotive","Human Rights","Steel Industry"]}},
        "jsonField":{"personal":{"address":{"home":{
                                   "state":"OH"}}}},
        "jsonField":{":"personal":{"address":{"home":{
                                   "city":"Columbus"}}}}}
    } ],
    ....
  }
}

It is important to understand that unlike the query against the Secondary Index presented in the previous section, this query is executed against the Elasticsearch cluster rather than the Oracle NoSQL store. Additionally, the Text Index created here allows one to perform a Full Text Search on the values in the nested array jsonField.duties.committee; something that cannot be done with Secondary Indexes.