Creating a Full Text Index

Review the concepts of Oracle NoSQL Database tables and indexes for better understanding of this section. See Indexes in the SQL Reference Guide. That chapter describes the main type of index you can create on the fields of a given Oracle NoSQL Database table.

This section introduces a second type of index that can be created on a given table. This second index category – separate from the Secondary Indexes described in the SQL Reference Guide – is referred to as a Full Text Index or, simply, a Text Index on the associated table.

As with any index, a Text Index as defined here, allows one to search for rows of an Oracle NoSQL Database table having fields that share some common value or characteristic. The difference between the two types of indexes is that an Oracle NoSQL Database Secondary Index is created, maintained and queried all within the Oracle NoSQL Database store; using the Oracle NoSQL Database Table API. On the other hand, the creation of a Text Index is only initiated via the Table API. Although the store maintains information about the Text Indexes that are created, such indexes are actually created, maintained, and queried in the Elasticsearch cluster with which the store is registered (using the Elasticsearch API).

It is important to understand that when the first type of index is created, data from the indexed fields of the associated table are written to the store itself; whereas when a Text Index is created, that data is streamed to the Elasticsearch cluster with which the store is registered, which stores (indexes) the data so that the Elasticsearch API can be used to execute full text searches against that data. Whenever new data is written to, or existing data is deleted from the table, the corresponding Text Index located in the cluster is updated accordingly.

To index one or more fields of an Oracle NoSQL table for Full Text Search in Elasticsearch, you can use the store's Admin CLI to execute a command with the following format:

execute 'CREATE FULLTEXT INDEX 
  [IF NOT EXISTS] 
  <index> ON <table> 
  (<field> {<mapping-spec>},<field> {<mapping-spec>}, ...) 
  [ES_SHARDS=<n>] 
  [ES_REPLICAS=<n>] 
  [OVERRIDE] 
  [COMMENT "<comment>"]';

Each argument, flag, and directive is described as follows, where any item encapsulated by square brackets [...] is optional, and the items encapsulated by curly braces {...} are required only when the field's value is a JSON document, but is optional otherwise:

index - The name of the Text Index to create.
table - The name of the table containing the fields to index.
field - A comma-separated list of each field to index, encapsulated by open parentheses.
Each field to index can optionally be associated with a mapping specification that specifies how Elasticsearch should handle the corresponding field. For example, whether Elasticsearch should treat the field's value as a text, number, date type, and so on; as well as what analyzer should be employed when indexing a text value. As explained in the sections below, the mapping specification for a given field must be expressed in valid JSON format.
If the command above is executed and a Text Index with the specified name already exists, then unless the optional directive IF NOT EXISTS is specified, or the optional directive OVERRIDE is specified, the command will fail, displaying an error message. Specifying IF NOT EXISTS when the named index already exists will result in a no-op. If OVERRIDE is specified for an existing index, then the existing index will be deleted from Elasticsearch and a new index will be created with the same name.
If the optional ES_SHARDS argument is specified, along with a corresponding integer value, then the setting for the number of primary shards Elasticsearch will use for the new index will be changed to the given value. Note that the default value for this setting is 5, and this setting cannot be changed after the index has been created.
If the optional ES_REPLICAS argument is specified, along with a corresponding integer value, then the setting for the number of copies of the indexed value each primary shard should have will be changed to the given value. Note that the default value for this setting is 1, and this setting can be changed on a live index at any time.

For more information on how the value of the ES_SHARDS and ES_REPLICAS properties are used, refer to the Elasticsearch settings named number_of_shards and number_of_replicas described in the Elasticsearch documentation. See Elasticsearch Index Settings.

When CREATE FULLTEXT INDEX executes successfully, the Text Index name provided in the command (along with metadata associated with that name) is stored and maintained in the Oracle NoSQL store. Additionally, a corresponding text searchable index – the index that is actually queried – is also created in the Elasticsearch cluster with which the store is registered. Whereas the name associated with the index in Oracle NoSQL is the simple index name specified in the CREATE FULLTEXT INDEX command, the name of the corresponding index in Elasticsearch takes the following dot-separated form:

ondb.<store>.<table>.<index>

Each of the coordinates of the Elasticsearch index name will always be lowercase; even if their counterpart in Oracle NoSQL was specified as mixed or upper case. The first coordinate (or prefix) of the name will always be ondb; which distinguishes the indexes in Elasticsearch that were created by the Oracle NoSQL CREATE FULLTEXT INDEX command from other indexes created externally, via the Elasticsearch API. The store coordinate of the Elasticsearch index name is the name of the Oracle NoSQL Database store that asked Elasticsearch to create the index. And the table and index coordinates are the values specified for the corresponding arguments in CREATE FULLTEXT INDEX; that is, the name of the Oracle NoSQL table from which the values to index are taken, and the name of the Oracle NoSQL Text Index the store should maintain. Using the coordinates of any such index name in Elasticsearch, one should always be able to determine the origin of the data stored in the index.

Once you have executed the CREATE FULLTEXT INDEX command described above, you can verify that the Text Index has been successfully created in Oracle NoSQL by executing the show indexes command from the Admin CLI; for example,

kv-> show indexes –table mytestTable

Indexes on table mytestTable mytestIndex (...), type: TEXT

You can also verify that the corresponding full text searchable index has been created in Elasticsearch. To do this you can execute a curl command from the command line of a host with network connectivity to one of the nodes in the Elasticsearch cluster; for example,

curl –X GET 'http://esHost:9200/_cat/indices'

yellow open ondb.kvstore._checkpoint ...
yellow open ondb.kvstore.mytesttable.mytestindex ...

Notice the entry that references the ondb.kvstore._checkpoint index. This index will be automatically created upon the creation of the first Oracle NoSQL Text Index. Unless it is manually deleted from the Elasticsearch cluster, it will always appear in the output of the indices command. This so-called _checkpoint index contains internal information written by Oracle NoSQL to support recovery operations when Oracle NoSQL is restarted. In general, this index should never be removed or modified.

Note:

Throughout this document, the curl utility program is used to demonstrate how to issue and display the results of HTTP requests to the Elasticsearch cluster. The curl program is supported on most operating systems (linux, Mac OS X, Microsoft Windows, and so on). It is used here because it is easy to install and can be run from the command line. Other options you can explore for sending queries to Elasticsearch are:

The elasticsearch-head tool; which is a web front end for browsing, querying, and interacting with an Elasticsearch cluster. See elasticsearch-head.
The Elasticsearch Java API; which can be used to query Elasticsearch from within program control. See Elasticsearch Java API.

In addition to executing show indexes from the Oracle NoSQL Admin CLI, you can also execute the show table command; which, in addition to the table structure, will also list all indexes (both secondary and text) created for that table. For example,

kv-> show table –table mytestTable
 
{
  "json_version" : 1,
  "type" : "table",
  "name" : "mytestTable",
  "shardKey" : [ "id" ],
  "primaryKey" : [ "id" ],
  "fields" : [
  {
    "name" : "id", 
    "type" : "INTEGER", 
    "nullable" : false, 
    "default" : null
   },
  { 
    "name" : "category", 
    "type" : "STRING", 
    "nullable" : true,
     "default" : null 
  },
  { 
    "name" : "txt",
     "type" : "STRING", 
    "nullable" : true, 
    "default" : null 
  } ], 
  "indexes" : [
    {
       "name" : "mytestIndex",
      "table" : "mytestTable",
      "type" " "text",
      "fields" : [ "category", "txt" ],
      "annotations" : {
        "category" : "{\"type\" : \"string"\",
                       \"analyzer\" : \"standard\"}",
        "txt" : "{\"type\" : \"string\",
                  \"analyzer\" : \"english\"}"
      }
    } ]
}

Note:

You cannot evolve a Text Index created in Elasticsearch via the CREATE FULLTEXT INDEX mechanism. If you want to change the index definition, for example, add more columns to the index, you must first delete the existing index using the DROP INDEX command and then use CREATE FULLTEXT INDEX to create a new Text Index satisfying the desired definition.