Mapping a Full Text Index Field to an Elasticsearch Field

Unlike the command used to create a secondary index on data stored in an Oracle NoSQL table, the CREATE FULLTEXT INDEX command allows you to specify finer control over how Elasticsearch treats the fields to be indexed. For each field that you want Elasticsearch to handle in a non-default fashion, you can specify how you want Elasticsearch to treat that field's values by including a mapping specification with each such field when executing the CREATE FULLTEXT INDEX command.

If no mapping specification is provided for a given field, and if that field contains any indexable Oracle NoSQL data type – except JSON data – then Oracle NoSQL will use that data type to determine the appropriate type with which to map the field's values to the Elasticsearch type system. This means that for fields containing non-JSON data, the mapping specification can be used to enforce and/or override the data type Elasticsearch should use when indexing the field's contents.

For example, if a field of a given table contains values stored as the Oracle NoSQL Database string type, then the default mapping supplied to Elasticsearch will declare that values from that field should be indexed as the Elasticsearch string type. But if you want Elasticsearch to treat the values of that field as the Elasticsearch integer type, then you would provide a mapping specification for the field including an explicit type declaration; that is,

{"type":"integer"}

But care must be taken when mapping incompatible data types. For the example just described, Elasticsearch will encounter errors if any of the string values being indexed contain non-numeric characters. See Elasticsearch Mapping.

For the case where the field to be indexed has values that are JSON documents, a mapping specification must always be provided in the CREATE FULLTEXT INDEX command; otherwise an error will occur. A mapping specification is necessary for such fields because, as explained later, it is not the document itself that is indexed, but a subset of the document's fields. When a JSON document is stored in an Oracle NoSQL Database table, Oracle NoSQL knows only that a value of type JSON was stored. It does not know the type intended for any of the fields (attributes) within the document. Thus, for each of the document's fields that will be indexed, the user must provide a corresponding mapping specification that specifies the type that Elasticsearch should use when indexing the field's value.

In addition to specifying the data type of a given field's content, the mapping specification can also be used to further refine how Elasticsearch processes the data being indexed. This is accomplished by including an additional set of parameters in the mapping specification. For example, suppose you want Elasticsearch to apply an analyzer different than the default analyzer when indexing a field with content of type string. In this case, you would specify a mapping specification of the form:

{"type":"string", "analyzer":"<analyzer-name>"}

To see the mapping generated by Oracle NoSQL Database for a given index created in Elasticsearch, you can execute a command like the following from the command line of a host with network connectivity to one of the nodes in the Elasticsearch cluster (example: esHost):

curl –X GET 'http://esHost:9200/ondb.<store>.<table>.<index>/_mapping?pretty'

For details on the sort of additional mapping parameters you can supply to Elasticsearch via the mapping specification, see Elasticsearch Mapping Parameters.

As a concrete example, suppose you have a table named jokeTbl in a store named kvstore, where the table consists of a field named category with values representing the categories under which jokes can fall, along with a field named txt that contains a string consisting of a joke that falls under the associated category. Suppose that when indexing the values stored under the category field, you want to index each word that makes up the category; but when indexing each joke, you want the word stems (or word roots) to be stored rather than the whole words. For example, if a joke contains the word "solipsistic", the stem of the word - "solipsist" – would actually be indexed (stored) rather than the whole word.

Since the Elasticsearch "standard" analyzer breaks up text into whole words, and the "english" analyzer stems words into their root form, you would use the "standard" analyzer for the category field and the "english" analyzer for the txt field (assuming the jokes are written in English rather than some other language). Specifically, to create the Text Index, you would execute a command like the following from the Admin CLI:

kv-> execute 'CREATE FULLTEXT INDEX jokeIndx ON jokeTbl (
       category{"type":"string","analyzer":"standard"}, 
       txt{"type":"string","analyzer":"english"})';

Once the Text Index is created, you can then query the index by executing a curl command from the command line of a host with network connectivity to one of the nodes in the Elasticsearch cluster. For example,

curl –X GET 'http://<esHost>:9200/ondb.kvstore.jokeTbl.jokeIndx/_search?pretty'

To see the mapping generated by Oracle NoSQL Database for the jokeIndx in the example above, you can execute a curl command like the following:

curl –X GET 'http://<esHost>:9200/ondb.kvstore.jokeTbl.jokeIndx/_mapping?pretty'

Note:

Text indexed fields can include non-scalar types (such as map and array), which are specified in the same way, and with the same limitations, as those for Oracle NoSQL Secondary Indexes.