Mapping Oracle NoSQL TIMESTAMP to Elasticsearch date Type

Mapping Oracle NoSQL `TIMESTAMP` to Elasticsearch `date` Type

When a value representing a date and time is written to a field of an Oracle NoSQL table, the value is stored in the table as an instance of java.sql.Timestamp; which corresponds to the Oracle NoSQL timestamp enum type. See Atomic Data Types in the SQL Reference Guide.

When creating a table, the keyword timestamp is then used to specify such a field in the table. Along with the timestamp keyword, an integer parameter representing the precision to apply when storing the value must also be specified, employing a declaration with the following form:

TIMESTAMP(<precision>)

The value input for precision must be one of ten possible integer values, from 0 to 9. In general, the timestamp data type defined by Oracle NoSQL Database allows finer-grained time precisions to be stored in a table; up to nanosecond granularity. A value of 0 input for precision specifies the least precise representation of a timestamp value; which corresponds to a format of, yyyy-MM-dd'T'HH:mm:ss, with 0 decimal places in the value's seconds component. A value of 9 specifies the finest granularity - or most precise - representation, which includes an instant during the given day that is accurate to the nanosecond. timestamp values with nanosecond precision correspond to a format of yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS, with 9 decimal places in the seconds component. All other precisions (1–8) represent a day and time granularity falling somewhere between the least precise (0 decimal places) and the most precise (9 decimal places).

As another concrete example, suppose you wish to create a table named tsTable consisting of an id field containing the table's Primary Key, and a field named ts that will contain values representing a date and a time-of-day in which the seconds component is represented with 6 decimal point accuracy (example: date = 1998-10-26, time-of-day = 08:33:59.735978). To create such a table, one can execute the following command from the Admin CLI:

kv-> execute 'CREATE TABLE tsTable (id INTEGER, ts TIMESTAMP(6), PRIMARY KEY (id))';

Suppose then that you wish to store the following values in the ts field:

tsVal[0] = 1996-12-31T23:01:43.987654321
tsVal[1] = 2005-03-20T14:10:25.258
tsVal[2] = 1998-10-26T08:33:59.735978
tsVal[3] = 2001-09-15T23:01:43.55566677
tsVal[4] = 2002-04-06T17:07:38.7653459

To store those values, you could execute code like the following:

final KVStore store = KVStoreFactory.getStore 
    (new KVStoreConfig(<storeName>, <host> + ":" + <port>));
final TableAPI tableAPI = store.getTableAPI();
final Tabletable = tableAPI.getTable("tsTable");
for (int i = 0; i < 5; i++) {
    final Row row = table.createRow();
    row.put(id, i);
    row.put(ts, TimestampUtils.parseString(tsVal[i]));
    tableAPI.putIfAbsent(row, null, null); 
}

Because the ts field of the table was created with precision 6, each value will be stored with 6 decimal places in the seconds component of the value. Specifically, if the value being stored contains more than 6 decimal places, then Oracle NoSQL will store the value with the decimal part of the seconds component rounded to 6 decimal places. For example, tsVal[4] from the list above will be stored as, 2002-04-06T17:07:38.765346.

Similarly, if the value being stored contains fewer than 6 decimal places, then Oracle NoSQL will pad the decimal part of the seconds component with zeros. For example, tsVal[1] from the list above will be stored as, 2005-03-20T14:10:25.258000.

When creating a Text Index on a table's field containing timestamp values, it is important to understand how the Oracle NoSQL Database Table API handles fields such as those described above. It is important because Elasticsearch stores values representing date and time using the Elasticsearch date type; which does not map directly to the java.sql.Timestamp type stored by Oracle NoSQL Database.

When indexing a timestamp field for Full Text Search, the Elasticsearch date type must be specified in the CREATE FULLTEXT INDEX command; otherwise Elasticsearch will handle the field's values as a string type. For example, the simplest way to index (for full text search) the ts field from the tsTable in the example above, would be to execute the following command:

kv-> execute 'CREATE FULLTEXT INDEX tsIndex ON tsTable (ts{"type":"date"})';

In this case, a default mapping specification will be generated that will tell Elasticsearch to handle the broadest range of date type formats when handling the values being indexed.

When indexing values that represent date and time in Elasticsearch, whenever you specify the date type for those values, you can also specify a format to which each indexed value must adhere; where an error will occur if a given value does not satisfy the specified format. See Elasticsearch Date. In a fashion similar to how one specifies an "analyzer" for a "string" value, the Elasticsearch API defines a format parameter that can be used to specify – via the mapping specification – the format Elasticsearch should expect when indexing a given value of type date. Specifically,

<fieldname>{"type":"date","format":"<format>"}

where the value input for the format token can be an explicit format such as, yyyy-MM-dd'T'HH:mm:ss, or can be a combination one or more of the Elasticsearch pre-defined values (macros). See Elasticsearch Built In Formats.

Using the Elasticsearch API (not Oracle NoSQL), a typical Elasticsearch mapping specification for a date type might then specify an explicit format along with one or more values from the set of Elasticsearch built in formats; for example,

{"type":"date","format":"yyyy-MM-dd'T'HH:mm:ss.SSS||yyyy-MM-dd||epoch_millis"}

A format like that shown tells Elasticsearch to expect values in a form such as, 1997-11-17T08:33:59.735, or 1997-11-17, or even as a number of milliseconds since the epoch. If a value has any other format, an error will occur and Elasticsearch will not index (store) the value.

Rather than employing an explicit format such as that shown in the example above, you can also specify formats using some combination of only the macros from the table; for example,

{"type":"date","format":"strict_date_optional_time||epoch_millis"}

This tells Elasticsearch that although acceptable date values must include the date (strict_date=yyyy-MM-dd), Elasticsearch should accept any values with or without a time component (optional_time). Additionally, if the value represents the number of milliseconds since the epoch, then such values should also be accepted by Elasticsearch.

With respect to using the CREATE FULLTEXT INDEX command to index a timestamp value for Full Text Search, although it is possible to specify the Elasticsearch format parameter for a date field in a way similar to the Elasticsearch API examples shown above, it may not be very practical. First, the number of valid combinations of macros from the set of Elasticsearch built in formats is very large, and may pose a significant burden for users.

Next, unlike other mapping parameters defined by Elasticsearch (for example the "analyzer" parameter for "string" types), if the user specifies a valid format for an Elasticsearch date field, but one or more of the values to be indexed do not satisfy that format, then an error will occur (in Elasticsearch) and those values will not be indexed. For example, if the user specifies a "french" analyzer for a string field but the value is actually in English, although unexpected search output may result, no error will occur. On the other hand, if the user specifies a format of yyyy-MM-dd'T'HH:mm:ss.SSS for a date field, but the value(s) being indexed contains more than 3 decimal places in the seconds component, although the index will be created, format errors will occur and the non-conformant values will not be indexed;.

To provide a more convenient mechanism for specifying the format for date values, as well as to minimize the opportunity for the sort of format errors just described, a special "name":"value" parameter is defined for the CREATE FULLTEXT INDEX command. When indexing Oracle NoSQL timestamp values as date values in Elasticsearch, rather than using the Elasticsearch format parameter (and its valid values), the specially defined precision parameter should be used instead. Although the precision parameter is optional, when it is included with a "type":"date" specification in the CREATE FULLTEXT INDEX command, the value of that parameter can be either millis or nanos. Specifically, when the CREATE FULLTEXT INDEX command is used to index NoSQL timestamp values as date values in Elasticsearch, one of the following parameter mappings must be specified in that command:

{"type":"date"}
{"type":"date","precision":"millis"}
{"type":"date","precision":"nanos"}

Note that the default precision (that is, no precision), as well as the nanos precision, both map - and index - the broadest range of timestamp formats as valid date types in Elasticsearch without error; whereas the millis precision indexes only timestamp values defined with precision 3 or less. As a result, the precision you use should be based on the following criteria:

If you know for sure that all values from the table field to be indexed have only precision 3 (milliseconds) or less, and you want to index the values using 3 decimal places in all cases, then specify millis precision.
If the field you wish to index consists of timestamp values of varying precisions and you want to index only those values with precision 3 or less, then specify millis precision; so that values with greater than milliseconds precision will not be indexed.
In all other cases, use either nanos precision or the default precision.

In summary, the special precision parameter not only minimizes the number of possible values the user can specify for the date type, it also reduces the occurrence of format errors by providing a way to map such values to the broadest range of possible formats; as well as allow the user to enforce milliseconds precision in the index.

Note:

As described above, a precision of nanos specified for a date type is currently identical to specifying no precision, which translates to the default date format. Although this may seem redundant, the nanos option is defined for two reasons. First, it is intended to be symmetric with the millis option; so that if a user knows the timestamp field being indexed consists of values with greater than millisecond precision, the user can simply specify nanos and the right thing will be done when constructing the mapping specification that will be registered with Elasticsearch.

The second reason for defining the nanos option is related to the fact that Elasticsearch currently supports formats with precisions no greater than milliseconds. (Notice that the Elasticsearch built in formats include macros associated with nothing finer than millis). If a version of Elasticsearch is released in the future that supports formats including nanoseconds precision, then a fairly straightforward change can be made in Oracle NoSQL Database to map the current nanos option to the new format defined by Elasticsearch; requiring no change in the public api, and no change to user applications.