SODA Index Specifications (Reference)

6 SODA Index Specifications (Reference)

You can index the data in JSON documents using index specifications. A detailed definition of SODA index specifications is presented.

Note:

To create a B-tree index you need Oracle Database Release 12c (12.2.0.1) or later. To create a B-tree index that indexes a DATE or a TIMESTAMP value you need Oracle Database Release 18c (18.1) or later.
To create a spatial index or a search index you need Oracle Database Release 12c (12.2.0.1) or later.

An index specification is a JSON object that specifies a particular kind of database index, which is used for operations on JSON documents. You can specify these kinds of index:

B-tree: Used to index scalar JSON values. It is identified by the presence of field fields. (Only a B-tree index has this field.)
Spatial: Used to index GeoJSON geographic data. It is identified by the presence of field spatial. (Only a spatial index has this field.)
Search: Used for one or both of the following:
- Ad hoc structural queries or full-text searches
- JSON data guide
A search index specification is identified by the lack of fields fields and spatial.

Each kind of index specification requires a name field, which is a string that names the index.

B-Tree Index Specifications

A SODA B-tree index specification specifies a B-tree function-based index on SQL/JSON function json_value, which is used by SODA to query JSON documents for scalar values. A B-tree index specification can have the following fields. Field fields is required for a B-tree index specification. The other fields are optional.

fields — Array of objects, each of which targets a field in the indexed documents that has a scalar JSON value. When the array has more than one element the index specification results in the creation of a composite B-tree index.

The order of the elements in array fields specifies the primary order of indexing, that is, the order among the targeted fields. The field of the first array element has the highest priority; the field of the last element has the lowest priority.

Each object in the array can have the following fields:
- path — String specifying the path to the targeted field, whose value is expected to be a scalar. Required.
  
  If there are any array steps in the path then only the first element of each such array is used for indexing. In your documents, only scalar values for the targeted field are handled by the index — any non-scalar values for the field are ignored by the index.
- datatype — String naming the data type of the targeted-field value, for indexing purposes. Optional. Possible values (all are interpreted case-insensitively): "varchar2" (default), "number", "date", "timestamp", and the "varchar2" synonyms "string" and "varchar".
  
  An index can be used to improve performance when evaluating a QBE filter criterion if the effective type of the input data matched by QBE filter criteria matches the index datatype value.
  
  For an index to be picked up, to evaluate a given QBE, it is sufficient that the scalar JSON value targeted by the QBE be interpreted as being of the same SQL data type as the value of index-specification field datatype. This is the case for a JSON number value or string value and an index datatype of "number" or "varchar2" (or a "varchar2" synonym or no datatype), respectively.
  
  For other datatype values there is no directly corresponding JSON scalar data type, so for a QBE to pick up the index it needs to use an item-method operator, to transform the JSON value to a SQL value of the appropriate data type.
  
  For example, in a QBE such as {"dateField" : {"$date" : "2017-07-25"}} the input string value "2017-07-25" (which has one of the supported ISO 8601 date formats) is converted by QBE item-method operator $date to data type "date". An index specified with a datatype value of "date" can be picked up to evaluate the QBE.
  
  A QBE that does not explicitly use item-method operator $number or $string can pick up an index whose datatype is "number" or "varchar2" (or one of its synonyms), respectively, because of the direct correspondence between JSON and SQL data types for such values. For example:
  - Using QBE {"numField" : 20}, like using {"numField" : {"$number" : 20}}, can pick up an index created with datatype value "number".
  - Using QBE {"stringField" : "my string"}, like using {"stringField" : {"$string" : "my string"}}, can pick up an index created with datatype value "varchar2" (or one of its synonyms).
- maxlength — Number specifying the maximum length of the value to index. Optional. Ignored if the datatype is one (such as number) that has no length. If maxlength is not specified then the length of the value indexed is 4000 divided by the number of string fields that are indexed.
- order — Index sorting order. Optional. The value of field order can be the string "asc" or the number 1, meaning ascending order, or the string "desc" or the number -1, meaning descending order. Default: ascending order.
unique — Boolean. Optional. Whether the index is unique. Default: nonunique (false).
indexNulls — Boolean. Optional. Whether to index NULL values for the selected columns (by appending the numeric value 1 to the list of columns to index). Default: do not index NULL values (false).

Note:

You must specify a true value for indexNulls, for the index to be picked up for the orderby clause of a QBE.

The default error-handling behavior is that the targeted field need not be present, but if it is present then its value must be a a JSON scalar that is convertible to data type datatype. An error is raised at query time if, for any document, that is not the case. In addition, if such an index exists, and you try to write a document where that is not the case, then an error is raised for the write operation.^Foot 1

A B-tree index specification can specify an alternative error-handling behavior from the default by including field scalarRequired with a true value. That requires that the targeted field be present and have a value convertible to data type datatype. If, for any document to be indexed, that is not the case then an error is raised at indexing time. In addition, if such an index exists, and you try to write a document where that is not the case, then raise an error for the write operation.^Foot 2

Note:

A JSON null value in your data is always convertible to the data type specified for the index. That data is simply not indexed. (This is true regardless of the value of scalarRequired.)

Spatial Index Specifications

A SODA spatial index specification specifies an Oracle Spatial and Graph index, which indexes GeoJSON data. A spatial index specification has a spatial field, whose value is a string specifying the path to the JSON field to be indexed. The value of that targeted JSON field is expected to be a single GeoJSON geometry object, that is, a JSON scalar that is also a GeoJSON geometry object.

The default error-handling behavior is that the targeted field need not be present, but if it is present then its value must be a single GeoJSON geometry object. An error is raised at indexing time if, for any document to be indexed, that is not the case. In addition, if such an index exists, and you try to write a document where that is not the case, then an error is raised for the write operation.^Foot 3

A spatial index specification can specify an alternative error-handling behavior from the default by including one of the following Boolean fields with a true value. (Only one of these error-handling fields can be specified as true; otherwise, a syntax error is raised at index-creation time.)

scalarRequired — Boolean. Optional. The targeted field must be present and have a GeoJSON geometry object as its value. Raise an error at indexing time if, for any document to be indexed, that is not the case. In addition, if such an index exists, and you try to write a document where that is not the case, then raise an error for the write operation.^Foot 4
lax — Boolean. Optional. The targeted field need not be present or have a GeoJSON geometry object as its value. Do not raise an error at indexing time for any document to be indexed that lacks the field or for which the field value is not geometry. In addition, if such an index exists, and you try to write a document where that is the case, do not raise an error for the write operation.^Foot 5

Note:

If you have created a SODA spatial index for a field whose value is a GeoJSON geometry object, and if you use a QBE that targets that field, then the index can be picked up for the QBE only if both index and QBE specify the same error-handling behavior for that field. Both must specify the same one of these:

scalarRequired : true
lax : true
Neither scalarRequired : true nor lax : true

Search Index Specifications

A SODA search index specification specifies a JSON search index, which indexes the textual context of your JSON documents in a general way. A search index can improve the performance of both (1) ad hoc structural queries, that is, queries that you might not anticipate or use regularly, and (2) queries that make use of full-text search. It is an Oracle Text index that is designed specifically for use with JSON data.

A JSON search index can also accumulate and update aggregate information about your documents. In this it provides a JSON data guide, which is a summary of the structural and type information contained in a set of JSON documents. It records metadata about the fields used in those documents.

You can use data-guide information to:

Generate a JSON Schema document that describes the set of JSON documents.
Create database views that you can use to perform SQL operations on the data in the documents.
Automatically add or update virtual database columns that correspond to added or changed fields in the documents.

The data-guide information contained in a JSON search index is updated automatically as new JSON content is added.

By default, a search index specification creates an index that provides both of these features: a general index and a data guide. These features are specified by fields search_on (string) and dataguide (string), respectively.

If field search_on is present with value "none" then the index provides only the data-guide functionality (no general search index). If field dataguide is present with value "off" then only the general search-index functionality is provided (no data-guide support). (A dataguide value of "on", or no field dataguide, specifies data-guide support).

Besides none, field search_on can also have value "text" or "text_value". Both of these support full-text queries, which use QBE operator $contains, and they both support ad hoc queries that make of other QBE operators, such as $eq, $ne, and $gt.

In addition, search_on value "text_value" indexes numeric ranges. This is a separate value because it has an added performance cost. If you do not need range indexing then you can save some index maintenance time and some disk space by specifying value text instead of text_value. The default value of search_on is text_value.