Classification of Indexes

Indexes can be classified based on fields, schema, entries, or a combination of them. Each one of these is described below.

Fields
- Single Field Index
- Composite Index
Schema
- Fixed Schema Index
- Schema-less Index (JSON Index)
Entries
- Simple Index
- Multikey Index

The following classification is made based on the number of fields that are provided while creating an index.

Single Field Index: An index is called a single field index if it is created on only one field of a table.
Composite Index: An index is called a composite index if it is created on more than one field of a table.

The following classification is made based on the schema type of the fields that are indexed.

Fixed Schema Index: An index is called a fixed schema index if all the fields that are indexed are strongly typed data. For more information on strongly typed data, see Wildcard Data Types section.
Schema-less Index (JSON Index): An index is called a JSON index if at least one of the fields is JSON data or fields inside JSON data.

The following classification is made based on the number of index entries created for each row of data in the table when evaluating an index.

Simple Index: An index is called a simple index if for each row of data in the table, there is one entry created in the index.
Multikey Index: An index is called a multikey index if for each row of data in the table, there are multiple entries created in the index.

Figure 8-1 Index Classification

Single Field Index

An index is called a single field index if it is created on only one field of a table.

Example 8-2 Single Field Index

The following is an example of a single field index. The index is created on the city field present in the address record in the info JSON field in the UserInfo table.

CREATE INDEX singlefieldindex1 ON UserInfo (
    info.address.city AS ANYATOMIC
);

Composite Index

An index is called a composite index if it is created on more than one field of a table.

Example 8-3 Composite Index

The following is an example of a composite index. The index is created on the state and city fields present in the address record in the info JSON field in the UserInfo table.

CREATE INDEX compositeindex1 ON UserInfo (
    info.address.state AS ANYATOMIC,
    info.address.city AS ANYATOMIC
);

Fixed Schema Index

An index can be created on a field with fixed schema data.

Example 8-4 Fixed Schema Index

The following is an example of a fixed schema index. The index is created on the uname field having integer data type in the UsersInfo table.

CREATE INDEX fixedschemaindex1 ON UserInfo (uname);

JSON Index

An index is called a JSON index if at least one of the fields is inside JSON data.

As JSON is schema-less, the data type of an indexed JSON field may be different across rows. When creating an index on JSON fields, if you are unsure what data type to expect for the JSON field, you may use the anyAtomic data type. Alternatively, you can specify one of the Oracle NoSQL Database atomic data types. You do that by declaring a data type using the AS keyword next to every index path into the JSON field.

Example 8-5 JSON Index

The following is an example of a JSON index. The index is created on the income field present in the info JSON field in the UserInfo table. Notice that you provide a data type for the income field while creating the index.

CREATE INDEX jsonindex1 ON UserInfo (
    info.income AS INTEGER
);

The creation of a JSON index will fail if the associated table contains any rows with data that violate the declared data type. Similarly, after creating a JSON index, an insert/update operation will fail if the new row does not conform to the declared data type in the JSON index.

For example, the jsonindex1 index will be created only if the income field in all the rows of the UserInfo table is of integer data type, if not the jsonindex1 index creation will fail. Similarly, after creating the jsonindex1 index on the UserInfo table, you can insert only rows in which the income field is of integer data type. For example, If you try inserting a row in which the income field is of string data type, the insert statement will fail.

Declaring a JSON index path as anyAtomic has the advantage of allowing the indexed JSON field to have values of various data types. The index entries are sorted in ascending order. When these values are stored in the index, they are sorted as follows:

Numbers
String
boolean

However, this advantage is offset by space and CPU costs. It is because numeric values of any kind in the indexed field will be cast to Number before being stored in the index. This cast takes CPU time, and the resulting storage for the number will be larger than the original storage for the number.

Example 8-6 JSON Index Example

The following is an example of a JSON index. The index is created on the street field present in the address field in the info JSON document in the UserInfo table. Notice that you provide anyAtomic data type for the street field while creating the index.

CREATE INDEX jsonindex2 ON UserInfo (
    info.address.street AS ANYATOMIC
);

Simple Index

An index is called a simple index if, for each row of data in the table, there is one entry created in the index. The index will return a single value that is of atomic data type or any special value (SQL NULL, JSON NULL, EMPTY). Essentially, the index paths of a simple index must not have .keys(), or .values(), or [] steps.

Example 8-7 Simple Index

The following is an example of a simple index. The index is created on the income and age fields present in the info JSON field in the UserInfo table.

CREATE INDEX simpleindex1 ON UserInfo (
    info.income AS ANYATOMIC,
    info.age AS ANYATOMIC
);

A simple index path must not include an array since arrays will render multiple index values. For example, info.connections[] returns the complete set of elements in the info.connections array. It is not a simple index.

Similarly, a simple index path must not have a map with .keys() or .values(). For example, info.expenses.keys() returns the complete set of keys in the expenses map. It is not a simple index path. However, info.expenses.books is a simple index path. Because even though info.expenses is a map, info.expenses.books return an atomic value.

Example 8-8 Simple Index Example

The following is an example of a simple index created on a JSON document in a JSON field. The index is created on the books item of the expenses JSON document in the info JSON field in the UserInfo table.

CREATE INDEX simpleindex2 ON UserInfo (
    info.expenses.books AS ANYATOMIC
);

If the evaluation of a simple index path returns an empty result, the special value EMPTY is used as an index entry. In the above example, If there is no books entry in the expenses JSON document, or if there is no expenses JSON document, then the special value EMPTY is indexed.

Multikey Index

An index is called a multikey index if for each row of data in the table, there are multiple entries created in the index. In a multikey index there is at least one index path that uses .keys(), .values(), or [] steps. Any such index path will be called a multikey index path.

In a multikey index, for each table row, index entries are created on all the elements in arrays or entries in maps that are being indexed. If the evaluation returns an empty result, the special value EMPTY is used as the index entry. Any duplicate index entries are then eliminated.

Example 8-9 Multikey Index

The following is an example of a multikey index. The index is created on the connections[] array in the UserInfo table. Here, all the elements in the connections[] array in each row of the UserInfo table will be indexed.

CREATE INDEX multikeyindex1 ON UserInfo (
    info.connections[] AS ANYATOMIC
);

Nested Multikey Index

An index is a nested multikey index if it is created on a field that is present inside an array which in turn is present inside another array.

Example 8-10 Nested Multikey Index

The following is an example of a nested multikey index where the field is present in an array that is present inside another array. The index is created on the issuedby field in the vpass array in the vehicles array in the info JSON of the UserInfo table.

CREATE INDEX multikeyindex2 ON UserInfo (
    info.vehicles[].vpass[].issuedby AS ANYATOMIC
);

Composite Multikey Index

An index is called a composite multikey index if it is created on more than one field, and at least one of those fields is multikey.

A composite multikey index may have a combination of multikey index paths and simple index paths.

Example 8-11 Composite Multikey Index

The following is an example of a composite multikey index having one multikey index path and one simple index path. The index is created on the income field and area field in the info JSON column of the UserInfo table.

CREATE INDEX multikeyindex3 ON UserInfo (
    info.income AS ANYATOMIC,
    info.address.phones[].area AS ANYATOMIC
);

A composite multikey index may have more than one multikey index path. Some of the possibilities of composite multikey indexes are given below.

You can use both the .keys() and .values() steps of a map or JSON together.

The following is an example of a composite multikey index in which both the .keys() and .values() steps of a JSON document are used together. The index is created on the keys and values of the expenses JSON document in the UserInfo table.
```
CREATE INDEX multikeyindex4 ON UserInfo (
    info.expenses.keys(),
    info.expenses.values() as ANYATOMIC
);
```
You can use multiple fields of an array/record/maps-viewed-as-records together. However, the restriction here is that a field cannot be treated as both array and map within a single CREATE INDEX statement.

The following is an example of a composite multikey index in which multiple fields of an array are used together. The index is created on the passid and issuedby fields in the vpass array in the UserInfo table.
```
CREATE INDEX nestedindex1 ON UserInfo (
    info.vehicles[].vpass[].passid AS ANYATOMIC,
    info.vehicles[].vpass[].issuedby AS ANYATOMIC
);
```
Figure 8-2 Composite Multikey Index
You can use fields of an array/record/maps-viewed-as-records, as well as the fields present inside the inner arrays of those array/record/maps-viewed-as-records together. However, the restriction here is that the immediate parent array of all such fields should be the same.

The following is an example of a composite multikey index in which fields of an array and fields of an inner array are used together. The index is created on the vid field in the vehicles array and the passid field in the vpass array which is in the vehicles array in the UserInfo table.
```
CREATE INDEX nestedindex2 ON UserInfo (
    info.vehicles[].vid AS ANYATOMIC,
    info.vehicles[].vpass[].passid AS ANYATOMIC
);
```
Figure 8-3 Composite Multikey Index

The following is an example of a composite multikey index in which fields of an array and fields of an inner array are used together, however, notice that the immediate parent array of the two fields are not the same. The following is an invalid index creation statement. The index is being created on the passid and serviceid fields in the vehicles array in the UserInfo table. Note that within the vehicles array, passid is inside the vpass array, and serviceid is inside the vservice array.

CREATE INDEX invalidindex1 ON UserInfo (
    info.vehicles[].vpass[].passid AS ANYATOMIC,
    info.vehicles[].vservice[].serviceid AS ANYATOMIC
);

Figure 8-4 Composite Multikey Index (invalid)

Specifications & Restrictions on Multikey Indexes

The following specifications & restrictions apply to multikey index paths:

You cannot provide predicate or boundary expressions for .keys(), .values(), and [] steps. For more information on predicate and boundary expressions, see the Map-Filter Step Expressions and Array-Slice Step Expressions sections respectively.
When a multikey index path is evaluated on a table row, it must return zero or more atomic values. If no value is returned, then the special value EMPTY is used.
No data type declaration is allowed for .keys() step, as the keys() are by default string data type.
You can provide .keys() steps only as the last step in an index path.
For Example, info.expenses.keys() is a valid index path whereas info.expenses.keys().books is an invalid index path.
You can provide .keys() and .values() only after the fields that are maps in strongly typed data.
You can provide .keys() and .values() steps after a field that you expect to be atomic values or JSON documents inside JSON data.
- If the fields are atomic values, then the special value EMPTY will be returned.
- If the fields are json documents, then the corresponding keys or values will be returned.
You should provide [] steps after those fields that are arrays in strongly typed data.
You should provide [] steps after those fields that you expect to be arrays inside JSON data.
If you do not provide [] steps after those fields that you expect to be arrays inside JSON data, then those fields cannot be an array.
- If the rows in the existing table contain any array inside the JSON data, then the index creation will fail.
- However, if there are no rows in the existing table that contain an array inside the JSON data, then the index creation will be successful. However, you will not be able to insert a new row with an array in that JSON data in the existing table.
- For example, in the following badindex1, the [] step is not used with the vehicles array. There can be two scenarios here:
```
CREATE INDEX badindex1 ON UserInfo(info.vehicles.vid AS ANYATOMIC);
```
  - If there are existing rows of data in the UserInfo table where the vehicles field happens to be an array, the index creation will fail.
  - If there are no existing rows of data with the vehicles array in the UserInfo table, the index will be created successfully. However, if you attempt to insert a new row of data where the vehicles attribute is an array, the row insertion will fail.
You cannot provide [] steps after the non-array fields in strongly typed data.
You can provide [] steps after the atomic values and JSON documents in JSON data. If you do so, then that atomic value or JSON document will be used.
If an array contains another array directly, then the corresponding number of []steps are required to index the elements of the inner arrays. For example, if there are two inner arrays inside vehicles, then info.vehicles[][].vid should be used.

Indexes on Functions

You can create indexes on the values of one or more SQL built-in functions.

List of functions that can be indexed:

The following subset of the Built-in Functions can be indexed.

Functions on Timestamps:

year
month
day
hour
minute
second
millisecond
microsecond
nanosecond
week

Functions on Strings:

length
replace
reverse
substring
trim
ltrim
rtrim
lower
upper

Functions on Rows:

modification_time
expiration_time
expiration_time_millis
row_storage_size

See Built-in Functions for more details on what a built-in function is and how to use these functions.

Examples of creating indexes on functions

You can create indexes on the values of one or more SQL built-in functions.

Create Index on row-property functions

A row-property function expects a row variable as its sole argument in a DML statement. You can create an index on these row property functions.

modification_time
expiration_time
expiration_time_millis
row_storage_size

In a CREATE INDEX statement, you must provide these functions without any argument. The row you are indexing is implicitly considered as the input to the function.

Example:

Create an index which indexes the rows of the Users table by its latest modification time:

CREATE INDEX idx_modtime ON users(modification_time())

This index will be used in a query, which has modification_time as the filter condition.

SELECT * FROM Users $u WHERE 
modification_time($u) > "2022-08-01T10:45:00"

This query returns all the rows whose most recent modification time is after 2022-08-01T10:45:00. It uses the idx_modtime index defined above. You can verify this by viewing the query plan using the show query command.

More examples of creating indexes on functions

If you are creating an index on a built-in function that is not a row property function, then the first argument of the function must be an index path. This is followed by a path type if the path points to JSON data as shown in the example below. Some string functions need more than one argument and in that case, all the arguments other than index path and path type should be constant literal only.

Note:

The path type is not needed if it is a non-JSON as the datatype is the same as the definition in the table. For JSON data, the type ANYATOMIC can be used to cover all valid types in JSON.

Characteristics of the built-in functions that can be indexed:

All of the built-in functions that can be indexed expect at most one item as their first argument.
The time-related functions such as YEAR, MONTH etc will throw an exception if their input is a sequence with more than 1 item, and the string related functions such as length, reverse and so on will return NULL in this case.
If the input index path has a multi-key, then the function will be evaluated separately for each value returned by the multi-key path. The resulting sequence of items will be combined with the values of the other index paths (if any) in the index definition and index entries will be created.

Download the script acctstream_loaddata.sql and run it as shown below. This script creates the stream_acct table used in the example below and loads data into the table.

Start your KVSTORE or KVLite and open the SQL.shell.

java -jar lib/kvstore.jar kvlite -secure-config disable
 java -jar lib/sql.jar -helper-hosts localhost:5000 -store kvstore

Using the load command, run the script.

load -file acctstream_loaddata.sql

Example 1: Create an index in the stream_acct table to index users by the first three letters of their country of residence.

CREATE INDEX idx_country ON stream_acct(substring(acct_data.country as string, 0,3))

In the example above,you declare thecountry field of the acct_data JSON as a string.
The substring function is used to extract the first three letters of the country and use it as the index key.

Example 2: Using a multi-key index path

In the following example, you index the users in the stream_acct table by the id of the shows they watch and the year and month of the dates when the show was watched.

CREATE INDEX idx_showid_year_month ON 
stream_acct(acct_data.contentStreamed[].showId AS INTEGER,
substring(acct_data.contentStreamed[].seriesInfo[].episodes[].date AS STRING,0, 4),
substring(acct_data.contentStreamed[].seriesInfo[].episodes[].date AS STRING,5, 2))

An example of a query using this index is shown below. The query counts the number of users who watched any episode of show 16 in the year 2022.

SELECT count(*) FROM stream_acct s1 WHERE EXISTS 
s1.acct_data.contentStreamed[$element.showId = 16].seriesInfo.
episodes[substring($element.date, 0, 4) = "2022"]

This query will use the index idx_showid_year_month. You can verify this by viewing the query plan using the show query command.

show query SELECT count(*) FROM stream_acct s1 WHERE EXISTS
> s1.acct_data.contentStreamed[$element.showId = 16].seriesInfo.episodes[substring($element.date, 0, 4) = "2022"]

{
  "iterator kind" : "GROUP",
  "input variable" : "$gb-1",
  "input iterator" :
  {
    "iterator kind" : "RECEIVE",
    "distribution kind" : "ALL_SHARDS",
    "distinct by fields at positions" : [ 1 ],
    "input iterator" :
    {
      "iterator kind" : "SELECT",
      "FROM" :
      {
        "iterator kind" : "TABLE",
        "target table" : "stream_acct",
        "row variable" : "$$s1",
        "index used" : "idx_showid_year_month",
        "covering index" : true,
        "index row variable" : "$$s1_idx",
        "index scans" : [
          {
            "equality conditions" : {"acct_data.contentStreamed[].showId":16,"substring#acct_data.contentStreamed[].seriesInfo[].episodes[].date@,0,4":"2022"},
            "range conditions" : {}
          }
        ]
      },
      "FROM variable" : "$$s1_idx",
      "SELECT expressions" : [
        {
          "field name" : "Column_1",
          "field expression" :
          {
            "iterator kind" : "CONST",
            "value" : 1
          }
        },
        {
          "field name" : "acct_id_gen",
          "field expression" :
          {
            "iterator kind" : "FIELD_STEP",
            "field name" : "#acct_id",
            "input iterator" :
            {
              "iterator kind" : "VAR_REF",
              "variable" : "$$s1_idx"
            }
          }
        }
      ]
    }
  },
  "grouping expressions" : [

  ],
  "aggregate functions" : [
    {
      "iterator kind" : "FUNC_COUNT_STAR"
    }
  ]
}