Using Indexes in NoSQL Database

n Oracle NoSQL Database, the query processor can identify which of the available indexes are beneficial for a query and rewrite the query to make use of such an index.

Using an index means scanning a contiguous subrange of its entries, potentially applying further filtering conditions on the entries within this subrange, and using the primary keys stored in the index entries to extract and return the associated table rows. The subrange of the index entries to scan is determined by the conditions appearing in the WHERE clause, some of which may be converted to search conditions for the index. Given that only a (hopefully small) subset of the index entries will satisfy the search conditions, the query can be evaluated without accessing each individual table row, thus saving a potentially large number of disk accesses.

In an Oracle NoSQL Database, a primary-key index is always created by default. This index maps the primary key columns of a table to the physical location of the table rows. Furthermore, if no other index is available, the primary index will be used. In other words, there is no pure table scan mechanism; a table scan is equivalent to a scan via the primary-key index. When it comes to indexes and queries, the query processor must answer two questions:

Is an index applicable to a query? That is, will accessing the table via this index be more efficient than doing a full table scan (via the primary index)?
Among the applicable indexes, which index or combination of indexes is the best to use?

There are no statistics on the number and distribution of values in a table column. As a result, the query processor has to rely on some simple heuristics in choosing among the applicable indexes. In addition, SQL for Oracle NoSQL Database allows for the inclusion of index hints in the queries. You can use index hints to force the use of a particular index in queries. You can use a query execution plan to understand what indexes are being used in the query. For more information on how a query is executed, see Query execution plan.

Secondary Index

There will be cases where you will want to use a secondary index to support some of your read requirements. Each secondary index that you add to a table will incur some overhead for writes as each index will need to be maintained. The good news with Oracle NoSQL is that secondary index partitions live on the same shard as the primary data, so the updates to the secondary index are limited on a per-shard basis. Index updates in Oracle NoSQL are also atomic, so your application can be guaranteed that updates to records in the shard are consistent with updates to the secondary index and these structures will never be out of sync. Another factor for consideration is that Oracle NoSQL Database nodes will keep the non-leaf index nodes in the cache, and will never cache the leaf portion (i.e. the data record). This gives the indexed scan an enormous performance benefit (for systems using spinning disk) over the non-indexed scan.

There are several things that you should think about when deciding on using a secondary index in Oracle NoSQL Database:

Filtering data close to the source – In Oracle NoSQL Database, secondary indexes are the primary mechanism for you to utilize when your query needs a filter and that filter needs to be executed as close as possible to the data. To fully understand why you may need a secondary index to filter your data for querying, let’s consider your options for scanning the data in a table:
- Unordered parallel table scan with no full shard key – The shard key is a table column or multiple columns used to control how the rows of that table are distributed. The main purpose of shard keys is to distribute data across the Oracle NoSQL Database Cloud cluster for scalability, and to position records that share the same shard key locally for easy reference and access. When you write a query using filters as columns that are part of the shard key but also include other columns, then you end up doing a parallel table scan. Each shard is scanned in parallel and the data is returned to your application. This will return every record in the table across all shards in the NoSQL Database.
- Ordered or unordered parallel index scan – The B-tree index at each shard is scanned in parallel. If an ordered scan is requested, the results are merged and presented.
Each option for scanning a table has its own costs and benefits and you should carefully weigh these tradeoffs and use what you know about the application requirements and expected workload to help guide your modeling decision.
- Efficient range scans – Will it be common for your queries to restrict the value ranges? For example, if your application needs to answer queries like “find all records between a range of dates” then using secondary indexes in Oracle NoSQL Database will be the easiest and most efficient way for your application to answer these types of queries.
- Workload and index maintenance update – Is it acceptable for writes to incur some extra overhead for index maintenance? Does your workload exhibit heavy read activity where latency for reads is more important than incurring extra write overhead?

See Tuning and Optimizing SQL queries for more guidelines on using indexes in queries.