Parallel Scans

Until now the reads that we have discussed in this chapter are single-threaded. Reads are performed one shard at a time, in sequence, until all the desired records are retrieved. This has obvious performance implications if you are retrieving a large number of records that span multiple shards. However, you can speed up the read performance by using parallel scans.

That is, suppose you have a keyspace that looks like this:

/trades/<timestamp>/<symbol>/-/: <price>;<qty>

If you want to locate all trades for ORCL which are more than 10k shares, you would have to scan all the records under /trades (this part could be done with a key prefix restriction) and examine each record. You would use the storeIterator() call to perform this search. The single-threaded storeIterator() retrieves records from each shard consecutively (that is, all records from shard 1, then all from shard 2, etc.).

Parallel Scan retrieves the records from each shard in parallel and allows the client to receive and process them in parallel. You can specify how many threads to use to perform the retrieval. If more threads are specified on the client side, then the user can expect better retrieval performance — until processor or network resources are saturated.

To specify that a parallel scan is to be performed, you use StoreIteratorConfig to identify the maximum number of client-side threads to be used for the scan. You can also set the number of results per request, and the maximum number of result batches that the Oracle NoSQL Database client can hold before the scan pauses. You provide this to StoreIteratorConfig, and then pass that instance to the overloaded form of KVStore.storeIterator() which accepts it. This creates a ParallelScanIterator. instance which you use to perform the parallel scan.

For example, to retrieve all of the records in the store using 5 threads in parallel, you would do this:

package kvstore.basicExample;

...

import oracle.kv.Consistency;
import oracle.kv.Direction;
import oracle.kv.ParallelScanIterator;
import oracle.kv.StoreIteratorConfig;

...
    /*
     * 

    /* 
     * Use multi-threading for this store iteration and limit the number 
     * of threads (degree of parallelism) to 5. 
     */
    final StoreIteratorConfig sc = new StoreIteratorConfig().
        setMaxConcurrentRequests(5);
    ParallelScanIterator<KeyValueVersion> iter = kvstore.storeIterator
        (Direction.UNORDERED, 
         0,
         null /* parentKey */,
         null /* subRange */,
         null /* Depth */,
         Consistency.NONE,
         0 /* timeout */,
         null /* timeoutUnit */,
         sc, /* New Arg: StoreIteratorConfig */);

    try {
        while (iter.hasNext()) {
            KeyValueVersion kvv = iter.next();
            ...
        }
    } finally {
        if (iter != null) {
            iter.close();
        }
    }