Querying Data In a Cache

22 Querying Data In a Cache

You can perform queries and use indexes to retrieve data in a cache that matches certain criteria.Queries and indexes can be simple, employing filters packaged with Coherence, or they can be run against multi-value attributes such as collections and arrays.

This chapter includes the following sections:

Query Overview
Coherence provides the ability to search for cache entries that meet a given set of criteria.
Query Concepts
The concept of querying is based on the ValueExtractor interface.
Performing Queries
Coherence includes many pre-built filters located in the com.tangosol.util.filter package.
Efficient Processing of Filter Results
You can query large data sets in batches to guard against running out of heap space.
Using Query Indexes
Query indexes allow values (or attributes of those values) and corresponding keys to be correlated within a QueryMap to increase query performance.
Performing Batch Queries
In order to preserve memory on the client issuing a query, there are various techniques that can retrieve query results in batches.
Performing Queries on Multi-Value Attributes
Coherence supports indexing and querying of multi-value attributes including collections and arrays.
Using Chained Extractors
The ChainedExtractor implementation allows chained invocation of zero-argument (accessor) methods.
Options to Skip Query Result Consistency Check
By default, Coherence will ensure query result with entries that match the provided filter. However, the consistency check can result in repeat query re-evaluations if the targeted partitions are modified concurrently.
Evaluating Query Cost and Effectiveness
You can create query explain plan records and query trace records in order to view the estimated cost and actual effectiveness of each filter in a query, respectively.

Parent topic: Performing Data Grid Operations

Query Overview

Coherence provides the ability to search for cache entries that meet a given set of criteria. The result set may be sorted if desired. Queries are evaluated with Read Committed isolation.

Queries currently apply only to cached data (and do not use the CacheLoader interface to retrieve additional data that may satisfy the query). Thus, the data set should be loaded entirely into cache before queries are performed. In cases where the data set is too large to fit into available memory, it may be possible to restrict the cache contents along a specific dimension (for example, "date") and manually switch between cache queries and database queries based on the structure of the query. For maintainability, this is usually best implemented inside a cache-aware data access object (DAO).

Indexing requires the ability to extract attributes on each Partitioned cache node; for dedicated cache server instances, this implies (usually) that application classes must be installed in the cache server's classpath.

For Local and Replicated caches, queries can be evaluated locally against unindexed or indexed data. For Partitioned caches, queries are typically performed in parallel across the cluster and use indexes. Access to unindexed attributes requires object deserialization (though indexing on other attributes can reduce the number of objects that must be evaluated). Lastly, Coherence includes a Cost-Based Optimizer (CBO) and also provides support for trace and explain reports that help ensure the efficiency of a query.

Note:

All classes that implement the Filter interface must explicitly implement the hashCode() and equals() methods in a way that is based solely on the object's serializable state. This is particularly important when using filters in ObservableMap.addMapListener and other similar places where the filter gets serialized, transported over the network, and eventually used as a map key. The same is valid for ValueExtractor implementations.

Parent topic: Querying Data In a Cache

Query Concepts

The concept of querying is based on the ValueExtractor interface.A value extractor is used to extract an attribute from a given object for querying (and similarly, indexing). Most developers need only the ReflectionExtractor implementation of this interface. The implementation uses reflection to extract an attribute from a value object by referring to a method name which is typically a getter method. For example:

ValueExtractor extractor = new ReflectionExtractor("getName");

Any void argument method can be used, including Object methods like toString() (useful for prototype/debugging). Indexes may be either traditional field indexes (indexing fields of objects) or functional-based indexes (indexing virtual object attributes). For example, if a class has field accessors getFirstName and getLastName, the class may define a function getFullName which concatenates those names, and this function may be indexed. See Using Query Indexes.

To query a cache that contains objects with getName attributes, a Filter must be used. A filter has a single method which determines whether a given object meets a criterion.

Filter filter = new EqualsFilter(extractor, "Bob Smith");

Note that the filters also have convenience constructors that accept a method name and internally construct a ReflectionExtractor:

Filter filter = new EqualsFilter("getName", "Bob Smith");

The following example shows a routine to select the entries of a cache that satisfy a particular filter:

for (Iterator iter = cache.entrySet(filter).iterator(); iter.hasNext(); )
    {
    Map.Entry entry = (Map.Entry)iter.next();
    Integer key = (Integer)entry.getKey();
    Person person = (Person)entry.getValue();
    System.out.println("key=" + key + " person=" + person);
    }

The following example uses a filter to select and sort cache entries:

// entrySet(Filter filter, Comparator comparator) 
Iterator iter = cache.entrySet(filter, null).iterator();

The additional null argument specifies that the result set should be sorted using the "natural ordering" of Comparable objects within the cache. The client may explicitly specify the ordering of the result set by providing an implementation of Comparator. Note that sorting places significant restrictions on the optimizations that Coherence can apply, as sorting requires that the entire result set be available before sorting.

Parent topic: Querying Data In a Cache

Performing Queries

Coherence includes many pre-built filters located in the com.tangosol.util.filter package.Example 22-1 demonstrates how to create a query and uses the GreaterEqualsFilter filter.

Example 22-1 Querying the Cache with a Filter

Filter filter = new GreaterEqualsFilter("getAge", 18);

for (Iterator iter = cache.entrySet(filter).iterator(); iter.hasNext(); )
    {
    Map.Entry entry = (Map.Entry) iter.next();
    Integer key = (Integer) entry.getKey();
    Person person = (Person) entry.getValue();
    System.out.println("key=" + key + " person=" + person);
    }

Note:

Although queries can be executed through a near cache, the query does not use the front portion of a near cache. If using a near cache with queries, the best approach is to use the following sequence:

Set setKeys = cache.key set(filter);
Map mapResult = cache.getAll(setKeys);

Parent topic: Querying Data In a Cache

Efficient Processing of Filter Results

You can query large data sets in batches to guard against running out of heap space.

Example 22-2 illustrates a pattern to process query results when using large data sets. In this example, all keys for entries that match the filter are returned, but only BUFFER_SIZE (in this case, 100) entries are retrieved from the cache at a time.

Note:

The LimitFilter API can process results in parts, similar to the example below. However LimitFilter is meant for scenarios where the results are paged, such as in a user interface. It is not an efficient means to process all data in a query result.

Example 22-2 Processing Query Results in Batches

public static void performQuery()
    {
    NamedCache c = CacheFactory.getCache("test");

    // Search for entries that start with 'c'
    Filter query = new LikeFilter(IdentityExtractor.INSTANCE, "c%", '\\', true);

    // Perform query, return keys of entries that match
    Set keys = c.keySet(query);

    // The amount of objects to process at a time
    final int BUFFER_SIZE = 100;

    // Object buffer
    Set buffer = new HashSet(BUFFER_SIZE);

    for (Iterator i = keys.iterator(); i.hasNext();)
        {
        buffer.add(i.next());

        if (buffer.size() >= BUFFER_SIZE)
            {
            // Bulk load BUFFER_SIZE number of objects from cache
            Map entries = c.getAll(buffer);

            // Process each entry
            process(entries);

            // Done processing these keys, clear buffer
            buffer.clear();
            }
        }
        // Handle the last partial chunk (if any)
        if (!buffer.isEmpty())
            {
            process(c.getAll(buffer));
            }
    }

public static void process(Map map)
    {
    for (Iterator ie = map.entrySet().iterator(); ie.hasNext();)
        {

        Map.Entry e = (Map.Entry) ie.next();
        out("key: "+e.getKey() + ", value: "+e.getValue());
        }
    }

Parent topic: Querying Data In a Cache

Using Query Indexes

Query indexes allow values (or attributes of those values) and corresponding keys to be correlated within a QueryMap to increase query performance.

This section includes the following topics:

Parent topic: Querying Data In a Cache

Creating an Index

The addIndex method of the QueryMap class is used to create indexes. Any attribute able to be queried may be indexed using this method. The method includes three parameters:

addIndex(ValueExtractor extractor, boolean fOrdered, Comparator comparator)

Example 22-3 demonstrates how to create an index:

Example 22-3 Sample Code to Create an Index

NamedCache cache = CacheFactory.getCache("MyCache");
ValueExtractor extractor = new ReflectionExtractor("getAttribute");
cache.addIndex(extractor, true, null);

The fOrdered argument specifies whether the index structure is sorted. Sorted indexes are useful for range queries, such as "select all entries that fall between two dates" or "select all employees whose family name begins with 'S'". For "equality" queries, an unordered index may be used, which may have better efficiency in terms of space and time.

The comparator argument can provide a custom java.util.Comparator for ordering the index.

The addIndex method is only intended as a hint to the cache implementation and, as such, it may be ignored by the cache if indexes are not supported or if the desired index (or a similar index) exists. It is expected that an application calls this method to suggest an index even if the index may exist, just so that the application is certain that index has been suggested. For example in a distributed environment, each server likely suggests the same set of indexes when it starts, and there is no downside to the application blindly requesting those indexes regardless of whether another server has requested the same indexes.

Note that queries can be combined by Coherence if necessary, and also that Coherence includes a cost-based optimizer (CBO) to prioritize the usage of indexes. To take advantage of an index, queries must use extractors that are equal ((Object.equals()) to the one used in the query.

A list of applied indexes can be retrieved from the StorageManagerMBean. See StorageManagerMBean in Managing Oracle Coherence.

Parent topic: Using Query Indexes

Creating User-Defined Indexes

Applications can choose to create user-defined indexes to control which entries are added to the index. User-defined indexes are typically used to reduce the memory and processing overhead required to maintain an index. To create a user-defined index, an application must implement the MapIndex interface and the IndexAwareExtractor interfaces. This section also describes the ConditionalIndex and ConditionalExtractor classes which provide an implementation of the interfaces to create a conditional index that uses an associated filter to evaluate whether an entry should be indexed.

This section includes the following topics:

Parent topic: Using Query Indexes

Implementing the MapIndex Interface

The MapIndex interface is used to correlate values stored in an indexed Map (or attributes of those values) to the corresponding keys in the indexed Map. Applications implement this interface to supply a custom index.

The following example implementation defines an index that only adds entries with non-null values. This would be useful in the case where there is a cache with a large number of entries and only a small subset have meaningful, non-null, values.

public class CustomMapIndex implements MapIndex
   {
   public void insert(Map.Entry entry)
       {
       if (entry.getValue()!= null)
           {
           ...
           }
       }
   ...
   }

In the above example, the value of the entry is checked for null before extraction, but it could be done after. If the value of the entry is null then nothing is inserted into the index. A similar check for null would also be required for the MapIndex update method. The rest of the MapIndex methods must be implemented appropriately as well.

Parent topic: Creating User-Defined Indexes

Implementing the IndexAwareExtractor Interface

The IndexAwareExtractor interface is an extension to the ValueExtractor interface that supports the creation and destruction of a MapIndex index. Instances of this interface are intended to be used with the QueryMap API to support the creation of custom indexes. The following example demonstrates how to implement this interface and is for the example CustomMapIndex class that was created above:

public class CustomIndexAwareExtractor
        implements IndexAwareExtractor, ExternalizableLite, PortableObject
    {
    public CustomIndexAwareExtractor(ValueExtractor extractor)
        {
        m_extractor = extractor;
        }
 
    public MapIndex createIndex(boolean fOrdered, Comparator comparator,
            Map mapIndex)
        {
        ValueExtractor extractor = m_extractor;
        MapIndex       index     = (MapIndex) mapIndex.get(extractor);
 
        if (index != null)
            {
            throw new IllegalArgumentException(
                    "Repetitive addIndex call for " + this);
            }
 
        index = new CustomMapIndex(extractor, fOrdered, comparator);
        mapIndex.put(extractor, index);
        return index;
        }
 
    public MapIndex destroyIndex(Map mapIndex)
        {
        return (MapIndex) mapIndex.remove(m_extractor);
        }
    ...
    }

In the above example, an underlying extractor is actually used to create the index and ultimately extracts the values from the cache entries. The IndexAwareExtractor implementation is used to manage the creation and destruction of a custom MapIndex implementation while preserving the existing QueryMap interfaces.

The IndexAwareExtractor is passed into the QueryMap.addIndex and QueryMap.removeIndex calls. Coherence, in turn, calls createIndex and destroyIndex on the IndexAwareExtractor. Also note that it is the responsibility of the IndexAwareExtractor to maintain the Map of extractor-to-index associations that is passed into createIndex and destroyIndex.

Parent topic: Creating User-Defined Indexes

Using a Conditional Index

A conditional index is a custom index that implements both the MapIndex and IndexAwareExtractor interfaces as described above and uses an associated filter to evaluate whether an entry should be indexed. An entry's extracted value is only added to the index if the filter evaluates to true. The implemented classes are ConditionalIndex and ConditionalExtractor, respectively.

The ConditionalIndex is created by a ConditionalExtractor. The filter and extractor used by the ConditionalIndex are set on the ConditionalExtractor and passed to the ConditionalIndex constructor during the QueryMap.addIndex call.

The ConditionalExtractor is an IndexAwareExtractor implementation that is only used to create a ConditionalIndex. The underlying ValueExtractor is used for value extraction during index creation and is the extractor that is associated with the created ConditionalIndex in the given index map. Using the ConditionalExtractor to extract values in not supported. For example:

ValueExtractor extractor = new ReflectionExtractor("getLastName");
Filter filter = new NotEqualsFilter("getId", null);
ValueExtractor condExtractor = new ConditionalExtractor(filter, extractor, true);
 
// add the conditional index which should only contain the last name values for the
// entries with non-null Ids
cache.addIndex(condExtractor, true, null);

Parent topic: Creating User-Defined Indexes

Performing Batch Queries

In order to preserve memory on the client issuing a query, there are various techniques that can retrieve query results in batches.

Using the key set form of the queries – combined with getAll() – reduces memory consumption since the entire entry set is not deserialized on the client simultaneously. It also takes advantage of near caching. For example:

// key set(Filter filter)
Set setKeys = cache.keySet(filter);
Set setPageKeys = new HashSet();
int PAGE_SIZE = 100;
for (Iterator iter = setKeys.iterator(); iter.hasNext();)
    {
    setPageKeys.add(iter.next());
    if (setPageKeys.size() == PAGE_SIZE || !iter.hasNext())
        {
        // get a block of values
        Map mapResult = cache.getAll(setPageKeys);

        // process the block
        // ...

        setPageKeys.clear();
        }
    }

A LimitFilter may be used to limit the amount of data sent to the client, and also to provide paging. The use of LimitFilter has two assumptions for it to function correctly:

There are no concurrent modifications to the data set in question.
Data is evenly distributed across all storage nodes of the cluster, such that each node has a fair sample of the entire set of data.

Note:

In the case of redistribution, data is not evenly distributed across all storage nodes and results in the wrong result set for any incoming query.

int pageSize = 25;
Filter filter = new GreaterEqualsFilter("getAge", 18);
// get entries 1-25
Filter limitFilter = new LimitFilter(filter, pageSize);
Set entries = cache.entrySet(limitFilter);

// get entries 26-50
limitFilter.nextPage();
entries = cache.entrySet(limitFilter);

When using a distributed/partitioned cache, queries can be targeted to partitions and cache servers using a PartitionedFilter. This is the most efficient way of batching query results as each query request is targeted to a single cache server, thus reducing the number of servers that must respond to a request and making the most efficient use of the network.

Note:

Use of PartitionedFilter is limited to cluster members; it cannot be used by Coherence*Extend clients. Coherence*Extend clients may use the two techniques described above, or these queries can be implemented as an Invocable and executed remotely by a Coherence*Extend client.

To execute a query partition by partition:

DistributedCacheService service =
   (DistributedCacheService) cache.getCacheService();
int cPartitions = service.getPartitionCount();
 
PartitionSet parts = new PartitionSet(cPartitions);
for (int iPartition = 0; iPartition < cPartitions; iPartition++)
    {
    parts.add(iPartition);
    Filter filterPart = new PartitionedFilter(filter, parts);
    Set setEntriesPart = cache.entrySet(filterPart);
 
    // process the entries ...
    parts.remove(iPartition);
    }

Queries can also be executed on a server by server basis:

DistributedCacheService service =
   (DistributedCacheService) cache.getCacheService();
int cPartitions = service.getPartitionCount();
 
PartitionSet partsProcessed = new PartitionSet(cPartitions);
for (Iterator iter = service.getStorageEnabledMembers().iterator();
        iter.hasNext();)
    {
    Member member = (Member) iter.next();
    PartitionSet partsMember = service.getOwnedPartitions(member);
 
    // due to a redistribution some partitions may have been processed
    partsMember.remove(partsProcessed);
    Filter filterPart = new PartitionedFilter(filter, partsMember);
    Set setEntriesPart = cache.entrySet(filterPart);
 
    // process the entries ...
    partsProcessed.add(partsMember);
    }
 
// due to a possible redistribution, some partitions may have been skipped
if (!partsProcessed.isFull())
    {
    partsProcessed.invert();
    Filter filter = new PartitionedFilter(filter, partsProcessed);
 
    // process the remaining entries ...
    }

Parent topic: Querying Data In a Cache

Performing Queries on Multi-Value Attributes

Coherence supports indexing and querying of multi-value attributes including collections and arrays.When an object is indexed, Coherence verifies if it is a multi-value type, and then indexes it as a collection rather than a singleton. The ContainsAllFilter, ContainsAnyFilter and ContainsFilter are used to query against these collections.

Set searchTerms = new HashSet();
searchTerms.add("java");
searchTerms.add("clustering");
searchTerms.add("books");

// The cache contains instances of a class "Document" which has a method
// "getWords" which returns a Collection<String> containing the set of
// words that appear in the document.
Filter filter = new ContainsAllFilter("getWords", searchTerms);

Set entrySet = cache.entrySet(filter);

// iterate through the search results
// ...

Parent topic: Querying Data In a Cache

Using Chained Extractors

The ChainedExtractor implementation allows chained invocation of zero-argument (accessor) methods.In the following example, the extractor first uses reflection to call getName() on each cached Person object, and then uses reflection to call the length method on the returned String.

ValueExtractor extractor = new ChainedExtractor("getName.length");

This extractor could be passed into a query, allowing queries (for example) to select all people with names not exceeding 10 letters. Method invocations may be chained indefinitely, for example getName.trim.length.

POF extractors and POF updaters offer the same functionality as ChainedExtractors through the use of the SimplePofPath class. See Using POF Extractors and POF Updaters.

Parent topic: Querying Data In a Cache

Options to Skip Query Result Consistency Check

By default, Coherence will ensure query result with entries that match the provided filter. However, the consistency check can result in repeat query re-evaluations if the targeted partitions are modified concurrently.

The following options allow you to decide if Coherence should relax the consistency check for a faster response:

At aggregator level, by Overriding the characteristics() of the individual aggregator.
For example:
```
@Override
public int characteristics()
     {
     return super.characteristics() | ALLOW_INCONSISTENCIES;
     }
```
At JVM level via a JVM argument:
Pass system property -Dcoherence.query.retry = 0.

For queries with the aggregate method, you can use both these options for a quick response. For queries using filters (for example, entrySet), you can use the JVM option to skip query result re-evaluation.

Parent topic: Querying Data In a Cache

Evaluating Query Cost and Effectiveness

You can create query explain plan records and query trace records in order to view the estimated cost and actual effectiveness of each filter in a query, respectively.The records are used to evaluate how Coherence is running the query and to determine why a query is performing poorly or how it can be modified in order to perform better. See also StorageManagerMBean in Managing Oracle Coherence for details on viewing query-based statistics.

This section includes the following topics:

Parent topic: Querying Data In a Cache

Creating Query Records

The com.tangosol.util.aggregator.QueryRecorder class produces an explain or trace record for a given filter. The class is an implementation of a parallel aggregator that is capable querying all nodes in a cluster and aggregating the results. The class supports two record types: an EXPLAIN record for showing the estimated cost for the filters in a query, and a TRACE record for showing the actual effectiveness of each filter in a query.

To create a query record, create a new QueryRecorder instance that specifies a RecordType parameter. Include the instance and the filter to be tested as parameters of the aggregate method. The following example creates an explain record:

NamedCache cache = CacheFactory.getCache("mycache");
cache.addIndex(new ReflectionExtractor("getAge"), true, null);

AllFilter filter = new AllFilter(new Filter[]
   {
    new OrFilter(
       new EqualsFilter(new ReflectionExtractor("getAge"), 16),
       new EqualsFilter(new ReflectionExtractor("getAge"), 19)),
    new EqualsFilter(new ReflectionExtractor("getLastName"), "Smith"),
    new EqualsFilter(new ReflectionExtractor("getFirstName"), "Bob"),
    });
 
QueryRecorder agent = new QueryRecorder(RecordType.EXPLAIN);
Object resultsExplain = cache.aggregate(filter, agent);
 
System.out.println("\n" + resultsExplain + "\n");

To create a trace record, change the RecordType parameter to TRACE:

QueryRecorder agent = new QueryRecorder(RecordType.TRACE);

Parent topic: Evaluating Query Cost and Effectiveness

Interpreting Query Records

Query records are used to evaluate the filters and indexes that make up a query. Explain plan records are used to evaluate the estimated cost associated with applying a filter. Trace records are used to evaluate how effective a filter is at reducing a key set.

This section provides a sample explain plan record and a sample trace record and discuss how to read and interpret the record. The records are based on an example query of 1500 entries that were located on a cluster of 4 storage-enabled nodes. The query consists of a filter that finds any people that are either age 16 or 19 with the first name Bob and the last name Smith. Lastly, and index is added for getAge. See Running The Query Record Example.

NamedCache cache = CacheFactory.getCache("mycache");
cache.addIndex(new ReflectionExtractor("getAge"), true, null);

AllFilter filter = new AllFilter(new Filter[]
   {
    new OrFilter(
       new EqualsFilter(new ReflectionExtractor("getAge"), 16),
       new EqualsFilter(new ReflectionExtractor("getAge"), 19)),
    new EqualsFilter(new ReflectionExtractor("getLastName"), "Smith"),
    new EqualsFilter(new ReflectionExtractor("getFirstName"), "Bob"),
    });

This section includes the following topics:

Parent topic: Evaluating Query Cost and Effectiveness

Query Explain Plan Record

A query explain record provides the estimated cost of evaluating a filter as part of a query operation. The cost takes into account whether or not an index can be used by a filter. The cost evaluation is used to determine the order in which filters are applied when a query is performed. Filters that use an index have the lowest cost and get applied first.

Example 22-4 shows a typical query explain plan record. The record includes an Explain Plain table for evaluating each filter in the query and a Index Lookups table that lists each index that can be used by the filter. The columns are described as follows:

Name – This column shows the name of each filter in the query. Composite filters show information for each of the filters within the composite filter.
Index – This column shows whether or not an index can be used with the given filter. If an index is found, the number shown corresponds to the index number on the Index Lookups table. In the example, an ordered simple map index (0) was found for getAge().
Cost – This column shows an estimated cost of applying the filter. If an index can be used, the cost is given as 1. The value of 1 is used since the operation of applying the index requires just a single access to the index content. In the example, there are 4 storage-enabled cluster members and thus the cost reflects accessing the index on all four members. If no index exists, the cost is calculated as EVAL_COST * number of keys. The EVAL_COST value is a constant value and is 1000. This is intended to show the relative cost of doing a full scan to reduce the key set using the filter. In the example, there are 1500 cache entries which need to be evaluated. Querying indexed entries is always relatively inexpensive as compared to non-indexed entries but does not necessarily guarantee effectiveness.

The record in Example 22-4 shows that the equal filter for getAge() has a low cost because it has an associated index and would be applied before getLastName() and getFirstName(). However, the getAge() filter, while inexpensive, may not be very effective if all entries were either 16 and 19 and only few entries matched Bob and Smith. In this case, it is more effective to add an index for getLastName() and getFirstName(). Moreover, the cost (mainly memory consumption) associated with creating an index is wasted if the index does a poor job of reducing the key set.

Example 22-4 Sample Query Explain Plan Record

Explain Plan
Name                                  Index        Cost      
==================================================================================
com.tangosol.util.filter.AllFilter  | ----       | 0         
  com.tangosol.util.filter.OrFilter | ----       | 0         
    EqualsFilter(.getAge(), 16)     | 0          | 4         
    EqualsFilter(.getAge(), 19)     | 0          | 4         
  EqualsFilter(.getLastName(), Smit | 1          | 1500000   
  EqualsFilter(.getFirstName(), Bob | 2          | 1500000   
 
 
Index Lookups
Index  Description                               Extractor             Ordered   
==================================================================================
0      SimpleMapIndex: Extractor=.getAge(), Ord  .getAge()             true
1      No index found                            .getLastName()        false
2      No index found                            .getFirstName()       false

Parent topic: Interpreting Query Records

Query Trace Record

A query trace record provides the actual cost of evaluating a filter as part of a query operation. The cost takes into account whether or not an index can be used by a filter. The query is actually performed and the effectiveness of each filter at reducing the key set is shown.

Example 22-5 shows a typical query trace record. The record includes a Trace table that shows the effectiveness of each filter in the query and an Index Lookups table that lists each index that can be used by the filter. The columns are described as follows:

Name – This column shows the name of each filter in the query. Composite filters show information for each of the filters within the composite filter.
Index – This column shows whether or not an index can be used with the given filter. If an index is found, the number shown corresponds to the index number on the Index Lookups table. In the example, an ordered simple map index (0) was found for getAge().
Effectiveness – This column shows the amount a key set was actually reduced as a result of each filter. The value is given as prefilter_key_set_size | postfilter_key_set_size and is also presented as a percentage. The prefilter_key_set_size value represents the key set size prior to evaluating the filter or applying an index. The postfilter_key_set_size value represents the size of the key set remaining after evaluating the filter or applying an index. For a composite filter entry, the value is the overall results for its contained filters. Once a key set size can no longer be reduced based on an index, the resulting key set is deserialized and any non index filters are applied.
Duration – This column shows the number of milliseconds spent evaluating the filter or applying an index. A value of 0 indicates that the time registered was below the reporting threshold. In the example, the 63 milliseconds is the result of having to deserialize the key set which is incurred on the first filter getLastName() only.

The record in Example 22-5 shows that it took approximately 63 milliseconds to reduce 1500 entries to find 100 entries with the first name Bob, last name Smith, and with an age of 16 or 19. The key set of 1500 entries was initially reduced to 300 using the index for getAge(). The resulting 300 entries (because they could not be further reduced using an index) were then deserialized and reduced to 150 entries based on getLastName() and then reduced to 100 using getFirstName(). The example shows that an index on getAge() is well worth the resources because it was able to effectively reduce the key set by 1200 entries. An index on getLastName and getFirstName would increase the performance of the overall query but may not be worth the additional resource required to create the index.

Example 22-5 Sample Query Trace Record

Trace
Name                                  Index        Effectiveness          Duration  
==================================================================================
com.tangosol.util.filter.AllFilter  | ----       | 1500|300(80%)        | 0         
  com.tangosol.util.filter.OrFilter | ----       | 1500|300(80%)        | 0         
    EqualsFilter(.getAge(), 16)     | 0          | 1500|150(90%)        | 0         
    EqualsFilter(.getAge(), 19)     | 0          | 1350|150(88%)        | 0         
  EqualsFilter(.getLastName(), Smit | 1          | 300|300(0%)          | 0         
  EqualsFilter(.getFirstName(), Bob | 2          | 300|300(0%)          | 0         
com.tangosol.util.filter.AllFilter  | ----       | 300|100(66%)         | 63        
  EqualsFilter(.getLastName(), Smit | ----       | 300|150(50%)         | 63        
  EqualsFilter(.getFirstName(), Bob | ----       | 150|100(33%)         | 0         
 
 
Index Lookups
Index  Description                               Extractor             Ordered   
==================================================================================
0      SimpleMapIndex: Extractor=.getAge(), Ord  .getAge()             true
1      No index found                            .getLastName()        false
2      No index found                            .getFirstName()       false

Parent topic: Interpreting Query Records

Running The Query Record Example

The following example is a simple class that demonstrates creating query records. The class loads a distributed cache (mycache) with 1500 Person objects, creates an index on an attribute, performs a query, and creates both a query explain plan record and a query trace record that is emitted to the console before the class exits.

Example 22-6 A Query Record Example

import com.tangosol.net.CacheFactory;
import com.tangosol.net.NamedCache;
import com.tangosol.util.Filter;
import com.tangosol.util.aggregator.QueryRecorder;
import static com.tangosol.util.aggregator.QueryRecorder.RecordType;
import com.tangosol.util.extractor.ReflectionExtractor;
import com.tangosol.util.filter.AllFilter;
import com.tangosol.util.filter.EqualsFilter;
import com.tangosol.util.filter.OrFilter;
import java.io.Serializable;
import java.util.Properties;
 
public class QueryRecordExanple
   {
   public static void main(String[] args) {
   
      testExplain();
      testTrace();
   }
   
   public static void testExplain()
   {
      NamedCache cache = CacheFactory.getCache("mycache");
      cache.addIndex(new ReflectionExtractor("getAge"), true, null);
      PopulateCache(cache);
        
      AllFilter filter = new AllFilter(new Filter[]
      {
         new OrFilter(
            new EqualsFilter(new ReflectionExtractor("getAge"), 16),
            new EqualsFilter(new ReflectionExtractor("getAge"), 19)),
         new EqualsFilter(new ReflectionExtractor("getLastName"), "Smith"),
         new EqualsFilter(new ReflectionExtractor("getFirstName"), "Bob"),
      });
 
      QueryRecorder agent = new QueryRecorder(RecordType.EXPLAIN);
      Object resultsExplain = cache.aggregate(filter, agent);
      System.out.println("\nExplain Plan=\n" + resultsExplain + "\n");
   } 

   public static void testTrace()
   {
      NamedCache cache = CacheFactory.getCache("hello-example");
      cache.addIndex(new ReflectionExtractor("getAge"), true, null);
      PopulateCache(cache);
        
      AllFilter filter = new AllFilter(new Filter[]
      {
         new OrFilter(
            new EqualsFilter(new ReflectionExtractor("getAge"), 16),
            new EqualsFilter(new ReflectionExtractor("getAge"), 19)),
         new EqualsFilter(new ReflectionExtractor("getLastName"), "Smith"),
         new EqualsFilter(new ReflectionExtractor("getFirstName"), "Bob"),
      });

      QueryRecorder agent = new QueryRecorder(RecordType.TRACE);
      Object resultsExplain = cache.aggregate(filter, agent);
      System.out.println("\nTrace =\n" + resultsExplain + "\n");
   }
    
   private static void PopulateCache(NamedCache cache)
      {
         for (int i = 0; i < 1500; ++i)
            {
               Person person = new Person(i % 3 == 0 ? "Joe" : "Bob",
                  i % 2 == 0 ? "Smith" : "Jones", 15 + i % 10);
               cache.put("key" + i, person);
            }
      } 
 
   public static class Person implements Serializable
      {
      public Person(String sFirstName, String sLastName, int nAge)
         {
            m_sFirstName = sFirstName;
            m_sLastName = sLastName;
            m_nAge = nAge;
         }
 
      public String getFirstName()
         {
            return m_sFirstName;
         }
 
      public String getLastName()
         {
            return m_sLastName;
         }
       public int getAge()
         {
            return m_nAge;
         }

      public String toString()
         {
            return "Person( " +m_sFirstName + " " + m_sLastName + " : " + 

               m_nAge + ")";
         }
      private String m_sFirstName;
      private String m_sLastName;
      private int m_nAge;
   } 
}

Parent topic: Evaluating Query Cost and Effectiveness