5 Data Affinity

Data affinity describes the concept of ensuring that a group of related cache entries is contained within a single cache partition. This ensures that all relevant data is managed on a single primary cache node (without compromising fault-tolerance).

Affinity may span multiple caches (if they are managed by the same cache service, which will generally be the case). For example, in a master-detail pattern such as an "Order-LineItem", the Order object may be co-located with the entire collection of LineItem objects that are associated with it.

The benefit is two-fold. First, only a single cache node is required to manage queries and transactions against a set of related items. Second, all concurrency operations can be managed locally, avoiding the need for clustered synchronization.

Several standard Coherence operations can benefit from affinity, including cache queries, InvocableMap operations and the getAll, putAll, and removeAll methods.

Note:

Data affinity is specified in terms of entry keys (not values). As a result, the association information must be present in the key class. Similarly, the association logic applies to the key class, not the value class.

5.1 Specifying Affinity

Affinity is specified in terms of a relationship to a partitioned key. In the Order-LineItem example above, the Order objects would be partitioned normally, and the LineItem objects would be associated with the appropriate Order object.

The association does not need to be directly tied to the actual parent key - it only must be a functional mapping of the parent key. It could be a single field of the parent key (even if it is non-unique), or an integer hash of the parent key. All that matters is that all child keys return the same associated key; it does not matter whether the associated key is an actual key (it is simply a "group id"). This fact may help minimize the size impact on the child key classes that don't already contain the parent key information (as it is derived data, the size of the data may be decided explicitly, and it also will not affect the behavior of the key). Note that making the association too general (having too many keys associated with the same "group id") can cause a "lumpy" distribution (if all child keys return the same association key regardless of what the parent key is, the child keys will all be assigned to a single partition, and will not be spread across the cluster).

There are two ways to ensure that a set of cache entries are co-located. Note that association is based on the cache key, not the value (otherwise updating a cache entry could cause it to change partitions). Also, note that while the Order will be co-located with the child LineItems, Coherence does not currently support composite operations that span multiple caches (for example, updating the Order and the collection of LineItems within a single invocation request com.tangosol.util.InvocableMap.EntryProcessor).

5.2 Specifying Data Affinity with a KeyAssociation

For application-defined keys, the class (of the cache key) may implement com.tangosol.net.cache.KeyAssociation as follows:

Example 5-1 Creating a Key Association

import com.tangosol.net.cache.KeyAssociation;

public class LineItemId implements KeyAssociation
   {
   // {...}

   public Object getAssociatedKey()
       {
       return getOrderId();
       }

   // {...}
   }

5.3 Specifying Data Affinity with a KeyAssociator

Applications may also provide a custom KeyAssociator:

Example 5-2 A Custom KeyAssociator

import com.tangosol.net.partition.KeyAssociator;

public class LineItemAssociator implements KeyAssociator
    {
    public Object getAssociatedKey(Object oKey)
        {
        if (oKey instanceof LineItemId)
            {
            return ((LineItemId) oKey).getOrderId();
            }
        else if (oKey instanceof OrderId)
            {
            return oKey;
            }
        else
            {
            return null;
            }
        }

    public void init(PartitionedService service)
        {
        }
    }

The key associator may be configured for a NamedCache in the associated <distributed-scheme> element:

Example 5-3 Configuring a Key Associator

<distributed-scheme>
    <!-- ... -->
    <key-associator>
        <class-name>LineItemAssociator</class-name>
    </key-associator>
</distributed-scheme>

5.4 Example of Using Affinity

Example 5-4 illustrates how to use affinity to create a more efficient query (NamedCache.entrySet(Filter)) and cache access (NamedCache.getAll(Collection)).

Example 5-4 Using Affinity for a More Efficient Query

OrderId orderId = new OrderId(1234);

// this Filter will be applied to all LineItem objects to fetch those
// for which getOrderId() returns the specified order identifier
// "select * from LineItem where OrderId = :orderId"Filter filterEq = new EqualsFilter("getOrderId", orderId);

// this Filter will direct the query to the cluster node that currently owns
// the Order object with the given identifier
Filter filterAsc = new KeyAssociatedFilter(filterEq, orderId);

// run the optimized query to get the ChildKey objects
Set setLineItemKeys = cacheLineItems.keySet(filterAsc);

// get all the Child objects immediately
Set setLineItems = cacheLineItems.getAll(setLineItemKeys);
 
// Or remove all immediately
cacheLineItems.keySet().removeAll(setLineItemKeys);