21 Pre-Loading a Cache

This chapter describes how to pre-load data into a cache using the bulk loading and distributed loading patterns.

This chapter includes the following sections:

Bulk Loading Data Into a Cache
Performing Distributed Bulk Loading

21.1 Bulk Loading Data Into a Cache

A common scenario when using Coherence is to pre-populate a cache before an application uses the data. Example 21-1 demonstrates loading data using the put method. This technique works, but each call to put may result in network traffic, especially for partitioned and replicated caches. Additionally, each call to put returns the object it just replaced in the cache (as defined in the java.util.Map interface) which adds more unnecessary overhead.

Example 21-1 Pre-Loading a Cache

public static void bulkLoad(NamedCache cache, Connection conn)
    {
    Statement s;
    ResultSet rs;
    
    try
        {
        s = conn.createStatement();
        rs = s.executeQuery("select key, value from table");
        while (rs.next())
            {
            Integer key   = new Integer(rs.getInt(1));
            String  value = rs.getString(2);
            cache.put(key, value);
            }
        ...
        }
    catch (SQLException e)
        {...}
    }

Loading the cache can be made much more efficient by using the ConcurrentMap.putAll method instead. This is illustrated in Example 21-2:

Example 21-2 Pre-Loading a Cache Using ConcurrentMap.putAll

public static void bulkLoad(NamedCache cache, Connection conn)
    {
    Statement s;
    ResultSet rs;
    Map       buffer = new HashMap();

    try
        {
        int count = 0;
        s = conn.createStatement();
        rs = s.executeQuery("select key, value from table");
        while (rs.next())
            {
            Integer key   = new Integer(rs.getInt(1));
            String  value = rs.getString(2);
            buffer.put(key, value);

            // this loads 1000 items at a time into the cache
            if ((count++ % 1000) == 0)
                {
                cache.putAll(buffer);
                buffer.clear();
                }
            }
        if (!buffer.isEmpty())
            {
            cache.putAll(buffer);
            }
        ...
        }
    catch (SQLException e)
        {...}
    }

21.2 Performing Distributed Bulk Loading

When pre-populating a Coherence partitioned cache with a large data set, it may be more efficient to distribute the work to Coherence cluster members. Distributed loading allows for higher data throughput rates to the cache by leveraging the aggregate network bandwidth and CPU power of the cluster. When performing a distributed load, the application must decide on the following:

which cluster members performs the load
how to divide the data set among the members

The application should consider the load that is placed on the underlying data source (such as a database or file system) when selecting members and dividing work. For example, a single database can easily be overwhelmed if too many members execute queries concurrently.

21.2.1 A Distributed Bulk Loading Example

This section outlines the general steps to perform a simple distributed load. The example assumes that the data is stored in files and is distributed to all storage-enabled members of a cluster.

Retrieve the set of storage-enabled members. For example, the following method uses the getStorageEnabledMembers method to retrieve the storage-enabled members of a distributed cache.
Example 21-3 Retrieving Storage-Enabled Members of the Cache
```
protected Set getStorageMembers(NamedCache cache)
        {
        return ((PartitionedService) cache.getCacheService())
           .getOwnershipEnabledMembers();
        }
```

Divide the work among the storage enabled cluster members. For example, the following routine returns a map, keyed by member, containing a list of files assigned to that member.

Example 21-4 Routine to Get a List of Files Assigned to a Cache Member

protected Map<Member, List<String>> divideWork(Set members, List<String> fileNames)
        {
        Iterator i = members.iterator();
        Map<Member, List<String>> mapWork = new HashMap(members.size());
        for (String sFileName : fileNames)
            {
            Member member = (Member) i.next();
            List<String> memberFileNames = mapWork.get(member);
            if (memberFileNames == null)
                {
                memberFileNames = new ArrayList();
                mapWork.put(member, memberFileNames);
                }
            memberFileNames.add(sFileName);

            // recycle through the members
            if (!i.hasNext())
                {
                i = members.iterator();
                }
            }
        return mapWork;
        }

Launch a task that performs the load on each member. For example, use Coherence's InvocationService to launch the task. In this case, the implementation of LoaderInvocable must iterate through memberFileNames and process each file, loading its contents into the cache. The cache operations normally performed on the client must execute through the LoaderInvocable.

Example 21-5 Class to Load Each Member of the Cache

public void load()
        {
        NamedCache cache = getCache();

        Set members = getStorageMembers(cache);

        List<String> fileNames = getFileNames();

        Map<Member, List<String>> mapWork = divideWork(members, fileNames);

        InvocationService service = (InvocationService)
                CacheFactory.getService("InvocationService");        

        for (Map.Entry<Member, List<String>> entry : mapWork.entrySet())
            {
            Member member = entry.getKey();
            List<String> memberFileNames = entry.getValue();

            LoaderInvocable task = new LoaderInvocable(memberFileNames, cache.getCacheName());
            service.execute(task, Collections.singleton(member), this);
            }
        }