20 キャッシュの事前ロード

この章では、キャッシュの事前ロードに使用可能な複数のパターンについて説明します。このパターンには、バルク・ロードや分散ロードなどがあります。

この章の内容は次のとおりです。

バルク・ロードおよび処理の実行
分散バルク・ロードの実行

20.1 バルク・ロードおよび処理の実行

例20-5、PagedQuery.javaは、Coherenceキャッシュのアイテムのバルク・ロードと処理を効率的に実行するテクニックを示しています。

20.1.1 キャッシュへのバルク書込み

Coherenceを使用する際の一般的なシナリオでは、アプリケーションでの使用前にキャッシュに事前移入します。例20-1は、これをJavaコードで簡単に実行する方法を示しています。

例20-1 キャッシュの事前ロード

public static void bulkLoad(NamedCache cache, Connection conn)
    {
    Statement s;
    ResultSet rs;
    
    try
        {
        s = conn.createStatement();
        rs = s.executeQuery("select key, value from table");
        while (rs.next())
            {
            Integer key   = new Integer(rs.getInt(1));
            String  value = rs.getString(2);
            cache.put(key, value);
            }
        ...
        }
    catch (SQLException e)
        {...}
    }

このテクニックは有益ですが、特にパーティション・キャッシュおよびレプリケーション・キャッシュでは、putをコールするたびにネットワーク・トラフィックが発生する場合があります。また、putをコールするたびにキャッシュで置換されたばかりのオブジェクトが返され(java.util.Mapインタフェースによる)、不要なオーバーヘッドが追加されます。かわりに、ConcurrentMap.putAllメソッドを使用すると、キャッシュのロードを大幅に効率化できます。これを例20-2に示します。

例20-2 ConcurrentMap.putAllを使用したキャッシュの事前ロード

public static void bulkLoad(NamedCache cache, Connection conn)
    {
    Statement s;
    ResultSet rs;
    Map       buffer = new HashMap();

    try
        {
        int count = 0;
        s = conn.createStatement();
        rs = s.executeQuery("select key, value from table");
        while (rs.next())
            {
            Integer key   = new Integer(rs.getInt(1));
            String  value = rs.getString(2);
            buffer.put(key, value);

            // this loads 1000 items at a time into the cache
            if ((count++ % 1000) == 0)
                {
                cache.putAll(buffer);
                buffer.clear();
                }
            }
        if (!buffer.isEmpty())
            {
            cache.putAll(buffer);
            }
        ...
        }
    catch (SQLException e)
        {...}
    }

20.1.2 フィルタ結果の効率的な処理

Coherenceでは、Filter APIを使用することによって、基準に基づいてキャッシュを問い合せることができます。次にその例を示します(エントリではキーに整数、値に文字列を指定)。

例20-3 フィルタを使用したキャッシュの問合せ

NamedCache c = CacheFactory.getCache("test");

// Search for entries that start with 'c'
Filter query = new LikeFilter(IdentityExtractor.INSTANCE, "c%", '\\', true);

// Perform query, return all entries that match
Set results = c.entrySet(query);
for (Iterator i = results.iterator(); i.hasNext();)
    {
    Map.Entry e = (Map.Entry) i.next();
    out("key: "+e.getKey() + ", value: "+e.getValue());
    }

この例は、小さなデータセットには便利ですが、データセットが大きすぎる場合は、ヒープ領域不足などの問題が発生することがあります。例20-4は、問合せ結果をバッチで処理してこの問題を回避するパターンを示しています。

例20-4 問合せ結果のバッチ処理

public static void performQuery()
    {
    NamedCache c = CacheFactory.getCache("test");

    // Search for entries that start with 'c'
    Filter query = new LikeFilter(IdentityExtractor.INSTANCE, "c%", '\\', true);

    // Perform query, return keys of entries that match
    Set keys = c.keySet(query);

    // The amount of objects to process at a time
    final int BUFFER_SIZE = 100;

    // Object buffer
    Set buffer = new HashSet(BUFFER_SIZE);

    for (Iterator i = keys.iterator(); i.hasNext();)
        {
        buffer.add(i.next());

        if (buffer.size() >= BUFFER_SIZE)
            {
            // Bulk load BUFFER_SIZE number of objects from cache
            Map entries = c.getAll(buffer);

            // Process each entry
            process(entries);

            // Done processing these keys, clear buffer
            buffer.clear();
            }
        }
        // Handle the last partial chunk (if any)
        if (!buffer.isEmpty())
            {
            process(c.getAll(buffer));
            }
    }

public static void process(Map map)
    {
    for (Iterator ie = map.entrySet().iterator(); ie.hasNext();)
        {

        Map.Entry e = (Map.Entry) ie.next();
        out("key: "+e.getKey() + ", value: "+e.getValue());
        }
    }

この例では、フィルタに一致するエントリのすべてのキーが返されますが、一度にキャッシュから取得されるエントリ数はBUFFER_SIZE(この例では100)のみです。

LimitFilterは、上記の例と同様に、結果を別々に処理できます。ただし、LimitFilterは、ユーザー・インタフェースのように結果がページングされるシナリオに適しています。問合せ結果のデータをすべて処理する場合には効率的な方法ではありません。

20.1.3 バルク・ロードと処理の例

例20-5は、PagedQuery.javaを示しています。これは前述の概念を示すサンプル・プログラムです。

このサンプルを実行する手順は次のとおりです。

次のJavaファイルをcom/tangosol/examples/PagedQuery.javaという名前で保存します。
クラスパスをCoherenceのライブラリと現在のディレクトリに指定します。
サンプルをコンパイルして実行します。

例20-5 バルク・ロード・プログラムのサンプル

package com.tangosol.examples;

import com.tangosol.net.CacheFactory;
import com.tangosol.net.NamedCache;
import com.tangosol.net.cache.NearCache;
import com.tangosol.util.Base;
import com.tangosol.util.Filter;
import com.tangosol.util.filter.LikeFilter;

import java.io.Serializable;

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Random;
import java.util.Set;
import java.util.HashSet;


/**
* This sample application demonstrates the following:
* <ul>
* <li>
* <b>Obtaining a back cache from a near cache for populating a cache.</b>
* Since the near cache holds a limited subset of the data in a cache it is
* more efficient to bulk load data directly into the back cache instead of
* the near cache.
* </li>
* <li>
* <b>Populating a cache in bulk using <tt>putAll</tt>.</b>
* This is more efficient than <tt>put</tt> for a large amount of entries.
* </li>
* <li>
* <b>Executing a filter against a cache and processing the results in bulk.</b>
* This sample issues a query against the cache using a filter.  The result is
* a set of keys that represent the query results.  Instead of iterating
* through the keys and loading each item individually with a <tt>get</tt>,
* this sample loads entries from the cache in bulk using <tt>getAll</tt> which
* is more efficient.
* </li>
*
* @author cp
*/
public class PagedQuery
        extends Base
    {
    /**
    * Command line execution entry point.
    */
    public static void main(String[] asArg)
        {
        NamedCache cacheContacts = CacheFactory.getCache("contacts",
                Contact.class.getClassLoader());

        populateCache(cacheContacts);

        executeFilter(cacheContacts);

        CacheFactory.shutdown();
        }

    // ----- populate the cache ---------------------------------------------
    
    /**
    * Populate the cache with test data.  This example shows how to populate
    * the cache a chunk at a time using {@link NamedCache#putAll} which is more
    * efficient than {@link NamedCache#put}.
    *
    * @param cacheDirect  the cache to populate. Note that this should <b>not</b>
    *                     be a near cache since that thrashes the cache
    *                     if the load size exceeds the near cache max size.  
    */
    public static void populateCache(NamedCache cacheDirect)
        {
        if (cacheDirect.isEmpty())
            {
            Map mapBuffer = new HashMap();
            for (int i = 0; i < 100000; ++i)
                {
                // some fake data
                Contact contact = new Contact();
                contact.setName(getRandomName() + ' ' + getRandomName());
                contact.setPhone(getRandomPhone());
                mapBuffer.put(new Integer(i), contact);

                // this loads 1000 items at a time into the cache
                if ((i % 1000) == 0)
                    {
                    out("Adding "+mapBuffer.size()+" entries to cache");
                    cacheDirect.putAll(mapBuffer);
                    mapBuffer.clear();
                    }
                }
            if (!mapBuffer.isEmpty())
                {
                cacheDirect.putAll(mapBuffer);
                }
            }
        }

    /**
    * Creates a random name.
    *
    * @return  a random string between 4 to 11 chars long
    */
    public static String getRandomName()
        {
        Random rnd = getRandom();
        int    cch = 4 + rnd.nextInt(7);
        char[] ach = new char[cch];
        ach[0] = (char) ('A' + rnd.nextInt(26));
        for (int of = 1; of < cch; ++of)
            {
            ach[of] = (char) ('a' + rnd.nextInt(26));
            }
        return new String(ach);
        }

    /**
    * Creates a random phone number
    *
    * @return  a random string of integers 10 chars long
    */
    public static String getRandomPhone()
        {
        Random rnd = getRandom();
        return "("
            + toDecString(100 + rnd.nextInt(900), 3)
            + ") "
            + toDecString(100 + rnd.nextInt(900), 3)
            + "-"
            + toDecString(10000, 4);
        }

    // ----- process the cache ----------------------------------------------

    /**
    * Query the cache and process the results in batches.  This example
    * shows how to load a chunk at a time using {@link NamedCache#getAll}
    * which is more efficient than {@link NamedCache#get}.
    *
    * @param cacheDirect  the cache to issue the query against
    */
    private static void executeFilter(NamedCache cacheDirect)
        {
        Filter query = new LikeFilter("getName", "C%");

        // to process 100 entries at a time
        final int CHUNK_COUNT = 100;

        // Start by querying for all the keys that match
        Set setKeys = cacheDirect.keySet(query);

        // Create a collection to hold the "current" chunk of keys
        Set setBuffer = new HashSet();

        // Iterate through the keys
        for (Iterator iter = setKeys.iterator(); iter.hasNext(); )
            {
            // Collect the keys into the current chunk
            setBuffer.add(iter.next());

            // handle the current chunk when it gets big enough
            if (setBuffer.size() >= CHUNK_COUNT)
                {
                // Instead of retrieving each object with a get,
                // retrieve a chunk of objects at a time with a getAll.
                processContacts(cacheDirect.getAll(setBuffer));
                setBuffer.clear();
                }
            }

        // Handle the last partial chunk (if any)
        if (!setBuffer.isEmpty())
            {
            processContacts(cacheDirect.getAll(setBuffer));
            }
        }

    /**
    * Process the map of contacts. In a real application some sort of
    * processing for each map entry would occur. In this example each
    * entry is logged to output. 
    *
    * @param map  the map of contacts to be processed
    */
    public static void processContacts(Map map)
        {
        out("processing chunk of " + map.size() + " contacts:");
        for (Iterator iter = map.entrySet().iterator(); iter.hasNext(); )
            {
            Map.Entry entry = (Map.Entry) iter.next();
            out("  [" + entry.getKey() + "]=" + entry.getValue());
            }
        }

    // ----- inner classes --------------------------------------------------

    /**
    * Sample object used to populate cache
    */
    public static class Contact
            extends Base
            implements Serializable
        {
        public Contact() {}

        public String getName()
            {
            return m_sName;
            }
        public void setName(String sName)
            {
            m_sName = sName;
            }

        public String getPhone()
            {
            return m_sPhone;
            }
        public void setPhone(String sPhone)
            {
            m_sPhone = sPhone;
            }

        public String toString()
            {
            return "Contact{"
                    + "Name=" + getName()
                    + ", Phone=" + getPhone()
                    + "}";
            }

        public boolean equals(Object o)
            {
            if (o instanceof Contact)
                {
                Contact that = (Contact) o;
                return equals(this.getName(), that.getName())
                    && equals(this.getPhone(), that.getPhone());
                }
            return false;
            }

        public int hashCode()
            {
            int result;
            result = (m_sName != null ? m_sName.hashCode() : 0);
            result = 31 * result + (m_sPhone != null ? m_sPhone.hashCode() : 0);
            return result;
            }

        private String m_sName;
        private String m_sPhone;
        }
    }

例20-6は、サンプルのコンパイルおよび実行後のCoherenceからの端末出力を示しています。

例20-6 バルク・ロード・プログラムからの端末出力

$ export COHERENCE_HOME=[**Coherence install directory**]

$ export CLASSPATH=$COHERENCE_HOME/lib/coherence.jar:.

$ javac com/tangosol/examples/PagedQuery.java

$ java com.tangosol.examples.PagedQuery

2008-09-15 12:19:44.156 Oracle Coherence 3.4/405 <Info> (thread=main, member=n/a): Loaded operational configuration from
 resource "jar:file:/C:/coherence/lib/coherence.jar!/tangosol-coherence.xml"
2008-09-15 12:19:44.171 Oracle Coherence 3.4/405 <Info> (thread=main, member=n/a): Loaded operational overrides from
resource "jar:file:/C:/coherence/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2008-09-15 12:19:44.171 Oracle Coherence 3.4/405 <D5> (thread=main, member=n/a): Optional configuration override
"/tangosol-coherence-override.xml" is not specified

Oracle Coherence Version 3.4/405
 Grid Edition: Development mode
Copyright (c) 2000-2008 Oracle. All rights reserved.

2008-09-15 12:19:44.812 Oracle Coherence GE 3.4/405 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster
with senior service member n/a
2008-09-15 12:19:48.062 Oracle Coherence GE 3.4/405 <Info> (thread=Cluster, member=n/a): Created a new cluster with
Member(Id=1, Timestamp=2008-09-15 12:19:44.609, Address=xxx.xxx.x.xxx:8088, MachineId=26828, Edition=Grid Edition,
Mode=Development, CpuCount=2, SocketCount=1) UID=0xC0A800CC00000112B9BC9B6168CC1F98
Adding 1024 entries to cache
Adding 1024 entries to cache

...repeated many times...

Adding 1024 entries to cache
Adding 1024 entries to cache
Adding 1024 entries to cache
processing chunk of 100 contacts:
  [25827]=Contact{Name=Cgkyleass Kmknztk, Phone=(285) 452-0000}
  [4847]=Contact{Name=Cyedlujlc Ruexrtgla, Phone=(255) 296-0000}
...repeated many times
  [33516]=Contact{Name=Cjfwlxa Wsfhrj, Phone=(683) 968-0000}
  [71832]=Contact{Name=Clfsyk Dwncpr, Phone=(551) 957-0000}
processing chunk of 100 contacts:
  [38789]=Contact{Name=Cezmcxaokf Kwztt, Phone=(725) 575-0000}
  [87654]=Contact{Name=Cuxcwtkl Tqxmw, Phone=(244) 521-0000}
...repeated many times
  [96164]=Contact{Name=Cfpmbvq Qaxty, Phone=(596) 381-0000}
  [29502]=Contact{Name=Cofcdfgzp Nczpdg, Phone=(563) 983-0000}
...
processing chunk of 80 contacts:
  [49179]=Contact{Name=Czbjokh Nrinuphmsv, Phone=(140) 353-0000}
  [84463]=Contact{Name=Cyidbd Rnria, Phone=(571) 681-0000}
...
  [2530]=Contact{Name=Ciazkpbos Awndvrvcd, Phone=(676) 700-0000}
  [9371]=Contact{Name=Cpqo Rmdw, Phone=(977) 729-0000}

20.2 分散バルク・ロードの実行

Coherenceのパーティション・キャッシュに大量のデータセットを事前移入する際は、Coherenceのクラスタ・メンバーに作業を分散すると効率が向上する可能性があります。分散ロードを使用すると、クラスタのネットワーク帯域幅およびCPU処理能力の集積を活用することによって、データ・スループット率を高めることができます。分散ロードの実行時には、アプリケーションで次の事項を決定する必要があります。

ロードを実行するクラスタのメンバー
メンバー間でのデータセットの分割方法

メンバーの選択および作業の分割時は、基礎となるデータソース(データベースやファイル・システムなど)に対する負荷をアプリケーションで考慮する必要があります。たとえば、問合せを同時に実行するメンバーが多すぎると、1つのデータベースでは対応しきれなくなる場合があります。

20.2.1 分散バルク・ロードの例

この項では、簡単な分散ロードを実行する一般的な手順の概要について説明します。この例では、データがファイルに保存されていて、クラスタ内で記憶域が有効なすべてのメンバーに分散されることを前提としています。

記憶域が有効なメンバーのセットを取得します。たとえば、次の方法ではgetStorageEnabledMembersメソッドを使用して、分散キャッシュ内で記憶域が有効なメンバーを取得します。
例20-7 キャッシュ内で記憶域が有効なメンバーの取得
```
protected Set getStorageMembers(NamedCache cache)
        {
        return ((PartitionedService) cache.getCacheService())
           .getOwnershipEnabledMembers();
        }
```

記憶域が有効なクラスタのメンバー間で作業を分割します。たとえば、次のルーチンでは、メンバーに割り当てられたファイルの一覧が含まれるマップが、メンバーをキーとして返されます。

例20-8 キャッシュのメンバーに割り当てられたファイルの一覧を取得するルーチン

protected Map<Member, List<String>> divideWork(Set members, List<String> fileNames)
        {
        Iterator i = members.iterator();
        Map<Member, List<String>> mapWork = new HashMap(members.size());
        for (String sFileName : fileNames)
            {
            Member member = (Member) i.next();
            List<String> memberFileNames = mapWork.get(member);
            if (memberFileNames == null)
                {
                memberFileNames = new ArrayList();
                mapWork.put(member, memberFileNames);
                }
            memberFileNames.add(sFileName);

            // recycle through the members
            if (!i.hasNext())
                {
                i = members.iterator();
                }
            }
        return mapWork;
        }

各メンバーへのロードを実行するタスクを起動します。たとえば、タスクの起動にはCoherenceのInvocationServiceなどを使用します。この場合、LoaderInvocableの実装では、memberFileNamesを反復して各ファイルを処理し、その内容をキャッシュにロードする必要があります。通常、クライアント上で実行されるキャッシュ処理は、LoaderInvocableを使用して実行する必要があります。

例20-9 キャッシュの各メンバーをロードするクラス

public void load()
        {
        NamedCache cache = getCache();

        Set members = getStorageMembers(cache);

        List<String> fileNames = getFileNames();

        Map<Member, List<String>> mapWork = divideWork(members, fileNames);

        InvocationService service = (InvocationService)
                CacheFactory.getService("InvocationService");        

        for (Map.Entry<Member, List<String>> entry : mapWork.entrySet())
            {
            Member member = entry.getKey();
            List<String> memberFileNames = entry.getValue();

            LoaderInvocable task = new LoaderInvocable(memberFileNames, cache.getCacheName());
            service.execute(task, Collections.singleton(member), this);
            }
        }