28 inCache for Page Caching

When WebCenter Sites is installed or upgraded, inCache for page caching is enabled by default. inCache for page caching overrides the legacy method of page caching. Two configuration files, cs-cache.xml and ss-cache.xml, are provided with WebCenter Sites for configuring local caches and peer-to-peer communication.

28.1 Overview

inCache supports disk striping and affects RealTime publishing by deactivating the donoteregenerate flag. The option to enable page regeneration during RealTime publishing requires populating the FW_RegenCriteria table, on the delivery system, with the URLs of pages to be crawled and regenerated. Page propagation is also an option, used to ensure that all nodes host the same pages without each node having to regenerate the pages. In addition, remote Satellite Server can be configured to continue serving stale pages for a short duration while their replacements are being regenerated.

Note:

You can return to the legacy page caching method, as necessary, by:
  1. Setting the VM argument on all WebCenter Sites and Satellite Server nodes as follows: –Dcs.useEhcache=false

  2. Reconfiguring all WebCenter Sites and Satellite Server nodes to use the database and shared file system.

  3. You will also have to purge the legacy page cache before resuming operations, given that it is likely to have become outdated since the time it was disabled.

In general, switching between caching methods requires you to reconfigure the WebCenter Sites and Satellite Server nodes to use the same cache, and to purge the cache before its use, as recommended by best practices.

28.2 Configuring Your System for inCache Page Caching

The configuration process consists of a set of required steps and optional optimization steps, shown below. For example, you will enable multiple WebCenter Sites nodes to communicate with each other, and set storage properties for each local cache on the WebCenter Sites and remote Satellite Servers.

To configure your system for inCache page caching

  1. Ensure that inCache for page caching is enabled. Look for the following VM argument: –Dcs.useEhcache and verify that it is either set to true or not set (in which case its value is assumed to be true).

  2. Configure each WebCenter Sites and Satellite Server node to retain cache content on system restart. Pass the following VM argument:

    –Dnet.sf.ehcache.enableShutdownHook=true
    
  3. Optional optimization. Set the diskStore path for each WebCenter Sites and Satellite Server node. If multiple nodes are on the same machine (vertical cluster) ensure that each node has a unique diskStore path. In the cs-cache.xml and ss-cache.xml files, set the property diskStore path="<path to disk store>". Both xml files are stored on each WebCenter Sites node: cs-cache.xml for WebCenter Sites and ss-cache.xml for co-resident Satellite Server. Only the ss-cache.xml file is stored on remote Satellite Server. The files are located under the WEB-INF/classes folder.

    For more information about the diskStore property, refer to the documentation on the Ehcache web site (at the time of this writing, the URL is http://ehcache.org/documentation/configuration.html).

    Note:

    inCache exists on WebCenter Sites, remote Satellite Servers, and co-resident Satellite Server. Architecturally and functionally inCache is identical on co-resident and remote Satellite Servers. However, the following recommendations are applied to co-resident Satellite Server:
    1. Disk persistence is turned off.

    2. The size of each cache is kept smaller than the size of WebCenter Sites' cache. This is to prevent overloading memory. We recommend keeping the sizes small except under special circumstances, such as the requirement for full double-buffered caching on co-resident Satellite Server.

  4. Required for multiple nodes. Configure automatic node detection for WebCenter Sites clusters (Satellite Server nodes cannot be clustered). As an example, this step uses three WebCenter Sites nodes named CS1, CS2, and CS3.

    1. On each WebCenter Sites node ensure that cs-cache.xml and ss-cache.xml (in WEB-INF/classes) specify a cacheManagerPeerListenerFactory, which will be used to create a CacheManagerPeerProvider. The provider detects other nodes in the cluster. Configure automatic detection as shown in the following example:

      <cacheManagerPeerProviderFactory
         class="net.sf.ehcache.distribution.RMCacheManagerPeer
         ProviderFactory"
         properties="peerDiscovery = automatic, 
         multicastGroupAddress = 230.0.0.1, 
         multicastGroupPort = 4444, timeToLive = 32"/>
      

      Note:

      The multicastGroupPort property must be set identically across cluster members. For example, if CS1 specifies multicastGroupPort=4444 in its cs-cache.xml file, then CS2 and CS3 must specify the same setting in their files.

      The timeToLive property specifies the number of hops allowed to a packet, which determines how far the packet propagates. Valid values range between 0 and 255 with the following restrictions:

      • 0 - same host

      • 1 - same subnet

      • 32 - same site

      • 64 - same region

      • 128 - same continent

      • 255 - unrestricted

    2. On all Satellite Servers, co-resident and remote, do one of the following:

      • Set the multicastGroupPort property in each ss-cache.xml file to a unique value.

      • To support cache replication, set the multicastGroupPort property in each ss-cache.xml file to the same value.

        Note:

        The value(s) that you set in the ss-cache.xml files must be different from the value for WebCenter Sites' multicastGroupPort property in the cs-cache.xml files.

      When inCache is configured, the system will start using the configurations specified in cs-cache.xml and ss-cache.xml. Caches are initialized upon the first call to any page in WebCenter Sites, cached or not.

  5. Optional optimization. Configure the local caches as necessary, using Table 28-1, "Cache Configuration Properties" as a property reference. The local cache for each WebCenter Sites and remote Satellite Server node is partitioned. Each part is defined in its own <cache> tag in the cs-cache.xml and ss-cache.xml files. The parts are named as follows:

    • pageByQry: This is the cache for the page data itself, keyed by the query url.

    • dependencyRepository: This is the cache of dependencies on which the pages are built. When pages are added to the pageByQry cache, entries are automatically created in the dependencyRepository cache. Each entry in this cache is an asset id or unknowndep or unknowndep-<type>. Therefore, this cache contains at most the number of assets in the system and a handful of items for different variations of unknowndeps.

      When a page with no dependencies is cached, that page logs a dependency on _NODEP_ (a single item in the dependencyRepository cache used to identify pages with no dependencies). The _NODEP_ item remains associated with the page until that page either expires or is manually flushed. On remote Satellite Servers, you can disable caching of pages without dependencies by setting the JVM option:

      -Dignore_nodep_pages to true (per Satellite Server).

    • notifier: This cache is used by active WebCenter Sites cluster members to notify other WebCenter Sites cluster members of changes to content.

  6. Restart all configured WebCenter Sites and Satellite Server nodes.

  7. Verify that all active cluster members are in the caching network and recognize each other. Use the Cluster Info diagnostic tool, which lists all WebCenter Sites members that have a notifier cache. To invoke Cluster Info:

    1. Bootstrap inCache by rendering a cacheable page.

    2. Log in to the WebCenter Sites Admin interface as a general administrator (fwadmin/xceladmin, by default).

    3. Verify that WebCenter Sites cluster members and co-resident Satellite Servers are communicating with each other. Open the Admin tab and go to System Tools then expand Cache Management, then expand Sites Cache, and double-click Cluster Info.

      Figure 28-1 Cluster Info Form

      Description of Figure 28-1 follows
      Description of ''Figure 28-1 Cluster Info Form''

      Various types of page caching statistics are available in the Cache Configuration tool. For more information, see Chapter 30, "System Tools."

  8. If you wish to stripe disks, enable page regeneration during RealTime publishing, or enable page propagation, see Section 28.3, "Tuning Options."

    Table 28-1 Cache Configuration Properties

    Property Required? Description

    name

    Y

    Specifies the name of the cache.

    Legal Values:

    • pageByQry

    • dependencyRepository

    • notifier

    For descriptions, see step 4.

    diskPersistent

    N

    Specifies whether to persist data on disk between restarts of the JVM.

    Default / recommended value: true

    Note: The diskPersistent setting must be consistent across the PageByQry, dependencyRepository, and notifier caches to prevent potential conflicts in generation count.

    maxElementsInMemory

    Y

    Specifies the maximum number of objects to store in memory.

    Default value: 200000

    maxElementsOnDisk

    Y

    Specifies the maximum number of objects to store on disk. Disks can be striped. For information, see Section 28.3.1, "Striping the Disk Cache."

    Default value: 1000000

    eternal

    Y

    Specifies whether cache will be cleared by WebCenter Sites (it is never cleared by inCache).

    Default value: true

    Do not change the value of this property.

    overflowToDisk

    Y

    Specifies whether the memory cache is allowed to overflow to disk-based cache.

    Default value: true

    Do not change the value of this property.

    diskSpoolBufferSizeMB

    N

    Specifies the size of the disk buffer, in megabytes. If you expect many disk I.Os (that is, disk cache is much larger than memory cache) set the buffer to 20MB.

    Default value: 5

    memoryStoreEvictionPolicy

    N

    Specifies how to remove items from cache. The recommended and default value is LFU (Least Frequently Used). LRU (Least Recently Used) is also a legal value.

    Default / recommended value: LFU

    clearOnFlush

    N

    Specifies whether to clear memory as it is flushed to disk.

    Default value: false

    Do not change the value of this property.


28.3 Tuning Options

28.3.1 Striping the Disk Cache

The inCache framework supports striping of the pageByQry cache to reduce the contention that occurs when a large portion of the cache must be written to disk.

Complete the following steps on WebCenter Sites nodes and remote Satellite Servers in the inCache framework:

To stripe the pageByQry cache

  1. To enable striping, add the following VM argument:

    –DnumOfDiskStores=X
    

    where X is the number of stripes. Set the number of stripes to the number of unique spindles that are available to stripe over (for instance if you have 5 drives, then set X to 5).

    Note:

    Drives used for striping the disk-based cache should not be used for any other purpose.

    The size of each DiskStore is the size specified by the property maxElementsOnDisk in the xml configuration file (see Table 28-1). For example, if you are using 5 stripes and maxElementsOnDisk is set to 100000 items, a total of 500000 items can be stored.

  2. Create a symbolic link or else mount the drive physically in the correct location so that the stripes are properly distributed.

    For each defined cache, the system creates a directory under the diskStore path (configured in step 3 in Section 28.3.3, "Setting Up Page Propagation"). Under each directory it also creates a group of numbered directories, starting at 0. Each directory points to a different drive.

    For example, items would be stored as follows in a disk-based cache on WebCenter Sites for –DnumOfDiskStores = 5:

    <custom_path>/cs-cache: 
       Directory: 0
       Directory: 1
       Directory: 2
       Directory: 3 
       Directory: 4
       File: dependencyRepository.data
       File: dependencyRepository.index
       File: notifier.data
       File: notifier.index
    <custom_path>/cs-cache/0:
       File: pageByQry.data
       File: pageByQry.index
    <custom_path>/cs-cache/1: 
       File: pageByQry.data
       File: pageByQry.index
    <custom_path>/cs-cache/2: 
       File: pageByQry.data
       File: pageByQry.index
    <custom_path>/cs-cache/3: 
       File: pageByQry.data
       File: pageByQry.index
    <custom_path>/cs-cache/4: 
       File: pageByQry.data
       File: pageByQry.index
    

In this example, the root cache contains dependency and notifier caches. Spread among the directories 0–4 are the stripes of the pageByQry cache. The directories 0–4 in this example were placed on five separate drives by the use of symbolic links.

28.3.2 Configuring for Page Regeneration During RealTime Publishing

Pages are regenerated only when they are requested.

To configure page regeneration during RealTime publishing

  1. By default, the PageCacheUpdater is set by the WebCenter Sites installer to use ParallelCacheRegenerator:

    Open the AdvPub.xml file in the WEB-INF/classes folder on the delivery WebCenter Sites system and verify that the PageCacheUpdater section has the lines shown below:

    Note:

    Do not change any of the values in the PageCacheUpdater section except for:
    • numThreadsPerServer, to specify the number of simultaneous threads for crawling

    • regenServers, to point to the WebCenter Sites nodes by <address> and <port of server where pages will be regenerated>

    <bean id="PageCacheUpdater" class="com.fatwire.realtime.regen.ParallelRegeneratorEh" singleton="false">
       <property name="id" value="CacheFlusher" />
       <property name="numThreadsPerServer" value="3" />
       <property name="regenServers">
         <list>
            <value>http://address:port of server where pages will be regenerated/servlet/ContentServer</value>
         </list>
       </property>
    </bean>
    
  2. Restart the delivery system.

  3. RealTime publish to the delivery WebCenter Sites to create the FW_RegenCriteria table on that system.

  4. Open the FW_RegenCriteria table, using Sites Explorer. Specify the pages to be regenerated and the depth of links to be followed. The ft_ss parameter can be included in the URLs to specify whether page requests will be handled by WebCenter Sites directly or by remote Satellite Server.

    Note:

    If a URL assembler is used, specify the internal URLs of the pages to crawl. The regenerator does not recognize URL-assembler URLs.

    Example:

    pagename=SiteName/HomePage&ft_ss=true
    Level=1
    

    WebCenter Sites will regenerate HomePage and the pages that are linked from HomePage. Given that ft_ss=true, the requests are treated as if they are generated from Satellite Server.

  5. RealTime publish to the delivery WebCenter Sites.

    During the publishing session, the delivery system crawls all of the pages specified in the FW_RegenCriteria table, but regenerates only pages for which component assets were invalidated (it also generates the uncached pages).

    For example:

    1. Updated assets are published to the delivery system.

    2. The delivery system invalidates the existing dependency information for the re-published assets by marking their asset identifiers as invalid in the dependencyRepository cache and incrementing the dependency generation counter. Because the invalidated assets are no longer available to pages that reference the assets, the pages are invalidated.

    3. The page regenerator crawls pages with the URLs that are specified in the FW_RegenCriteria table and regenerates the invalidated pages. It also generates the uncached pages.

28.3.3 Setting Up Page Propagation

Page propagation enables all nodes in a WebCenter Sites cluster (including the Satellite Server nodes) to host the same pages without each node having to regenerate the pages.

When configured to use inCache, each WebCenter Sites has a separate JVM and thus maintains a separate, local cache. If one WebCenter Sites generates and caches a new page, none of the other WebCenter Sites have the newly cached page, nor are they informed of that page. Upon receiving a request for the same page, each node must generate the page by referring to the database and shared file system, thus putting extra load on both components. Page propagation prevents different nodes from regenerating the same page by propagating the page across the cluster.

Page propagation is triggered when pages are loaded into a node's local cache. It works as illustrated in the scenario below, starting with basic inCache functionality.

  1. inCache Page Caching for Newly Generated Pages:

    1. Node A, a WebCenter Sites node, receives a request for a new page.

    2. Node A generates the requested page.

    3. Node A caches the new page with complete dependency information (identifiers of component assets) into its local page and dependency caches.

    4. When page propagation is enabled, step 3 follows.

  2. inCache Page Caching for Regenerated Pages:

    1. Node B, also a WebCenter Sites node, receives a request for a page that has been invalidated (a component asset was modified. The asset is marked as invalid in the node's local dependency cache and in the dependency caches of all other nodes containing the asset.)

    2. Node B regenerates the requested page.

    3. Node B caches the regenerated page and updates its dependency (in step 2a) by incrementing the generation counter.

    4. When page propagation is enabled, step 3 follows.

  3. inCache Page Caching with Page Propagation:

    When the node caches the (re)generated page, it also propagates the page to all other nodes. Propagated information consists of:

    • The page's complete dependency information. For example, if the page has x dependencies (asset identifiers), all x dependencies are propagated from the local dependency cache to the dependency caches of other WebCenter Sites nodes.

    • The page itself. The page is propagated from the local page cache to the page caches of other WebCenter Sites nodes.

    If the page already exists on a receiving node, the node ignores the propagation.

The following list summarizes page propagation events and conditions:

  • When a page is cached on a WebCenter Sites node, its complete page information (described in step 3, above) is propagated to all other WebCenter Sites nodes.

  • When a page is cached on a remote Satellite Server node, its complete page information (described in step 3, above) is propagated to other remote Satellite Servers over Java Remote Method Invocation (RMI).

  • When a page is propagated:

    • The page's last updated time stamp is preserved on all WebCenter Sites and Satellite Server nodes.

    • The generation count on a given node remains independent of generation count on all other nodes. For example, Node A has a generation count of 10, but other nodes use a generation count of 10 for a different type of dependency. Even though the same object is propagated across WebCenter Sites or Satellite Servers, it may be assigned different generation counts by each WebCenter Sites or Satellite Server node. The object and its last updated/modified time remain the same across nodes.

    • The propagation is ignored by nodes on which the page already exists.

  • If any of a cached page's dependencies fail to propagate to a node where the page does not exist, the page is not cached on that node until it is requested and generated on that node.

  • Because blobs are still stored in the Satellite Server caches, they are replicated across Satellite Servers.

This section contains the following topics:

28.3.3.1 Enabling Page Propagation

Start with a system that is configured to use inCache.

To enable page propagation

  1. On all WebCenter Sites nodes:

    1. Set propagatecache=true in the futuretense.ini property file.

    2. In cs-cache.xml, verify that multicastGroupPort is the same for all WebCenter Sites nodes in the cluster to ensure they can communicate with each other. (The file is located in WEB-INF\classes.)

  2. On all remote Satellite Servers:

    1. Set propagatecache=true in the satellite.properties file.

    2. In ss-cache.xml, ensure multicastGroupPort is identical for all Satellite Servers intended for cache propagation, but different from multicastGroupPort for the WebCenter Sites nodes.

  3. On all nodes, initialize page propagation by rendering any page and verify that the system responds as described below:

    • On WebCenter Sites nodes, caching of the page triggers its propagation from the local cache to the caches of other WebCenter Sites nodes (as described in step 3 in Section 28.3.3, "Setting Up Page Propagation") while preserving the page's last update/modified time stamp across all WebCenter Sites. Propagation is ignored by WebCenter Sites nodes on which the page already exists.

    • The response on Satellite Servers proceeds as on WebCenter Sites nodes, but over Java RMI.

28.3.3.2 Setting Up Page Propagation on Restart

If a node enabled for page propagation is restarted, its local caches must be re-initialized in order for the node to recognize page propagations. To re-initialize a local cache, render a cacheable page on the node; caching triggers the node to propagate the page to other nodes and recognize pages that are propagated by other nodes. When a node is restarted, the propagations it missed during its period of inactivity are not reproduced on the node, even though its local caches are re-initialized.

28.3.4 Configuring for Pagelet Regeneration in Background

Remote Satellite Server can be configured to serve invalidated pagelets while they are being regenerated by a background process. To enable serving of invalidated pagelets, add serveStale=true to remote Satellite Server's satellite.properties file. If a pagelet is then invalidated, it will be regenerated in one of the following ways:

  • If a browser requests the pagelet within the next 30 minutes, remote Satellite Server starts a background process that sends a request to WebCenter Sites to regenerate the page. While the background process is running, the browser is served the invalidated pagelet from remote Satellite Server's cache. All subsequent requests will be served the invalidated pagelet until it is regenerated by the background process.

  • If a browser requests the pagelet after 30 minutes, remote Satellite Server uses its normal process for regenerating pagelets; that is, it sends a request to WebCenter Sites to regenerate the pagelet. The requesting browser must wait until the page is regenerated.

To obtain information about the background process, set the com.fatwire.logging.cs.cache.ehcache logger to DEBUG in remote Satellite Server's commons-logging.properties file. DEBUG produces the following message at the start of the background process:

"Data for <cache key> is found to have been invalidated. A new request has started processing new data in the background."

At the end of the process, the message reads:

"Background process for request <cache key> has completed successfully, data in cache is updated."