7 Starting and Stopping Cluster Members

You can start and stop the cache servers and cache clients that are part of a cluster.

This chapter includes the following sections:

Starting Cache Servers

Cache servers are cluster members that are responsible for storing cached data.A cluster may be comprised of many cache servers. Each cache server runs in its own JVM.

This section includes the following topics:

Overview of the DefaultCacheServer Class

The com.tangosol.net.DefaultCacheServer class is used to start a cache server. A cache server can be started from the command line or can be started programmatically. The following arguments are used when starting a cache server:

  • The name of a cache configuration file that is found on the classpath or the path to a Grid ARchive (GAR). If both are provided, the GAR takes precedence. A GAR includes the artifacts that comprise a Coherence application and adheres to a specific directory structure. A GAR can be left as a directory or can be archived with a .gar extension. See Building a Coherence GAR Module in Administering Oracle Coherence.

  • An optional application name for the GAR. If no name is provided, the archive name is used (the directory name or the file name without the .gar extension). The name provides an application scope that is used to separate applications on a cluster.

  • The number of seconds between checks for stopped services. Stopped services are only automatically started if they are set to be automatically started (as configured by an <autostart> element in the cache configuration file). The default value if no argument is provided is 5 seconds.

Starting Cache Servers From the Command Line

Cache servers are typically started from the command line. Use the Java -cp option to indicate the location of the coherence.jar file and the location where the tangosol-coherence-override.xml and coherence-cache-config.xml files are located. The location of the configuration files must precede the coherence.jar file on the classpath; otherwise, the default configuration files that are located in the coherence.jar file are used to start the cache server instance. See Understanding Configuration.

The following example starts a cache server member, uses any configuration files that are placed in the COHERENCE_HOME\config directory, and checks for service restarts every two seconds.

java -server -Xms512m -Xmx512m -cp COHERENCE_HOME\config;COHERENCE_HOME\lib\coherence.jar com.tangosol.net.DefaultCacheServer 2

The following example starts a cache server member and uses the Coherence application artifacts that are packaged in the MyGar.gar file. The default name (MyGAR) is used as the application name.

java -server -Xms512m -Xmx512m -cp COHERENCE_HOME\config;COHERENCE_HOME\lib\coherence.jar com.tangosol.net.DefaultCacheServer D:\example\MyGAR.gar

Note:

The cache configuration file that is packaged in a GAR file takes precedence over a cache configuration file that is located on the classpath.

The COHERENCE_HOME\bin\cache-server script is provided as a convenience and can start a cache server instance. The script is available for both Windows (cache-server.cmd) and UNIX-based platforms (cache-server.sh). The script sets up a basic environment and then runs the DefaultCacheServer class. The scripts are typically modified as required for a particular cluster.

Tip:

During testing, it is sometimes useful to create multiple scripts with different names that uniquely identify each cache server. For example: cahe-server-a, cache-server-b, and so on.

Lastly, a cache server can be started on the command line by using the java -jar command with the coherence.jar library. Cache servers are typically started this way for testing and demonstration purposes. For example:

java -jar COHERENCE_HOME\lib\coherence.jar

Starting Cache Servers Programmatically

An application can use or extend the DefaultCacheServer class as required when starting a cache server. For example, an application may want to do some application-specific setup or processing before starting a cache server and its services.

The following example starts a cache server using the main method:

String[] args = new String[]{"my-cache-config.xml", "5"};
DefaultCacheServer.main(args);

The DefaultCacheServer(ConfigurableCacheFactory) constructor uses a factory class to create a cache server instance that uses a specified cache configuration file. The following example uses the ExtensibleConfigurableCacheFactory implementation and creates a DefaultCacheServer instance and also uses the startAndMonitor(long) method to start a cache server as in the previous example:

ExtensibleConfigurableCacheFactory.Dependencies deps =
   ExtensibleConfigurableCacheFactory.DependenciesHelper.newInstance("my-cache-co
      nfig.xml");
 
ExtensibleConfigurableCacheFactory factory;
factory = new ExtensibleConfigurableCacheFactory(deps);
DefaultCacheServer dcs = new DefaultCacheServer(factory);
dcs.startAndMonitor(5000);

The static method startDaemon() method starts a cache server on a dedicated daemon thread and is intended for use within managed containers.

Two additional static start methods (start() and start(ConfigurableCacheFactory)) are also available to start a cache server and return control. However, the cache factory class is typically used instead of these methods, which remain for backward compatibility.

Applications that require even more fine-grained control can subclass the DefaultCacheServer class and override its methods to perform any custom processing as required.

Starting Cache Clients

Cache clients are cluster members that join the cluster to interact with the cluster's services.Cache clients can be as simple as an application that gets and puts data in a cache or can be as complex as a data grid compute application that processes data that is in a cache. The main difference between a cache client and a cache server is that cache clients are generally not responsible for cluster storage.

This section includes the following topics:

Disabling Local Storage

Cache clients that use the partition cache service (distributed caches) should not maintain any partitioned data. Cache clients that have storage disabled perform better and use less resources. Partitioned data should only be distributed among cache server instances.

Local storage is disabled on a per-process basis using the coherence.distributed.localstorage system property. This allows cache clients and servers to use the same configuration descriptors. For example:

java -cp COHERENCE_HOME\config;COHERENCE_HOME\lib\coherence.jar -Dcoherence.distributed.localstorage=false com.MyApp

Using the CacheFactory Class to Start a Cache Client

Applications that use the com.tangosol.net.CacheFactory class to get an instance of a cache become cluster members and are considered cache clients. The following example demonstrates the most common way of starting a cache client:

CacheFactory.ensureCluster();
NamedCache cache = CacheFactory.getCache("cache_name");

When starting an application that is a cache client, use the Java -cp option to indicate the location of the coherence.jar file and the location where the tangosol-coherence-override.xml and coherence-cache-config.xml files are located. The location of the configuration files must precede the coherence.jar file on the classpath; otherwise, the default configuration files that are located in the coherence.jar file are used to start the cache server instance. See Understanding Configuration.

The following example starts an application that is a cache client, uses any configuration files that are placed in the COHERENCE_HOME\config directory, and disables storage on the member.

java -cp COHERENCE_HOME\config;COHERENCE_HOME\lib\coherence.jar -Dcoherence.distributed.localstorage=false com.MyApp

The COHERENCE_HOME\bin\coherence script is provided for testing purposes and can start a cache client instance. The script is available for both Windows (coherence.cmd) and UNIX-based platforms (coherence.sh). The script sets up a basic environment, sets storage to be disabled, and then runs the CacheFactory class, which returns a prompt. The prompt is used to enter commands for interacting with a cache and a cluster. The scripts are typically modified as required for a particular cluster. The class can also be started directly from the command line instead of using the script. For example:

java -cp COHERENCE_HOME\config;COHERENCE_HOME\lib\coherence.jar -Dcoherence.distributed.localstorage=false com.tangosol.net.CacheFactory

If a Coherence application is packaged as a GAR, the GAR can be loaded by the CacheFactory instance using the server command at the prompt after the client member starts.

server [<path-to-gar>] [<app-name>]

The following example loads the Coherence application artifacts that are packaged in the MyGar.gar file. The default name (MyGAR) is used as the application name.

Map (?) server D:\example\MyGAR.gar

Stopping Cluster Members

You can stop cluster members from the command line or programmatically.

This section includes the following topics:

Prerequisites for Stopping All Cluster Members

To ensure there is no data loss during a controlled shutdown, you can leverage the service suspend feature. A service is considered suspended only after all the data is fully written, including active persistence mode, asynchronous persistence tasks, entries in the write-behind queue of a read-write backing map, and other asynchronous operations. Outstanding operations are completed and no new operations are allowed against the suspended services.

Thus, for a controlled complete shutdown of a cluster, Oracle recommends to execute the Coherence ClusterMBean operation suspendService("Cluster"), which shuts down all services gracefully before shutting down the cluster members.

Stopping Cluster Members From the Command Line

Cluster members are most often shutdown using the kill command when on the UNIX platform and Ctrl+c when on the Windows platform. These commands initiate the standard JVM shutdown hook which is invoked upon normal JVM termination.

Note:

Issuing the kill -9 command triggers an abnormal JVM termination and the shutdown hook does not run. However, a graceful shutdown is generally not required if a service is known to be node-safe (as seen using JMX management) before termination.

The action a cluster member takes when receiving a shutdown command is configured in the operational override file within the <shutdown-listener> element. The following options are available:

  • none - Perform no explicit shutdown actions.

  • force - Perform a hard-stop on the node by calling Cluster.stop().

  • graceful - (Default) Perform a normal shutdown by calling Cluster.shutdown().

  • true - Same as force.

  • false - Same as none.

The coherence.shutdown.timeout system property configures the duration to wait for shutdown to complete before timing out. The default value is 2 minutes. If a Coherence application uses persistence, write-behind cache store, or any other asynchronous operation that takes longer than 2 minutes, then it is necessary to configure the timeout to be longer to allow the pending asynchronous operations to complete. The system property value is configured as a time duration such as "3s" for 3 seconds, "5m" for 5 minutes, or "1hr" for 1 hour.

When the time duration to complete graceful shutdown exceeds the coherence.shutdown.timeout time, the JVM process is considered hung and the JVM process terminates abruptly using halt. There exists a chance that not all outstanding asynchronous operations completed when the shutdown times out. Therefore, it is important to ensure that coherence.shutdown.timeout is configured to a time duration that is sufficiently long for the number of outstanding asynchronous operations an application environment may have.

The following example sets the shutdown hook to none.

<?xml version='1.0'?>

<coherence xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-operational-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-operational-config
   coherence-operational-config.xsd">
   <cluster-config>
      <shutdown-listener>
         <enabled system-property="coherence.shutdownhook">none</enabled>
      </shutdown-listener>
   </cluster-config>
</coherence>

The coherence.shutdownhook system property is used to specify the shutdown hook behavior instead of using the operational override file. For example:

-Dcoherence.shutdownhook=none

Stopping Cache Servers Programmatically

The DefaultCacheServer class provides two methods that are used to shutdown a cache server:

Note:

Shutdown is supposed to be called in a standalone application where it shuts down the instance which the DefaultCacheServer class itself maintains as a static member.

  • shutdown() – This is a static method that is used to shut down a cache server that was started on a different thread using the DefaultCacheServer.main() or DefaultCacheServer.start() methods.

  • shutdownServer() – This method is called on a DefaultCacheServer instance which an application keeps hold of.

Using the Bootstrap API

Coherence has a simple bootstrap API that enables you to configure and start a Coherence application by building a com.tangol.net.Coherence instance and starting this instance.

The Coherence instance provides access to one or more com.tangosol.net.Session instances. A com.tangosol.net.Session gives access to Coherence clustered resources, such as NamedMap, NamedCache, NamedTopic, and so on. Sessions can be of different types. For example, a session can be:
  • related to a ConfigurableCacheFactory.
  • configured from a configuration file.
  • a client-side gRPC session.

Example 7-1 Application Bootstrap Code

import com.tangosol.net.Coherence;
import com.tangosol.net.CoherenceConfiguration;
import com.tangosol.net.SessionConfiguration;

public class Main
    {
    public static void main(String[] args)
        {
        // Create a session configuration                                
        SessionConfiguration session = SessionConfiguration.builder()
                .named("Carts")
                .withConfigUri("cache-config.xml")
                .build();

        // Create a Coherence instance configuration                     
        CoherenceConfiguration cfg = CoherenceConfiguration.builder()
                .withSession(SessionConfiguration.defaultSession())
                .withSession(session)
                .build();

        // Create the Coherence instance from the configuration          
        Coherence coherence = Coherence.clusterMember(cfg);

        // Start Coherence                                               
        coherence.start();
        }
    }
In this example:
  • The 'Create a session configuration' part creates SessionConfiguration from the cache-config.xml configuration file with the Session name Carts.
  • The 'Create a Coherence instance configuration' part creates CoherenceConfiguration to configure the Coherence instance. This configuration contains the Carts session configuration.
  • The 'Create the Coherence instance from the configuration' part creates the Coherence cluster member instance from CoherenceConfiguration.
  • The 'Start Coherence' part starts the Coherence instance.

This section includes the following topics:

Running a Coherence Server

The com.tangol.net.Coherence contains a main method that you can use to run a Coherence server. This method is a more powerful alternative to DefaultCacheServer.
$ java -cp coherence.jar com.tangosol.net.Coherence

Without any other configuration, the default Coherence instance started using the above command runs a server that is identical to the server that is run using DefaultCacheServer.

Session Configurations

You can create a create a session configuration named SessionConfiguration by using the SessionConfiguration builder. See the example in Using the Bootstrap API.

This section includes the following topics:

About the Default Session

When running Coherence, if you have not specified any configuration, the default configuration file is used to configure Coherence. This behavior continues to apply to the bootstrap API.

If you start a Coherence instance without specifying any session configurations, it creates a single default Session. This default Session wraps the cache factory interface ConfigurableCacheFactory that is created from the default configuration file. See Interface ConfigurableCacheFactory. The default file name of the default configuration file is coherence-cache-config.xml unless the name is overridden with the coherence.cacheconfig system property.

When creating a Coherence configuration instance named CoherenceConfiguration, you can add the default session by using the SessionConfiguration.defaultSession() helper method. This method returns a SessionConfiguration that is configured to create the default session Session.

In the following example, the default session configuration is specifically added to CoherenceConfiguration:
CoherenceConfiguration cfg = CoherenceConfiguration.builder()
        .withSession(SessionConfiguration.defaultSession())
        .build();
Naming a Session

All sessions have a name that must be unique within the application. If you have not specified a name when the SessionConfiguration is built, the default name of $Default$ will be used. A Coherence instance fails to start if duplicate Session names exist.

Examples:

  • This configuration will have the default name ($Default$):
    SessionConfiguration session = SessionConfiguration.builder()
            .build();
  • This configuration will have the name Test:
    SessionConfiguration session = SessionConfiguration.builder()
            .name("Test")
            .build();
Specifying the Session Configuration URI

The most common type of session is a wrapper around ConfigurableCacheFactory. When using the SessionConfiguration builder, you can specify the configuration file URI by using the withConfigUri() method. This method accepts a string value for specifying the location of the configuration file.

The following example uses the configuration file named cache-config.xml:
SessionConfiguration session = SessionConfiguration.builder()
        .withConfigUri("cache-config.xml")
        .build();

If you do not specify a configuration URI, the default value will be used. The default value is coherence-cache-config.xml unless this value is overridden with the coherence.cacheconfig system property.

Adding Session Event Interceptors

Coherence provides many types of events. For example, life-cycle events for Coherence itself, cache life-cycle events, cache entry events, partition events, and so on. These events can be listened to by implementing an EventInterceptor that receives specific types of event. Event interceptors can be registered with a Session as part of its configuration.

For example, suppose the application has an interceptor class called CacheInterceptor that listens to the CacheLifecycleEvent event when caches get created or destroyed. You can add this interceptor to the session as shown below:
SessionConfiguration session = SessionConfiguration.builder()
        .withInterceptor(new CacheInterceptor())
        .build();

The interceptor will receive cache life-cycle events for all caches that are created using the session.

Scoping a Session Configuration

Scope is a concept that helps you scope services and isolate them from other services with the same name. For example, you can have multiple ConfigurableCacheFactory instances loaded from the same XML configuration file but with different scope names. Scoping ensures that each ConfigurableCacheFactory instance has its own services in the cluster.

Unless you require multiple sessions, a scope will not generally be used in a configuration.

You can configure a scope for a session using the configuration’s withScopeName() method, as shown in the following example:
SessionConfiguration session = SessionConfiguration.builder()
        .withScopeName("Test")
        .build();

The session (and any ConfigurableCacheFactory it wraps) created from the above configuration will have a scope name of Test.

You can also set a scope name in the <defaults> section of the XML configuration file. An example of scoped-configuration.xml file is shown below:
<?xml version="1.0"?>
<cache-config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xmlns="http://xmlns.oracle.com/coherence/coherence-cache-config"
              xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-cache-config coherence-cache-config.xsd">

  <defaults>
    <scope-name>Test</scope-name>
  </defaults>

A ConfigurableCacheFactory instance created from this XML file, and any Session that wraps it, will have a scope name of Test.

Note:

When using the bootstrap API, any scope name specifically configured in SessionConfiguration (that is not the default scope name) overrides the scope name in the XML file.

For example, the following scenarios show the different ways of defining the scope name using the scoped-configuration.xml file (see the above example):

  • In this case, the scope name will be Foo because the scope name has been explicitly set in SessionConfiguration:
    SessionConfiguration session = SessionConfiguration.builder()
            .withConfigUri("scoped-configuration.xml")
            .withScopeName("Foo")
            .build();
  • In this case, the scope name will be Foo because, although no scope name has been explicitly set in SessionConfiguration, the name has been set to Foo. Therefore, the scope name will default to Foo.
    SessionConfiguration session = SessionConfiguration.builder()
            .named("Foo")
            .withConfigUri("scoped-configuration.xml")
            .build();
  • In this case, the scope name will be Test as no scope name or session name has been explicitly set in SessionConfiguration. Therefore, the scope name of Test will be used from the XML configuration.
    SessionConfiguration session = SessionConfiguration.builder()
            .withConfigUri("scoped-configuration.xml")
            .build();
  • In this case, the scope name will be Test as the session name has been set to Foo. However, the scope name has been explicitly set to the default scope name using the constant Coherence.DEFAULT_SCOPE. Therefore, the scope name of Test will be used from the XML configuration.
    SessionConfiguration session = SessionConfiguration.builder()
            .named("Foo")
            .withScopeName(Coherence.DEFAULT_SCOPE)
            .withConfigUri("scoped-configuration.xml")
            .build();

Configuring a Coherence Instance

You can start a Coherence application by creating a Coherence instance from CoherenceConfiguration. An instance of CoherenceConfiguration is created using the builder. For example:
CoherenceConfiguration cfg = CoherenceConfiguration.builder()
        .build();

This section includes the following topics:

Adding Sessions

A Coherence instance manages one or more Session instances. You can add these session instances to CoherenceConfiguration by adding the SessionConfiguration instances to the builder.

If you do not add any sessions to the builder, the Coherence instance runs a single Session that uses the default configuration file.
CoherenceConfiguration cfg = CoherenceConfiguration.builder()
        .build();

The above configuration configures a Coherence instance with the default name and with a single Session that uses the default configuration file.

You can also explicitly add the default session to CoherenceConfiguration:
CoherenceConfiguration cfg = CoherenceConfiguration.builder()
        .withSession(SessionConfiguration.defaultSession())
        .build();
You may also add other session configurations to CoherenceConfiguration:
SessionConfiguration session = SessionConfiguration.builder()
        .named("Carts")
        .withConfigUri("cache-config.xml")
        .build();

CoherenceConfiguration cfg = CoherenceConfiguration.builder()
        .withSession(session)
        .build();

While there is no limit to the number of sessions you can configure, the majority of applications require only a single session, which is most likely the default session.

Configuring a Session for Auto-Discovery

You can configure a CoherenceConfiguration to automatically discover the SessionConfiguration instances. These instance are discovered using the Java ServiceLoader. Any instances of SessionConfiguration or SessionConfiguration.Provider that are configured as services in the META-INF/services/ files, will be loaded.

This feature is useful if you are building modular applications where you want to include the functionality in a separate application module that uses its own Session. The SessionConfiguration instance for the module is made discoverable by the Java ServiceLoader. When the module’s jar file is on the classpath, the Session gets created and the module’s functionality becomes available to the application.

For example:
CoherenceConfiguration cfg = CoherenceConfiguration.builder()
        .discoverSessions() 
        .build();

In this example, the call to discoverSessions() loads the SessionConfiguration instances that have been discovered by the Java ServiceLoader.

Naming Coherence Instances

Each Coherence instance must be uniquely named. You can specify a name by using the named() method on the builder. If you do not specify a name, then the default name of $Default$ will be used.

In the majority of use-cases, an application requires only a single Coherence instance. Therefore, there will be no requirement to specify a name.

In the following code, the configuration creates a Coherence instance with the name Carts.
CoherenceConfiguration cfg = CoherenceConfiguration.builder()
        .named("Carts")
        .build();
Adding Global Event Interceptors

You can add event interceptors to a SessionConfiguration instance to receive events for a session. See Adding Session Event Interceptors. Event interceptors can also be added to the Coherence instance to receive events for all Session instances managed by that Coherence instance.

For example, if you reuse the CacheInterceptor class for caches in all sessions, then the interceptor receives events for both the default session and the Certs session:
SessionConfiguration cartsSession = SessionConfiguration.builder()
         .named("Carts")
         .withConfigUri("cache-config.xml")
         .build();

CoherenceConfiguration cfg = CoherenceConfiguration.builder()
        .withSession(SessionConfiguration.defaultSession())
        .withSession(cartsSession)
        .withInterceptor(new CacheInterceptor())
        .build();

Creating a Coherence Instance

You can use the CoherenceConfiguration instance to create a Coherence instance.

You can create a Coherence instance in one of the following two modes:
  • cluster member
  • client

The mode chosen affects how some types of Session are created and whether auto-start services have started.

As the name suggests, a "cluster member" is a Coherence instance that expects to start or join a Coherence cluster. In a cluster member, any Session that wraps a ConfigurableCacheFactory has its services auto-started and monitored (this is the same behavior as seen when yo use the DefaultCacheServer to start a server).

A "client" Coherence instance is typically not a cluster member, it is a Coherence*Extend or a gRPC client. As such, Session instances that wrap a ConfigurableCacheFactory are not auto-started. Instead, they start on demand as resources such as maps, caches, or topics are requested from them.

The com.tangosol.net.Coherence class has static factory methods to create Coherence instances in different modes.

Examples:
  • To create a Coherence instance that is a cluster member, use the Coherence.clusterMember method, as shown below:
    CoherenceConfiguration cfg = CoherenceConfiguration.builder()
            .build();
    
    Coherence coherence = Coherence.clusterMember(cfg);
  • To create a Coherence instance that is a client, use the Coherence.client method, as shown below:
    CoherenceConfiguration cfg = CoherenceConfiguration.builder()
            .build();
    
    Coherence coherence = Coherence.client(cfg);
    

This section includes the following topic:

Creating a Default Coherence Instance
You can create a Coherence instance without specifying any configuration.
Coherence coherence = Coherence.clusterMember();
Coherence coherence = Coherence.client();

In the above examples, the Coherence instance will have the default Session and any discovered sessions.

Starting Coherence

A Coherence instance must be started to start all the sessions that the Coherence instance is managing. You can start the Coherence instance by calling the start() method, as shown below:
Coherence coherence = Coherence.clusterMember(cfg);
coherence.start();

Obtaining a Coherence Instance

To avoid having to pass around the instance of Coherence that was used to bootstrap an application, the Coherence class has some static methods that make it simple to retrieve an instance.

If only a single instance of Coherence is being used in an application (which is true for most use-cases), use the getInstance() method:
Coherence coherence = Coherence.getInstance();
You can also retrieve an instance by name by configuring it as shown below:
CoherenceConfiguration cfg = CoherenceConfiguration.builder()
        .named("Carts")
        .build();

Coherence.create(cfg);
And then retrieve the instace:
Coherence coherence = Coherence.getInstance("Carts");

Ensuring Coherence Has Started

If an application code needs to ensure that a Coherence instance has started before doing work, use the whenStarted() method. This method obtains a CompletableFuture that will be completed when the Coherence instance starts.

For example:
Coherence               coherence = Coherence.getInstance("Carts");
CompletableFuture<Void> future    = coherence.whenStarted();

future.join();

There is also a corresponding whenStopped() method that returns a future that will be completed when the Coherence instance stops.

Adding Coherence Lifecycle Interceptors

Besides using the future methods as described in Ensuring Coherence Has Started, you can add an EventInterceptor to the configuration of a Coherence instance to receive life-cycle events.

Following is an example interceptor that implements Coherence.LifecycleListener.
public class MyInterceptor implements Coherence.LifecycleListener {
    public void onEvent(CoherenceLifecycleEvent event) {
        // process event
    }
}
You can add the interceptor to the following configuration:
CoherenceConfiguration cfg = CoherenceConfiguration.builder()
        .withSession(SessionConfiguration.defaultSession())
        .withInterceptor(new MyInterceptor())
        .build();

When you start or stop a Coherence instance created from this configuration, the MyInterceptor instance starts receiving the life-cycle events.

Performing a Rolling Restart

A rolling restart is a technique for restarting cache servers in a cluster that ensures no data is lost during the restart. A rolling restart allows the data on a cache server to be redistributed to other cache servers on the cluster while the cache server is restarted. Each cache server in the cluster can be restarted in turn to effectively restart the whole cluster.

Rolling restarts are commonly performed when a cache server or its host computer must be updated or when upgrading a cache server to a new patch set release or patch set update release. However, the technique can also be used whenever you want to restart a cache server that is currently managing a portion of cached data.

Note:

When upgrading a cluster, a rolling restart can only be used to upgrade patch set releases or patch set update releases, but not major or minor releases. See Release Number Format in Administering Oracle Fusion Middleware.

This section includes the following topics:

Prerequisites for a Rolling Restart

A rolling restart requires initial considerations and setup before you restart a cluster. A rolling restart cannot be performed on a cluster that does not meet the following prerequisites:

  • The cache servers in a cluster must provide enough capacity to handle the shutdown of a single cache server (n minus 1 where n is the number of cache servers in the cluster). An out-of-memory exception or data eviction can occur during a redistribution of data if the cache servers are running at capacity. See Cache Size Calculation Recommendations in Administering Oracle Coherence.

  • Remote JMX management must be enabled on all cache servers and at least two cache servers must contain an operational MBean server. Ensure that you can connect to the MBean servers using an MBean browser such as JConsole. See Using JMX to Manage Oracle Coherence in Managing Oracle Coherence.

  • Evaluate if your environment requires lambda serialization mode to be set to dynamic for the rolling upgrade to succeed properly. See Considerations for a Rolling Upgrade to evaluate if your environment requires dynamic lambda serialization mode for rolling upgrade. See Configuring Lambda Serialization Mode on how to override Coherence production mode and configure the dynamic lambda serialization mode.

    Note:

    As of Oracle Coherence release 14.1.1.2206, Coherence production mode defaults to static lambda serialization by default. Prior to this release, Oracle Coherence 12.2.1.0.0 and higher have been running solely in the dynamic serialization mode.

If a cache service is configured to use asynchronous backups, use the shutdown method to perform an orderly shut down instead of the stop method or kill -9. Otherwise, a member may shutdown before asynchronous backups are complete. The shutdown method guarantees that all updates are complete.

If using persistence, while a rolling restart is agnostic to the format of the data stored on disk, there are scenarios where you can take precautionary steps to ensure that there is a 'savepoint' to which the cluster can be rolled back if faced with a catastrophic event. On-disk persisted files may become unreadable when going from a later to an earlier version.

Therefore, Oracle recommends you to perform a persistent snapshot of the relevant services before the roll. See Using Snapshots to Persist a Cache Service. This can be performed while suspending the service (to ensure global consistency) or not based on the tolerance of the application. The snapshot offers a 'savepoint' to which the cluster can be rolled back if an issue occurs during the roll.

Note:

Coherence may change the format of the data saved on disk. When rolling from a version that includes a change in the storage format, Oracle strongly recommends creating a snapshot before roll/upgrade.

Restarting Cache Servers for a Rolling Restart

Use these instructions to restart a cache server. If you are restarting the host computer, then make sure all cache server processes are shutdown before shutting down the computer.

Note:

The instructions below assume that none of the cache servers share the same Coherence JAR or ORACLE_HOME. If some of the cache servers share the same Coherence JAR or ORACLE_HOME, then treat the servers as a logical unit and perform the steps on each of the servers at the same time.

To restart a cache server:

  1. Connect to a Coherence MBean server using an MBean browser. Ensure that the MBean server is not hosted on the cache server that is being restarted.
  2. From the Coherence Service MBean, select a cluster service that corresponds to a cache that is configured in the cache configuration file.
  3. Check the StatusHA attribute for any cluster member to ensure that the attribute's value is MACHINE-SAFE. The MACHINE-SAFE state indicates that all the cache servers running on any given computer could be stopped without data loss. If the attribute is not MACHINE-SAFE, then additional cache servers, possibly on different computers, must be started before performing a restart. See ServiceMBean in Managing Oracle Coherence.
  4. Shutdown the cache server.
  5. From the MBean browser, recheck the StatusHA attribute and wait for the state to return to MACHINE-SAFE.
  6. Restart the cache server.
  7. From the MBean browser, recheck the StatusHA attribute and wait for the state to return to MACHINE-SAFE.
  8. Repeat steps 4 to 7 for additional cache servers that are to be restarted.

Persistence Roll Back After or During a Rolling Restart

If an issue is observed during the rolling restart, it is advisable to roll back the entire cluster. You can roll back the cluster by rolling the nodes with the new version of the application (and/or Coherence) back to their prior versions, or in more severe cases, a complete restart of the cluster.

If the new version being migrated to includes a Coherence patch that modifies the storage format (as highlighted in the Oracle Coherence Release Notes) the persistent stores could be in a mixed state of the old and new format.

During a full cluster restart, the mixed state can result in a failure to recover, causing the failed stores being moved into the trash. At this point, you can choose the forceRecovery operation of the PersistenceManagerMBean (in JConsole, for example, Coherence/Persistence/<servicename>/PersistenceCoordinator) to force Coherence to reinstate empty versions of those partitions that could not be recovered. If you have created a snapshot of the relevant services, it is possible to recover the snapshot, reverting the state of those services.