6 Persisting Caches
This chapter includes the following sections:
- Overview of Persistence
Coherence persistence is a set of tools and technologies that manage the persistence and recovery of Coherence distributed caches. - Persistence Dependencies
Persistence is only available for distributed caches and requires the use of a centralized partition assignment strategy. - Persisting Caches on Demand
Caches can be persisted to disk at any point in time and recovered as required. - Actively Persisting Caches
Caches can be automatically persisted to disk and automatically recovered when a cluster is restarted. - Using Snapshots to Persist a Cache Service
Snapshots are a backup of the contents of a cache service that must be manually managed using thePersistenceManagerMBean
MBean. - Archiving Snapshots
Snapshots can be archived to a central location and then later retrieved and restored. Archiving snapshots requires defining the directory where archives are stored and configuring cache services to use an archive directory. - Using Active Persistence Mode
You can enable and configure active persistence mode to have the contents of a cache automatically persisted and recovered. - Using Asynchronous Persistence Mode
You can enable and configure asynchronous persistence mode to have the storage servers to persist data asynchronously. - Modifying the Pre-Defined Persistence Environments
Persistence uses a set of directories for storage. You can choose to use the default storage directories or change the directories as required. - Creating Persistence Environments
You can choose to define and use multiple persistence environments to support different cache scenarios. - Using Quorum for Persistence Recovery
Coherence includes a quorum policy that enables Coherence to defer recovery until a suitable point. Suitability is based on availability of all partitions and sufficient capacity to initiate recovery across storage members. - Subscribing to Persistence JMX Notifications
ThePersistenceManagerMBean
MBean includes a set of notification types that applications can use to monitor persistence operations. - Managing Persistence
Persistence should be managed to ensure there is enough disk space and to ensure persistence operations do not add significant latency to cache operations. - Configuring Caches as Transient
Caches that do not require persistence can be configured as transient. Caches that are transient are not recovered during persistence recovery operations.
Parent topic: Advanced Administration
Overview of Persistence
This section includes the following topics:
- Persistence Modes
- Disk-Based Persistence Storage
- Persistence Configuration
- Management and Monitoring
Parent topic: Persisting Caches
Persistence Modes
Persistence can operate in two modes:
-
On-Demand persistence mode – a cache service is manually persisted and recovered upon request using the persistence coordinator. The persistence coordinator is exposed as an MBean interface that provides operations for creating, archiving, and recovering snapshots of a cache service.
-
Active persistence mode – In this mode, cache contents are automatically persisted on all mutations and are automatically recovered on cluster/service startup. The persistence coordinator can still be used in active persistence mode to perform on-demand snapshots.
Parent topic: Overview of Persistence
Disk-Based Persistence Storage
Persistence uses a database for the persistence store. The database is used to store the backing map partitions of a partitioned service. The locations of the database files can be stored on the local disk of each cache server or on a shared disk: Storage Area Network (SAN) or Network File System (NFS). See Plan for SAN/NFS Persistence Storage.
Note:
Database files should never be manually edited. Editing the database files can lead to persistence errors.
The local disk option allows each cluster member to access persisted data
for the service partitions that it owns. Persistence is coordinated across all storage
member using a list of cache server host addresses. The address list ensures that all
persisted partitions are discovered during recovery. Local disk storage provides a high
throughput and low latency storage mechanism; however, a partition service must still
rely on in-memory backup (backup-count
value greater than zero) to
remain machine safe.
The shared disk option, together with active persistence mode, allows each
cluster member to access persisted data for all service partitions. An advantage to
using a shared disk is that partitioned services do not require in-memory backup
(backup-count
value can be equal to zero) to remain machine-safe;
because, all storage-enabled members can recover partitions from the shared storage.
Disabling in-memory backup increases the cache capacity of the cluster at the cost of
higher latency recovery during node failure. In general, the use of a shared disk can
potentially affect throughput and latencies and should be tested and monitored
accordingly.
Note:
The service statusHA statistic shows an ENDAGERED
status when the backup count is set to zero even if persistence is being used to
replace in-memory backup.
Both the local disk and shared disk approach can rely on a quorum policy that controls how many cluster members must be present to perform persistence operations and before recovery can begin. Quorum policies allow time for a cluster to start before data recovery begins.
Parent topic: Overview of Persistence
Persistence Configuration
Persistence is declaratively configured using Coherence configuration files and requires no changes to application code. An operational override file is used to configure the underlying persistence implementation if the default settings are not acceptable. A cache configuration file is used to set persistence properties on a distributed cache.
Parent topic: Overview of Persistence
Management and Monitoring
Persistence can be monitored and managed using MBean attributes and
operations. Persistence operations such as creating and archiving snapshots are
performed using the PersistenceManagerMBean
MBean. Persistence
attributes are included as part of the attributes of a service and can be viewed using
the ServiceMBean
MBean.
Persistence attributes and statistics are aggregated in the
persistence
and persistence-details
reports.
Persistence statistics are also aggregated in the VisualVM plug-in. Both tools can help
troubleshoot possible resource and performance issues.
Parent topic: Overview of Persistence
Persistence Dependencies
Note:
Transactional caches do not support persistence.Distributed caches use a centralized partitioned assignment strategy by default. Although
uncommon, it is possible that an autonomous or a custom partition assignment strategy is
being used. Check the StrategyName
attribute on the
PartitionAssignment
MBean to verify the strategy that is currently
configured for a distributed cache. See Changing the Partition Distribution
Strategy in Developing Applications with Oracle Coherence.
Parent topic: Persisting Caches
Persisting Caches on Demand
To persist caches on demand:
- Use the persistence coordinator to create, recover, and remove snapshots. See Using Snapshots to Persist a Cache Service.
- Optionally, change the location where persistence files are written to disk. See Changing the Pre-Defined Persistence Directory.
- Optionally, configure the number of storage members that are required to perform recovery. See Using Quorum for Persistence Recovery.
Parent topic: Persisting Caches
Actively Persisting Caches
To actively persist caches:
- Enable active persistence. See Enabling Active Persistence Mode.
- Optionally, change the location where persistence files are written to disk. See Changing the Pre-Defined Persistence Directory.
- Optionally, change how a service responds to possible failures during active persistence. See Changing the Active Persistence Failure Response.
- Optionally, configure the number of storage members that are required to perform recovery. See Using Quorum for Persistence Recovery.
Parent topic: Persisting Caches
Using Snapshots to Persist a Cache Service
PersistenceManagerMBean
MBean.
Note:
The instructions in this section were created using the VisualVM-MBeans plug-in for the Coherence VisualVM tool. The Coherence VisualVM Plug-in can also be used to perform snapshot operations.
This section includes the following topics:
Create a Snapshot
Creating snapshots writes the contents of a cache service to the snapshot directory that is specified within the persistence environment definition in the operational override configuration file. A snapshot can be created either on a running service (a service that is accepting and processing requests) or on a suspended service. The former provides consistency at a partition level while the latter provides global consistency.
By default, the snapshot operation assumes partition level consistency. To achieve global consistency, the service must be explicitly suspended which causes any requests into the service to be blocked.
Snapshot with Partition Consistency
To create a snapshot with partition consistency:
-
From the list of MBeans, select and expand the Persistence node.
-
Expand a service for which you want to create a snapshot and select PersistenceCoordinator.
-
From the Operations tab, enter a name for the snapshot in the field for the createSnapshot operation.
-
Click createSnapshot.
Snapshot with Global Consistency
To create a snapshot with global consistency:
-
From the list of MBeans, select the ClusterMBean node.
-
From the Operations tab, enter the name of the service that you want to suspend in the field for the suspendService operation.
-
Click suspendService.
-
From the list of MBeans, select and expand the Persistence node.
-
Expand the service (now suspended) for which you want to create a snapshot and select PersistenceCoordinator.
-
From the Operations tab, enter a name for the snapshot in the field for the createSnapshot operation.
-
Click createSnapshot.
Note:
Applications can be notified when the operation completes by subscribing to the snapshot JMX notifications. See Subscribing to Persistence JMX Notifications.
-
From the list of MBeans, select the ClusterMBean node.
-
From the Operations tab, enter the name of the service that you want to resume in the field for the resumeService operation.
-
Click resumeService.
Parent topic: Using Snapshots to Persist a Cache Service
Recover a Snapshot
Recovering snapshots restores the contents of a cache service from a snapshot.
Note:
A Coherence service recovered from a persistent snapshot is not propagated to federated clusters. The data on the originating cluster is recovered but the cache data on the destination cluster remains unaffected and may still contain the data that was present prior to the recovery. To propagate the snapshot data, a federation ReplicateAll
operation is required after the snapshot recovery is completed. The ReplicateAll
operation is available on the FederationManagerMBean
MBean. See FederationManagerMBean Operations in Managing Oracle Coherence.
To recover a snapshot:
Parent topic: Using Snapshots to Persist a Cache Service
Remove a Snapshot
Removing a snapshot deletes the snapshot from the snapshot directory. The cache service remains unchanged.
To remove a snapshot:
- From the list of MBeans, select and expand the Persistence node.
- Expand a service for which you want to remove a snapshot and select PersistenceCoordinator.
- From the Operations tab, enter the name of a snapshot in the field for the removeSnapshot operation.
- Click removeSnapshot.
Parent topic: Using Snapshots to Persist a Cache Service
Archiving Snapshots
PersistenceManagerMBean
MBean. An archive is slower to create than snapshots but, unlike snapshots, the archive is portable.
This section includes the following topics:
- Defining a Snapshot Archive Directory
- Specifying a Directory Snapshot Archiver
- Performing Snapshot Archiving Operations
- Creating a Custom Snapshot Archiver
Parent topic: Persisting Caches
Defining a Snapshot Archive Directory
The directory where snapshots are archived is defined in the operational override file using a directory snapshot archiver definition. Multiple definitions can be created as required.
Note:
The archive directory location and name must be the same across all members. The archive directory location must be a shared directory and must be accessible to all members.
To define a snapshot archiver directory, include the <directory-archiver>
element within the <snapshot-archivers>
element. Use the <archiver-directory>
element to enter the directory where snapshot archives are stored. Use the id
attribute to provide a unique name for the definition. For example:
<snapshot-archivers> <directory-archiver id="archiver1"> <archive-directory>/mydirectory</archive-directory> </directory-archiver> </snapshot-archivers>
Parent topic: Archiving Snapshots
Specifying a Directory Snapshot Archiver
To specify a directory snapshot archiver, edit the persistence definition within a distributed scheme and include the name of a directory snapshot archiver that is defined in the operational override configuration file. For example:
<distributed-scheme> <scheme-name>distributed</scheme-name> <service-name>Service1</service-name> <backing-map-scheme> <local-scheme/> </backing-map-scheme> <persistence> <archiver>archiver1</archiver> </persistence> <autostart>true</autostart> </distributed-scheme>
Parent topic: Archiving Snapshots
Performing Snapshot Archiving Operations
Snapshot archiving is manually managed using the PersistenceManagerMBean
MBean. The MBean includes asynchronous operations to archive and retrieve snapshot archives and also includes operations to list and remove archives.
This section includes the following topics:
- Archiving a Snapshot
- Retrieving Archived Snapshots
- Removing Archived Snapshots
- Listing Archived Snapshots
- Listing Archived Snapshot Stores
Parent topic: Archiving Snapshots
Retrieving Archived Snapshots
To retrieve an archived snapshot:
- From the list of MBeans, select and expand the Persistence node.
- Expand a service for which you want to retrieve an archived snapshot and select PersistenceCoordinator.
- From the Operations tab, enter the name of an archived snapshot in the field for the retrieveArchivedSnapshot operation.
- Click retrieveArchivedSnapshot. The archived snapshot is copied from the directory archiver location to the snapshot directory and is available to be recovered to the service backing map. See Recover a Snapshot.
Parent topic: Performing Snapshot Archiving Operations
Removing Archived Snapshots
To remove an archived snapshot:
- From the list of MBeans, select and expand the Persistence node.
- Expand a service for which you want to purge an archived snapshot and select PersistenceCoordinator.
- From the Operations tab, enter the name of an archived snapshot in the field for the removeArchivedSnapshot operation.
- Click removeArchivedSnapshot. The archived snapshot is removed from the archive directory.
Parent topic: Performing Snapshot Archiving Operations
Listing Archived Snapshots
To get a list of the current archived snapshots:
- From the list of MBeans, select and expand the Persistence node.
- Expand a service for which you want to list archived snapshots and select PersistenceCoordinator.
- From the Operations tab, click the
listArchivedSnapshots
operation. A list of archived snapshots is returned.
Parent topic: Performing Snapshot Archiving Operations
Listing Archived Snapshot Stores
To list the individual stores, or parts of and archived snapshot:
- From the list of MBeans, select and expand the Persistence node.
- Expand a service for which you want to list archived snapshot stores and select PersistenceCoordinator.
- From the Operations tab, enter the name of an archived snapshot in the field for the listArchivedSnapshotStores operation.
- Click listArchivedSnapshotStores. A list of stores for the archived snapshots is returned.
Parent topic: Performing Snapshot Archiving Operations
Creating a Custom Snapshot Archiver
Custom snapshot archiver implementations can be created as required to store archives using an alternative technique than the default directory snapshot archiver implementation. For example, you may want to persist archives to an external database, use a web service to store archives to a storage area network, or store archives in a content repository.
This section includes the following topics:
- Create a Custom Snapshot Archiver Implementation
- Create a Custom Snapshot Archiver Definition
- Specifying a Custom Snapshot Archiver
Parent topic: Archiving Snapshots
Create a Custom Snapshot Archiver Implementation
To create a custom snapshot archiver implementation, create a class that extends the AbstractSnapshotArchiver
class.
Parent topic: Creating a Custom Snapshot Archiver
Create a Custom Snapshot Archiver Definition
To create a custom snapshot archiver definition, include the <custom-archiver>
element within the <snapshot-archivers>
element and use the id
attribute to provide a unique name for the definition. Add the <class-name>
element within the <custom-archiver>
element that contains the fully qualified name of the implementation class. The following example creates a definition for a custom implementation called MyCustomArchiver
:
<snapshot-archivers> <custom-archiver id="custom1"> <class-name>package.MyCustomArchiver</class-name> </custom-archiver> </snapshot-archivers>
Use the <class-factory-name>
element if your implementation uses a factory class that is responsible for creating archiver instances. Use the <method-name>
element to specify the static factory method on the factory class that performs object instantiation. The following example gets a snapshot archiver instance using the getArchiver
method on the MyArchiverFactory
class.
<snapshot-archivers> <custom-archiver id="custom1"> <class-factory-name>package.MyArchiverFactory</class-factory-name> <method-name>getArchiver</method-name> </custom-archiver> </snapshot-archivers>
Any initialization parameters that are required for an implementation can be specified using the <init-params>
element. The following example sets the UserName
parameter to Admin
.
<snapshot-archivers> <custom-archiver id="custom1"> <class-name>package.MyCustomArchiver</class-name> <init-params> <init-param> <param-name>UserName</param-name> <param-value>Admin</param-value> </init-param> </init-params> </custom-archiver> </snapshot-archivers>
Parent topic: Creating a Custom Snapshot Archiver
Specifying a Custom Snapshot Archiver
To specify a custom snapshot archiver, edit the persistence definition within a distributed scheme and include the name of a custom snapshot archiver that is defined in the operational override configuration file. For example:
<distributed-scheme> <scheme-name>distributed</scheme-name> <service-name>Service1</service-name> <backing-map-scheme> <local-scheme/> </backing-map-scheme> <persistence> <archiver>custom1</archiver> </persistence> <autostart>true</autostart> </distributed-scheme>
Parent topic: Creating a Custom Snapshot Archiver
Using Active Persistence Mode
This section includes the following topics:
- Enabling Active Persistence Mode
- Changing the Active Persistence Failure Response
- Changing the Partition Count When Using Active Persistence
Parent topic: Persisting Caches
Enabling Active Persistence Mode
Active persistence can be enabled for all services or for specific services. To enable active persistence for all services, set the coherence.distributed.persistence.mode
system property to active
. For example:
-Dcoherence.distributed.persistence.mode=active
The default value if no value is specified is on-demand
, which enables on-demand persistence. The persistence coordinator can still be used in active persistence mode to take snapshots of a cache.
To enable active persistence for a specific service, modify a distributed scheme definition and include the <environment>
element within the <persistence>
element. Set the value of the <environment>
element to default-active
. For example:
<distributed-scheme> <scheme-name>distributed</scheme-name> <service-name>Service1</service-name> <backing-map-scheme> <local-scheme/> </backing-map-scheme> <persistence> <environment>default-active</environment> </persistence> <autostart>true</autostart> </distributed-scheme>
The default value if no value is specified is default-on-demand
, which enables on-demand persistence for the service.
Parent topic: Using Active Persistence Mode
Changing the Active Persistence Failure Response
You can change the way a partitioned cache service responds to possible persistence failures during active persistence operations. The default response is to immediately stop the service. This behavior is ideal if persistence is critical for the service (for example, a cache depends on persistence for data backup). However, if persistence is not critical, you can chose to let the service continue servicing requests.
To change the active persistence failure response for a service, edit the distributed scheme definition and include the <active-failure-mode>
element within the <persistence>
element and set the value to stop-persistence
. If no value is specified, then the default value (stop-service
) is automatically used. The following example changes the active persistence failure response to stop-persistence
.
<distributed-scheme>
<scheme-name>distributed</scheme-name>
<service-name>Service1</service-name>
<backing-map-scheme>
<local-scheme/>
</backing-map-scheme>
<persistence>
<active-failure-mode>stop-persistence</active-failure-mode>
</persistence>
<autostart>true</autostart>
</distributed-scheme>
Parent topic: Using Active Persistence Mode
Changing the Partition Count When Using Active Persistence
The partition count cannot be changed when using active persistence. If you change a services partition count, then on restart of the services all active data is moved to the persistence trash and must be recovered after the original partition count is restored. Data that is persisted can only be recovered only to services that are running with the same partition count, or you can select one of the available workarounds. See Workarounds to Migrate a Persistent Service to a Different Partition Count.
Ensure that the partition count is not modified if active persistence is being used. If the partition count is changed, then a message similar to the following is displayed when the services are started:
<Warning> (thread=DistributedCache:DistributedCachePersistence, member=1): Failed to recover partition 0 from SafeBerkeleyDBStore(...); partition-count mismatch 501(persisted) != 277(service); reinstate persistent store from trash once validation errors have been resolved
The message indicates that the change in the partition-count is not supported and the current active data has been copied to the trash directory. To recover the data:
- Shutdown the entire cluster.
- Remove the current active directory contents for the cluster and service affected on each cluster member.
- Copy (recursively) the contents of the trash directory for each service to the active directory.
- Restore the partition count to the original value.
- Restart the cluster.
Parent topic: Using Active Persistence Mode
Workarounds to Migrate a Persistent Service to a Different Partition Count
- Using Coherence Federation as a means to replicate data to a service with a different partition count.
- Defining a new persistent service and transferring the data manually.
For instructions to use the two options, see Using Federation and Using a New Service. Oracle recommends using federation because it ensures that data is migrated and available as quickly as possible.
If the existing persistent cache service is federated, migration is trivial, as illustrated in the following steps:
- Stop one of the destination clusters in the federation.
- Move the persistence directory to a backup storage.
- Create a new cache config that differs only in the partition count for the related service.
- Invoke the Mean operation
replicateAll
from the newly started cluster with the different partition count.
Using Federation
Now, all the data is persisted to the cluster with the target partition count.
Oracle recommends that you use the federated cache scheme to facilitate any future upgrade. However, if desired, you can also go back to the distributed scheme with the new partition count. Simply point the active persistent directory to the one that is created by BOSTON.
Using a New Service
If you do not want to use a federated service, perform the following steps to upgrade:
Using Asynchronous Persistence Mode
Parent topic: Persisting Caches
Modifying the Pre-Defined Persistence Environments
Persistence uses a set of directories for storage. You can choose to use the default storage directories or change the directories as required.
This section includes the following topics:
Parent topic: Persisting Caches
Overview of the Pre-Defined Persistence Environment
The operational deployment descriptor includes two pre-defined persistence environment definitions:
-
default-active
– used when active persistence is enabled. -
default-on-demand
– used when on-demand persistence is enabled.
The operational override file or system properties are used to override the default settings of the pre-defined persistence environments. The pre-defined persistence environments have the following configuration:
<persistence-environments>
<persistence-environment id="default-active">
<persistence-mode>active</persistence-mode>
<active-directory
system-property="coherence.distributed.persistence.active.dir">
</active-directory>
<snapshot-directory
system-property="coherence.distributed.persistence.snapshot.dir">
</snapshot-directory>
<trash-directory
system-property="coherence.distributed.persistence.trash.dir">
</trash-directory>
</persistence-environment>
<persistence-environment-environment id="default-on-demand">
<persistence-mode>on-demand</persistence-mode>
<active-directory
system-property="coherence.distributed.persistence.active.dir">
</active-directory>
<snapshot-directory
system-property="coherence.distributed.persistence.snapshot.dir">
</snapshot-directory>
<trash-directory
system-property="coherence.distributed.persistence.trash.dir">
</trash-directory>
</persistence-environment>
</persistence-environments>
Parent topic: Modifying the Pre-Defined Persistence Environments
Changing the Pre-Defined Persistence Directory
The pre-defined persistence environments use a base directory called coherence
within the USER_HOME
directory to save persistence files. The location includes directories for active persistence files, snapshot persistence files, and trash files. The locations can be changed to a different local directory or a shared directory on the network.
Note:
-
Persistence directories and files (including the
meta.properties
files) should never be manually edited. Editing the directories and files can lead to persistence errors.
To change the pre-defined location of persistence files, include the <active-directory>
, <snapshot-directory>
, and <trash-directory>
elements that are each set to the respective directories where persistence files are saved. The following example modifies the pre-defined on-demand persistence environment and changes the location of all directories to the /persistence
directory:
<persistence-environments> <persistence-environment id="default-on-demand"> <active-directory system-property="coherence.distributed.persistence.active.dir"> /persistence/active</active-directory> <snapshot-directory system-property="coherence.distributed.persistence.snapshot.dir"> /persistence/snapshot</snapshot-directory> <trash-directory system-property="coherence.distributed.persistence.trash.dir"> /persistence</trash</trash-directory> </persistence-environment> </persistence-environments>
The following system properties are used to change the pre-defined location of the persistence files instead of using the operational override file:
-Dcoherence.distributed.persistence.active.dir=/persistence/active -Dcoherence.distributed.persistence.snapshot.dir=/persistence/snapshot -Dcoherence.distributed.persistence.trash.dir=/persistence/trash
Use the coherence.distributed.persistence.base.dir
system property to change the default directory off the USER_HOME
directory:
-Dcoherence.distributed.persistence.base.dir=persistence
Parent topic: Modifying the Pre-Defined Persistence Environments
Creating Persistence Environments
paged-topic-scheme
definition in the cache
configuration file.
This section includes the following topics:
- Define a Persistence Environment
- Configure a Persistence Mode
- Configure Persistence Directories
- Configure a Cache Service to Use a Persistence Environment
Parent topic: Persisting Caches
Define a Persistence Environment
To define a persistence environment, include the <persistence-environments>
element that contains a <persistence-environment>
element. The <persistence-environment>
element includes the configuration for a persistence environment. Use the id
attribute to name the environment. The id
attribute is used to refer to the persistence environment from a distributed scheme definition. The following example creates a persistence environment with the name environment1
:
<persistence-environments>
<persistence-environment id="enviornment1">
<persistence-mode></persistence-mode>
<active-directory></active-directory>
<snapshot-directory></snapshot-directory>
<trash-directory></trash-directory>
</persistence-environment>
</persistence-environments>
Parent topic: Creating Persistence Environments
Configure a Persistence Mode
A persistence environment supports two persistence modes: on-demand and active. On-demand persistence requires the use of the persistence coordinator to persist and recover cache services. Active persistence automatically persists and recovers cache services. You can still use the persistence coordinator in active persistence mode to periodically persist a cache services.
To configure the persistence mode, include the <persistence-mode>
element set to either on-demand
or active
. The default value if no value is specified is on-demand
. The following example configures active persistence.
<persistence-environments> <persistence-environment id="enviornment1"> <persistence-mode>active</persistence-mode> <persistence-mode></persistence-mode> <active-directory></active-directory> <snapshot-directory></snapshot-directory> <trash-directory></trash-directory> </persistence-environment> </persistence-environments>
Parent topic: Creating Persistence Environments
Configure Persistence Directories
A persistence environment saves cache service data to disk. The location can be configured as required and can be either on a local drive or on a shared network drive. When configuring a local drive, only the partitions that are owned by a cache server are persisted to the respective local disk. When configuring a shared network drive, all partitions are persisted to the same shared disk.
Note:
-
Persistence directories and files (including the
meta.properties
files) should never be manually edited. Editing the directories and files can lead to persistence errors. -
If persistence is configured to use an NFS mounted file system, then the NFS mount should be configured to use synchronous IO and not asynchronous IO, which is the default on many operating systems. The use of asynchronous IO can lead to data loss if the file system becomes unresponsive due to an outage. For details on configuration, refer to the
mount
documentation for your operating system.
Different directories are used for active, snapshot and trash files and are named accordingly. Only the top-level directory must be specified. To configure persistence directories, include the <active-directory>
, <snapshot-directory>
, and <trash-directory>
elements that are each set to a directory path where persistence files are saved. The default value if no value is specified is the USER_HOME
directory. The following example configures the /env1
directory for all persistence files:
<persistence-environments> <persistence-environment id="enviornment1"> <persistence-mode>on-demand</persistence-mode> <active-directory>/env1</active-directory> <snapshot-directory>/env1</snapshot-directory> <trash-directory>/env1</trash-directory> </persistence-environment> </persistence-environments>
Parent topic: Creating Persistence Environments
Configure a Cache Service to Use a Persistence Environment
To change the persistence environment used by a cache service, modify the distributed scheme definition and include the <environment>
element within the <persistence>
element. Set the value of the <environment>
element to the name of a persistence environment that is defined in the operational override configuration file. For example:
<distributed-scheme> <scheme-name>distributed</scheme-name> <service-name>Service1</service-name> <backing-map-scheme> <local-scheme/> </backing-map-scheme> <persistence> <environment>environment1</environment> </persistence> <autostart>true</autostart> </distributed-scheme>
Parent topic: Creating Persistence Environments
Using Quorum for Persistence Recovery
Coherence includes a quorum policy that enables Coherence to defer recovery until a suitable point. Suitability is based on availability of all partitions and sufficient capacity to initiate recovery across storage members.
This section includes the following topics:
- Overview of Persistence Recovery Quorum
- Using the Dynamic Recovery Quorum Policy
- Explicit Persistence Quorum Configuration
Parent topic: Persisting Caches
Overview of Persistence Recovery Quorum
The partitioned cache recover quorum uses two inputs to determine whether the requirements of partition availability and storage capacity are met to commence persistence recovery. The partition availability requirement is met through a list of recovery host names (machines that will contain the persistent stores), while the storage capacity requirement is met by specifying the number of storage nodes. Using the Dynamic Quorum policy both of these inputs can be inferred by the 'last known good' state of the service, and is the recommended approach for configuring a recovery quorum. If the rules used by the Dynamic Quorum policy are insufficient, you can explicitly specify the recovery-hosts list and the recover-quorum. The use of the quorum allows time for a cluster to start and ensures that partitions are recovered gracefully without overloading too few storage members or without inadvertently deleting orphaned partitions.
If the recover quorum is not satisfied, then persistence recovery does not proceed and the service or cluster may appear to be blocked. To check for this scenario, view the QuorumPolicy
attribute in the ServiceMBean
MBean to see if recover
is included in the list of actions. If data has not been recovered after cluster startup, the following log message is emitted (each time a new service member starts up) to indicate that the quorum has not been satisfied:
<Warning> (thread=DistributedCache:DistributedCachePersistence, member=1): Action recover disallowed; all-disallowed-actions: recover(4)
After the quorum is satisfied, the following message is emitted:
<Warning> (thread=DistributedCache:DistributedCachePersistence, member=1): All actions allowed
For active persistence, the recover quorum is enabled by default and automatically uses the dynamic recovery quorum policy. See Using the Dynamic Recovery Quorum Policy.
For general details about partitioned cache quorums, see Using the Partitioned Cache Quorums in Developing Applications with Oracle Coherence.
Parent topic: Using Quorum for Persistence Recovery
Using the Dynamic Recovery Quorum Policy
The dynamic recovery quorum policy is used with active persistence and automatically configures the persistence recovery quorum based on a predefined algorithm. The dynamic recovery quorum policy is the default quorum policy for active persistence mode and does not need to be explicitly enabled. The policy is automatically used if either the <recover-quorum>
value is not specified or if the value is set to 0
. The following example explicitly enables the dynamic recovery quorum policy and is provided here for clarity.
<distributed-scheme> <scheme-name>distributed</scheme-name> <service-name>Service1</service-name> <backing-map-scheme> <local-scheme/> </backing-map-scheme> <partitioned-quorum-policy-scheme> <recover-quorum>0</recover-quorum> </partitioned-quorum-policy-scheme> <autostart>true</autostart> </distributed-scheme>
Note:
When using the dynamic recovery quorum policy, the <recovery-hosts>
element should not be used within the <partitioned-quorum-policy-scheme>
element. All other quorum polices (for example the read quorum policy) are still valid.
Understanding the Dynamic Recovery Algorithm
The dynamic recovery quorum policy works by recording cluster membership information each time a member joins the cluster and partition distribution stabilizes. Membership is only recorded if the service is not suspended and all other partitioned cache actions (such as read, write, restore, and distribute) are allowed by the policy. JMX notifications are sent to subscribers of the PersistenceManagerMBean
MBean every time the cluster membership changes.
During recovery scenarios, a service only recovers data if the following conditions are satisfied:
-
the persistent image of all partitions is accessible by the cluster members
-
the number of storage-enabled nodes is at least 2/3 of the last recorded membership
-
if the persistent data is being stored to a local disk (not shared and visible by all hosts), then there should be at least 2/3 of the number of members for each host as there was when the last membership was recorded
The partitioned cache service blocks any client side requests if any of the conditions are not satisfied. However, if an administrator determines that the full recovery is impossible due to missing partitions or that starting the number of servers that is expected by the quorum is unnecessary, then the recovery can be forced by invoking the forceRecovery
operation on the PersistenceManagerMBean
MBean.
The recovery algorithm can be overridden by using a custom quorum policy class that extends the com.tangosol.net.ConfigurableQuorumPolicy.PartitionedCacheQuorumPolicy
class. To change the hard-coded 2/3 ratio, override the PartitionedCacheQuorumPolicy.calculateMinThreshold
method. See Using Custom Action Policies in Developing Applications with Oracle Coherence.
Parent topic: Using Quorum for Persistence Recovery
Explicit Persistence Quorum Configuration
To configure the recover quorum for persistence, modify a distributed scheme definition and include the <recover-quorum>
element within the <partitioned-quorum-policy-scheme>
element. Set the <recover-quorum>
element value to the number of storage members that must be available before recovery starts. For example:
<distributed-scheme> <scheme-name>distributed</scheme-name> <service-name>Service1</service-name> <backing-map-scheme> <local-scheme/> </backing-map-scheme> <partitioned-quorum-policy-scheme> <recover-quorum>2</recover-quorum> </partitioned-quorum-policy-scheme> <autostart>true</autostart> </distributed-scheme>
Note:
In active persistence mode, setting the<recover-quorum>
element to 0
, enables the
dynamic recovery quorum policy. See Using the Dynamic Recovery Quorum Policy.
In shared disk scenarios, all partitions are persisted and recovered from a single location. For local-disk scenarios, each storage member recovers its partitions from a local disk. However, if you use a non-dynamic recovery quorum with local-disk based storage, you must define a list of storage-enabled hosts in the cluster that are required to recover orphaned partition from the persistent storage, otherwise empty partitions will be assigned.
Note:
Recovery hosts must be specified to ensure that recovery does not commence prior to all persisted state being available.To define a list of addresses, edit the operational override configuration file and include the <address-provider>
element that contains a list of addresses each defined using an <address>
element. Use the id
attribute to name the address provider list. The id
attribute is used to refer to the list from a distributed scheme definition. The following example creates an address provider list that contains two member addresses and is named persistence_hosts
:
<address-providers> <address-provider id="persistence_hosts"> <address>HOST_NAME1</address> <address>HOST_NAME2</address> </address-provider> </address-providers>
To refer to the address provider list, modify a distributed scheme definition and include the <recovery-hosts>
element within the <partitioned-quorum-policy-scheme>
element and set the value to the name of an address provider list. For example:
<distributed-scheme>
<scheme-name>distributed</scheme-name>
<service-name>Service1</service-name>
<backing-map-scheme>
<local-scheme/>
</backing-map-scheme>
<partitioned-quorum-policy-scheme>
<recover-quorum>2</recover-quorum>
<recovery-hosts>persistence_hosts</recovery-hosts>
</partitioned-quorum-policy-scheme>
<autostart>true</autostart>
</distributed-scheme>
Parent topic: Using Quorum for Persistence Recovery
Subscribing to Persistence JMX Notifications
PersistenceManagerMBean
MBean includes a set of notification types that applications can use to monitor persistence operations. See PersistenceManagerMBean in Managing Oracle Coherence.
To subscribe to persistence JMX notifications, implement the JMX NotificationListener
interface and register the listener. The following code snippet demonstrates registering a notification listener. Refer to the Coherence examples for the complete example, which includes a sample listener implementation.
... MBeanServer server = MBeanHelper.findMBeanServer(); Registry registry = cluster.getManagement(); try { for (String sServiceName : setServices) { logHeader("Registering listener for " + sServiceName); String sMBeanName = getMBeanName(sServiceName); ObjectName oBeanName = new ObjectName(sMBeanName); NotificationListener listener = new PersistenceNotificationListener(sServiceName); server.addNotificationListener(oBeanName, listener, null, null); } ...
Parent topic: Persisting Caches
Managing Persistence
This section includes the following topics:
- Plan for Persistence Storage
- Plan for Persistence Memory Overhead
- Monitor Persistence Storage Usage
- Monitoring Persistence Latencies
Parent topic: Persisting Caches
Plan for Persistence Storage
An adequate amount of disk space is required to persist data. Ensure enough space is provisioned to persist the expected amount of cached data. The following guidelines should be used when sizing disks for persistence:
-
The approximate overhead for active persistence data storage is an extra 10-30% per partition. The actual overhead may vary depending upon data access patterns, the size of keys and values, and other factors such as block sizes and heavy system load.
-
Use the Coherence VisualVM plug-in and persistence reports to monitor space availability and usage. See Monitor Persistence Storage Usage. Specifically, use the
PersistenceActiveSpaceUsed
attribute on theServiceMBean
MBean to monitor the actual persistence space used for each service and node. -
Persistence configurations that use a shared disk for storage should plan for the potential maximum size of the cache because all partitions are persisted to the same location. For example, if the maximum capacity of a cache is 8GB, then the shared disk must be able to accommodate at least 8GB of persisted data plus overhead.
-
Persistence configurations that use a local disk for storage should plan for the potential maximum cache capacity of the cache server because only the partitions owned by a cache server are persisted to the local disk. For example, if the maximum cache capacity of a cache server is 2GB, then the local disk must be able to accommodate at least 2GB of persisted data plus overhead.
-
Plan additional space when creating snapshots in either active or on-demand mode. Each snapshot of a cache duplicates the size of the persistence files on disk.
-
Plan additional space for snapshot archives. Each archive of a snapshot is slightly less than the size of the snapshot files on disk.
-
The larger the partition count, the more memory overhead a single member may observe. Each partition will consume at a minimum about 80k of heap for the underlying Berkeley DB (BDB) implementation to maintain its context. For example, a partition count of 24,000 may easily consume the entire heap of a JVM configured for 2GB for the senior member in dynamic quorum mode.
To alleviate this issue, a recovery quorum of at least 75% of the machines may be used, therefore recovery will only take place when enough members are running and ready to be balanced. In this case, the partitions are evenly spread across all of the storage-enabled members, as opposed to being concentrated on a single member and only then distributed.
Another way to alleviate the issue is to reduce the partition count whenever possible.
Note:
The underlying Berkeley DB (BDB) storage which is used for persistence, is “append only”. Insertions, deletions, and updates are always added at the end of the current file. The first file is named 00000000.jdb
. When this file grows to
a certain size (10 MB, by default), then a new file named
00000001.jdb
becomes the current file, and so on. As the files
reach the 10MB limit and roll over to a new file, the background tasks are run
automatically to cleanup, compress, and remove the unused files. As a result, you will
have at least 10MB storage overhead per service per partition. This overhead is taken
into account in the 10-30% figure quoted above.
Parent topic: Managing Persistence
Plan for Persistence Memory Overhead
In addition to affecting disk usage, active persistence requires additional data structures and memory within each JVM's heap for managing persistence. The amount of memory required varies based on the partition-count and data usage patterns but as a guide, you should allocate an additional 20-35% of memory to each JVM that is running active persistence.
Parent topic: Managing Persistence
Monitor Persistence Storage Usage
Monitor persistence storage to ensure that there is enough space available on the file system to persist cached data.
Coherence VisualVM Plug-in
Use the Persistence tab in the Coherence VisualVM plug-in to view the amount of space being used by a service for active persistence. The space is reported in both Bytes and Megabytes. The tab also reports the current number of snapshots available for a service. The snapshot number can be used to estimate the additional space usage and to determine whether snapshots should be deleted to free up space.
Coherence Reports
Use the persistence detail report (persistence-detail.txt
) to view the amount of space being used by a service for both active persistence and for persistence snapshots. The amount of available disk space is also reported and allows you to monitor if a disk is reaching capacity.
Coherence MBeans
Use the persistence attributes on the ServiceMBean
MBean to view all the persistence storage statistics for a service. The MBean includes statistics for both active persistence and persistence snapshots.
Parent topic: Managing Persistence
Monitoring Persistence Latencies
Monitor persistence latencies when using active persistence to ensure that persistence operations are not adversely affecting cache operations. High latencies can be a sign that network issues are delaying writing persistence files to a shared disk or delaying coordination between local disks.
Coherence VisualVM Plug-In
Use the Persistence tab in the Coherence VisualVM plug-in to view the amount of latency that persistence operations are adding to cache operations. The time is reported in milliseconds. Statistics are reported for each service and provide the average latency of all persistence operations and for the highest recorded latency.
Coherence Reports
Use the persistence detail report (persistence-detail.txt
) to view the amount of latency that persistence operations are adding to cache operations. The time is reported in milliseconds. Statistics are provided for the average latency of all persistence operations and for the highest recorded latency on each cluster node of a service. The statistics can be used to determine if some nodes are experiencing higher latencies than other nodes.
Coherence MBeans
Use the persistence attributes on the ServiceMBean
MBean to view the amount of latency that persistence operations are adding to cache operations. The time is reported in milliseconds. Statistics are provided for the average latency of all persistence operations and for the highest recorded latency on each cluster nodes of a service. The statistics can be used to determine if some nodes are experiencing higher latencies than other nodes.
Parent topic: Managing Persistence
Configuring Caches as Transient
Note:
During persistence recovery operations, the entire cache service is recovered from the persisted state and any caches that are configured as transient are reset.
Caches are configured as transient using the <transient>
element within the <backing-map-scheme>
element of a distributed scheme definition. However, because persistence is always enabled on a service, a parameter macro is used to configure the transient setting for each cache. For example:
<caching-scheme-mapping> <cache-mapping> <cache-name>nonPersistedCache</cache-name> <scheme-name>distributed</scheme-name> <init-params> <init-param> <param-name>transient</param-name> <param-value>true</param-value> </init-param> </init-params> </cache-mapping> <cache-mapping> <cache-name>persistedCache</cache-name> <scheme-name>distributed</scheme-name> </cache-mapping> </caching-scheme-mapping> <distributed-scheme> <scheme-name>distributed</scheme-name> <service-name>DistributedService</service-name> <backing-map-scheme> <transient>{transient false}</transient> <local-scheme/> </backing-map-scheme> <autostart>true</autostart> </distributed-scheme>
Note:
The default value of the <transient>
element is false
and indicates that cache data is persisted.
Parent topic: Persisting Caches