17 Using Quorum

Coherence provides quorum policies that control when specific service actions are allowed in a cluster in order to ensure that a cluster is adequately provisioned.

This chapter includes the following sections:

Overview of Using Quorum

A quorum, in Coherence, refers to the minimum number of service members that are required in a cluster before a service action is allowed or disallowed.Quorums are beneficial because they automatically provide assurances that a cluster behaves in an expected way when member thresholds are reached. For example, a partitioned cache backup quorum might require at least 5 storage-enabled members before the partitioned cache service is allowed to back up partitions.

Quorums are service-specific and defined within a quorum policy; there is a cluster quorum policy for the Cluster service, a partitioned quorum policy for the Partitioned Cache service, and a proxy quorum policy for the Proxy service. Quorum thresholds are set on the policy using a cache configuration file.

Each quorum provides benefits for its particular service. However, in general, quorums:

  • control service behavior at different service member levels

  • mandate the minimum service member levels that are required for service operations

  • ensure an optimal cluster and cache environment for a particular application or solution

Using the Cluster Quorum

The cluster quorum policy ensures a cluster always contains a configured number of cluster members.

This section includes the following topics:

Overview of the Cluster Quorum Policy

The cluster quorum policy defines quorums for the Cluster service that mandates when a cluster can terminate suspect members. A member is considered suspect if it has not responded to network communications and is in imminent danger of being disconnected from the cluster. These quorums can be specified generically across all cluster members or constrained to members that have a specific role in the cluster, such as client or server members. See member-identity.

  • Timeout Site Quorum – The timeout site quorum mandates the minimum number of sites that must remain in the cluster when the cluster service is terminating suspect members.

  • Timeout Machine Quorum – The timeout machine quorum mandates the minimum number of cluster machines that must remain in the cluster when the cluster service is terminating suspect members.

  • Timeout Survivor Quorum – The timeout survivor quorum mandates the minimum number of cluster members that must remain in the cluster when the cluster service is terminating suspect members.

This quorum is typically used in environments where network performance varies. For example, intermittent network outages may cause a high number of cluster members to be removed from the cluster. Using these quorums, a certain number of members are maintained during the outage and are available when the network recovers. This behavior also minimizes the manual intervention required to restart machines or members. Naturally, requests that require cooperation by the nodes that are not responding are not able to complete and are either blocked for the duration of the outage or are timed out.

Configuring the Cluster Quorum Policy

Cluster quorums are configured in an operational override file within the <cluster-quorum-policy> element. Each quorum can optionally use a role attribute. The following example demonstrates configuring the quorums together and ensures that 5 cluster members, 2 sites, and 3 machines are always kept in the cluster while removing suspect members. In addition, the quorums are specific to members with the server role.

<cluster-config>
   <member-identity>
      <role-name>server</role-name>
   </member-identity>
   <cluster-quorum-policy>
      <timeout-site-quorum role="server">2</timeout-site-quorum>
      <timeout-machine-quorum role="server">3</timeout-machine-quorum>
      <timeout-survivor-quorum role="server">5</timeout-survivor-quorum>
   </cluster-quorum-policy>
</cluster-config>

Using the Partitioned Cache Quorums

The partitioned cache quorum policy ensures that a partitioned cache service contains a configured number of members to perform service operations.

This section includes the following topics:

Overview of the Partitioned Cache Quorum Policy

The partitioned cache quorum policy defines quorums for the partitioned cache service (DistributedCache) that mandate how many service members are required before different partitioned cache service operations can be performed:

  • Distribution Quorum – This quorum mandates the minimum number of storage-enabled members of a partitioned cache service that must be present before the partitioned cache service is allowed to perform partition distribution.

  • Restore Quorum – This quorum mandates the minimum number of storage-enabled members of a partitioned cache service that must be present before the partitioned cache service is allowed to restore lost primary partitions from backup.

  • Read Quorum – This quorum mandates the minimum number of storage-enabled members of a partitioned cache service that must be present to process read requests. A read request is any request that does not mutate the state or contents of a cache.

  • Write Quorum – This quorum mandates the minimum number of storage-enabled members of a partitioned cache service that must be present to process write requests. A write request is any request that may mutate the state or contents of a cache.

  • Recover Quorum – This quorum mandates the minimum number of storage-enabled members of a partitioned cache service that must be present to recover orphaned partitions from the persistent storage, or assign empty partitions if the persistent storage is unavailable or lost. The members must be defined in a host-address list that can be referenced as part of the quorum definition.

    Note:

    The Recover Quorum supports a value of zero. A value of zero enables the dynamic recovery policy, which ensures availability of all persisted state and is based on the last good cluster membership information to determine how many members must be present for the recovery. See Using Quorum for Persistence Recovery in Administering Oracle Coherence.

These quorums are typically used to indicate at what service member levels different service operations are best performed given the intended use and requirements of a distributed cache. For example, a small distributed cache may only require three storage-enabled members to adequately store data and handle projected request volumes. While; a large distributed cache may require 10, or more, storage-enabled members to adequately store data and handle projected request volumes. Optimal member levels are tested during development and then set accordingly to ensure that the minimum service member levels are provisioned in a production environment.

If the number of storage-enabled nodes running the service drops below the configured level of read or write quorum, the corresponding client operation are rejected by throwing the com.tangosol.net.RequestPolicyException. If the number of storage-enabled nodes drops below the configured level of distribution quorum, some data may become "endangered" (no backup) until the quorum is reached. Dropping below the restore quorum may cause some operation to be blocked until the quorum is reached or to be timed out.

Configuring the Partitioned Cache Quorum Policy

Partitioned cache quorums are configured in a cache configuration file within the <partitioned-quorum-policy-scheme> element. The element must be used within a <distributed-scheme> element. The following example demonstrates configuring thresholds for the partitioned cache quorums. Ideally, the threshold values would indicate the minimum amount of service members that are required to perform the operation.

<distributed-scheme>
   <scheme-name>partitioned-cache-with-quorum</scheme-name>
   <service-name>PartitionedCacheWithQuorum</service-name>
   <backing-map-scheme>
      <local-scheme/>
   </backing-map-scheme>
   <partitioned-quorum-policy-scheme>
      <distribution-quorum>4</distribution-quorum>
      <restore-quorum>3</restore-quorum>
      <read-quorum>3</read-quorum>
      <write-quorum>5</write-quorum>
      <recover-quorum>2</recover-quorum>
      <recovery-hosts>persistence-host-list</recovery-hosts>
   </partitioned-quorum-policy-scheme>
   <autostart>true</autostart>
</distributed-scheme>

The <partitioned-quorum-policy-scheme> element does not support scheme references. However, you can reference the scheme as <distributed-scheme>, as shown in the following example:

<?xml version="1.0"?>
<cache-config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              
xmlns="http://xmlns.oracle.com/coherence/coherence-cache-config"
              
xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-cache-config coherence-cache-config.xsd">
  <caching-scheme-mapping>
    <cache-mapping>
      <cache-name>*</cache-name>
      <scheme-name>dist-example</scheme-name>
    </cache-mapping>
  </caching-scheme-mapping>
  <caching-schemes>
    <distributed-scheme>
      <scheme-name>dist-example</scheme-name>
      <scheme-ref>partitioned-cache-with-quorum</scheme-ref>
       <service-name>DistExample</service-name>
       <backing-map-scheme>
         <local-scheme>
           <high-units>100M</high-units>
         </local-scheme>
       </backing-map-scheme>
       <autostart>true</autostart>
    </distributed-scheme>
    <distributed-scheme>
       <scheme-name>partitioned-cache-with-quorum</scheme-name>
       <service-name>PartitionedCacheWithQuorum</service-name>
       <partitioned-quorum-policy-scheme>
          <scheme-name>partitioned-cache-quorum</scheme-name>
          <distribution-quorum>4</distribution-quorum>
          <restore-quorum>3</restore-quorum>
          <read-quorum>3</read-quorum>
          <write-quorum>5</write-quorum>
       </partitioned-quorum-policy-scheme>
    </distributed-scheme>
  </caching-schemes>
</cache-config>

Using the Proxy Quorum

The proxy quorum policy ensures a cluster always contains enough proxies to service client requests.

This section includes the following topics:

Overview of the Proxy Quorum Policy

The proxy quorum policy defines a single quorum (the connection quorum) for the proxy service. The connection quorum mandates the minimum number of proxy service members that must be available before the proxy service can allow client connections.

This quorum is typically used to ensure enough proxy service members are available to optimally support a given set of TCP clients. For example, a small number of clients may efficiently connect to a cluster using two proxy services. While; a large number of clients may require 3 or more proxy services to efficiently connect to a cluster. Optimal levels are tested during development and then set accordingly to ensure that the minimum service member levels are provisioned in a production environment.

Configuring the Proxy Quorum Policy

The connection quorum threshold is configured in a cache configuration file within the <proxy-quorum-policy-scheme> element. The element must be used within a <proxy-scheme> element. The following example demonstrates configuring the connection quorum threshold to ensures that 3 proxy service members are present in the cluster before the proxy service is allowed to accept TCP client connections:

<proxy-scheme>
   <scheme-name>proxy-with-quorum</scheme-name>
   <service-name>TcpProxyService</service-name>
   <acceptor-config>
      <tcp-acceptor>
         <local-address>
            <address>localhost</address>
            <port>32000</port>
         </local-address>
      </tcp-acceptor>
   </acceptor-config>
   <proxy-quorum-policy-scheme>
      <connect-quorum>3</connect-quorum>
   </proxy-quorum-policy-scheme>
   <autostart>true</autostart>
</proxy-scheme>

The <proxy-quorum-policy-scheme> element also supports the use of scheme references. In the below example, a <proxy-quorum-policy-scheme>, with the name proxy-quorum, is referenced from within the <proxy-scheme> element:

<proxy-scheme>
   <scheme-name>proxy-with-quorum</scheme-name>
   <service-name>TcpProxyService</service-name>
   ...
   <proxy-quorum-policy-scheme>
      <scheme-ref>proxy-quorum</scheme-ref>
   </proxy-quorum-policy-scheme>
   <autostart>true</autostart>
</proxy-scheme>
<proxy-scheme>
   <scheme-name>proxy-example</scheme-name>
   <service-name>TcpProxyService</service-name>
   ...
   <proxy-quorum-policy-scheme>
      <scheme-name>proxy-quorum</scheme-name>
      <connect-quorum>3</connect-quorum>
   </proxy-quorum-policy-scheme>
   <autostart>true</autostart>
</proxy-scheme>

Using Custom Action Policies

Custom action policies can be used instead of the default quorum policies for the Cluster service, Partitioned Cache service, and Proxy service.Custom action policies must implement the com.tangosol.net.ActionPolicy interface.

This section includes the following topics:

Enabling Custom Action Policies

To enable a custom policy, add a <class-name> element within a quorum policy scheme element that contains the fully qualified name of the implementation class. The following example adds a custom action policy to the partitioned quorum policy for a distributed cache scheme definition:

<distributed-scheme>
   <scheme-name>partitioned-cache-with-quorum</scheme-name>
   <service-name>PartitionedCacheWithQuorum</service-name>
   <backing-map-scheme>
     <local-scheme/>
   </backing-map-scheme>
   <partitioned-quorum-policy-scheme>
      <class-name>package.MyCustomAction</class-name>
   </partitioned-quorum-policy-scheme>
   <autostart>true</autostart>
</distributed-scheme>

As an alternative, a factory class can create custom action policy instances. To define a factory class, use the <class-factory-name> element to enter the fully qualified class name and the <method-name> element to specify the name of a static factory method on the factory class which performs object instantiation. For example.

<distributed-scheme>
   <scheme-name>partitioned-cache-with-quorum</scheme-name>
   <service-name>PartitionedCacheWithQuorum</service-name>
   <backing-map-scheme>
     <local-scheme/>
   </backing-map-scheme>
   <partitioned-quorum-policy-scheme>
      <class-factory-name>package.Myfactory</class-factory-name>
      <method-name>createPolicy</method-name>
   </partitioned-quorum-policy-scheme>
   <autostart>true</autostart>
</distributed-scheme>

Enabling the Custom Failover Access Policy

Coherence provides a pre-defined custom action policy that moderates client request load during a failover event in order to allow cache servers adequate opportunity to re-establish partition backups. Use this policy in situations where a heavy load of high-latency requests may prevent, or significantly delay, cache servers from successfully acquiring exclusive access to partitions needing to be transferred or backed up.

To enable the custom failover access policy, add a <class-name> element within a <partition-quorum-policy-scheme> element that contains the fully qualified name of the failover access policy (com.tangosol.net.partition.FailoverAccessPolicy). The policy accepts the following parameters:

  • cThresholdMillis – Specifies the delay before the policy should start holding requests (after becoming endangered). The default value is 5000 milliseconds.

  • cLimitMillis – Specifies the delay before the policy makes a maximal effort to hold requests (after becoming endangered). The default values is 60000 milliseconds.

  • cMaxDelayMillis – Specifies the maximum amount of time to hold a request. The default value is 5000 milliseconds.

The following example enables the custom failover access policy and sets each of the parameters:

<distributed-scheme>
   <scheme-name>partitioned-cache-with-quorum</scheme-name>
   <service-name>PartitionedCacheWithQuorum</service-name>
   <backing-map-scheme>
     <local-scheme/>
   </backing-map-scheme>
   <partitioned-quorum-policy-scheme>
      <class-name>com.tangosol.net.partition.FailoverAccessPolicy/class-name>
      <init-params>
         <init-param>
            <param-name>cThresholdMillis</param-name>
            <param-value>7000</param-value>
         </init-param>
         <init-param>
            <param-name>cLimitMillis</param-name>
            <param-value>30000</param-value>
         </init-param>
         <init-param>
            <param-name>cMaxDelayMillis</param-name>
            <param-value>2000</param-value>
         </init-param>
      </init-params>
   </partitioned-quorum-policy-scheme>
   <autostart>true</autostart>
</distributed-scheme>