Group Management Service

The Group Management Service (GMS) is an infrastructure component that is enabled for the instances in a cluster. When GMS is enabled, if a clustered instance fails, the cluster and the Domain Administration Server (DAS) are aware of the failure and can take action when failure occurs. Many features of GlassFish Server depend upon GMS. For example, GMS is used by the in-memory session replication, transaction service, and timer service features.

If server instances in a cluster are located on different machines, ensure that all the server instance machines and the DAS machine are on the same subnet and that multicast is enabled for the network. To test whether multicast is enabled, use the validate-multicast(1) subcommand.

GMS is a core service of the Shoal framework. For more information about Shoal, visit the Project Shoal home page.

The following topics are addressed here:

GMS Configuration Settings
Dotted Names for GMS Settings
To Preconfigure Nondefault GMS Configuration Settings
To Change GMS Settings After Cluster Creation
To Check the Health of Instances in a Cluster
To Validate That Multicast Transport Is Available for a Cluster
Using the Multi-Homing Feature With GMS

GMS Configuration Settings

GlassFish Server has two types of GMS settings:

GMS cluster settings — These are determined during cluster creation. For more information about these settings, see To Create a Cluster.
GMS configuration settings — These are determined during configuration creation and are explained here.

The following GMS configuration settings are used in GMS for group discovery and failure detection:

group-discovery-timeout-in-millis

Indicates the amount of time (in milliseconds) an instance's GMS module will wait during instance startup for discovering other members of the group.

The group-discovery-timeout-in-millis timeout value should be set to the default or higher. The default is 5000.

max-missed-heartbeats

Indicates the maximum number of missed heartbeats that the health monitor counts before the instance can be marked as a suspected failure. GMS also tries to make a peer-to-peer connection with the suspected member. If the maximum number of missed heartbeats is exceeded and peer-to-peer connection fails, the member is marked as a suspected failure. The default is 3.

heartbeat-frequency-in-millis

Indicates the frequency (in milliseconds) at which a heartbeat is sent by each server instance to the cluster.

The failure detection interval is the max-missed-heartbeats multiplied by the heartbeat-frequency-in-millis. Therefore, the combination of defaults, 3 multiplied by 2000 milliseconds, results in a failure detection interval of 6 seconds.

Lowering the value of heartbeat-frequency-in-millis below the default would result in more frequent heartbeat messages being sent out from each member. This could potentially result in more heartbeat messages in the network than a system needs for triggering failure detection protocols. The effect of this varies depending on how quickly the deployment environment needs to have failure detection performed. That is, the (lower) number of retries with a lower heartbeat interval would make it quicker to detect failures.

However, lowering this value could result in false positives because you could potentially detect a member as failed when, in fact, the member's heartbeat is reflecting the network load from other parts of the server. Conversely, a higher timeout interval results in fewer heartbeats in the system because the time interval between heartbeats is longer. As a result, failure detection would take a longer. In addition, a startup by a failed member during this time results in a new join notification but no failure notification, because failure detection and verification were not completed.

The default is 2000.

verify-failure-waittime-in-millis

Indicates the verify suspect protocol's timeout used by the health monitor. After a member is marked as suspect based on missed heartbeats and a failed peer–to–peer connection check, the verify suspect protocol is activated and waits for the specified timeout to check for any further health state messages received in that time, and to see if a peer-to-peer connection can be made with the suspect member. If not, then the member is marked as failed and a failure notification is sent. The default is 1500.

verify-failure-connect-timeout-in-millis

Indicates the time it takes for the GMS to detect a hardware or network failure of a server instance. Be careful not to set this value too low. The smaller this timeout value is, the greater the chance of detecting false failures. That is, the instance has not failed but doesn't respond within the short window of time. The default is 10000.

The heartbeat frequency, maximum missed heartbeats, peer-to-peer connection-based failure detection, and the verify timeouts are all needed to ensure that failure detection is robust and reliable in GlassFish Server.

For the dotted names for each of these GMS configuration settings, see Dotted Names for GMS Settings. For the steps to specify these settings, see To Preconfigure Nondefault GMS Configuration Settings.

Dotted Names for GMS Settings

Below are sample get(1) subcommands to get all the GMS configuration settings (attributes associated with the referenced mycfg configuration) and GMS cluster settings (attributes and properties associated with a cluster named mycluster).

asadmin> get "configs.config.mycfg.group-management-service.*"
configs.config.mycfg.group-management-service.failure-detection.heartbeat-frequency-in-millis=2000
configs.config.mycfg.group-management-service.failure-detection.max-missed-heartbeats=3
configs.config.mycfg.group-management-service.failure-detection.verify-failure-connect-timeout-in-millis=10000
configs.config.mycfg.group-management-service.failure-detection.verify-failure-waittime-in-millis=1500
configs.config.mycfg.group-management-service.group-discovery-timeout-in-millis=5000

asadmin> get clusters.cluster.mycluster
clusters.cluster.mycluster.config-ref=mycfg
clusters.cluster.mycluster.gms-bind-interface-address=${GMS-BIND-INTERFACE-ADDRESS-mycluster}
clusters.cluster.mycluster.gms-enabled=true
clusters.cluster.mycluster.gms-multicast-address=228.9.245.47
clusters.cluster.mycluster.gms-multicast-port=9833
clusters.cluster.mycluster.name=mycluster

asadmin> get "clusters.cluster.mycluster.property.*"
clusters.cluster.mycluster.property.GMS_LISTENER_PORT=${GMS_LISTENER_PORT-mycluster}
clusters.cluster.mycluster.property.GMS_MULTICAST_TIME_TO_LIVE=4
clusters.cluster.mycluster.property.GMS_LOOPBACK=false
clusters.cluster.mycluster.property.GMS_TCPSTARTPORT=9090
clusters.cluster.mycluster.property.GMS_TCPENDPORT=9200

The last get subcommand displays only the properties that have been explicitly set.

For the steps to specify these settings, see To Preconfigure Nondefault GMS Configuration Settings and To Change GMS Settings After Cluster Creation.

To Preconfigure Nondefault GMS Configuration Settings

You can preconfigure GMS with values different than the defaults without requiring a restart of the DAS and the cluster.

Create a configuration using the copy-config(1) subcommand.
For example:
```
asadmin> copy-config default-config mycfg
```
For more information, see To Create a Named Configuration.

Set the values for the new configuration's GMS configuration settings.

For example:

asadmin > set configs.config.mycfg.group-management-service.group-discovery-timeout-in-millis=8000
asadmin> set configs.config.mycfg.group-management-service.failure-detection.max-missed-heartbeats=5

For a complete list of the dotted names for these settings, see Dotted Names for GMS Settings.

Create the cluster so it uses the previously created configuration.
For example:
```
asadmin> create-cluster --config mycfg mycluster
```
You can also set GMS cluster settings during this step. For more information, see To Create a Cluster.

Create server instances for the cluster.

For example:

asadmin> create-instance --node localhost --cluster mycluster instance01

asadmin> create-instance --node localhost --cluster mycluster instance02

Start the cluster.
For example:
```
asadmin> start-cluster mycluster
```

See Also

You can also view the full syntax and options of a subcommand by typing asadmin help subcommand at the command line.

To Change GMS Settings After Cluster Creation

To avoid the need to restart the DAS and the cluster, configure GMS configuration settings before cluster creation as explained in To Preconfigure Nondefault GMS Configuration Settings.

To avoid the need to restart the DAS and the cluster, configure the GMS cluster settings during cluster creation as explained in To Create a Cluster.

Changing any GMS settings using the set subcommand after cluster creation requires a domain administration server (DAS) and cluster restart as explained here.

Ensure that the DAS and cluster are running.
Remote subcommands require a running server.

Use the get(1) subcommand to determine the settings to change.

For example:

asadmin> get "configs.config.mycfg.group-management-service.*"
configs.config.mycfg.group-management-service.failure-detection.heartbeat-frequency-in-millis=2000
configs.config.mycfg.group-management-service.failure-detection.max-missed-heartbeats=3
configs.config.mycfg.group-management-service.failure-detection.verify-failure-connect-timeout-in-millis=10000
configs.config.mycfg.group-management-service.failure-detection.verify-failure-waittime-in-millis=1500
configs.config.mycfg.group-management-service.group-discovery-timeout-in-millis=5000

For a complete list of the dotted names for these settings, see Dotted Names for GMS Settings.

Use the set(1) subcommand to change the settings.

For example:

asadmin> set configs.config.mycfg.group-management-service.group-discovery-timeout-in-millis=6000

Use the get subcommand again to confirm that the changes were made.

For example:

asadmin> get configs.config.mycfg.group-management-service.group-discovery-timeout-in-millis

Restart the DAS.

For example:

asadmin> stop-domain domain1

asadmin> start-domain domain1

Restart the cluster.

For example:

asadmin> stop-cluster mycluster

asadmin> start-cluster mycluster

See Also

You can also view the full syntax and options of a subcommand by typing asadmin help subcommand at the command line.

To Check the Health of Instances in a Cluster

The get-health subcommand only works when GMS is enabled. This is the quickest way to evaluate the health of a cluster and to detect if cluster is properly operating; that is, all members of the cluster are running and visible to DAS.

If multicast is not enabled for the network, all instances could be running (as shown by the list-instances(1) subcommand), yet isolated from each other. The get-health subcommand does not show the instances if they are running but cannot discover each other due to multicast not being configured properly. See To Validate That Multicast Transport Is Available for a Cluster.

Ensure that the DAS and cluster are running.
Remote subcommands require a running server.
Check whether server instances in a cluster are running by using the get-health(1) subcommand.

Example 4-1 Checking the Health of Instances in a Cluster

This example checks the health of a cluster named cluster1.

asadmin> get-health cluster1
instance1 started since Wed Sep 29 16:32:46 EDT 2010
instance2 started since Wed Sep 29 16:32:45 EDT 2010
Command get-health executed successfully.

See Also

You can also view the full syntax and options of the subcommand by typing asadmin help get-health at the command line.

To Validate That Multicast Transport Is Available for a Cluster

Before You Begin

To test a specific multicast address, multicast port, or bind interface address, get this information beforehand using the get subcommand. Use the following subcommand to get the multicast address and port for a cluster named c1:

asadmin> get clusters.cluster.c1
clusters.cluster.c1.config-ref=mycfg
clusters.cluster.c1.gms-bind-interface-address=${GMS-BIND-INTERFACE-ADDRESS-c1}
clusters.cluster.c1.gms-enabled=true
clusters.cluster.c1.gms-multicast-address=228.9.174.162
clusters.cluster.c1.gms-multicast-port=5383
clusters.cluster.c1.name=c1

Use the following subcommand to get the bind interface address of a server instance named i1that belongs to a cluster named c1, if this system property has been set:

asadmin> get servers.server.i1.system-property.GMS-BIND-INTERFACE-ADDRESS-c1
servers.server.i1.system-property.GMS-BIND-INTERFACE-ADDRESS-c1.name=GMS-BIND-INTERFACE-ADDRESS-c1
servers.server.i1.system-property.GMS-BIND-INTERFACE-ADDRESS-c1.value=10.12.152.30

For information on how to set this system property, see Using the Multi-Homing Feature With GMS.

Note - Do not run the validate-multicast subcommand using the DAS and cluster's multicast address and port values while the DAS and cluster are running. Doing so results in an error.

The validate-multicast subcommand must be run at the same time on two or more machines to validate whether multicast messages are being received between the machines.

Check whether multicast transport is available for a cluster by using the validate-multicast(1) subcommand.

Example 4-2 Validating That Multicast Transport Is Available for a Cluster

This example checks whether multicast transport is available for a cluster named c1.

Run from host sr1:

asadmin> validate-multicast
Will use port 2048
Will use address 228.9.3.1
Will use bind interface null
Will use wait period 2,000 (in milliseconds)

Listening for data...
Sending message with content "sr1" every 2,000 milliseconds
Received data from sr1 (loopback)
Received data from sr2
Exiting after 20 seconds. To change this timeout, use the --timeout command line option.
Command validate-multicast executed successfully.

Run from host sr2:

asadmin> validate-multicast
Will use port 2048
Will use address 228.9.3.1
Will use bind interface null
Will use wait period 2,000 (in milliseconds)

Listening for data...
Sending message with content "sr2" every 2,000 milliseconds
Received data from sr2 (loopback)
Received data from sr1
Exiting after 20 seconds. To change this timeout, use the --timeout command line option.
Command validate-multicast executed successfully.

Next Steps

As long as all machines see each other, multicast is validated to be working properly across the machines. If the machines are not seeing each other, set the --bindaddress option explicitly to ensure that all machines are using interface on same subnet, or increase the --timetolive option from the default of 4. If these changes fail to resolve the multicast issues, ask the network administrator to verify that the network is configured so the multicast messages can be seen between all the machines used to run the cluster.

See Also

You can also view the full syntax and options of the subcommand by typing asadmin help get-health at the command line.

Using the Multi-Homing Feature With GMS

Multi-homing enables GlassFish Server clusters to be used in an environment that uses multiple Network Interface Cards (NICs). A multi-homed host has multiple network connections, of which the connections may or may not be the on same network. Multi-homing provides the following benefits:

Provides redundant network connections within the same subnet. Having multiple NICs ensures that one or more network connections are available for communication.
Supports communication across two or more different subnets. The DAS and all server instances in the same cluster must be on the same subnet for GMS communication, however.
Binds to a specific IPv4 address and receives GMS messages in a system that has multiple IP addresses configured. The responses for GMS messages received on a particular interface will also go out through that interface.
Supports separation of external and internal traffic.

Traffic Separation Using Multi-Homing

You can separate the internal traffic resulting from GMS from the external traffic. Traffic separation enables you plan a network better and augment certain parts of the network, as required.

Consider a simple cluster, c1, with three instances, i101, i102, and i103. Each instance runs on a different machine. In order to separate the traffic, the multi-homed machine should have at least two IP addresses belonging to different networks. The first IP as the external IP and the second one as internal IP. The objective is to expose the external IP to user requests, so that all the traffic from the user requests would be through them. The internal IP is used only by the cluster instances for internal communication through GMS. The following procedure describes how to set up traffic separation.

To configure multi-homed machines for GMS without traffic separation, skip the steps or commands that configure the EXTERNAL-ADDR system property, but perform the others.

To avoid having to restart the DAS or cluster, perform the following steps in the specified order.

To Set Up Traffic Separation

Create the system properties EXTERNAL-ADDR and GMS-BIND-INTERFACE-ADDRESS-c1 for the DAS.
- asadmin create-system-properties --target server EXTERNAL-ADDR=192.155.35.4
- asadmin create-system-properties --target server GMS-BIND-INTERFACE-ADDRESS-c1=10.12.152.20
Create the cluster with the default settings.
Use the following command:
```
asadmin create-cluster c1
```
A reference to a system property for GMS traffic is already set up by default in the gms-bind-interface-address cluster setting. The default value of this setting is ${GMS-BIND-INTERFACE-ADDRESS-cluster-name}.
When creating the clustered instances, configure the external and GMS IP addresses.
Use the following commands:
- asadmin create-instance --node localhost --cluster c1 --systemproperties EXTERNAL-ADDR=192.155.35.5:GMS-BIND-INTERFACE-ADDRESS-c1=10.12.152.30 i101
- asadmin create-instance --node localhost --cluster c1 --systemproperties EXTERNAL-ADDR=192.155.35.6:GMS-BIND-INTERFACE-ADDRESS-c1=10.12.152.40 i102
- asadmin create-instance --node localhost --cluster c1 --systemproperties EXTERNAL-ADDR=192.155.35.7:GMS-BIND-INTERFACE-ADDRESS-c1=10.12.152.50 i103

Set the address attribute of HTTP listeners to refer to the EXTERNAL-ADDR system properties.

Use the following commands:

asadmin set c1-config.network-config.network-listeners.network-listener.http-1.address=\${EXTERNAL-ADDR}
asadmin set c1-config.network-config.network-listeners.network-listener.http-2.address=\${EXTERNAL-ADDR}

Skip Navigation Links
Exit Print View
	Oracle GlassFish Server 3.1-3.1.1 High Availability Administration Guide