Skip Navigation Links | |
Exit Print View | |
Oracle GlassFish Server 3.1-3.1.1 High Availability Administration Guide |
1. High Availability in GlassFish Server
2. Setting Up SSH for Centralized Administration
3. Administering GlassFish Server Nodes
4. Administering GlassFish Server Clusters
About GlassFish Server Clusters
To Preconfigure Nondefault GMS Configuration Settings
To Change GMS Settings After Cluster Creation
To Check the Health of Instances in a Cluster
To Validate That Multicast Transport Is Available for a Cluster
Creating, Listing, and Deleting Clusters
To List All Clusters in a Domain
5. Administering GlassFish Server Instances
6. Administering Named Configurations
7. Configuring Web Servers for HTTP Load Balancing
8. Configuring HTTP Load Balancing
9. Upgrading Applications Without Loss of Availability
10. Configuring High Availability Session Persistence and Failover
11. Configuring Java Message Service High Availability
The Group Management Service (GMS) is an infrastructure component that is enabled for the instances in a cluster. When GMS is enabled, if a clustered instance fails, the cluster and the Domain Administration Server (DAS) are aware of the failure and can take action when failure occurs. Many features of GlassFish Server depend upon GMS. For example, GMS is used by the in-memory session replication, transaction service, and timer service features.
If server instances in a cluster are located on different machines, ensure that all the server instance machines and the DAS machine are on the same subnet and that multicast is enabled for the network. To test whether multicast is enabled, use the validate-multicast(1) subcommand.
GMS is a core service of the Shoal framework. For more information about Shoal, visit the Project Shoal home page.
The following topics are addressed here:
GlassFish Server has two types of GMS settings:
GMS cluster settings — These are determined during cluster creation. For more information about these settings, see To Create a Cluster.
GMS configuration settings — These are determined during configuration creation and are explained here.
The following GMS configuration settings are used in GMS for group discovery and failure detection:
Indicates the amount of time (in milliseconds) an instance's GMS module will wait during instance startup for discovering other members of the group.
The group-discovery-timeout-in-millis timeout value should be set to the default or higher. The default is 5000.
Indicates the maximum number of missed heartbeats that the health monitor counts before the instance can be marked as a suspected failure. GMS also tries to make a peer-to-peer connection with the suspected member. If the maximum number of missed heartbeats is exceeded and peer-to-peer connection fails, the member is marked as a suspected failure. The default is 3.
Indicates the frequency (in milliseconds) at which a heartbeat is sent by each server instance to the cluster.
The failure detection interval is the max-missed-heartbeats multiplied by the heartbeat-frequency-in-millis. Therefore, the combination of defaults, 3 multiplied by 2000 milliseconds, results in a failure detection interval of 6 seconds.
Lowering the value of heartbeat-frequency-in-millis below the default would result in more frequent heartbeat messages being sent out from each member. This could potentially result in more heartbeat messages in the network than a system needs for triggering failure detection protocols. The effect of this varies depending on how quickly the deployment environment needs to have failure detection performed. That is, the (lower) number of retries with a lower heartbeat interval would make it quicker to detect failures.
However, lowering this value could result in false positives because you could potentially detect a member as failed when, in fact, the member's heartbeat is reflecting the network load from other parts of the server. Conversely, a higher timeout interval results in fewer heartbeats in the system because the time interval between heartbeats is longer. As a result, failure detection would take a longer. In addition, a startup by a failed member during this time results in a new join notification but no failure notification, because failure detection and verification were not completed.
The default is 2000.
Indicates the verify suspect protocol's timeout used by the health monitor. After a member is marked as suspect based on missed heartbeats and a failed peer–to–peer connection check, the verify suspect protocol is activated and waits for the specified timeout to check for any further health state messages received in that time, and to see if a peer-to-peer connection can be made with the suspect member. If not, then the member is marked as failed and a failure notification is sent. The default is 1500.
Indicates the time it takes for the GMS to detect a hardware or network failure of a server instance. Be careful not to set this value too low. The smaller this timeout value is, the greater the chance of detecting false failures. That is, the instance has not failed but doesn't respond within the short window of time. The default is 10000.
The heartbeat frequency, maximum missed heartbeats, peer-to-peer connection-based failure detection, and the verify timeouts are all needed to ensure that failure detection is robust and reliable in GlassFish Server.
For the dotted names for each of these GMS configuration settings, see Dotted Names for GMS Settings. For the steps to specify these settings, see To Preconfigure Nondefault GMS Configuration Settings.
Below are sample get(1) subcommands to get all the GMS configuration settings (attributes associated with the referenced mycfg configuration) and GMS cluster settings (attributes and properties associated with a cluster named mycluster).
asadmin> get "configs.config.mycfg.group-management-service.*" configs.config.mycfg.group-management-service.failure-detection.heartbeat-frequency-in-millis=2000 configs.config.mycfg.group-management-service.failure-detection.max-missed-heartbeats=3 configs.config.mycfg.group-management-service.failure-detection.verify-failure-connect-timeout-in-millis=10000 configs.config.mycfg.group-management-service.failure-detection.verify-failure-waittime-in-millis=1500 configs.config.mycfg.group-management-service.group-discovery-timeout-in-millis=5000
asadmin> get clusters.cluster.mycluster clusters.cluster.mycluster.config-ref=mycfg clusters.cluster.mycluster.gms-bind-interface-address=${GMS-BIND-INTERFACE-ADDRESS-mycluster} clusters.cluster.mycluster.gms-enabled=true clusters.cluster.mycluster.gms-multicast-address=228.9.245.47 clusters.cluster.mycluster.gms-multicast-port=9833 clusters.cluster.mycluster.name=mycluster
asadmin> get "clusters.cluster.mycluster.property.*" clusters.cluster.mycluster.property.GMS_LISTENER_PORT=${GMS_LISTENER_PORT-mycluster} clusters.cluster.mycluster.property.GMS_MULTICAST_TIME_TO_LIVE=4 clusters.cluster.mycluster.property.GMS_LOOPBACK=false clusters.cluster.mycluster.property.GMS_TCPSTARTPORT=9090 clusters.cluster.mycluster.property.GMS_TCPENDPORT=9200
The last get subcommand displays only the properties that have been explicitly set.
For the steps to specify these settings, see To Preconfigure Nondefault GMS Configuration Settings and To Change GMS Settings After Cluster Creation.
You can preconfigure GMS with values different than the defaults without requiring a restart of the DAS and the cluster.
For example:
asadmin> copy-config default-config mycfg
For more information, see To Create a Named Configuration.
For example:
asadmin > set configs.config.mycfg.group-management-service.group-discovery-timeout-in-millis=8000 asadmin> set configs.config.mycfg.group-management-service.failure-detection.max-missed-heartbeats=5
For a complete list of the dotted names for these settings, see Dotted Names for GMS Settings.
For example:
asadmin> create-cluster --config mycfg mycluster
You can also set GMS cluster settings during this step. For more information, see To Create a Cluster.
For example:
asadmin> create-instance --node localhost --cluster mycluster instance01
asadmin> create-instance --node localhost --cluster mycluster instance02
For example:
asadmin> start-cluster mycluster
See Also
You can also view the full syntax and options of a subcommand by typing asadmin help subcommand at the command line.
To avoid the need to restart the DAS and the cluster, configure GMS configuration settings before cluster creation as explained in To Preconfigure Nondefault GMS Configuration Settings.
To avoid the need to restart the DAS and the cluster, configure the GMS cluster settings during cluster creation as explained in To Create a Cluster.
Changing any GMS settings using the set subcommand after cluster creation requires a domain administration server (DAS) and cluster restart as explained here.
Remote subcommands require a running server.
For example:
asadmin> get "configs.config.mycfg.group-management-service.*" configs.config.mycfg.group-management-service.failure-detection.heartbeat-frequency-in-millis=2000 configs.config.mycfg.group-management-service.failure-detection.max-missed-heartbeats=3 configs.config.mycfg.group-management-service.failure-detection.verify-failure-connect-timeout-in-millis=10000 configs.config.mycfg.group-management-service.failure-detection.verify-failure-waittime-in-millis=1500 configs.config.mycfg.group-management-service.group-discovery-timeout-in-millis=5000
For a complete list of the dotted names for these settings, see Dotted Names for GMS Settings.
For example:
asadmin> set configs.config.mycfg.group-management-service.group-discovery-timeout-in-millis=6000
For example:
asadmin> get configs.config.mycfg.group-management-service.group-discovery-timeout-in-millis
For example:
asadmin> stop-domain domain1
asadmin> start-domain domain1
For example:
asadmin> stop-cluster mycluster
asadmin> start-cluster mycluster
See Also
You can also view the full syntax and options of a subcommand by typing asadmin help subcommand at the command line.
The get-health subcommand only works when GMS is enabled. This is the quickest way to evaluate the health of a cluster and to detect if cluster is properly operating; that is, all members of the cluster are running and visible to DAS.
If multicast is not enabled for the network, all instances could be running (as shown by the list-instances(1) subcommand), yet isolated from each other. The get-health subcommand does not show the instances if they are running but cannot discover each other due to multicast not being configured properly. See To Validate That Multicast Transport Is Available for a Cluster.
Remote subcommands require a running server.
Example 4-1 Checking the Health of Instances in a Cluster
This example checks the health of a cluster named cluster1.
asadmin> get-health cluster1 instance1 started since Wed Sep 29 16:32:46 EDT 2010 instance2 started since Wed Sep 29 16:32:45 EDT 2010 Command get-health executed successfully.
See Also
You can also view the full syntax and options of the subcommand by typing asadmin help get-health at the command line.
Before You Begin
To test a specific multicast address, multicast port, or bind interface address, get this information beforehand using the get subcommand. Use the following subcommand to get the multicast address and port for a cluster named c1:
asadmin> get clusters.cluster.c1 clusters.cluster.c1.config-ref=mycfg clusters.cluster.c1.gms-bind-interface-address=${GMS-BIND-INTERFACE-ADDRESS-c1} clusters.cluster.c1.gms-enabled=true clusters.cluster.c1.gms-multicast-address=228.9.174.162 clusters.cluster.c1.gms-multicast-port=5383 clusters.cluster.c1.name=c1
Use the following subcommand to get the bind interface address of a server instance named i1that belongs to a cluster named c1, if this system property has been set:
asadmin> get servers.server.i1.system-property.GMS-BIND-INTERFACE-ADDRESS-c1 servers.server.i1.system-property.GMS-BIND-INTERFACE-ADDRESS-c1.name=GMS-BIND-INTERFACE-ADDRESS-c1 servers.server.i1.system-property.GMS-BIND-INTERFACE-ADDRESS-c1.value=10.12.152.30
For information on how to set this system property, see Using the Multi-Homing Feature With GMS.
Note - Do not run the validate-multicast subcommand using the DAS and cluster's multicast address and port values while the DAS and cluster are running. Doing so results in an error.
The validate-multicast subcommand must be run at the same time on two or more machines to validate whether multicast messages are being received between the machines.
Example 4-2 Validating That Multicast Transport Is Available for a Cluster
This example checks whether multicast transport is available for a cluster named c1.
Run from host sr1:
asadmin> validate-multicast Will use port 2048 Will use address 228.9.3.1 Will use bind interface null Will use wait period 2,000 (in milliseconds) Listening for data... Sending message with content "sr1" every 2,000 milliseconds Received data from sr1 (loopback) Received data from sr2 Exiting after 20 seconds. To change this timeout, use the --timeout command line option. Command validate-multicast executed successfully.
Run from host sr2:
asadmin> validate-multicast Will use port 2048 Will use address 228.9.3.1 Will use bind interface null Will use wait period 2,000 (in milliseconds) Listening for data... Sending message with content "sr2" every 2,000 milliseconds Received data from sr2 (loopback) Received data from sr1 Exiting after 20 seconds. To change this timeout, use the --timeout command line option. Command validate-multicast executed successfully.
Next Steps
As long as all machines see each other, multicast is validated to be working properly across the machines. If the machines are not seeing each other, set the --bindaddress option explicitly to ensure that all machines are using interface on same subnet, or increase the --timetolive option from the default of 4. If these changes fail to resolve the multicast issues, ask the network administrator to verify that the network is configured so the multicast messages can be seen between all the machines used to run the cluster.
See Also
You can also view the full syntax and options of the subcommand by typing asadmin help get-health at the command line.
Multi-homing enables GlassFish Server clusters to be used in an environment that uses multiple Network Interface Cards (NICs). A multi-homed host has multiple network connections, of which the connections may or may not be the on same network. Multi-homing provides the following benefits:
Provides redundant network connections within the same subnet. Having multiple NICs ensures that one or more network connections are available for communication.
Supports communication across two or more different subnets. The DAS and all server instances in the same cluster must be on the same subnet for GMS communication, however.
Binds to a specific IPv4 address and receives GMS messages in a system that has multiple IP addresses configured. The responses for GMS messages received on a particular interface will also go out through that interface.
Supports separation of external and internal traffic.
You can separate the internal traffic resulting from GMS from the external traffic. Traffic separation enables you plan a network better and augment certain parts of the network, as required.
Consider a simple cluster, c1, with three instances, i101, i102, and i103. Each instance runs on a different machine. In order to separate the traffic, the multi-homed machine should have at least two IP addresses belonging to different networks. The first IP as the external IP and the second one as internal IP. The objective is to expose the external IP to user requests, so that all the traffic from the user requests would be through them. The internal IP is used only by the cluster instances for internal communication through GMS. The following procedure describes how to set up traffic separation.
To configure multi-homed machines for GMS without traffic separation, skip the steps or commands that configure the EXTERNAL-ADDR system property, but perform the others.
To avoid having to restart the DAS or cluster, perform the following steps in the specified order.
asadmin create-system-properties --target server EXTERNAL-ADDR=192.155.35.4
asadmin create-system-properties --target server GMS-BIND-INTERFACE-ADDRESS-c1=10.12.152.20
Use the following command:
asadmin create-cluster c1
A reference to a system property for GMS traffic is already set up by default in the gms-bind-interface-address cluster setting. The default value of this setting is ${GMS-BIND-INTERFACE-ADDRESS-cluster-name}.
Use the following commands:
asadmin create-instance --node localhost --cluster c1 --systemproperties EXTERNAL-ADDR=192.155.35.5:GMS-BIND-INTERFACE-ADDRESS-c1=10.12.152.30 i101
asadmin create-instance --node localhost --cluster c1 --systemproperties EXTERNAL-ADDR=192.155.35.6:GMS-BIND-INTERFACE-ADDRESS-c1=10.12.152.40 i102
asadmin create-instance --node localhost --cluster c1 --systemproperties EXTERNAL-ADDR=192.155.35.7:GMS-BIND-INTERFACE-ADDRESS-c1=10.12.152.50 i103
Use the following commands:
asadmin set c1-config.network-config.network-listeners.network-listener.http-1.address=\${EXTERNAL-ADDR} asadmin set c1-config.network-config.network-listeners.network-listener.http-2.address=\${EXTERNAL-ADDR}