Checking the Cluster Nodes

The Foundation Services product is delivered with tools to check different aspects of a cluster, including the status of cluster nodes, the network connection between nodes, and the IP addresses of nodes.

To Check the Status of the Cluster Nodes

You can check the nodes of your cluster with the nhcmmstat command.

Check the nodes by using the nhcmmstat command.

# nhcmmstat -c all
Executed Command: all
------------------------------
node_id     = 10   [This is the current node]
domain_id   = 250
name        = node10
role        = MASTER
qualified   = YES
synchro.    = READY
frozen      = NO
excluded    = NO
eligible    = YES
incarn.     = 1038420771 (27/11/2002 - 19:12:51)
swload_id   = 1
CGTP @      = 10.240.3.10
------------------------------
------------------------------
node_id     = 30
domain_id   = 250
name        = node30
role        = IN
qualified   = YES
synchro.    = READY
frozen      = NO
excluded    = NO
eligible    = NO
incarn.     = 1038422116 (27/11/2002 - 19:35:16)
swload_id   = 1
CGTP @      = 10.240.3.30
------------------------------
------------------------------
node_id     = 20
domain_id   = 2540
name        = node20
role        = VICE-MASTER
qualified   = YES
synchro.    = READY
frozen      = NO
excluded    = NO
eligible    = YES
incarn.     = 1038420945 (27/11/2002 - 19:15:45)
swload_id   = 1
CGTP @      = 10.240.3.20
------------------------------

In the preceding example, the output from the nhcmmstat command displays information about all the peer nodes in the console window. This information includes the role of each node. The peer nodes must include the master and vice-master nodes.

For more information on nhcmmstat, see the nhcmmstat(1M) man page.

To Check the Network Connection Between Nodes

You can check that the cluster network is functioning correctly with the nhadm command.

Verify that the nodes in the cluster are communicating through a network.

On the Solaris OS:
# /opt/SUNWcgha/sbin/nhadm check
On Linux OS:
# /opt/sun/sbin/nhadm check
If any peer node is not accessible from any other peer node, the nhadm command displays an error message in the console window.

For more information, see the Netra High Availability Suite 3.0 1/08 Foundation Services Cluster Administration Guide.

To Check Node Addresses

Each node has an IP address assigned to the NIC0, NIC1, and cgtp0 network interfaces. To identify and ping each network interface of a node, follow this procedure.

Type the ifconfig command.

# ifconfig -a

The ifconfig command displays configuration information about the network interfaces to the console window. Sample output for the ifconfig command on a peer node is as follows:

hme0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 \
index 1
        inet 10.250.1.30 netmask ffffff00 broadcast 10.250.1.255
        ether 8:0:20:f9:b4:b0 
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 2
        inet 127.0.0.1 netmask ff000000 
hme1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 10.250.2.30 netmask ffffff00 broadcast 10.250.2.255
        ether 8:0:20:f9:b4:b1 
cgtp0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet 10.250.3.30 netmask ffffff00 broadcast 10.250.3.255
        ether 0:0:0:0:0:0

Each peer node has at least three network interfaces configured. If a node has external access configured or if the node is the master, more network interfaces are displayed by the ifconfig command.

Retrieve the cluster ID, that is, the domainid, by using the output from the ifconfig command.

The domainid in this example is 250.

Retrieve the node ID, that is, the nodeid, by using the output from the ifconfig command.

The nodeid in this example is 30.

Retrieve the network interface names and corresponding IP addresses by using the output from the ifconfig command.

The network interfaces NIC0 and NIC1 in this example are the physical interfaces hme0 and hme1, respectively. The third interface is the virtual physical interface, cgtp0.

The IP addresses for the three network interfaces in this example are as follows:

hme0 10.250.1.30

hme1 10.250.2.30

cgtp0 10.250.3.30

The Ethernet addresses for NIC0 and NIC1 in this example are as follows:

hme0 8:0:20:f9:b4:b0

hme1 8:0:20:f9:b4:b1

Ping each network interface address of the node 30.

# ping 10.250.1.30
# ping 10.250.2.30
# ping 10.250.3.30

Managing Switchovers and Failovers

You can trigger a switchover to swap the master and vice-master roles of the master-eligible nodes. A switchover is useful when you plan to take the master node down for maintenance. To trigger a switchover, see To Trigger a Switchover.

However, if there is a problem on the master node, the master role fails over automatically to the vice-master node. In this case, the master and vice-master roles are also swapped, but because the cause is an unplanned problem, the swap is called a failover. To cause a failover, see To Reboot the Master Node Causing a Failover.

To Trigger a Switchover

Identify the master node.

On the Solaris OS:
# /opt/SUNWcgha/sbin/nhcmmstat -c all
On Linux:
# /opt/sun/sbin/nhcmmstat -c all
The nhcmmstat command prints information on each peer node to the console window.

Trigger a switchover.

On the Solaris OS:
# /opt/SUNWcgha/sbin/nhcmmstat -c so
On Linux:
# /opt/sun/sbin/nhcmmstat -c so
If there is a vice-master node qualified to become master in the cluster, this node is elected master. The old master node becomes the vice-master node. If there is no potential master, nhcmmstat does not perform a switchover.

After the switchover is complete, verify that the roles of the master and vice-master nodes have been switched.

On the Solaris OS:
# /opt/SUNWcgha/sbin/nhcmmstat -c vice
On Linux:
# /opt/sun/sbin/nhcmmstat -c vice
If the switchover is successful, the current node is the vice master. This command also verifies that the current node is synchronized with the new master node.

Verify the cluster configuration.

On the Solaris OS:
# /opt/SUNWcgha/sbin/nhadm check
On Linux:
# /opt/sun/sbin/nhadm check
For more information on nhcmmstat, see the nhcmmstat(1M) man page.

To Reboot the Master Node Causing a Failover

If you reboot the master node, you trigger a failover.

Run the nhcmmstat command to identify the master node.

On the Solaris OS:
# /opt/SUNWcgha/sbin/nhcmmstat -c all
On Linux:
# /opt/sun/sbin/nhcmmstat -c all

Shut down the master node.

Note - For detailed information about shutting down the node on the operating system version in use at your site, refer to the Netra High Availability Suite 3.0 1/08 Foundation Services Cluster Administration Guide.

The vice-master node becomes the master. Because one of the two master-eligible nodes in the cluster is shut down, you lose the redundancy of the cluster. To recover redundancy, restart the stopped node.

Verify that the vice-master node became the master node when the old master node was shut down.

# nhcmmstat -c master
Executed Command: master
------------------------------
node_id     = 20   [This is the current node]
domain_id   = 250
name        = node20
role        = MASTER
qualified   = YES
synchro.    = NEEDED !!!
frozen      = NO
excluded    = NO
eligible    = YES
incarn.     = 1038481013 (28/11/2002 - 11:56:53)
swload_id   = 1
CGTP @      = 10.250.3.20
------------------------------

The output shows that the vice-master node is now the master node. In addition, the new master node displays a requirement for synchronizing its state with the vice-master node.

Restart the old master node, which you shut down in Step 4.

This node now automatically becomes the vice-master node.

Run the nhcmmstat command to verify that the current node is the vice-master node.

# nhcmmstat -c all
Executed Command: all
------------------------------
node_id     = 30
domain_id   = 250
name        = node30
role        = IN
qualified   = YES
synchro.    = READY
frozen      = NO
excluded    = NO
eligible    = NO
incarn.     = 1038422116 (27/11/2002 - 19:35:16)
swload_id   = 1
CGTP @      = 10.250.3.30
------------------------------
------------------------------
node_id     = 20 
domain_id   = 250
name        = node20
role        = MASTER
qualified   = YES
synchro.    = READY
frozen      = NO
excluded    = NO
eligible    = YES
incarn.     = 1038481013 (28/11/2002 - 11:56:53)
swload_id   = 1
CGTP @      = 10.250.3.20
------------------------------
------------------------------
node_id     = 10   [This is the current node]
domain_id   = 250
name        = node10
role        = VICE-MASTER
qualified   = YES
synchro.    = READY
frozen      = NO
excluded    = NO
eligible    = YES
incarn.     = 1038481383 (28/11/2002 - 12:03:03)
swload_id   = 1
CGTP @      = 10.250.3.10
------------------------------

Verify that the node has started correctly.

On the Solaris OS:
# /opt/SUNWcgha/sbin/nhadm check
On Linux:
# /opt/sun/sbin/nhadm check
For more information on the tests run by nhadm check, see the nhadm(1M) man page.

`hme0`	`10.250.1.30`
`hme1`	`10.250.2.30`
`cgtp0`	`10.250.3.30`

`hme0`	`8:0:20:f9:b4:b0`
`hme1`	`8:0:20:f9:b4:b1`

Running Administration Tasks on the Cluster

Checking the Cluster Nodes

Managing Switchovers and Failovers