C H A P T E R  4

Running Administration Tasks on the Cluster

After you have installed the software on the cluster, run some administration tasks to check that the cluster is functioning correctly and to further evaluate the product.

To check your cluster, perform the following administration tasks:


Checking the Cluster Nodes

The Foundation Services product is delivered with tools to check different aspects of a cluster, including the status of cluster nodes, the network connection between nodes, and the IP addresses of nodes.

procedure icon  To Check the Status of the Cluster Nodes

You can check the nodes of your cluster with the nhcmmstat command.

  1. Log in to a master-eligible node as superuser.

  2. Check the nodes by using the nhcmmstat command.


    # nhcmmstat -c all
    Executed Command: all
    ------------------------------
    node_id     = 10   [This is the current node]
    domain_id   = 250
    name        = node10
    role        = MASTER
    qualified   = YES
    synchro.    = READY
    frozen      = NO
    excluded    = NO
    eligible    = YES
    incarn.     = 1038420771 (27/11/2002 - 19:12:51)
    swload_id   = 1
    CGTP @      = 10.240.3.10
    ------------------------------
    ------------------------------
    node_id     = 30
    domain_id   = 250
    name        = node30
    role        = IN
    qualified   = YES
    synchro.    = READY
    frozen      = NO
    excluded    = NO
    eligible    = NO
    incarn.     = 1038422116 (27/11/2002 - 19:35:16)
    swload_id   = 1
    CGTP @      = 10.240.3.30
    ------------------------------
    ------------------------------
    node_id     = 20
    domain_id   = 2540
    name        = node20
    role        = VICE-MASTER
    qualified   = YES
    synchro.    = READY
    frozen      = NO
    excluded    = NO
    eligible    = YES
    incarn.     = 1038420945 (27/11/2002 - 19:15:45)
    swload_id   = 1
    CGTP @      = 10.240.3.20
    ------------------------------
    

    In the preceding example, the output from the nhcmmstat command displays information about all the peer nodes in the console window. This information includes the role of each node. The peer nodes must include the master and vice-master nodes.

    For more information on nhcmmstat, see the nhcmmstat(1M) man page.

procedure icon  To Check the Network Connection Between Nodes

You can check that the cluster network is functioning correctly with the nhadm command.

  1. Log in to a peer node as superuser.

  2. Verify that the nodes in the cluster are communicating through a network.

    On the Solaris OS:


    # /opt/SUNWcgha/sbin/nhadm check
    

    On Linux OS:


    # /opt/sun/sbin/nhadm check
    

    If any peer node is not accessible from any other peer node, the nhadm command displays an error message in the console window.

    For more information, see the Netra High Availability Suite 3.0 1/08 Foundation Services Cluster Administration Guide.

procedure icon  To Check Node Addresses

Each node has an IP address assigned to the NIC0, NIC1, and cgtp0 network interfaces. To identify and ping each network interface of a node, follow this procedure.

  1. Log in to the node that you want to examine.

  2. Type the ifconfig command.


    # ifconfig -a
    

    The ifconfig command displays configuration information about the network interfaces to the console window. Sample output for the ifconfig command on a peer node is as follows:


    hme0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 \
    index 1
            inet 10.250.1.30 netmask ffffff00 broadcast 10.250.1.255
            ether 8:0:20:f9:b4:b0 
    lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 2
            inet 127.0.0.1 netmask ff000000 
    hme1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
            inet 10.250.2.30 netmask ffffff00 broadcast 10.250.2.255
            ether 8:0:20:f9:b4:b1 
    cgtp0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
            inet 10.250.3.30 netmask ffffff00 broadcast 10.250.3.255
            ether 0:0:0:0:0:0 
    

    Each peer node has at least three network interfaces configured. If a node has external access configured or if the node is the master, more network interfaces are displayed by the ifconfig command.

  3. Retrieve the cluster ID, that is, the domainid, by using the output from the ifconfig command.

    The domainid in this example is 250.

  4. Retrieve the node ID, that is, the nodeid, by using the output from the ifconfig command.

    The nodeid in this example is 30.

  5. Retrieve the network interface names and corresponding IP addresses by using the output from the ifconfig command.

    The network interfaces NIC0 and NIC1 in this example are the physical interfaces hme0 and hme1, respectively. The third interface is the virtual physical interface, cgtp0.

    The IP addresses for the three network interfaces in this example are as follows:


    hme0 10.250.1.30
    hme1 10.250.2.30
    cgtp0 10.250.3.30

    The Ethernet addresses for NIC0 and NIC1 in this example are as follows:


    hme0 8:0:20:f9:b4:b0
    hme1 8:0:20:f9:b4:b1

  6. Log in to another peer node as superuser.

  7. Ping each network interface address of the node 30.


    # ping 10.250.1.30
    # ping 10.250.2.30
    # ping 10.250.3.30
    


Managing Switchovers and Failovers

You can trigger a switchover to swap the master and vice-master roles of the master-eligible nodes. A switchover is useful when you plan to take the master node down for maintenance. To trigger a switchover, see To Trigger a Switchover.

However, if there is a problem on the master node, the master role fails over automatically to the vice-master node. In this case, the master and vice-master roles are also swapped, but because the cause is an unplanned problem, the swap is called a failover. To cause a failover, see To Reboot the Master Node Causing a Failover.

procedure icon  To Trigger a Switchover

  1. Log in to a peer node as superuser.

  2. Identify the master node.

    On the Solaris OS:


    # /opt/SUNWcgha/sbin/nhcmmstat -c all
    

    On Linux:


    # /opt/sun/sbin/nhcmmstat -c all
    

    The nhcmmstat command prints information on each peer node to the console window.

  3. Log in to the master node as superuser.

  4. Trigger a switchover.

    On the Solaris OS:


    # /opt/SUNWcgha/sbin/nhcmmstat -c so
    

    On Linux:


    # /opt/sun/sbin/nhcmmstat -c so
    

    If there is a vice-master node qualified to become master in the cluster, this node is elected master. The old master node becomes the vice-master node. If there is no potential master, nhcmmstat does not perform a switchover.

  5. After the switchover is complete, verify that the roles of the master and vice-master nodes have been switched.

    On the Solaris OS:


    # /opt/SUNWcgha/sbin/nhcmmstat -c vice
    

    On Linux:


    # /opt/sun/sbin/nhcmmstat -c vice
    

    If the switchover is successful, the current node is the vice master. This command also verifies that the current node is synchronized with the new master node.

  6. Verify the cluster configuration.

    On the Solaris OS:


    # /opt/SUNWcgha/sbin/nhadm check
    

    On Linux:


    # /opt/sun/sbin/nhadm check
    

    For more information on nhcmmstat, see the nhcmmstat(1M) man page.

procedure icon  To Reboot the Master Node Causing a Failover

If you reboot the master node, you trigger a failover.

  1. Log in to a peer node as superuser.

  2. Run the nhcmmstat command to identify the master node.

    On the Solaris OS:


    # /opt/SUNWcgha/sbin/nhcmmstat -c all
    

    On Linux:


    # /opt/sun/sbin/nhcmmstat -c all
    

  3. Log in to the master node as superuser.

  4. Shut down the master node.



    Note - For detailed information about shutting down the node on the operating system version in use at your site, refer to the Netra High Availability Suite 3.0 1/08 Foundation Services Cluster Administration Guide.



    The vice-master node becomes the master. Because one of the two master-eligible nodes in the cluster is shut down, you lose the redundancy of the cluster. To recover redundancy, restart the stopped node.

  5. Log in to a peer node as superuser.

  6. Verify that the vice-master node became the master node when the old master node was shut down.


    # nhcmmstat -c master
    Executed Command: master
    ------------------------------
    node_id     = 20   [This is the current node]
    domain_id   = 250
    name        = node20
    role        = MASTER
    qualified   = YES
    synchro.    = NEEDED !!!
    frozen      = NO
    excluded    = NO
    eligible    = YES
    incarn.     = 1038481013 (28/11/2002 - 11:56:53)
    swload_id   = 1
    CGTP @      = 10.250.3.20
    ------------------------------
    

    The output shows that the vice-master node is now the master node. In addition, the new master node displays a requirement for synchronizing its state with the vice-master node.

  7. Restart the old master node, which you shut down in Step 4.

    This node now automatically becomes the vice-master node.

  8. Run the nhcmmstat command to verify that the current node is the vice-master node.


    # nhcmmstat -c all
    Executed Command: all
    ------------------------------
    node_id     = 30
    domain_id   = 250
    name        = node30
    role        = IN
    qualified   = YES
    synchro.    = READY
    frozen      = NO
    excluded    = NO
    eligible    = NO
    incarn.     = 1038422116 (27/11/2002 - 19:35:16)
    swload_id   = 1
    CGTP @      = 10.250.3.30
    ------------------------------
    ------------------------------
    node_id     = 20 
    domain_id   = 250
    name        = node20
    role        = MASTER
    qualified   = YES
    synchro.    = READY
    frozen      = NO
    excluded    = NO
    eligible    = YES
    incarn.     = 1038481013 (28/11/2002 - 11:56:53)
    swload_id   = 1
    CGTP @      = 10.250.3.20
    ------------------------------
    ------------------------------
    node_id     = 10   [This is the current node]
    domain_id   = 250
    name        = node10
    role        = VICE-MASTER
    qualified   = YES
    synchro.    = READY
    frozen      = NO
    excluded    = NO
    eligible    = YES
    incarn.     = 1038481383 (28/11/2002 - 12:03:03)
    swload_id   = 1
    CGTP @      = 10.250.3.10
    ------------------------------
    

  9. Log in to the new vice-master node as superuser.

  10. Verify that the node has started correctly.

    On the Solaris OS:


    # /opt/SUNWcgha/sbin/nhadm check
    

    On Linux:


    # /opt/sun/sbin/nhadm check
    

    For more information on the tests run by nhadm check, see the nhadm(1M) man page.