C H A P T E R 6 |
This chapter describes how to stop and start the Netra HA Suite software, a node, or a cluster. This chapter contains the following sections:
Maintenance on a peer node can disrupt communication between this node and services and applications running on other peer nodes. During maintenance, you must isolate a node from the cluster by starting the node without the Foundation Services. After maintenance, reintegrate the node into the cluster by restarting the Foundation Services.
Log in as superuser to the node on which you want to stop the Netra HA Suite software.
Create the not_configured file on the node.
# touch /etc/opt/SUNWcgha/not_configured |
# touch /etc/opt/sun/nhas/not_configured |
Reboot the node as described in To Perform a Clean Reboot of a Solaris OS Node or To Perform a Clean Reboot of a Linux Node.
The node restarts without the Foundation Services running. If the node is the master node, this procedure causes a failover.
Verify that the Foundation Services are not running:
# pgrep -x nhcmmd |
If the Foundation Services have been stopped, no process identifier should appear for the nhcmmd daemon.
To Stop and Restart the Foundation Services Without Stopping the Solaris OS |
Use this procedure to restart the Foundation Services when the Solaris OS does not need to come down (to apply a new patch, for example).
To Stop and Restart the Foundation Services Without Stopping Linux |
Use this procedure to restart the Foundation Services when Linux does not need to come down (to apply a new patch, for example).
Use this procedure to restart the Foundation Services on a node after performing the procedure in To Start a Node Without the Foundation Services.
Log in as superuser to the node on which you want to restart the Foundation Services.
Check that the not_configured file is not present.
The file is located at /etc/opt/SUNWcgha/not_configured on Solaris systems, and /etc/opt/sun/nhas/not_configured on Linux systems. If that file is present, delete it.
Reboot the node as described in To Perform a Clean Reboot of a Solaris OS Node or in To Perform a Clean Reboot of a Linux Node, depending on the OS your system uses.
Verify the configuration of the node:
# nhadm check configuration |
If the node is configured correctly, the nhadm command does not encounter any errors.
For information about the nhadm command, see the nhadm1M man page.
Verify that the services have started correctly:
# nhadm check starting |
If the Foundation Services have started correctly, the nhadm command does not encounter any errors.
Sometimes you need to stop Daemon Monitoring to investigate why a monitored daemon has failed. This section describes how to stop and restart Daemon Monitoring.
For information about the causes of daemon failure at startup and runtime, see the Netra High Availability Suite 3.0 1/08 Foundation Services Troubleshooting Guide.
This procedure stops Daemon Monitoring. On reboot, Daemon Monitoring is not automatically restarted.
Log in as superuser to the node on which you want to stop the monitoring daemon.
If the node is running the Solaris OS:
# touch /etc/opt/SUNWcgha/not_under_pmd_control |
# touch /etc/opt/sun/nhas/not_under_pmd_control |
Reboot the node as described in To Perform a Clean Reboot of a Solaris OS Node or in To Perform a Clean Reboot of a Linux Node, depending on the OS your system uses.
The Foundation Services start, and the OS and Netra HA Suite daemons that were monitored are no longer monitored.
If Daemon Monitoring was stopped using To Stop Daemon Monitoring, restart Daemon Monitoring as follows:
Log in to the node on which you want to restart the Daemon Monitoring.
If the node is running the Solaris OS:
# rm /etc/opt/SUNWcgha/not_under_pmd_control |
# rm /etc/opt/sun/nhas/not_under_pmd_control |
Reboot the node as described in To Perform a Clean Reboot of a Solaris OS Node or in To Perform a Clean Reboot of a Linux Node, depending on the OS your system uses.
The Foundation Services start and are monitored by the Daemon Monitor.
This section describes how to shut down and restart a node. The consequences of stopping a node depend on the role of the node. If you shut down a master-eligible node, you no longer have a redundant cluster.
This section describes how to shut down a master node, a vice-master node, a diskless node, and a dataless node.
Before shutting down the master node, perform a switchover as described in To Trigger a Switchover With nhcmmstat. The vice-master node becomes the new master node, and the old master node becomes the new vice-master node. Then, shut down the new vice-master node as described in To Shut Down the Vice-Master Node.
To shut down the master node without first performing a switchover, do the following:
Shut down the master node as described in To Perform a Clean Power off of a Solaris Node or To Perform a Clean Power off of a Linux Node , depending on the OS your system uses.
The vice-master node becomes the master node. Because there are only two master-eligible nodes in the cluster and one is shut down, your cluster is not highly available. To restore high availability, restart the stopped node.
Shut down the vice-master node as described in To Perform a Clean Power off of a Solaris Node or To Perform a Clean Power off of a Linux Node , depending on the OS your system uses.
Because there are only two master-eligible nodes in the cluster and one is shut down, your cluster is not highly available. To restore high availability, restart the stopped node.
Shut down the node as described in To Perform a Clean Power off of a Solaris Node or To Perform a Clean Power off of a Linux Node , depending on the OS your system uses.
When a diskless node or dataless node is shut down, there is no impact on the roles of the other peer nodes.
This section describes how to restart a node that has been stopped by one of the procedures in Shutting Down a Node.
Note - For x64 platforms, refer to the hardware documentation for information about performing tasks that reference OBP commands and that, therefore, apply only to the SPARC architecture. |
This section describes how to shut down and restart a cluster.
Identify the role of each peer node:
# nhcmmstat -c all |
Shut down each diskless and dataless node as described in To Perform a Clean Power off of a Linux Node .
Verify that the vice-master node is synchronized with the master node (not applicable for shared disk configurations):
For versions of the Solaris OS earlier than version 10:
# /usr/opt/SUNWesm/sbin/scmadm -S -M |
For the Solaris 10 OS and later:
# /usr/sbin/dsstat 1 |
# drbdadm cstate all |
Shut down the vice-master node by logging in to the vice-master node and following the steps provided in To Perform a Clean Power off of a Solaris Node or in To Perform a Clean Power off of a Linux Node , depending on the OS your system uses.
Shut down the master node by logging in to the master node and following the steps provided in To Perform a Clean Power off of a Solaris Node or in To Perform a Clean Power off of a Linux Node , depending on the OS your system uses.
For further information about the init command, see the init1M man pages.
This procedure describes how to restart a cluster that has been shut down as described in To Shut Down a Cluster.
Access the the master node’s system console and type the following:
ok> boot |
Note - For x64 platforms, refer to the hardware documentation for information about performing tasks that reference OpenBoot™ PROM (OBP) commands and, therefore, apply only to the SPARC architecture. |
When the node has finished booting, verify that the master node is correctly configured:
# nhadm check configuration |
Access the vice-master node’s system console and type the following:
ok> boot |
When the node has finished booting, verify that the vice-master node is correctly configured:
# nhadm check configuration |
Access the system consoles of each diskless or dataless node and type the following:
ok> boot |
When the nodes have finished booting, verify that each node is correctly configured:
# nhadm check configuration |
From any node in the cluster, verify that the cluster has started up successfully:
# nhadm check starting |
Confirm that each node has the same role it had before it was shut down.
Before you perform a switchover, verify that the master and vice-master disks are synchronized, as described in To Verify That the Master Node and Vice-Master Node Are Synchronized. To trigger a switchover, perform the following procedure.
If the master node and the vice-master node both act as master nodes, this error is called split brain. For information about how to recover from split brain at startup and at runtime, see the Netra High Availability Suite 3.0 1/08 Foundation Services Troubleshooting Guide.
The following procedure is specific to IP-replicated clusters because a split brain error is unlikely to happen with a shared disk configuration. For shared disk, just check that the configuration is normal and then reboot.
Stop all of the nodes in the cluster as described in To Perform a Clean Power off of a Solaris Node .
Boot both of the master-eligible nodes in single-user mode.
ok> boot -s |
Note - For x64 platforms, refer to the hardware documentation for information about performing tasks that reference OBP commands and, therefore, apply only to the SPARC architecture. |
Confirm that the master-eligible nodes are configured correctly.
Boot the nodes in the following order:
Boot the first master-eligible node. This node has the most up-to-date set of data.
Caution - The node that becomes the vice-master node will have the recent file system data erased. |
Confirm that the first master-eligible node has become the master node.
Confirm that the second master-eligible node has become the vice-master node.
Wait until the master node and vice-master node are synchronized.
Stop all peer nodes in the cluster as described in To Perform a Clean Power off of a Linux Node .
Restart both of the master-eligible nodes with Netra HA Suite software disabled. Note which node is master and which node is vice-master before restarting the nodes.
# touch /etc/opt/sun/nhas/not_configured # reboot -n -f |
Confirm that the master-eligible nodes are configured correctly.
For each master-eligible node, do the following:
Reset the DRBD replication configuration:
# drbdadm secondary all |
# drbdadm primary all # drbdadm invalidate_remote all |
This will trigger a full re-synchronization from the master node to the vice-master node.
Caution - The vice-master node will have the recent file system data erased. |
Wait until the master node and vice-master node are synchronized. This is a full re-synchronization and might take some time.
Remove the not_configured file on both the master and vice-master node:
# rm /etc/opt/sun/nhas/not_configured |
Copyright © 2008, Sun Microsystems, Inc. All rights reserved.