Sun Cluster 3.1 System Administration Guide

Chapter 2 Shutting Down and Booting a Cluster

This chapter provides the procedures for shutting down and booting a cluster and individual cluster nodes.

For a high-level description of the related procedures in this chapter, see Table 2–1 and Table 2–2.

Shutting Down and Booting a Cluster Overview

The Sun Cluster scshutdown(1M) command stops cluster services in an orderly fashion and cleanly shuts down the entire cluster. You might do use the scshutdown command when moving the location of a cluster. You can also use the command to shut down the cluster if you have data corruption caused by an application error.

Note –

Use scshutdown instead of the shutdown or halt commands to ensure proper shutdown of the entire cluster. The Solaris shutdown command is used with the scswitch(1M) command to shut down individual nodes. See How to Shut Down a Cluster or Shutting Down and Booting a Single Cluster Node for more information.

The scshutdown command stops all nodes in a cluster by:

Taking all running resource groups offline.
Unmounting all cluster file systems.
Shutting down active device services.
Running init 0 and bringing all nodes to the OBP ok prompt.

Note –

If necessary, you can boot a node in non-cluster mode so that the node does not participate in cluster membership. Non-cluster mode is useful when installing cluster software or for performing certain administrative procedures. See How to Boot a Cluster Node in Non-Cluster Mode for more information.

Table 2–1 Task List: Shutting Down and Booting a Cluster


Task	For Instructions
Stop the cluster - Use scshutdown(1M)	How to Shut Down a Cluster
Start the cluster by booting all nodes. The nodes must have a working connection to the cluster interconnect to attain cluster membership.	How to Boot a Cluster
Reboot the cluster - Use `scshutdown` At the `ok` prompt, boot each node individually with the boot(1M) command. The nodes must have a working connection to the cluster interconnect to attain cluster membership.	How to Reboot a Cluster

How to Shut Down a Cluster

Caution –

Do not use send brk on a cluster console to shut down a cluster node. The command is not supported within a cluster. If you use send brk with the go at the ok prompt to reboot, the node panics.

If your cluster is running Oracle® Parallel Server or Real Application Clusters, shut down all instances of the database.

Refer to the Oracle Parallel Server/Real Application Clusters product documentation for shutdown procedures.

Become superuser on any node in the cluster.

Shut down the cluster immediately to OBP.

From a single node in the cluster, type the following command.
# scshutdown -g0 -y

Verify that all nodes have reached the ok prompt.

Do not power off any nodes until all cluster nodes are at the ok prompt.

If necessary, power off the nodes.

Example—Shutting Down a Cluster

The following example shows the console output when stopping normal cluster operation and bringing down all nodes to the ok prompt. The -g 0 option sets the shutdown grace period to zero, -y provides an automatic yes response to the confirmation question. Shutdown messages also appear on the consoles of the other nodes in the cluster.

# scshutdown -g0 -y
May 2 10:08:46 phys-schost-1 cl_runtime: WARNING: CMM monitoring disabled.
phys-schost-1# 
INIT: New run level: 0
The system is coming down.  Please wait.
System services are now being stopped.
/etc/rc0.d/K05initrgm: Calling scswitch -S (evacuate)
The system is down.
syncing file systems... done
Program terminated
ok

Where to Go From Here

See How to Boot a Cluster to restart a cluster that has been shut down.

How to Boot a Cluster

To start a cluster whose nodes have been shut down and are at the ok prompt, boot(1M) each node.

If you make configuration changes between shutdowns, start the node with the most current configuration first. Except in this situation, the boot order of the nodes does not matter.
ok boot
Messages are displayed on the booted nodes' consoles as cluster components are activated.

Note –
Cluster nodes must have a working connection to the cluster interconnect to attain cluster membership.

Verify that the nodes booted without error and are online.

The scstat(1M) command reports the nodes' status.
# scstat -n
Note –
If a cluster node's /var file system fills up, Sun Cluster might not be able to restart on that node. If this problem arises, see How to Repair a Full /var File System.

Example—Booting a Cluster

The following example shows the console output when booting node phys-schost-1 into the cluster. Similar messages appear on the consoles of the other nodes in the cluster.

ok boot
Rebooting with command: boot 
...
Hostname: phys-schost-1
Booting as part of a cluster
NOTICE: Node 1 with votecount = 1 added.
NOTICE: Node 2 with votecount = 1 added.
NOTICE: Node 3 with votecount = 1 added.
...
NOTICE: Node 1: attempting to join cluster
...
NOTICE: Node 2 (incarnation # 937690106) has become reachable.
NOTICE: Node 3 (incarnation # 937690290) has become reachable.
NOTICE: cluster has reached quorum.
NOTICE: node 1 is up; new incarnation number = 937846227.
NOTICE: node 2 is up; new incarnation number = 937690106.
NOTICE: node 3 is up; new incarnation number = 937690290.
NOTICE: Cluster members:   1  2  3
...

How to Reboot a Cluster

Run the scshutdown(1M) command to shut down the cluster, then boot the cluster with the boot(1M) command on each node.

(Optional). For a cluster that is running Oracle Parallel Server/Real Application Clusters, shut down all instances of the database.

Refer to the Oracle Parallel Server/Real Application Clusters product documentation for shutdown procedures.

Become superuser on any node in the cluster.

Shut down the cluster to OBP.

From a single node in the cluster, type the following command.
# scshutdown -g0 -y
Each node is shut down to the ok prompt.

Note –
Cluster nodes must have a working connection to the cluster interconnect to attain cluster membership.

Boot each node.

The order in which the nodes are booted does not matter unless you make configuration changes between shutdowns. If you make configuration changes between shutdowns, start the node with the most current configuration first.
ok boot
Messages appear on the booted nodes' consoles as cluster components are activated.

Verify that the nodes booted without error and are online.

The scstat command reports the nodes' status.
# scstat -n
Note –
If a cluster node's /var file system fills up, Sun Cluster might not be able to restart on that node. If this problem arises, see How to Repair a Full /var File System.

Example—Rebooting a Cluster

The following example shows the console output when stopping normal cluster operation, bringing down all nodes to the ok prompt, then restarting the cluster. The -g 0 option sets the grace period to zero, -y provides an automatic yes response to the confirmation question. Shutdown messages also appear on the consoles of other nodes in the cluster.

# scshutdown -g0 -y
May 2 10:08:46 phys-schost-1 cl_runtime: WARNING: CMM monitoring disabled.
phys-schost-1# 
INIT: New run level: 0
The system is coming down.  Please wait.
...
The system is down.
syncing file systems... done
Program terminated
ok boot
Rebooting with command: boot 
...
Hostname: phys-schost-1
Booting as part of a cluster
...
NOTICE: Node 1: attempting to join cluster
...
NOTICE: Node 2 (incarnation # 937690106) has become reachable.
NOTICE: Node 3 (incarnation # 937690290) has become reachable.
NOTICE: cluster has reached quorum.
...
NOTICE: Cluster members:   1  2  3
...
NOTICE: Node 1: joined cluster
...
The system is coming up.  Please wait.
checking ufs filesystems
...
reservation program successfully exiting
Print services started.
volume management starting.
The system is ready.
phys-schost-1 console login:
NOTICE: Node 1: joined cluster
...
The system is coming up.  Please wait.
checking ufs filesystems
...
reservation program successfully exiting
Print services started.
volume management starting.
The system is ready.
phys-schost-1 console login:

Shutting Down and Booting a Single Cluster Node

Note –

Use the scswitch(1M) command in conjunction with the Solaris shutdown(1M) command to shut down an individual node. Use the scshutdown command only when shutting down an entire cluster.

Table 2–2 Task Map: Shutting Down and Booting a Cluster Node


Task	For Instructions
Stop a cluster node - Use scswitch(1M) and shutdown(1M)	How to Shut Down a Cluster Node
Start a node The node must have a working connection to the cluster interconnect to attain cluster membership.	How to Boot a Cluster Node
Stop and restart (reboot) a cluster node - Use `scswitch` and `shutdown` The node must have a working connection to the cluster interconnect to attain cluster membership.	How to Reboot a Cluster Node
Boot a node so that the node does not participate in cluster membership - Use `scswitch` and `shutdown`, then boot -x	How to Boot a Cluster Node in Non-Cluster Mode

How to Shut Down a Cluster Node

Caution –

Do not use send brk on a cluster console to shut down a cluster node. Using send brk and entering go at the ok prompt to reboot causes a node to panic. This functionality is not supported within a cluster.

If you are running Oracle Parallel Server/Real Application Clusters, shut down all instances of the database.

Refer to the Oracle Parallel Server/Real Application Clusters product documentation for shutdown procedures.

Become superuser on the cluster node to be shut down.

Switch all resource groups, resources, and device groups from the node being shut down to other cluster members.

On the node to be shut down, type the following command.
# scswitch -S -h node
-S

Evacuates all device services and resource groups from the specified node.

-h node

Specifies the node from which you are switching resource groups and device groups.

Shut down the cluster node to OBP.

On the node to be shut down, type the following command.
# shutdown -g0 -y -i0

Verify that the cluster node has reached the ok prompt.

If necessary, power off the node.

Example—Shutting Down a Cluster Node

The following example shows the console output when shutting down node phys-schost-1. The -g0 option sets the grace period to zero, -y provides an automatic yes response to the confirmation question, and -i0 invokes run level 0 (zero). Shutdown messages for this node appear on the consoles of other nodes in the cluster.

# scswitch -S -h phys-schost-1
# shutdown -g0 -y -i0
May 2 10:08:46 phys-schost-1 cl_runtime: WARNING: CMM monitoring disabled.phys-schost-1# 
INIT: New run level: 0
The system is coming down.  Please wait.
Notice: rgmd is being stopped.
Notice: rpc.pmfd is being stopped.
Notice: rpc.fed is being stopped.
umount: /global/.devices/node@1 busy
umount: /global/phys-schost-1 busy
The system is down.
syncing file systems... done
Program terminated
ok

Where to Go From Here

See How to Boot a Cluster Node to restart a cluster node that has been shut down.

How to Boot a Cluster Node

Note –

Starting a cluster node can be affected by the quorum configuration. In a two-node cluster, you must have a quorum device configured so that the total quorum count for the cluster is three. You should have one quorum count for each node and one quorum count for the quorum device. In this situation, if the first node is shut down, the second node continues to have quorum and runs as the sole cluster member. For the first node to come back in the cluster as a cluster node, the second node must be up and running. The required cluster quorum count (two) must be present.

To start a cluster node that has been shut down, boot the node.
ok boot
Messages are displayed on all node consoles as cluster components are activated.

Note –
A cluster node must have a working connection to the cluster interconnect to attain cluster membership.

Verify that the node has booted without error, and is online.

The scstat(1M) command reports the status of a node.
# scstat -n
Note –
If a cluster node's /var file system fills up, Sun Cluster might not be able to restart on that node. If this problem arises, see How to Repair a Full /var File System.

Example—Booting a Cluster Node

The following example shows the console output when booting node phys-schost-1 into the cluster.

ok boot
Rebooting with command: boot 
...
Hostname: phys-schost-1
Booting as part of a cluster
...
NOTICE: Node 1: attempting to join cluster
...
NOTICE: Node 1: joined cluster
...
The system is coming up.  Please wait.
checking ufs filesystems
...
reservation program successfully exiting
Print services started.
volume management starting.
The system is ready.
phys-schost-1 console login:

How to Reboot a Cluster Node

If the cluster node is running Oracle Parallel Server/Real Application Clusters, shut down all instances of the database.

Refer to the Oracle Parallel Server/Real Application Clusters product documentation for shutdown procedures.

Become superuser on the cluster node to be shut down.

Shut down the cluster node by using the scswitch and shutdown commands.

Enter these commands on the node to be shut down. The -i 6 option with the shutdown command causes the node to reboot after the node shuts down to the ok prompt.
# scswitch -S -h node # shutdown -g0 -y -i6
Note –
Cluster nodes must have a working connection to the cluster interconnect to attain cluster membership.

Verify that the node has booted without error, and is online.
# scstat -n

Example—Rebooting a Cluster Node

The following example shows the console output when rebooting node phys-schost-1. Messages for this node, such as shutdown and startup notification, appear on the consoles of other nodes in the cluster.

# scswitch -S -h phys-schost-1
# shutdown -g0 -y -i6
May 2 10:08:46 phys-schost-1 cl_runtime: WARNING: CMM monitoring disabled.
phys-schost-1# 
INIT: New run level: 6
The system is coming down.  Please wait.
System services are now being stopped.
Notice: rgmd is being stopped.
Notice: rpc.pmfd is being stopped.
Notice: rpc.fed is being stopped.
umount: /global/.devices/node@1 busy
umount: /global/phys-schost-1 busy
The system is down.
syncing file systems... done
rebooting...
Resetting ... 
,,,
Sun Ultra 1 SBus (UltraSPARC 143MHz), No Keyboard
OpenBoot 3.11, 128 MB memory installed, Serial #5932401.
Ethernet address 8:8:20:99:ab:77, Host ID: 8899ab77.
...
Rebooting with command: boot
...
Hostname: phys-schost-1
Booting as part of a cluster
...
NOTICE: Node 1: attempting to join cluster
...
NOTICE: Node 1: joined cluster
...
The system is coming up.  Please wait.
The system is ready.
phys-schost-1 console login:

How to Boot a Cluster Node in Non-Cluster Mode

You can boot a node so that the node does not participate in the cluster membership, that is, in non-cluster mode. Non-cluster mode is useful when installing the cluster software or performing certain administrative procedures, such as patching a node.

Become superuser on the cluster node to be started in non-cluster mode.

Shut down the node by using the scswitch and shutdown commands.
# scswitch -S -h node # shutdown -g0 -y -i0

Verify that the node is at the ok prompt.

Boot the node in non-cluster mode by using the boot(1M) command with the -x option.
ok boot -x
Messages appear on the node's console stating that the node is not part of the cluster.

Example—Booting a Cluster Node in Non-Cluster Mode

The following example shows the console output when shutting down node phys-schost-1 then restarting the node in non-cluster mode. The -g0 option sets the grace period to zero, -y provides an automatic yes response to the confirmation question, and -i0 invokes run level 0 (zero). Shutdown messages for this node appear on the consoles of other nodes in the cluster.

# scswitch -S -h phys-schost-1
# shutdown -g0 -y -i0
May 2 10:08:46 phys-schost-1 cl_runtime: WARNING: CMM monitoring disabled.
phys-schost-1# 
...
rg_name = schost-sa-1 ...
offline node = phys-schost-2 ...
num of node = 0 ...
phys-schost-1# 
INIT: New run level: 0
The system is coming down.  Please wait.
System services are now being stopped.
Print services stopped.
syslogd: going down on signal 15
...
The system is down.
syncing file systems... done
WARNING: node 1 is being shut down.
Program terminated

ok boot -x
...
Not booting as part of cluster
...
The system is ready.
phys-schost-1 console login:

Repairing a Full `/var` File System

Both Solaris and Sun Cluster software write error messages to the /var/adm/messages file, which over time can fill the /var file system. If a cluster node's /var file system fills up, Sun Cluster might not be able to restart on that node. Additionally, you might not be able to log in to the node.

How to Repair a Full `/var` File System

If a node reports a full /var file system and continues to run Sun Cluster services, use this procedure to clear the full file system. Refer to “Viewing System Messages” in System Administration Guide: Advanced Administration for more information.

Become superuser on the cluster node with the full /var file system.

Clear the full file system.

For example, delete nonessential files that are contained in the file system.

Chapter 2 Shutting Down and Booting a Cluster

Shutting Down and Booting a Cluster Overview

How to Shut Down a Cluster

Example—Shutting Down a Cluster

Where to Go From Here

How to Boot a Cluster

Example—Booting a Cluster

How to Reboot a Cluster

Example—Rebooting a Cluster

Shutting Down and Booting a Single Cluster Node

How to Shut Down a Cluster Node

Example—Shutting Down a Cluster Node

Where to Go From Here

How to Boot a Cluster Node

Example—Booting a Cluster Node

How to Reboot a Cluster Node

Example—Rebooting a Cluster Node

How to Boot a Cluster Node in Non-Cluster Mode

Example—Booting a Cluster Node in Non-Cluster Mode

Repairing a Full /var File System

How to Repair a Full /var File System

Repairing a Full `/var` File System

How to Repair a Full `/var` File System