Sun Cluster System Administration Guide for Solaris OS

Shutting Down and Booting a Cluster Overview

The Sun Cluster scshutdown(1M) command stops cluster services in an orderly fashion and cleanly shuts down the entire cluster. You might do use the scshutdown command when moving the location of a cluster. You can also use the command to shut down the cluster if you have data corruption caused by an application error.


Note –

Use the scshutdown command instead of the shutdown or halt commands to ensure proper shutdown of the entire cluster. The Solaris shutdown command is used with the scswitch(1M) command to shut down individual nodes. See How to Shut Down a Cluster or Shutting Down and Booting a Single Cluster Node for more information.


The scshutdown command stops all nodes in a cluster by:

  1. Taking all running resource groups offline.

  2. Unmounting all cluster file systems.

  3. Shutting down active device services.

  4. Running init 0 and bringing all nodes to the OpenBootTM PROM ok prompt on a SPARC based system or to a boot subsystem on an x86 based system. Boot subsystems are described in more detail in Boot Subsystems in System Administration Guide: Basic Administration.


Note –

If necessary, you can boot a node in non-cluster mode so that the node does not participate in cluster membership. Non-cluster mode is useful when installing cluster software or for performing certain administrative procedures. See How to Boot a Cluster Node in Non-Cluster Mode for more information.


Table 3–1 Task List: Shutting Down and Booting a Cluster

Task 

For Instructions 

Stop the cluster 

    -Use scshutdown(1M)

See How to Shut Down a Cluster

Start the cluster by booting all nodes. 

The nodes must have a working connection to the cluster interconnect to attain cluster membership. 

See How to Boot a Cluster

Reboot the cluster 

    - Use scshutdown

At the ok prompt or the Select (b)oot or (i)nterpreter prompt on the Current Boot Parameters screen, boot each node individually with the boot(1M) or the b command.

The nodes must have a working connection to the cluster interconnect to attain cluster membership. 

See How to Reboot a Cluster

ProcedureHow to Shut Down a Cluster


Caution – Caution –

Do not use send brk on a cluster console to shut down a cluster node. The command is not supported within a cluster.


Steps
  1. SPARC: If your cluster is running Oracle Parallel Server or Real Application Clusters, shut down all instances of the database.

    Refer to the Oracle Parallel Server or Oracle Real Application Clusters product documentation for shutdown procedures.

  2. Become superuser on any node in the cluster.

  3. Shut down the cluster immediately.

    From a single node in the cluster, type the following command.


    # scshutdown -g0 -y
    
  4. Verify that all nodes are showing the ok prompt on a SPARC based system or a Boot Subsystem on an x86 based system.

    Do not power off any nodes until all cluster nodes are at the ok prompt on a SPARC based system or in a Boot Subsystem on an x86 based system.

  5. If necessary, power off the nodes.


Example 3–1 SPARC: Shutting Down a Cluster

The following example shows the console output when stopping normal cluster operation and bringing down all nodes so that the ok prompt is shown. The -g 0 option sets the shutdown grace period to zero, -y provides an automatic yes response to the confirmation question. Shutdown messages also appear on the consoles of the other nodes in the cluster.


# scshutdown -g0 -y
Wed Mar 10 13:47:32 phys-schost-1 cl_runtime: 
WARNING: CMM monitoring disabled.
phys-schost-1# 
INIT: New run level: 0
The system is coming down.  Please wait.
System services are now being stopped.
/etc/rc0.d/K05initrgm: Calling scswitch -S (evacuate)
The system is down.
syncing file systems... done
Program terminated
ok 


Example 3–2 x86: Shutting Down a Cluster

The following example shows the console output when stopping normal cluster operation and bringing down all nodes. The -g 0 option sets the shutdown grace period to zero, -y provides an automatic yes response to the confirmation question. Shutdown messages also appear on the consoles of the other nodes in the cluster.


# scshutdown -g0 -y
May  2 10:32:57 phys-schost-1 cl_runtime: 
WARNING: CMM: Monitoring disabled.  
root@phys-schost-1#
INIT: New run level: 0
The system is coming down.  Please wait.
System services are now being stopped.
/etc/rc0.d/K05initrgm: Calling scswitch -S (evacuate)
failfasts already disabled on node 1
Print services already stopped.
May  2 10:33:13 phys-schost-1 syslogd: going down on signal 15
The system is down.
syncing file systems... done
Type any key to continue 

See Also

See How to Boot a Cluster to restart a cluster that has been shut down.

ProcedureHow to Boot a Cluster

Steps
  1. To start a cluster whose nodes have been shut down and are at the ok prompt or at the Select (b)oot or (i)nterpreter prompt on the Current Boot Parameters screen, boot(1M) each node.

    If you make configuration changes between shutdowns, start the node with the most current configuration first. Except in this situation, the boot order of the nodes does not matter.

    • SPARC:


      ok boot
      
    • x86:


                            <<< Current Boot Parameters >>>
      Boot path: /pci@0,0/pci8086,2545@3/pci8086,1460@1d/pci8086,341a@7,1/
      sd@0,0:a
      Boot args:
      
      Type    b [file-name] [boot-flags] <ENTER>  to boot with options
      or      i <ENTER>                           to enter boot interpreter
      or      <ENTER>                             to boot with defaults
      
                        <<< timeout in 5 seconds >>>
      Select (b)oot or (i)nterpreter: b
      

    Messages are displayed on the booted nodes' consoles as cluster components are activated.


    Note –

    Cluster nodes must have a working connection to the cluster interconnect to attain cluster membership.


  2. Verify that the nodes booted without error and are online.

    The scstat(1M) command reports the nodes' status.


    # scstat -n
    

    Note –

    If a cluster node's /var file system fills up, Sun Cluster might not be able to restart on that node. If this problem arises, see How to Repair a Full /var File System.



Example 3–3 SPARC: Booting a Cluster

The following example shows the console output when booting node phys-schost-1 into the cluster. Similar messages appear on the consoles of the other nodes in the cluster.


ok boot
Rebooting with command: boot 
...
Hostname: phys-schost-1
Booting as part of a cluster
NOTICE: Node phys-schost-1 with votecount = 1 added.
NOTICE: Node phys-schost-2 with votecount = 1 added.
NOTICE: Node phys-schost-3 with votecount = 1 added.
...
NOTICE: Node phys-schost-1: attempting to join cluster
...
NOTICE: Node phys-schost-2 (incarnation # 937690106) has become reachable.
NOTICE: Node phys-schost-3 (incarnation # 937690290) has become reachable.
NOTICE: cluster has reached quorum.
NOTICE: node phys-schost-1 is up; new incarnation number = 937846227.
NOTICE: node phys-schost-2 is up; new incarnation number = 937690106.
NOTICE: node phys-schost-3 is up; new incarnation number = 937690290.
NOTICE: Cluster members: phys-schost-1 phys-schost-2 phys-schost-3.
...


Example 3–4 x86: Booting a Cluster

The following example shows the console output when booting node phys-schost-1 into the cluster. Similar messages appear on the consoles of the other nodes in the cluster.


ATI RAGE SDRAM BIOS P/N GR-xlint.007-4.330
*                                        BIOS Lan-Console 2.0
Copyright (C) 1999-2001  Intel Corporation
MAC ADDR: 00 02 47 31 38 3C
AMIBIOS (C)1985-2002 American Megatrends Inc.,
Copyright 1996-2002 Intel Corporation
SCB20.86B.1064.P18.0208191106
SCB2 Production BIOS Version 2.08
BIOS Build 1064
2 X Intel(R) Pentium(R) III CPU family      1400MHz
Testing system memory, memory size=2048MB
2048MB Extended Memory Passed
512K L2 Cache SRAM Passed
ATAPI CD-ROM SAMSUNG CD-ROM SN-124

Press <F2> to enter SETUP, <F12> Network

Adaptec AIC-7899 SCSI BIOS v2.57S4
(c) 2000 Adaptec, Inc. All Rights Reserved.
    Press <Ctrl><A> for SCSISelect(TM) Utility!

Ch B,  SCSI ID: 0 SEAGATE  ST336605LC        160
       SCSI ID: 1 SEAGATE  ST336605LC        160
       SCSI ID: 6 ESG-SHV  SCA HSBP M18      ASYN
Ch A,  SCSI ID: 2 SUN      StorEdge 3310     160
       SCSI ID: 3 SUN      StorEdge 3310     160

AMIBIOS (C)1985-2002 American Megatrends Inc.,
Copyright 1996-2002 Intel Corporation
SCB20.86B.1064.P18.0208191106
SCB2 Production BIOS Version 2.08
BIOS Build 1064

2 X Intel(R) Pentium(R) III CPU family      1400MHz
Testing system memory, memory size=2048MB
2048MB Extended Memory Passed
512K L2 Cache SRAM Passed
ATAPI CD-ROM SAMSUNG CD-ROM SN-124    

SunOS - Intel Platform Edition             Primary Boot Subsystem, vsn 2.0

                        Current Disk Partition Information

                 Part#   Status    Type      Start       Length
                ================================================
                   1     Active   X86 BOOT     2428       21852
                   2              SOLARIS     24280     71662420
                   3              <unused> 
                   4              <unused>
              Please select the partition you wish to boot: *       *

Solaris DCB

			       loading /solaris/boot.bin

SunOS Secondary Boot version 3.00

                  Solaris Intel Platform Edition Booting System

Autobooting from bootpath: /pci@0,0/pci8086,2545@3/pci8086,1460@1d/
pci8086,341a@7,1/sd@0,0:a

If the system hardware has changed, or to boot from a different
device, interrupt the autoboot process by pressing ESC.
Press ESCape to interrupt autoboot in 2 seconds.
Initializing system
Please wait...
Warning: Resource Conflict - both devices are added

NON-ACPI device: ISY0050
     Port: 3F0-3F5, 3F7; IRQ: 6; DMA: 2
ACPI device: ISY0050
     Port: 3F2-3F3, 3F4-3F5, 3F7; IRQ: 6; DMA: 2

                     <<< Current Boot Parameters >>>
Boot path: /pci@0,0/pci8086,2545@3/pci8086,1460@1d/pci8086,341a@7,1/
sd@0,0:a
Boot args: 

Type    b [file-name] [boot-flags] <ENTER>  to boot with options
or      i <ENTER>                           to enter boot interpreter
or      <ENTER>                             to boot with defaults

                  <<< timeout in 5 seconds >>>

Select (b)oot or (i)nterpreter: 
Size: 275683 + 22092 + 150244 Bytes
/platform/i86pc/kernel/unix loaded - 0xac000 bytes used
SunOS Release 5.9 Version Generic_112234-07 32-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
configuring IPv4 interfaces: e1000g2.
Hostname: phys-schost-1
Booting as part of a cluster
NOTICE: CMM: Node phys-schost-1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node phys-schost-2 (nodeid = 2) with votecount = 1 added.
NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d1s2) added; votecount = 1, bitmask
of nodes with configured paths = 0x3.
NOTICE: clcomm: Adapter e1000g3 constructed
NOTICE: clcomm: Path phys-schost-1:e1000g3 - phys-schost-2:e1000g3 being constructed
NOTICE: clcomm: Path phys-schost-1:e1000g3 - phys-schost-2:e1000g3 being initiated
NOTICE: clcomm: Path phys-schost-1:e1000g3 - phys-schost-2:e1000g3 online
NOTICE: clcomm: Adapter e1000g0 constructed
NOTICE: clcomm: Path phys-schost-1:e1000g0 - phys-schost-2:e1000g0 being constructed
NOTICE: CMM: Node phys-schost-1: attempting to join cluster.
NOTICE: clcomm: Path phys-schost-1:e1000g0 - phys-schost-2:e1000g0 being initiated
NOTICE: CMM: Quorum device /dev/did/rdsk/d1s2: owner set to node 1.
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node phys-schost-1 (nodeid = 1) is up; new incarnation number = 1068496374.
NOTICE: CMM: Node phys-schost-2 (nodeid = 2) is up; new incarnation number = 1068496374.
NOTICE: CMM: Cluster members: phys-schost-1 phys-schost-2.
NOTICE: CMM: node reconfiguration #1 completed.
NOTICE: CMM: Node phys-schost-1: joined cluster.

ProcedureHow to Reboot a Cluster

Run the scshutdown(1M) command to shut down the cluster, then boot the cluster with the boot(1M) command on each node.

Steps
  1. SPARC: If your cluster is running Oracle Parallel Server or Oracle Real Application Clusters, shut down all instances of the database.

    Refer to the Oracle Parallel Server or Oracle Real Application Clusters product documentation for shutdown procedures.

  2. Become superuser on any node in the cluster.

  3. Shut down the cluster.

    From a single node in the cluster, type the following command.


    # scshutdown -g0 -y 
    

    Each node is shut down.


    Note –

    Cluster nodes must have a working connection to the cluster interconnect to attain cluster membership.


  4. Boot each node.

    The order in which the nodes are booted does not matter unless you make configuration changes between shutdowns. If you make configuration changes between shutdowns, start the node with the most current configuration first.

    • SPARC:


      ok boot
      
    • x86:


                            <<< Current Boot Parameters >>>
      Boot path: /pci@0,0/pci8086,2545@3/pci8086,1460@1d/pci8086,341a@7,1/
      sd@0,0:a
      Boot args:
      
      Type    b [file-name] [boot-flags] <ENTER>  to boot with options
      or      i <ENTER>                           to enter boot interpreter
      or      <ENTER>                             to boot with defaults
      
                        <<< timeout in 5 seconds >>>
      Select (b)oot or (i)nterpreter: b
      

    Messages appear on the booted nodes' consoles as cluster components are activated.

  5. Verify that the nodes booted without error and are online.

    The scstat command reports the nodes' status.


    # scstat -n
    

    Note –

    If a cluster node's /var file system fills up, Sun Cluster might not be able to restart on that node. If this problem arises, see How to Repair a Full /var File System.



Example 3–5 SPARC: Rebooting a Cluster

The following example shows the console output when stopping normal cluster operation, bringing down all nodes to the ok prompt, then restarting the cluster. The -g 0 option sets the grace period to zero, -y provides an automatic yes response to the confirmation question. Shutdown messages also appear on the consoles of other nodes in the cluster.


# scshutdown -g0 -y
Wed Mar 10 13:47:32 phys-schost-1 cl_runtime: 
WARNING: CMM monitoring disabled.
phys-schost-1# 
INIT: New run level: 0
The system is coming down.  Please wait.
...
The system is down.
syncing file systems... done
Program terminated
ok boot
Rebooting with command: boot 
...
Hostname: phys-schost-1
Booting as part of a cluster
...
NOTICE: Node phys-schost-1: attempting to join cluster
...
NOTICE: Node phys-schost-2 (incarnation # 937690106) has become reachable.
NOTICE: Node phys-schost-3 (incarnation # 937690290) has become reachable.
NOTICE: cluster has reached quorum.
...
NOTICE: Cluster members: phys-schost-1 phys-schost-2 phys-schost-3.
...
NOTICE: Node phys-schost-1: joined cluster
...
The system is coming up.  Please wait.
checking ufs filesystems
...
reservation program successfully exiting
Print services started.
volume management starting.
The system is ready.
phys-schost-1 console login:
NOTICE: Node phys-schost-1: joined cluster
...
The system is coming up.  Please wait.
checking ufs filesystems
...
reservation program successfully exiting
Print services started.
volume management starting.
The system is ready.
phys-schost-1 console login: 


Example 3–6 x86: Rebooting a Cluster

The following example shows the console output when stopping normal cluster operation, bringing down all nodes, then restarting the cluster. The -g 0 option sets the grace period to zero, -y provides an automatic yes response to the confirmation question. Shutdown messages also appear on the consoles of other nodes in the cluster.


# scshutdown -g0 -y
May  2 10:32:57 phys-schost-1 cl_runtime: 
WARNING: CMM: Monitoring disabled.  
root@phys-schost-1#
INIT: New run level: 0
The system is coming down.  Please wait.
System services are now being stopped.
/etc/rc0.d/K05initrgm: Calling scswitch -S (evacuate)
failfasts already disabled on node 1
Print services already stopped.
May  2 10:33:13 phys-schost-1 syslogd: going down on signal 15
The system is down.
syncing file systems... done
Type any key to continue

ATI RAGE SDRAM BIOS P/N GR-xlint.007-4.330
*                                        BIOS Lan-Console 2.0
Copyright (C) 1999-2001  Intel Corporation
MAC ADDR: 00 02 47 31 38 3C
AMIBIOS (C)1985-2002 American Megatrends Inc.,
Copyright 1996-2002 Intel Corporation
SCB20.86B.1064.P18.0208191106
SCB2 Production BIOS Version 2.08
BIOS Build 1064
2 X Intel(R) Pentium(R) III CPU family      1400MHz
Testing system memory, memory size=2048MB
2048MB Extended Memory Passed
512K L2 Cache SRAM Passed
ATAPI CD-ROM SAMSUNG CD-ROM SN-124

Press <F2> to enter SETUP, <F12> Network

Adaptec AIC-7899 SCSI BIOS v2.57S4
(c) 2000 Adaptec, Inc. All Rights Reserved.
    Press <Ctrl><A> for SCSISelect(TM) Utility!

Ch B,  SCSI ID: 0 SEAGATE  ST336605LC        160
       SCSI ID: 1 SEAGATE  ST336605LC        160
       SCSI ID: 6 ESG-SHV  SCA HSBP M18      ASYN
Ch A,  SCSI ID: 2 SUN      StorEdge 3310     160
       SCSI ID: 3 SUN      StorEdge 3310     160

AMIBIOS (C)1985-2002 American Megatrends Inc.,
Copyright 1996-2002 Intel Corporation
SCB20.86B.1064.P18.0208191106
SCB2 Production BIOS Version 2.08
BIOS Build 1064

2 X Intel(R) Pentium(R) III CPU family      1400MHz
Testing system memory, memory size=2048MB
2048MB Extended Memory Passed
512K L2 Cache SRAM Passed
ATAPI CD-ROM SAMSUNG CD-ROM SN-124    

SunOS - Intel Platform Edition             Primary Boot Subsystem, vsn 2.0

                        Current Disk Partition Information

                 Part#   Status    Type      Start       Length
                ================================================
                   1     Active   X86 BOOT     2428       21852
                   2              SOLARIS     24280     71662420
                   3              <unused> 
                   4              <unused>
              Please select the partition you wish to boot: *       *

Solaris DCB

			       loading /solaris/boot.bin

SunOS Secondary Boot version 3.00

                  Solaris Intel Platform Edition Booting System

Autobooting from bootpath: /pci@0,0/pci8086,2545@3/pci8086,1460@1d/
pci8086,341a@7,1/sd@0,0:a

If the system hardware has changed, or to boot from a different
device, interrupt the autoboot process by pressing ESC.
Press ESCape to interrupt autoboot in 2 seconds.
Initializing system
Please wait...
Warning: Resource Conflict - both devices are added

NON-ACPI device: ISY0050
     Port: 3F0-3F5, 3F7; IRQ: 6; DMA: 2
ACPI device: ISY0050
     Port: 3F2-3F3, 3F4-3F5, 3F7; IRQ: 6; DMA: 2

                     <<< Current Boot Parameters >>>
Boot path: /pci@0,0/pci8086,2545@3/pci8086,1460@1d/pci8086,341a@7,1/
sd@0,0:a
Boot args: 

Type    b [file-name] [boot-flags] <ENTER>  to boot with options
or      i <ENTER>                           to enter boot interpreter
or      <ENTER>                             to boot with defaults

                  <<< timeout in 5 seconds >>>

Select (b)oot or (i)nterpreter: b
Size: 275683 + 22092 + 150244 Bytes
/platform/i86pc/kernel/unix loaded - 0xac000 bytes used
SunOS Release 5.9 Version Generic_112234-07 32-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
configuring IPv4 interfaces: e1000g2.
Hostname: phys-schost-1
Booting as part of a cluster
NOTICE: CMM: Node phys-schost-1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node phys-schost-2 (nodeid = 2) with votecount = 1 added.
NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d1s2) added; votecount = 1, bitmask
of nodes with configured paths = 0x3.
NOTICE: clcomm: Adapter e1000g3 constructed
NOTICE: clcomm: Path phys-schost-1:e1000g3 - phys-schost-2:e1000g3 being constructed
NOTICE: clcomm: Path phys-schost-1:e1000g3 - phys-schost-2:e1000g3 being initiated
NOTICE: clcomm: Path phys-schost-1:e1000g3 - phys-schost-2:e1000g3 online
NOTICE: clcomm: Adapter e1000g0 constructed
NOTICE: clcomm: Path phys-schost-1:e1000g0 - phys-schost-2:e1000g0 being constructed
NOTICE: CMM: Node phys-schost-1: attempting to join cluster.
NOTICE: clcomm: Path phys-schost-1:e1000g0 - phys-schost-2:e1000g0 being initiated
NOTICE: CMM: Quorum device /dev/did/rdsk/d1s2: owner set to node 1.
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node phys-schost-1 (nodeid = 1) is up; new incarnation number = 1068496374.
NOTICE: CMM: Node phys-schost-2 (nodeid = 2) is up; new incarnation number = 1068496374.
NOTICE: CMM: Cluster members: phys-schost-1 phys-schost-2.
NOTICE: CMM: node reconfiguration #1 completed.
NOTICE: CMM: Node phys-schost-1: joined cluster.
WARNING: mod_installdrv: no major number for rsmrdt
ip: joining multicasts failed (18) on clprivnet0 - will use link layer
broadcasts for multicast
The system is coming up.  Please wait.
checking ufs filesystems
/dev/rdsk/c1t0d0s5: is clean.
NOTICE: clcomm: Path phys-schost-1:e1000g0 - phys-schost-2:e1000g0 online
NIS domain name is dev.eng.mycompany.com
starting rpc services: rpcbind keyserv ypbind done.
Setting netmask of e1000g2 to 192.168.255.0
Setting netmask of e1000g3 to 192.168.255.128
Setting netmask of e1000g0 to 192.168.255.128
Setting netmask of clprivnet0 to 192.168.255.0
Setting default IPv4 interface for multicast: add net 224.0/4: gateway phys-schost-1
syslog service starting.
obtaining access to all attached disks


*****************************************************************************
*
* The X-server can not be started on display :0...
*
*****************************************************************************
volume management starting.
Starting Fault Injection Server...
The system is ready.

phys-schost-1 console login: