Sun Cluster 3.0 12/01 System Administration Guide

Chapter 6 Administering the Cluster

This chapter provides the procedures for administering items that affect the entire cluster.

This is a list of the procedures in this chapter.

6.1 Administering the Cluster Overview

Table 6-1 Task List: Administering the Cluster

Task 

For Instructions, Go To 

Change the name of the cluster 

"6.1.1 How to Change the Cluster Name"

List node IDs and their corresponding node names 

"6.1.2 How to Map Node ID to Node Name"

Permit or deny new nodes to add themselves to the cluster 

"6.1.3 How to Work With New Cluster Node Authentication"

Change the time for a cluster using the Network Time Protocol (NTP) 

"6.1.4 How to Reset the Time of Day in a Cluster"

Bring down a node and enter the OpenBootTM PROM

"6.1.5 How to Enter the OpenBoot PROM (OBP) on a Node"

Change the private hostname 

"6.1.6 How to Change the Private Hostname"

Put a cluster node in maintenance state 

"6.1.7 How to Put a Node Into Maintenance State"

Bring a cluster node out of maintenance state 

"6.1.8 How to Bring a Node Out of Maintenance State"

Add a node to a cluster 

"6.2.1 How to Add a Cluster Node to the Authorized Node List"

Remove a node from a cluster 

"6.2.1 How to Add a Cluster Node to the Authorized Node List"

6.1.1 How to Change the Cluster Name

If necessary, you can change the cluster name after initial installation.

  1. Become superuser on any node in the cluster.

  2. Enter the scsetup(1M) utility.


    # scsetup
    

    The Main Menu is displayed.

  3. To change the cluster name, type 6 (Other cluster properties).

    The Other Cluster Properties menu is displayed.

  4. Make your selection from the menu and follow the onscreen instructions.

6.1.1.1 Example--Changing the Cluster Name

The following example shows the scconf(1M) command generated from the scsetup utility to change to the new cluster name, dromedary.


# scconf -c -C cluster=dromedary

6.1.2 How to Map Node ID to Node Name

During Sun Cluster installation, each node is automatically assigned a unique node ID number. The node ID number is assigned to a node in the order in which it joins the cluster for the first time; once assigned, the number cannot be changed. The node ID number is often used in error messages to identify which cluster node the message concerns. Use this procedure to determine the mapping between node IDs and node names.

You do not need to be superuser to list configuration information.

  1. Use the scconf(1M) command to list the cluster configuration information.


    % scconf -pv | grep "Node ID"
    

6.1.2.1 Example--Mapping the Node ID to the Node Name

The following example shows the node ID assignments


% scconf -pv | grep "Node ID"
	(phys-schost-1) Node ID:																				1
	(phys-schost-2) Node ID:																				2
	(phys-schost-3) Node ID:																				3

6.1.3 How to Work With New Cluster Node Authentication

Sun Cluster enables you to determine if new nodes can add themselves to the cluster and with what type of authentication. You can permit any new node to join the cluster over the public network, deny new nodes from joining the cluster, or indicate a specific node that can join the cluster. New nodes can be authenticated by using either standard UNIX or Diffie-Hellman (DES) authentication. If you select DES authentication, you must also configure all necessary encryption keys before a node can join. See the keyserv(1M) and publickey(4) man pages for more information.

  1. Become superuser on any node in the cluster.

  2. Enter the scsetup(1M) utility.


    # scsetup
    

    The Main Menu is displayed.

  3. To work with cluster authentication, type 6 (New nodes).

    The New Nodes menu is displayed.

  4. Make your selection from the menu and follow the onscreen instructions.

6.1.3.1 Examples--Preventing New Machines From Being Added to the Cluster

The following example shows the scconf(1M) command generated from the scsetup utility that would prevent new machines from being added to the cluster.


# scconf -a -T node=.

6.1.3.2 Examples--Permitting All New Machines to Be Added to the Cluster

The following example shows the scconf command generated from the scsetup utility that would enable all new machines to be added to the cluster.


# scconf -r -T all

6.1.3.3 Examples--Specifying a New Machine to Be Added to the Cluster

The following example shows the scconf command generated from the scsetup utility to enable a single new machine to be added to the cluster.


# scconf -a -T node=phys-schost-4

6.1.3.4 Examples--Setting the Authentication to Standard UNIX

The following example shows the scconf command generated from the scsetup utility to reset to standard UNIX authentication for new nodes joining the cluster.


# scconf -c -T authtype=unix

6.1.3.5 Examples--Setting the Authentication to DES

The following example shows the scconf command generated from the scsetup utility to use DES authentication for new nodes joining the cluster.


# scconf -c -T authtype=des

Note -

When using DES authentication, you need to also configure all necessary encryption keys before a node can join the cluster. See the keyserv(1M) and publickey(4) man pages for more information.


6.1.4 How to Reset the Time of Day in a Cluster

Sun Cluster uses the Network Time Protocol (NTP) to maintain time synchronization between cluster nodes. Adjustments in the cluster occur automatically as needed when nodes synchronize their time. See the Sun Cluster 3.0 12/01 Concepts document and the Network Time Protocol User's Guide for more information.


Caution - Caution -

When using NTP, do not attempt to adjust the cluster time while the cluster is up and running. This includes using the date(1), rdate(1M), or xntpdate(1M) commands interactively or within cron(1M) scripts.


  1. Become superuser on any node in the cluster.

  2. Shut down the cluster to the OBP prompt.


    # scshutdown -g0 -y
    

  3. Boot each node into non-cluster node.


    ok boot -x
    

  4. On a single node, set the time of day by running the date(1) command.


    # date HHMMSS
    

  5. On the other machines, synchronize the time to that node by running the rdate(1M) command.


    # rdate hostname
    

  6. Boot each node to restart the cluster.


    # reboot
    

  7. Verify that the change took place on all cluster nodes.

    On each node, run the date(1M) command.


    # date
    

6.1.5 How to Enter the OpenBoot PROM (OBP) on a Node

Use this procedure if you need to configure or change OpenBoot PROM settings.

  1. Connect to the terminal concentrator port.


    # telnet tc_name tc_port_number
    

    tc_name

    Specifies the name of the terminal concentrator.

    tc_port_number

    Specifies the port number on the terminal concentrator. Port numbers are configuration dependent. Typically, ports 2 and 3 (5002 and 5003) are used for the first cluster installed at a site.

  2. Shut down the cluster node gracefully by using the scswitch(1M) command to evacuate any resource or disk device groups and then shutdown(1M) to bring the node to the OBP prompt.


    # scswitch -S -h nodelist
    # shutdown -g0 -y -i0
    


    Caution - Caution -

    Do not use send brk on a cluster console to shut down a cluster node. If you use send brk and then type go at the OBP prompt to reboot, the node will panic. This functionality is not supported within a cluster.


  3. Execute the OBP commands.

6.1.6 How to Change the Private Hostname

Use this procedure to change the private hostname of a cluster node after installation has been completed.

Default private hostnames are assigned during initial cluster installation. The default private hostname takes the form clusternode<nodeid>-priv, for example: clusternode3-priv. You should only change a private hostname if the name is already in use in the domain.


Caution - Caution -

Do not attempt to assign IP addresses to new private hostnames. The clustering software assigns them.


  1. Disable, on all nodes in the cluster, any Data Service resources or other applications that might cache private hostnames.


    # scswitch -n -j resource1, resource2
    

    Include the following in the applications you disable.

    • HA-DNS and HA-NFS services, if configured.

    • Any application which has been custom configured to use the private hostname.

    • Any application which is being used by clients over the private interconnect.

    See the scswitch(1M) man page and the Sun Cluster 3.0 12/01 Data Services Installation and Configuration Guide for information about using the scswitch command.

  2. Bring down the Network Time Protocol (NTP) daemon on each node of the cluster.

    See the xntpd man page for more information about the NTP daemon.


    # /etc/init.d/xntpd stop
    

  3. Determine the name of the node on which you are changing the private hostname.


    # scconf -p | grep node
    

  4. Run the scsetup utility to change the private hostname.

    It is only necessary to do this from one of the nodes in the cluster.


    Note -

    When selecting a new private hostname, be sure the name is unique to the cluster node.


  5. Select 5, Private Hostnames, from the Main Menu.

  6. Select 1, Change a Private Hostname, from the Private Hostnames Menu.

    Answer the questions when prompted. You will be asked the name of the node whose private hostname is being changed (clusternode<nodeid>-priv), and the new private hostname.

  7. Flush the name service cache.

    Do this on each node in the cluster. This prevents the cluster applications and data services from trying to access the old private hostname.


    # nscd -i hosts
    

  8. Edit the ntp.conf file on each node to change the private hostname to the new one.

    Use whatever editing tool you prefer.

    If this is done at install time, also remember to remove names for nodes which are configured; the default template comes pre-configured with eight nodes. Typically, the ntp.conf file will be identical on each cluster node.

  9. Verify that you can successfully ping the new private hostname from all cluster nodes.

  10. Restart the NTP daemon.

    Do this on each node of the cluster.


    # /etc/init.d/xntpd start
    

  11. Enable all Data Service resources and other applications that were disabled in Step 1.


    # scswitch -e -j resource1, resource2
    

    See the scswitch(1M) man page and the Sun Cluster 3.0 12/01 Data Services Installation and Configuration Guide for information about using the scswitch command.

6.1.6.1 Example--Changing the Private Hostname

The following example changes the private hostname from clusternode2-priv to clusternode4-priv, on node phys-schost-2.


[Disable all applications and data services as necessary.]
phys-schost-1# /etc/init.d/xntpd stop
phys-schost-1# scconf -p | grep node
 ...
 Cluster nodes:                                phys-schost-1 phys-schost-2 phys-
 schost-3
 Cluster node name:                                 phys-schost-1
  Node private hostname:                           clusternode1-priv
 Cluster node name:                                 phys-schost-2
  Node private hostname:                           clusternode2-priv
 Cluster node name:                                 phys-schost-3
  Node private hostname:                           clusternode3-priv
 ...
phys-schost-1# scsetup
phys-schost-1# nscd -i hosts
phys-schost-1# vi /etc/inet/ntp.conf
 ...
 peer clusternode1-priv
 peer clusternode4-priv
 peer clusternode3-priv
phys-schost-1# ping clusternode4-priv
phys-schost-1# /etc/init.d/xntpd start
[Enable all applications and data services disabled at the beginning of the procedure.]

6.1.7 How to Put a Node Into Maintenance State

Put a cluster node into maintenance state when taking the node out of service for an extended period of time. This way, the node does not contribute to the quorum count while it is being serviced. To put a cluster node into maintenance state, the node must be brought down using scswitch(1M) and shutdown(1M).


Note -

Use the Solaris shutdown command to shut down a single node. The scshutdown command should be used only when shutting down an entire cluster.


When a cluster node is brought down and put into maintenance state, all quorum devices that are configured with ports to the node have their quorum vote counts decremented by one. The node and quorum device vote counts are incremented by one when the node is taken out of maintenance mode and brought back online.

You need to use the scconf(1M) command to put a cluster node into maintenance state.The scsetup utility does not include the functionality for putting a quorum device into maintenance state.

  1. Become superuser on the node to be put into maintenance state.

  2. Evacuate any resource groups and disk device groups from the node.


    # scswitch -S -h nodelist
    

    -S

    Evacuates all device services and resource groups from the specified node.

    -h nodelist

    Specifies the node from which you are switching resource groups and devices groups.

  3. Bring the node you evacuated down to the OBP prompt and out of the cluster.


    # shutdown -g0 -y -i0
    

  4. Become superuser on another node in the cluster and put the node brought down in Step 3 into maintenance state.


    # scconf -c -q node=node,maintstate
    

    -c

    Specifies the change form of the scconf command.

    -q

    Manages the quorum options.

    node=node

    Specifies the node name or node ID of the node to change.

    maintstate

    Puts the node into maintenance state.

  5. Verify that the cluster node is now in maintenance state.


    # scstat -q
    

    The node you put into maintenance state should have a status of offline and 0 (zero) for Present and Possible quorum votes.

6.1.7.1 Example--Putting a Cluster Node Into Maintenance State

The following example moves a cluster node into maintenance state and verifies the results. The scstat -q output shows the Node votes for phys-schost-1 to be 0 (zero) and the status to be offline. The Quorum Summary should also show reduced vote counts. Depending on your configuration, the Quorum Votes by Device output might indicate that some quorum disk devices are offline as well.


[On the node to be put into maintenance state:]
phys-schost-1# scswitch -S -h phys-schost-1
phys-schost-1# shutdown -g0 -y -i0

[On another node in the cluster:]
phys-schost-2# scconf -c -q node=phys-schost-1,maintstate
phys-schost-2# scstat -q

-- Quorum Summary --
  Quorum votes possible:      3
  Quorum votes needed:        2
  Quorum votes present:       3

-- Quorum Votes by Node --
                    Node Name           Present Possible Status
                    ---------           ------- -------- ------
  Node votes:       phys-schost-1       0        0       Offline
  Node votes:       phys-schost-2       1        1       Online
  Node votes:       phys-schost-3       1        1       Online

-- Quorum Votes by Device --
                    Device Name         Present Possible Status
                    -----------         ------- -------- ------
  Device votes:     /dev/did/rdsk/d3s2  0        0       Offline
  Device votes:     /dev/did/rdsk/d17s2 0        0       Offline
  Device votes:     /dev/did/rdsk/d31s2 1        1       Online

6.1.7.2 Where to Go From Here

To bring a node back online, see "6.1.8 How to Bring a Node Out of Maintenance State".

6.1.8 How to Bring a Node Out of Maintenance State

Use the following procedure to bring a node back online and reset the quorum vote count to the default. For cluster nodes, the default quorum count is one. For quorum devices, the default quorum count is N-1, where N is the number of nodes with non-zero vote counts that have ports to the quorum device.

When a node has been put into maintenance state, the node's quorum vote count is decremented by one. All quorum devices that are configured with ports to the node will also have their quorum vote counts decremented. When the quorum vote count is reset and a node is brought back out of maintenance state, both the node's quorum vote count and the quorum device vote count are incremented by one.

Run this procedure any time a node has been put into maintenance state and you are bringing it out of maintenance state.


Caution - Caution -

If you do not specify either the globaldev or node options, the quorum count is reset for the entire cluster.


  1. Become superuser on any node of the cluster, other than the one in maintenance state.

  2. If using quorum, reset the cluster quorum count from a node other than the one in maintenance state.

    You must reset the quorum count from a node other than the node in maintenance state before rebooting the node, or it might hang waiting for quorum.


    # scconf -c -q node=node,reset
    

    -c

    Specifies the change form of the scconf command.

    -q

    Manages the quorum options.

    node=node

    Specifies the name of the node to be reset, for example, phys-schost-1.

    reset

    The change flag that resets quorum.

  3. Boot the node that you want to bring out of maintenance state.

  4. Verify the quorum vote count.


    # scstat -q
    

    The node you brought out of maintenance state should have a status of online and show the appropriate vote count for Present and Possible quorum votes.

6.1.8.1 Example--Bringing a Cluster Node Out of Maintenance State and Resetting the Quorum Vote Count

The following example resets the quorum count for a cluster node and its quorum devices to their defaults and verifies the result. The scstat -q output shows the Node votes for phys-schost-1 to be 1 and the status to be online. The Quorum Summary should also show an increase in vote counts.


phys-schost-2# scconf -c -q node=phys-schost-1,reset

[On phys-schost-1:]
ok> boot

phys-schost-1# scstat -q

-- Quorum Summary --

  Quorum votes possible:      6
  Quorum votes needed:        4
  Quorum votes present:       6

-- Quorum Votes by Node --

                    Node Name           Present Possible Status
                    ---------           ------- -------- ------
  Node votes:       phys-schost-1       1        1       Online
  Node votes:       phys-schost-2       1        1       Online
  Node votes:       phys-schost-3       1        1       Online

-- Quorum Votes by Device --

                    Device Name         Present Possible Status
                    -----------         ------- -------- ------
  Device votes:     /dev/did/rdsk/d3s2  1        1       Online
  Device votes:     /dev/did/rdsk/d17s2 1        1       Online
  Device votes:     /dev/did/rdsk/d31s2 1        1       Online

6.2 Adding and Removing a Cluster Node

The following table lists the tasks to perform when adding a node to an existing cluster. To complete the procedure correctly, these tasks must be performed in the order shown

Table 6-2 Task Map: Adding a Cluster Node to an Existing Cluster

Task 

For Instructions, Go To 

Install the host adapter on the node and verify that the existing cluster interconnects can support the new node 

Sun Cluster 3.0 12/01 Hardware Guide

Add shared storage 

Sun Cluster 3.0 12/01 Hardware Guide

Add the node to the authorized node list 

   - Use scsetup.

"6.2.1 How to Add a Cluster Node to the Authorized Node List"

Install and configure the software on the new cluster node 

   - Install the Solaris Operating Environment and Sun Cluster software 

   - Configure the node as part of the cluster 

Sun Cluster 3.0 12/01 Software Installation Guide: See the section on installing and configuring Sun Cluster software.

The following table lists the tasks to perform when removing a node from an existing cluster. To complete the procedure correctly, the tasks must be performed in the order shown.

Table 6-3 Task Map: Removing a Cluster Node

Task 

For Instructions, Go To 

Place node being removed into maintenance state 

   - Use shutdown and scconf

"6.1.7 How to Put a Node Into Maintenance State"

Remove node from all resource groups 

   - Use scrgadm

Sun Cluster 3.0 12/01 Data Services Installation and Configuration Guide: see the procedure for how to remove a node from an existing resource group.

Remove node from all device groups of which the node is a member 

   - Use volume manager commands 

"3.3.4 How to Remove a Node From a Disk Device Group (Solstice DiskSuite)"

or 

"3.3.15 How to Remove a Node From a Disk Device Group (VERITAS Volume Manager)"

Remove all logical transport connections to the node being removed 

   - Use scsetup

"5.1.4 How to Remove Cluster Transport Cables, Transport Adapters, and Transport Junctions"

 

To remove the physical hardware from the node, see the Sun Cluster 3.0 12/01 Hardware Guide section on installing and maintaining cluster interconnect and public network hardware.

Remove all quorum devices shared with the node being removed 

   - Use scsetup

"4.1.3 How to Remove a Quorum Device"

Remove node from the cluster software configuration 

   - Use scconf

"6.2.2 How to Remove a Node From the Cluster Software Configuration"

Remove required shared storage from the node and cluster 

   - Follow the procedures in your volume manager documentation and hardware guide 

SDS or VxVM administration guide 

Sun Cluster 3.0 12/01 Hardware Guide

6.2.1 How to Add a Cluster Node to the Authorized Node List

Before adding a machine to an existing cluster, be sure the node has all of the necessary hardware correctly installed and configured, including a good physical connection to the private cluster interconnect.

For hardware installation information, refer to the Sun Cluster 3.0 12/01 Hardware Guide or the hardware documentation that shipped with your server.

This procedure permits a machine to install itself into a cluster by adding its node name to the list of authorized nodes for that cluster.

You must be superuser on a current cluster member node to complete this procedure.

  1. Be sure you have correctly completed all prerequisite hardware installation and configuration tasks listed in the task map for "6.2 Adding and Removing a Cluster Node".

  2. Execute the scsetup(1M) utility.


    # scsetup
    

    The Main Menu is displayed.

  3. To modify the authorized list, type 3 at the New Nodes Menu, Specify the name of a machine which may add itself.

    Follow the prompts to add the node's name to the cluster. You will be asked for the name of the node to be added.

  4. To access the New Nodes Menu, type 6 at the Main Menu.

  5. Verify that the task has been performed successfully.

    The scsetup utility prints a "Command completed successfully" message if it completes the task without error.

  6. To prevent any new machines from being added to the cluster, type 1 at the New Nodes Menu.

    Follow the scsetup prompts. This option tells the cluster to ignore all requests coming in over the public network from any new machine trying to add itself to the cluster.

  7. Quit the scsetup utility.

  8. Install and configure the software on the new cluster node.

    Use either scinstall or JumpStartTM to complete the installation and configuration of the new node, as described in the Sun Cluster 3.0 12/01 Software Installation Guide.

6.2.1.1 Example--Adding a Cluster Node to the Authorized Node List

The following example shows how to add a node named phys-schost-3 to the authorized node list in an existing cluster.


[Become superuser and execute the scsetup utility.]
# scsetup
Select New nodes>Specify the name of a machine which may add itself.
Answer the questions when prompted.
Verify that the scconf command completed successfully.
 
scconf -a -T node=phys-schost-3
 
    Command completed successfully.
Select Prevent any new machines from being added to the cluster.
Quit the scsetup New Nodes Menu and Main Menu.
[Install the cluster software.]

6.2.1.2 Where to Go From Here

For an overall list of tasks for adding a cluster node, see Table 6-2, "Task Map: Adding a Cluster Node."

To add a node to an existing resource group, see the Sun Cluster 3.0 12/01 Data Services Installation and Configuration Guide.

6.2.2 How to Remove a Node From the Cluster Software Configuration

This is the last software configuration procedure that needs to be accomplished in the process for removing a node from a cluster. You must be superuser on a node in the cluster to perform this procedure.

  1. Be sure you have correctly completed all prerequisite tasks listed in the task map for "6.2 Adding and Removing a Cluster Node".


    Note -

    Be sure you have placed the node in maintenance state and removed it from all resource groups, device groups, and quorum device configurations before continuing with this procedure.


  2. Determine the local disks in the cluster and their associated raw disk device group names, for example dsk/d4.


    # scconf -pvv | grep Local_Disk	
    

  3. Identify which local disks and raw disk device groups in the cluster are connected to the node being removed.


    # scconf -pvv | grep node-name | grep Device	
    

  4. Disable the localonly property for each local disk identified in Step 3.

    See the scconf_dg_rawdisk(1M) man page for more information about the localonly property.


    # scconf -c -D name=rawdisk-device-group,localonly=false
    

  5. Remove the node from all raw disk device groups, of which the node is a member.

    This step must be completed for each raw disk device group that is connected to the node being removed.


    # scconf -r -D name=rawdisk-device-group,nodelist=node
    

  6. Remove the node from the cluster.


    # scconf -r -h node=node
    

  7. Verify the node removal using scstat.


    # scstat -n
    

  8. To physically remove the node from the cluster, remove the hardware connections as described in the Sun Cluster 3.0 12/01 Hardware Guide.


Note -

After the device has been removed from the cluster, you must reinstall the Solaris operating environment on the removed host before it can be placed back into service in any capacity.


6.2.2.1 Example--Removing a Node From the Cluster Software Configuration

This example shows how to remove a node (phys-schost-2) from a cluster.


[Become superuser on any node and identify all local disks and their raw disk device group names:]
# scconf -pvv | grep Local_Disk
	(dsk/d4) Device group type:          Local_Disk
	(dsk/d8) Device group type:          Local_Disk
[Identify the local disks and raw disk device groups connected to the node being removed:]
# scconf -pvv | grep phys-schost-2 | grep Device	
	(dsk/d4) Device group node list:  phys-schost-2
	(dsk/d2) Device group node list:  phys-schost-1, phys-schost-2
	(dsk/d1) Device group node list:  phys-schost-1, phys-schost-2
[Remove the localonly flag for each local disk on the node:]
# scconf -c -D name=dsk/d4,localonly=false
[Remove the node from all raw disk device groups:]
# scconf -r -D name=dsk/d4,nodelist=phys-schost-2
# scconf -r -D name=dsk/d2,nodelist=phys-schost-2
# scconf -r -D name=dsk/d1,nodelist=phys-schost-2
[Remove the node from the cluster:]
# scconf -r -h node=phys-schost-2
[Verify node removal:]
# scstat -n
-- Cluster Nodes --
                    Node name           Status
                    ---------           ------
  Cluster node:     phys-schost-1       Online

6.2.2.2 Where to Go From Here

For hardware procedures, see the Sun Cluster 3.0 12/01 Hardware Guide.

For an overall list of tasks for removing a cluster node, see Table 6-3, "Task Map: Removing a Cluster Node."

To add a node to an existing cluster, see "6.2.1 How to Add a Cluster Node to the Authorized Node List".