This chapter provides instructions for the following topics.
This chapter includes the following procedures:
The scadmin startcluster command is used to make a node the first member of the cluster. This node becomes node 0 of the cluster. The other Sun Cluster nodes are started by a single command, scadmin startnode. This command starts the programs required for multinode synchronization, and coordinates integration of the other nodes with the first node (if the Sun Cluster software is already running on the first node). You can remove nodes from the cluster by using the scadmin command with the stopnode option on the node that you are removing from the cluster.
Make the local node the first member node in the cluster. This node must be a configured node of the cluster in order to run the scadmin startcluster command successfully. This command must complete successfully before any other nodes can join the cluster. If the local node aborts for any reason while the subsequent nodes are joining the cluster, the result might be a corrupted CCD. If this scenario occurs, restore the CCD using the procedure "4.11.3 How to Restore the CCD".
To make the local node a configured node of the cluster, see "3.1 Adding and Removing Cluster Nodes".
It is important that no other nodes are running the cluster software at this time. If this node detects that another cluster node is active, the local node aborts.
Start the first node of the cluster by using the scadmin(1M) command.
# scadmin startcluster localnode clustername |
The startcluster option does not run if localnode does not match the name of the node on which the command runs. See the scadmin(1M) man page for details.
For example:
phys-hahost1# scadmin startcluster phys-hahost1 haclust Node specified is phys-hahost1 Cluster specified is haclust =========================== WARNING============================ = Creating a new cluster = =============================================================== You are attempting to start up the cluster node 'phys-hahost1' as the only node in a new cluster. It is important that no other cluster nodes be active at this time. If this node hears from other cluster nodes, this node will abort. Other nodes may only join after this command has completed successfully. Data corruption may occur if more than one cluster is active. Do you want to continue [y,n,?] y |
If you receive a reconfig.4013 error message, then either there is already a node in a cluster, or another node is still in the process of going down. Run the get_node_status(1M) command on the node that might be up to determine that node's status.
Add all other nodes to the cluster.
Run the following command on all other nodes. This command can be run on multiple nodes at the same time.
# scadmin startnode |
If you receive the following reconfig.4015 error message, there might be no existing cluster. Restart the cluster by using the scadmin startcluster localnode command.
SUNWcluster.clustd.reconf.4015 "Aborting--no existing or intact cluster to join." |
Alternately, there may be a partition or node failure. (For example, a third node is attempting to join a two-node cluster when one of the two nodes fails.) If this happens, wait until the failures have completed. Fix the problems, if any, and then attempt to rejoin the cluster.
If any required software packages are missing, the command fails and the console displays a message similar to the following:
Assuming a default cluster name of haclust Error: Required SC package `SUNWccm' not installed! Aborting cluster startup. |
For information on installing the Sun Cluster software packages, refer to the Sun Cluster 2.2 Software Installation Guide.
Putting a node in any mode other than multiuser, or halting or rebooting the node, requires stopping the Sun Cluster membership monitor. Then your site's preferred method can be used for further node maintenance.
Stopping the cluster requires stopping the membership monitor on all cluster nodes by running the scadmin stopnode command on all nodes simultaneously.
You can stop the membership monitor only when no logical hosts are owned by the local Sun Cluster node.
To stop the membership monitor on one node, switch over the logical host(s) to another node using the haswitch(1M) command and stop the membership monitor by typing the following command:
phys-hahost1# haswitch ... phys-hahost1# scadmin stopnode |
If a logical host is owned by the node when the scadmin stopnode command is run, ownership will be transferred to another node that can master the logical host before the membership monitor is stopped. If the other possible master of the logical host is down, the scadmin stopnode command will shut down the data services in addition to stopping the membership monitor.
After the scadmin stopnode command runs, Sun Cluster will remain stopped, even across system reboots, until the scadmin startnode command is run.
The scadmin stopnode command removes the node from the cluster. In the absence of other simultaneous failures, you may shut down as many nodes as you choose without losing quorum among the remaining nodes. (If quorum is lost, the entire cluster shuts down.)
If you shut down a node for disk maintenance, you also must prepare the boot disk or data disk using the procedures described in Chapter 10, Administering Sun Cluster Local Disks for boot disks, or those described in your volume manager documentation for data disks.
You might have to shut down one or more Sun Cluster nodes to perform hardware maintenance procedures such as adding or removing SBus cards. The following sections describe the procedure for shutting down a single node or the entire cluster.
If it is not necessary to have the data remain available, place the logical hosts (disk groups) into maintenance mode.
phys-hahost2# haswitch -m logicalhost |
Refer to the haswitch(1M) man page for details.
It is possible to halt a Sun Cluster node by using the halt(1M) command, allowing a failover to restore the logical host services on the backup node. However, the halt(1M) operation might cause the node to panic. The haswitch(1M) command offers a more reliable method of switching ownership of the logical hosts.
Stop Sun Cluster on one node without stopping services running on the other nodes in the cluster.
phys-hahost1# scadmin stopnode |
Halt the node.
phys-hahost1# halt |
The node is now ready for maintenance work.
You might want to shut down all nodes in a Sun Cluster configuration if a hazardous environmental condition exists, such as a cooling failure or a severe lightning storm.
Stop the membership monitor on all nodes simultaneously by using the scadmin(1M) command.
You can do this in one step using the Cluster Console.
phys-hahost1# scadmin stopnode ... |
Halt all nodes using halt(1M).
phys-hahost1# halt ... |
Shut down any Sun Cluster node by using the halt(1M) command or the uadmin(1M) command.
If the membership monitor is running when a node is shut down, the node will most likely take a "Failfast timeout" and display the following message:
panic[cpu9]/thread=0x50f939e0: Failfast timeout - unit |
You can avoid this by stopping the membership monitor before shutting down the node. Refer to the procedure, "4.2.2 How to Stop Sun Cluster on All Nodes", for additional information.
Database server instances can run on a node only after you have invoked the startnode option and the node has successfully joined the cluster. All database instances should be shut down before the stopnode option is invoked.
If you are running Oracle7 Parallel Server, Oracle8 Parallel Server, or Informix XPS, refer to your product documentation for shutdown procedures.
If the stopnode command is executed while the Oracle7 or Oracle8 instance is still running on the node, stopnode will hang and the following message is displayed on the console:
ID[vxclust]: stop: waiting for applications to end |
The Oracle7 or Oracle8 instance must be shut down for the stopnode command to terminate successfully.
If the stopnode command is executed while the Informix-Online XPS instance is still running on the node, the database hangs and becomes unusable.
The haswitch(1M) command is used to switch over the specified logical hosts (and associated disk groups, data services, and logical IP addresses) to the node specified by the destination host. For example, the following command switches over logical hosts hahost1 and hahost2 to both be mastered by phys-hahost1.
# haswitch phys-hahost1 hahost1 hahost2 |
If the logical host has more than one data service configured on it, you cannot selectively switch over just one data service, or a subset of the data services. Your only option is to switch over all the data services on the logical host.
Both the destination host and the current master of the logical host must be in the cluster membership, otherwise the command fails.
In clusters providing HA data services, automatic switchover can be set up for the situation where a node fails, the logical hosts it mastered are switched over to another node, and later the failed node returns to the cluster. Logical hosts will automatically be remastered by their default master, unless you configure them to remain mastered by the host to which they were switched.
If you do not want a logical host to be automatically switched back to its original master, use the -m option of the scconf(1M) command. Refer to the scconf(1M) man page for details.
To disable automatic switchover for a logical host, you need only run the scconf(1M) command on a single node that is an active member of the cluster.
# scconf clustername -L logicalhost -n node1,node2 -g dg1 -i qe0,qe0,logaddr1 -m |
Maintenance mode is useful for some administration tasks on file systems and disk groups. To put the disk groups of a logical host into maintenance mode, use the -m option to the haswitch(1M) command.
Unlike other types of ownership of a logical host, maintenance mode persists across node reboots.
For example, this command puts logical host hahost1 in maintenance mode.
phys-hahost2# haswitch -m hahost1 |
This command stops the data services associated with hahost1 on the Sun Cluster node that currently owns the disk group, and also halts the fault monitoring programs associated with hahost1 on all Sun Cluster nodes. The command also executes a umount(1M) of any Sun Cluster file systems on the logical host. The associated disk group ownership is released.
This command runs on any host, regardless of current ownership of the logical host and disk group.
You can remove a logical host from maintenance mode by performing a switchover specifying the physical host that is to own the disk group. For example, you could use the following command to remove hahost1 from maintenance mode:
phys-hahost1# haswitch phys-hahost1 hahost1 |
Multiple failures (including network partitions) might result in subsets of cluster members attempting to remain in the cluster. Usually, these subsets have lost partial or total communication with each other. In such cases, the software attempts to ensure that there is only one resultant valid cluster. To achieve this, the software might cause some or all nodes to abort. The following discussion explains the criterion used to make these decisions.
The quorum criterion is defined as a subset with at least half the members of the original set of cluster nodes (not only the configured nodes). If a subset does not meet the quorum criterion, the nodes in the subset abort themselves and a reconfig.4014 error message is displayed. Failure to meet the quorum criterion could be due to a network partition or to a simultaneous failure of more than half of the nodes.
Valid clusters only contain nodes that can communicate with each other over private networks.
Consider a four-node cluster that partitions itself into two subsets: one subset consists of one node, while the other subset consists of three nodes. Each subset attempts to meet the quorum criterion. The first subset has only one node (out of the original four) and does not meet the quorum criterion. Hence, the node in the first subset shuts down. The second subset has three nodes (out of the original four), meets the quorum criterion, and therefore stays up.
Alternatively, consider a two-node cluster with a quorum device. If there is a partition in such a configuration, then one node and the quorum device meet the quorum criterion and the cluster stays up.
A split-brain partition occurs if a subset has exactly half the cluster members. (The split-brain partition does not include the scenario of a two-node cluster with a quorum device.) During initial installation of Sun Cluster, you were prompted to choose your preferred type of recovery from a split-brain scenario. Your choices were ask and select. If you chose ask, then if a split-brain partition occurs, the system asks you for a decision about which nodes should stay up. If you chose select, the system automatically selects for you which cluster members should stay up.
If you chose an automatic selection policy to deal with split-brain situations, your options were Lowest Nodeid or Highest Nodeid. If you chose Lowest Nodeid, then the subset containing the node with the lowest ID value becomes the new cluster. If you chose Highest Nodeid, then the subset containing the node with the highest ID value becomes the new cluster. For more details, see the section on installation procedures in the Sun Cluster 2.2 Software Installation Guide.
In either case, you must manually abort the nodes in all other subsets.
If you did not choose an automatic selection policy or if the system prompts you for input at the time of the partition, then the system displays the following error message.
SUNWcluster.clustd.reconf.3010 "*** ISSUE ABORTPARTITION OR CONTINUEPARTITION *** Proposed cluster: xxx Unreachable nodes: yyy" |
Additionally, a message similar to the following is displayed on the console every ten seconds:
*** ISSUE ABORTPARTITION OR CONTINUEPARTITION *** If the unreachable nodes have formed a cluster, issue ABORTPARTITION. (scadmin abortpartition <localnode> <clustername>) You may allow the proposed cluster to form by issuing CONTINUEPARTITION. (scadmin continuepartition <localnode> <clustername>) Proposed cluster partition: 0 Unreachable nodes: 1 |
If you did not choose an automatic select process, use the procedure "4.6.2 How to Choose a New Cluster" to choose the new cluster.
To restart the cluster after a split-brain failure, you must wait for the stopped node to come up entirely (it might undergo automatic reconfiguration or reboot) before you bring it back into the cluster using the scadmin startnode command.
Determine which subset should form the new cluster. Run the following command on one node in the subset that should abort.
# scadmin abortpartition |
When the abortpartition command is issued on one node, the Cluster Membership Monitor (CMM) propagates that command to all the nodes in that partition. Therefore, if all nodes in that partition receive the command, they all abort. However, if some of the nodes in the partition cannot be contacted by the CMM, then they have to be manually aborted. Run the scadmin abortpartition command on any remaining nodes that do not abort.
Run the following command on one node in the subset that should stay up.
# scadmin continuepartition |
A further reconfiguration occurs if there has been another failure within the new cluster. At all times, only one cluster is active.
Because Solaris and Sun Cluster software error messages are written to the /var/adm/messages file, the /var file system can become full. If the /var file system becomes full while the node is running, the node will continue to run, but you probably will not be able to log into the node with the full /var file system. If the node goes down, Sun Cluster will not start and a login will not be possible. If this happens, you must reboot in single-user mode (boot -s).
If the node reports a full /var file system and continues to run Sun Cluster services, follow the steps outlined in the following procedure.
In this example, phys-hahost1 has a full /var file system.
Perform a switchover.
Move all logical hosts off the node experiencing the problem.
phys-hahost2# haswitch phys-hahost2 hahost1 hahost2 |
Remove the node from the cluster membership.
If you have an active login to phys-hahost1, enter the following:
phys-hahost1# scadmin stopnode |
If you do not have an active login to phys-hahost1, halt the node.
Reboot the node in single-user mode.
(0) ok boot -s INIT: SINGLE USER MODE Type Ctrl-d to proceed with normal startup, (or give root password for system maintenance): root_password Entering System Maintenance Mode Sun Microsystems Inc. SunOS 5.6 Generic August 1997 |
Perform the steps you would normally take to clear the full file system.
After the file system is cleared, enter multiuser mode.
# exit |
Use the scadmin startnode command to cause the node to rejoin the configuration.
# scadmin startnode |
We recommend that you use Network Time Protocol (NTP) to maintain time synchronization between cluster nodes if NTP comes with your Solaris operating environment.
An administrator cannot adjust the time of the nodes in a Sun Cluster configuration. Never attempt to perform a time change using the date(1), rdate(1M), or xntpdate(1M) commands.
In the Sun Cluster environment, the cluster nodes can run as NTP clients. You must have an NTP server set up and configured outside the cluster to use NTP; the cluster nodes cannot be configured to be NTP servers. Refer to the xntpd(1M) man page for information about NTP clients and servers.
If you are running cluster nodes as NTP clients, make sure that there are no crontab(1) entries that call ntpdate(1M). It is safer to run xntpd(1M) on the clients because that keeps the clocks in sync without making large jumps forward or backward.
Complete the following steps when one node has a hardware failure and needs to be replaced with a new node.
This procedure assumes the root disk of the failed node is still operational and can be used. If your failed root disk is not mirrored, contact your local Sun Enterprise Service representative or your local authorized service provider for assistance.
If the failed node is not operational, start at Step 5.
If you have a parallel database configuration, stop the database.
Refer to the appropriate documentation for your data services. All HA applications are automatically shut down with the scadmin stopnode command.
Use the Cluster Console to open a terminal window.
As root, enter the following command in the terminal window.
This command removes the node from the cluster, stops the Sun Cluster software, and disables the volume manager on that node.
# scadmin stopnode |
Halt the operating system on the node.
Refer to the Solaris System Administrator's Guide.
Power off the node.
Refer to your hardware service manual for more information.
Do not disconnect any cables from the failed node at this time.
Remove the boot disk from the failed node.
Refer to your hardware service manual for more information.
Place the boot disk in the identical slot in the new node.
The root disk should be accessible at the same address as before. Refer to your hardware service manual for more information.
Be sure that the new node has the same IP address as the failed system. You may need to modify the boot servers or arp servers to remap the IP address to the new Ethernet address. For more information, refer to the NIS+ and DNS Setup and Configuration Guide.
Power on the new node.
Refer to your hardware service manual for more information.
If the node automatically boots, shut down the operating system and take the system to the OpenBoot PROM monitor.
For more information, refer to the shutdown(1M) man page.
Refer to your hardware planning and installation guide to ensure that every scsi-initiator-id is set correctly.
Power off the new node.
Refer to your hardware service manual for more information.
On the surviving node that shares the multihost disks with the failed node, detach all of the disks in one disk expansion unit attached to the failed node.
Refer to your hardware service manual for more information.
Power off the disk expansion unit.
Refer to your hardware service manual for more information.
As you replace the failed node, messages similar to the following might appear on the system console. Disregard these messages, because they might not indicate a problem.
Nov 3 17:44:00 updb10a unix: WARNING: /sbus@1f,0/SUNW,fas@0,8800000/sd@2,0 (sd17): Nov 3 17:44:00 updb10a unix: SCSI transport failed: reason 'incomplete': retrying \ command Nov 3 17:44:03 updb10a unix: WARNING: /sbus@1f,0/SUNW,fas@0,8800000/sd@2,0 (sd17): Nov 3 17:44:03 updb10a unix: disk not responding to selection |
Detach the SCSI cable from the failed node and attach it to the corresponding slot on the new node.
Refer to your hardware service manual for more information.
Power on the disk expansion unit.
Refer to your hardware service manual for more information.
Reattach all of the disks you detached in Step 12.
Refer to your hardware service manual for more information.
Wait for volume recovery to complete on all the volumes in the disk expansion unit before detaching the corresponding mirror disk expansion unit.
Use your volume manager software to determine when volume recovery has occurred.
Repeat Step 12 through Step 17 for all of the remaining disk expansion units.
Power on the replaced (new) node.
Refer to your hardware service manual for more information.
Reboot the node and wait for the system to come up.
<#0> boot |
Determine the Ethernet address on the replaced (new) node.
# /usr/sbin/arp nodename |
Determine the node ID of the replaced node.
By the process of elimination, you can determine which node is not in the cluster. The node IDs should be numbered consecutively starting with node 0.
# get_node_status sc: included in running cluster node id: 0 membership: 0 interconnect0: unknown interconnect1: unknown vm_type: cvm vm_on_node: master vm: up db: down |
Inform the cluster system of the new Ethernet address (of the replaced node) by entering the following command on all the cluster nodes.
# scconf clustername -N node-id ethernet-address-of-host |
Continuing with the example in Step 22, the node ID is 1:
# scconf clustername -N 1 ethernet-address-of-host |
Start up the replaced node.
# scadmin startnode |
In parallel database configuration, restart the database.
Refer to the appropriate documentation for your data services. All HA applications are automatically started with the scadmin startcluster and scadmin startnode commands.
The Terminal Concentrator need not be operational for the cluster to stay up. If the Terminal Concentrator fails, the cluster itself does not fail.
You can replace a failed Terminal Concentrator without affecting the cluster. If the new Terminal Concentrator has retained the same name, IP address, and password as the original, then no cluster commands are required. Simply plug in the new Terminal Concentrator and it will work as expected.
If the replacement Terminal Concentrator has a new name, IP address, or password, use the scconf(1M) command as described in "3.12 Changing TC/SSP Information", to change this information in the cluster database. This can be done with the cluster running without affecting cluster operations.
The ccdadm(1M) command is used to perform administrative procedures on the Cluster Configuration Database (CCD). Refer to the ccdadm(1M) man page for additional information.
As root, you can run the ccdadm(1M) command from any active node. This command updates all the nodes in your cluster.
It is good practice to checkpoint the CCD using the -c option (checkpoint) to ccdadm(1M) each time cluster configuration is updated. The CCD is extensively used by the Sun Cluster framework to store configuration data related to logical hosts and HA data services. The CCD is also used to store the network adapter configuration data used by PNM. We strongly recommended that after any changes to the HA or PNM configuration of the cluster, you capture the current valid snapshot of the CCD by using the -c option as an insurance against problems that can occur under fault scenarios in the future. This requirement is no different from requiring database administrators or system administrators to frequently backup their data to avoid catastrophes in the future due to unforeseen circumstances.
Run the -v option whenever there may be a problem with the Dynamic CCD.
This option compares the consistency record of each CCD copy on all the cluster nodes, enabling you to verify that the database is consistent across all the nodes. CCD queries are disabled while the verification is in progress.
# ccdadm clustername -v |
Run the -c option once a week or whenever you back up the CCD.
This option makes a backup copy of the Dynamic CCD. The backup copy subsequently can be used to restore the Dynamic CCD by using the -r option. See "4.11.3 How to Restore the CCD" for more information.
When backing up the CCD, put all logical hosts in maintenance mode before running the ccdadm -c command. The logical hosts must be in maintenance mode when restoring the CCD database. Therefore, having a backup file similar to the restore state will prevent unnecessary errors or problems.
# ccdadm clustername -c checkpoint-filename |
In this command, checkpoint-filename is the name of your backup copy.
Run ccdadm(1M) with the -r option whenever the CCD has been corrupted. This option discards the current copy of the Dynamic CCD and restores it with the contents of the restore file you supply. Use this command to initialize or restore the Dynamic CCD after the ccdd(1M) reconfiguration algorithm failed to elect a valid CCD copy upon cluster restart. The CCD is then marked valid.
If necessary, disable the quorum.
See "4.11.4 How to Enable or Disable the CCD Quorum" for more information.
# ccdadm clustername -q off |
Put the logical hosts in maintenance mode.
# haswitch -m logicalhosts |
Restore the CCD.
In this command, restore-filename is the name of the file you are restoring.
# ccdadm clustername -r restore-filename |
If necessary, turn the CCD quorum back on.
# ccdadm clustername -q on |
Bring the logical hosts back online.
For example:
# haswitch phys-host1 logicalhost1 # haswitch phys-host2 logicalhost2 |
Typically, the cluster software requires a quorum before updating the CCD. The -q option enables you to disable this restriction and to update the CCD with any number of nodes.
Run this option to enable or disable a quorum when updating or restoring the Dynamic CCD. The quorum_flag is a toggle: on (to enable) or off (to disable) a quorum. By default, the quorum is enabled.
For example, if you have three physical nodes, you need at least two nodes to perform updates. Because of a hardware failure, you can bring up only one node. The cluster software does not enable you to update the CCD. If, however, you run the ccdadm -q command, you can toggle off the software control, and update the CCD.
# ccdadm clustername -q on|off |
The -p option enables you to purify (verify the contents and check the syntax of) the CCD database file. Run this option whenever there is a syntax error in the CCD database file.
# ccdadm -p CCD-filename |
The -p option reports any format errors in the candidate file and writes a corrected version of the file into the file filename.pure. You can then restore this "pure" file as the new CCD database. See "4.11.3 How to Restore the CCD" for more information.
In some situations, you might want to disable a shared CCD. Such situations might include troubleshooting scenarios, or conversion of a two-node cluster to a three-node cluster such that you no longer need a shared CCD.
Stop one node in the cluster.
The following command stops phys-hahost2:
phys-hahost2# scadmin stopnode |
Back up the shared CCD to a safe location.
Turn off the shared CCD by using the scconf(1M) command.
Run this command on all nodes.
phys-hahost1# scconf -S none |
Copy the CCD from the shared diskset to both nodes
Unmount the shared CCD volume.
phys-hahost1# umount /etc/opt/SUNWcluster/conf/ccdssa |
Deport the disk group on which the shared CCD resided.
phys-hahost1# vxdg deport sc_dg |
Restart the stopped node.
phys-hahost2# scadmin startnode |
The private CCDs on each node should now be identical. To reinstate the shared CCD, follow the steps described in the appendix on configuring SSVM in the Sun Cluster 2.2 Software Installation Guide.
The system logs errors in the CCD to the /var/opt/SUNWcluster/ccd/ccd.log file. Critical error messages are also passed to the Cluster Console. Additionally, in the rare case of a crash, the software creates a core file under /var/opt/SUNWcluster/ccd.
The following is an example of the ccd.log file.
lpc204# cat ccd.log Apr 16 14:54:05 lpc204 ID[SUNWcluster.ccd.ccdd.1005]: (info) starting `START' transition with time-out 10000 Apr 16 14:54:05 lpc204 ID[SUNWcluster.ccd.ccdd.1005]: (info) completed `START' transition with status 0 Apr 16 14:54:06 lpc204 ID[SUNWcluster.ccd.ccdd.1005]: (info) starting `STEP1' transition with time-out 20000 Apr 16 14:54:06 lpc204 ID[SUNWcluster.ccd.ccdd.1000]: (info) Nodeid = 0 Up = 0 Gennum = 0 Date = Feb 14 10h30m00 1997 Restore = 4 Apr 16 14:54:06 lpc204 ID[SUNWcluster.ccd.ccdd.1002]: (info) start reconfiguration elected CCD from Nodeid = 0 Apr 16 14:54:06 lpc204 ID[SUNWcluster.ccd.ccdd.1004]: (info) the init CCD database is consistent Apr 16 14:54:06 lpc204 ID[SUNWcluster.ccd.ccdd.1001]: (info) Node is up as a one-node cluster after scadmin startcluster; skipping ccd quorum test Apr 16 14:54:06 lpc204 ID[SUNWcluster.ccd.ccdd.1005]: (info) completed `STEP1' transition with status 0 |
The following table lists the most common error messages with suggestions for resolving the problem. Refer to the Sun Cluster 2.2 Error Messages Manual for the complete list of error messages.
Table 4-1 Common Error Messages for the Cluster Configuration Database
The list of disks maintained by the volume manager is used as the set of devices for failure fencing. If there are no disk groups present in a system, there are no devices for failure fencing (there is effectively no data to be protected). However, when new shared disk groups are imported while one or more nodes are not in the cluster, the cluster must be informed that an extra set of devices need failure fencing.
When new shared disk groups are imported while one or more nodes are not in the cluster, the cluster must be informed that an extra set of devices need failure fencing. This is accomplished by running the scadmin resdisk command from a node that can access the new disk group(s).
# scadmin resdisks |
This command reserves all the devices connected to a node if no other node (that has connectivity to the same set of devices) is in the cluster membership. That is, reservations are affected only if one and only one node, out of all possible nodes that have direct physical connectivity to the devices, is in the cluster membership. If this condition is false, the scadmin resdisks command has no effect. The command also fails if a cluster reconfiguration is in progress. Reservations on shared devices are automatically released when this one node is shut down, or when other nodes, with direct connectivity to the shared devices, join the cluster membership.
It is unnecessary to run the scadmin resdisks command if shared disk groups are imported while all nodes are in the cluster. Reservations and failure fencing are not relevant if full cluster membership is present.
However, if a shared disk group is deported, the reservations on the shared devices in the deported disk group are not released. These reservations are not released until either the node that does the reservations is shut down, or the other node, with which it shares devices, joins the cluster.
To enable the set of disks belonging to the deported disk group to be used immediately, enter the following two commands in succession on all cluster nodes, after deporting the shared disk group:
# scadmin reldisks # scadmin resdisks |
The first command releases reservations on all shared devices. The second command effectively redoes the reservations based on the currently imported set of disk groups, and automatically excludes the set of disks associated with deported disk groups.