1. Introduction to Administering Oracle Solaris Cluster
2. Oracle Solaris Cluster and RBAC
3. Shutting Down and Booting a Cluster
4. Data Replication Approaches
5. Administering Global Devices, Disk-Path Monitoring, and Cluster File Systems
7. Administering Cluster Interconnects and Public Networks
Overview of Administering the Cluster
How to Change the Cluster Name
How to Map Node ID to Node Name
How to Work With New Cluster Node Authentication
How to Reset the Time of Day in a Cluster
SPARC: How to Display the OpenBoot PROM (OBP) on a Node
How to Change the Node Private Hostname
How to Add a Private Hostname for a Non-Voting Node on a Global Cluster
How to Change the Private Hostname on a Non-Voting Node on a Global Cluster
How to Delete the Private Hostname for a Non-Voting Node on a Global Cluster
How to Put a Node Into Maintenance State
Performing Zone Cluster Administrative Tasks
How to Remove a File System From a Zone Cluster
How to Remove a Storage Device From a Zone Cluster
How to Uninstall Oracle Solaris Cluster Software From a Cluster Node
Troubleshooting a Node Uninstallation
Unremoved Cluster File System Entries
Unremoved Listing in Device Groups
Creating, Setting Up, and Managing the Oracle Solaris Cluster SNMP Event MIB
How to Enable an SNMP Event MIB
How to Disable an SNMP Event MIB
How to Change an SNMP Event MIB
How to Enable an SNMP Host to Receive SNMP Traps on a Node
How to Disable an SNMP Host From Receiving SNMP Traps on a Node
How to Add an SNMP User on a Node
How to Remove an SNMP User From a Node
Running an Application Outside the Global Cluster
How to Take a Solaris Volume Manager Metaset From Nodes Booted in Noncluster Mode
How to Save the Solaris Volume Manager Software Configuration
How to Purge the Corrupted Diskset
How to Recreate the Solaris Volume Manager Software Configuration
10. Configuring Control of CPU Usage
11. Patching Oracle Solaris Cluster Software and Firmware
12. Backing Up and Restoring a Cluster
13. Administering Oracle Solaris Cluster With the Graphical User Interfaces
This section describes how to perform administrative tasks for the entire global cluster or zone cluster. The following table lists these administrative tasks and the associated procedures. You generally perform cluster administrative tasks in the global zone. To administer a zone cluster, at least one machine that will host the zone cluster must be up in cluster mode. All zone-cluster nodes are not required to be up and running; Oracle Solaris Cluster replays any configuration changes when the node that is currently out of the cluster rejoins the cluster.
Note - By default, power management is disabled so that it does not interfere with the cluster. If you enable power management for a single-node cluster, the cluster is still running but it can become unavailable for a few seconds. The power management feature attempts to shut down the node, but it does not succeed.
In this chapter, phys-schost# reflects a global-cluster prompt. The clzonecluster interactive shell prompt is clzc:schost>.
Table 9-1 Task List: Administering the Cluster
|
If necessary, you can change the cluster name after initial installation.
The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.
This procedure provides the long forms of the Oracle Solaris Cluster commands. Most commands also have short forms. Except for the long and short forms of the command names, the commands are identical.
phys-schost# clsetup
The Main Menu is displayed.
The Other Cluster Properties menu is displayed.
phys-schost# stclient -x
phys-schost# stclient -d -i service_tag_instance_number
phys-schost# reboot
Example 9-1 Changing the Cluster Name
The following example shows the cluster(1CL) command generated from the clsetup(1CL) utility to change to the new cluster name, dromedary.
phys-schost# cluster -c dromedary
During Oracle Solaris Cluster installation, each node is automatically assigned a unique node ID number. The node ID number is assigned to a node in the order in which it joins the cluster for the first time. After the node ID number is assigned, the number cannot be changed. The node ID number is often used in error messages to identify which cluster node the message concerns. Use this procedure to determine the mapping between node IDs and node names.
You do not need to be superuser to list configuration information for a global cluster or a zone cluster. One step in this procedure is performed from a node of the global cluster. The other step is performed from a zone-cluster node.
phys-schost# clnode show | grep Node
phys-schost# zlogin sczone clnode -v | grep Node
Example 9-2 Mapping the Node ID to the Node Name
The following example shows the node ID assignments for a global cluster.
phys-schost# clnode show | grep Node === Cluster Nodes === Node Name: phys-schost1 Node ID: 1 Node Name: phys-schost2 Node ID: 2 Node Name: phys-schost3 Node ID: 3
Oracle Solaris Cluster enables you to determine if new nodes can add themselves to the global cluster and the type of authentication to use. You can permit any new node to join the cluster over the public network, deny new nodes from joining the cluster, or indicate a specific node that can join the cluster. New nodes can be authenticated by using either standard UNIX or Diffie-Hellman (DES) authentication. If you select DES authentication, you must also configure all necessary encryption keys before a node can join. See the keyserv(1M) and publickey(4) man pages for more information.
The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.
This procedure provides the long forms of the Oracle Solaris Cluster commands. Most commands also have short forms. Except for the long and short forms of the command names, the commands are identical.
phys-schost# clsetup
The Main Menu is displayed.
The New Nodes menu is displayed.
Example 9-3 Preventing a New Machine From Being Added to the Global Cluster
The clsetup utility generates the claccess command. The following example shows the claccess command that prevents new machines from being added to the cluster.
phys-schost# claccess deny -h hostname
Example 9-4 Permitting All New Machines to Be Added to the Global Cluster
The clsetup utility generates the claccess command. The following example shows the claccess command that enables all new machines to be added to the cluster.
phys-schost# claccess allow-all
Example 9-5 Specifying a New Machine to Be Added to the Global Cluster
The clsetup utility generates the claccess command. The following example shows the claccess command that enables a single new machine to be added to the cluster.
phys-schost# claccess allow -h hostname
Example 9-6 Setting the Authentication to Standard UNIX
The clsetup utility generates the claccess command. The following example shows the claccess command that resets to standard UNIX authentication for new nodes that are joining the cluster.
phys-schost# claccess set -p protocol=sys
Example 9-7 Setting the Authentication to DES
The clsetup utility generates the claccess command. The following example shows the claccess command that uses DES authentication for new nodes that are joining the cluster.
phys-schost# claccess set -p protocol=des
When using DES authentication, you must also configure all necessary encryption keys before a node can join the cluster. For more information, see the keyserv(1M) and publickey(4) man pages.
Oracle Solaris Cluster software uses the Network Time Protocol (NTP) to maintain time synchronization between cluster nodes. Adjustments in the global cluster occur automatically as needed when nodes synchronize their time. For more information, see the Oracle Solaris Cluster Concepts Guide and the Network Time Protocol User's Guide.
Caution - When using NTP, do not attempt to adjust the cluster time while the cluster is up and running. Do not adjust the time by using the date(1), rdate(1M), xntpd(1M), or svcadm(1M) commands interactively or within cron(1M) scripts. |
The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.
This procedure provides the long forms of the Oracle Solaris Cluster commands. Most commands also have short forms. Except for the long and short forms of the command names, the commands are identical.
phys-schost# cluster shutdown -g0 -y -i 0
On SPARC based systems, run the following command.
ok boot -x
On x86 based systems, run the following commands.
# shutdown -g -y -i0 Press any key to continue
The GRUB menu appears similar to the following:
GNU GRUB version 0.95 (631K lower / 2095488K upper memory) +-------------------------------------------------------------------------+ | Solaris 10 /sol_10_x86 | | Solaris failsafe | | | +-------------------------------------------------------------------------+ Use the ^ and v keys to select which entry is highlighted. Press enter to boot the selected OS, 'e' to edit the commands before booting, or 'c' for a command-line.
For more information about GRUB based booting, see Booting an x86 Based System by Using GRUB (Task Map) in System Administration Guide: Basic Administration.
The GRUB boot parameters screen appears similar to the following:
GNU GRUB version 0.95 (615K lower / 2095552K upper memory) +----------------------------------------------------------------------+ | root (hd0,0,a) | | kernel /platform/i86pc/multiboot | | module /platform/i86pc/boot_archive | +----------------------------------------------------------------------+ Use the ^ and v keys to select which entry is highlighted. Press 'b' to boot, 'e' to edit the selected command in the boot sequence, 'c' for a command-line, 'o' to open a new line after ('O' for before) the selected line, 'd' to remove the selected line, or escape to go back to the main menu.
[ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename. ESC at any time exits. ] grub edit> kernel /platform/i86pc/multiboot -x
The screen displays the edited command.
GNU GRUB version 0.95 (615K lower / 2095552K upper memory) +----------------------------------------------------------------------+ | root (hd0,0,a) | | kernel /platform/i86pc/multiboot -x | | module /platform/i86pc/boot_archive | +----------------------------------------------------------------------+ Use the ^ and v keys to select which entry is highlighted. Press 'b' to boot, 'e' to edit the selected command in the boot sequence, 'c' for a command-line, 'o' to open a new line after ('O' for before) the selected line, 'd' to remove the selected line, or escape to go back to the main menu.-
Note - This change to the kernel boot parameter command does not persist over the system boot. The next time you reboot the node, it will boot into cluster mode. To boot into noncluster mode instead, perform these steps again to add the -x option to the kernel boot parameter command.
phys-schost# date HHMM.SS
phys-schost# rdate hostname
phys-schost# reboot
On each node, run the date command.
phys-schost# date
Use this procedure if you need to configure or change OpenBoot™ PROM settings.
The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.
This procedure provides the long forms of the Oracle Solaris Cluster commands. Most commands also have short forms. Except for the long and short forms of the command names, the commands are identical.
# telnet tc_name tc_port_number
Specifies the name of the terminal concentrator.
Specifies the port number on the terminal concentrator. Port numbers are configuration dependent. Typically, ports 2 and 3 (5002 and 5003) are used for the first cluster installed at a site.
phys-schost# clnode evacuate node # shutdown -g0 -y
Caution - Do not use send brk on a cluster console to shut down a cluster node. |
Use this procedure to change the private hostname of a cluster node after installation has been completed.
Default private host names are assigned during initial cluster installation. The default private hostname takes the form clusternode< nodeid>-priv, for example: clusternode3-priv. Change a private hostname only if the name is already in use in the domain.
Caution - Do not attempt to assign IP addresses to new private host names. The clustering software assigns them. |
The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.
This procedure provides the long forms of the Oracle Solaris Cluster commands. Most commands also have short forms. Except for the long and short forms of the command names, the commands are identical.
phys-schost# clresource disable resource[,...]
Include the following in the applications you disable.
HA-DNS and HA-NFS services, if configured
Any application that has been custom-configured to use the private hostname
Any application that is being used by clients over the private interconnect
For information about using the clresource command, see the clresource(1CL) man page and the Oracle Solaris Cluster Data Services Planning and Administration Guide.
Use the svcadm command to shut down the Network Time Protocol (NTP) daemon. See the svcadm(1M) man page for more information about the NTP daemon.
phys-schost# svcadm disable ntp
Run the utility from only one of the nodes in the cluster.
Note - When selecting a new private hostname, ensure that the name is unique to the cluster node.
Answer the questions when prompted. You are asked the name of the node whose private hostname you are changing (clusternode< nodeid>-priv), and the new private hostname.
Perform this step on each node in the cluster. Flushing prevents the cluster applications and data services from trying to access the old private hostname.
phys-schost# nscd -i hosts
If you perform this step at installation, also remember to remove names for nodes that are configured. The default template is preconfigured with 16 nodes. Typically, the ntp.conf.cluster file is identical on each cluster node.
Perform this step on each node of the cluster.
Use the svcadm command to restart the NTP daemon.
# svcadm enable ntp
phys-schost# clresource enable resource[,...]
For information about using the clresourcecommand, see the clresource(1CL) man page and the Oracle Solaris Cluster Data Services Planning and Administration Guide.
Example 9-8 Changing the Private Hostname
The following example changes the private hostname from clusternode2-priv to clusternode4-priv, on node phys-schost-2 .
[Disable all applications and data services as necessary.] phys-schost-1# /etc/init.d/xntpd stop phys-schost-1# clnode show | grep node ... private hostname: clusternode1-priv private hostname: clusternode2-priv private hostname: clusternode3-priv ... phys-schost-1# clsetup phys-schost-1# nscd -i hosts phys-schost-1# vi /etc/inet/ntp.conf ... peer clusternode1-priv peer clusternode4-priv peer clusternode3-priv phys-schost-1# ping clusternode4-priv phys-schost-1# /etc/init.d/xntpd start [Enable all applications and data services disabled at the beginning of the procedure.]
Use this procedure to add a private hostname for a non-voting node on a global cluster after installation has been completed. In the procedures in this chapter, phys-schost# reflects a global-cluster prompt. Perform this procedure only on a global cluster.
phys-schost# clsetup
Answer the questions when prompted. There is no default for a global-cluster non-voting node private hostname. You will need to provide a hostname.
Use this procedure to change the private hostname of a non-voting node after installation has been completed.
Private host names are assigned during initial cluster installation. The private hostname takes the form clusternode< nodeid>-priv, for example: clusternode3-priv . Change a private hostname only if the name is already in use in the domain.
Caution - Do not attempt to assign IP addresses to new private hostnames. The clustering software assigns them. |
The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.
This procedure provides the long forms of the Oracle Solaris Cluster commands. Most commands also have short forms. Except for the long and short forms of the command names, the commands are identical.
phys-schost# clresource disable resource1, resource2
Include the following in the applications you disable.
HA-DNS and HA-NFS services, if configured
Any application that has been custom-configured to use the private hostname
Any application that is being used by clients over the private interconnect
For information about using the clresource command, see the clresource(1CL) man page and the Oracle Solaris Cluster Data Services Planning and Administration Guide.
phys-schost# clsetup
You need to perform this step only from one of the nodes in the cluster.
Note - When selecting a new private hostname, ensure that the name is unique to the cluster.
No default exists for a non-voting node of a global cluster's private hostname. You need to provide a hostname.
Answer the questions when prompted. You are asked for the name of the non-voting node whose private hostname is being changed (clusternode< nodeid>-priv), and the new private hostname.
Perform this step on each node in the cluster. Flushing prevents the cluster applications and data services from trying to access the old private hostname.
phys-schost# nscd -i hosts
Use this procedure to delete a private hostname for a non-voting node on a global cluster. Perform this procedure only on a global cluster.
You can change the name of a node that is part of an Oracle Solaris Cluster configuration. You must rename the Oracle Solaris hostname before you can rename the node. Use the clnode rename command to rename the node.
The following instructions apply to any application that is running in a global cluster.
ok> boot -x
# clnode rename -n newnodename oldnodename
# sync;sync;sync;/etc/reboot
# clnode status -v
You can choose to change the logical hostname resource's hostnamelist property either before or after you rename the node by following the steps in How to Rename a Node. This step is optional.
The following steps show how to configure the apache-lh-res resource to work with the new logical hostname, and must be executed in cluster mode.
# clrg offline apache-rg
# clrs disable appache-lh-res
# clrs set -p HostnameList=test-2 apache-lh-res
# clrs enable apache-lh-res
# clrg online apache-rg
# clrs status apache-rs
Put a global-cluster node into maintenance state when taking the node out of service for an extended period of time. This way, the node does not contribute to the quorum count while it is being serviced. To put a node into maintenance state, the node must be shut down with clnode(1CL) evacuate and cluster(1CL) shutdown commands.
Note - Use the Oracle Solaris shutdown command to shut down a single node. Use the cluster shutdown command only when shutting down an entire cluster.
When a cluster node is shut down and put in maintenance state, all quorum devices that are configured with ports to the node have their quorum vote counts decremented by one. The node and quorum device vote counts are incremented by one when the node is removed from maintenance mode and brought back online.
Use the clquorum(1CL) disable command to put a cluster node into maintenance state.
The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.
This procedure provides the long forms of the Oracle Solaris Cluster commands. Most commands also have short forms. Except for the long and short forms of the command names, the commands are identical.
phys-schost# clnode evacuate node
phys-schost# shutdown -g0 -y-i 0
phys-schost# clquorum disable node
Specifies the name of a node that you want to put into maintenance mode.
phys-schost# clquorum status node
The node that you put into maintenance state should have a Status of offline and 0 (zero) for Present and Possible quorum votes.
Example 9-9 Putting a Global-Cluster Node Into Maintenance State
The following example puts a cluster node into maintenance state and verifies the results. The clnode status output shows the Node votes for phys-schost-1 to be 0 (zero) and the status to be Offline. The Quorum Summary should also show reduced vote counts. Depending on your configuration, the Quorum Votes by Device output might indicate that some quorum disk devices are offline as well.
[On the node to be put into maintenance state:] phys-schost-1# clnode evacuate phys-schost-1 phys-schost-1# shutdown -g0 -y -i0 [On another node in the cluster:] phys-schost-2# clquorum disable phys-schost-1 phys-schost-2# clquorum status phys-schost-1 -- Quorum Votes by Node -- Node Name Present Possible Status --------- ------- -------- ------ phys-schost-1 0 0 Offline phys-schost-2 1 1 Online phys-schost-3 1 1 Online
See Also
To bring a node back online, see How to Bring a Node Out of Maintenance State.
Use the following procedure to bring a global-cluster node back online and reset the quorum vote count to the default. For cluster nodes, the default quorum count is one. For quorum devices, the default quorum count is N-1, where N is the number of nodes with nonzero vote counts that have ports to the quorum device.
When a node has been put in maintenance state, the node's quorum vote count is decremented by one. All quorum devices that are configured with ports to the node will also have their quorum vote counts decremented. When the quorum vote count is reset and a node removed from maintenance state, both the node's quorum vote count and the quorum device vote count are incremented by one.
Run this procedure any time a global-cluster node has been put in maintenance state and you are removing it from maintenance state.
Caution - If you do not specify either the globaldev or node options, the quorum count is reset for the entire cluster. |
The phys-schost# prompt reflects a global-cluster prompt. Perform this procedure on a global cluster.
This procedure provides the long forms of the Oracle Solaris Cluster commands. Most commands also have short forms. Except for the long and short forms of the command names, the commands are identical.
You must reset the quorum count from a node other than the node in maintenance state before rebooting the node, or the node might hang while waiting for quorum.
phys-schost# clquorum reset
The change flag that resets quorum.
phys-schost# clquorum status
The node that you removed from maintenance state should have a status of online and show the appropriate vote count for Present and Possible quorum votes.
Example 9-10 Removing a Cluster Node From Maintenance State and Resetting the Quorum Vote Count
The following example resets the quorum count for a cluster node and its quorum devices to their defaults and verifies the result. The cluster status output shows the Node votes for phys-schost-1 to be 1 and the status to be online. The Quorum Summary should also show an increase in vote counts.
phys-schost-2# clquorum reset
On SPARC based systems, run the following command.
ok boot
On x86 based systems, run the following commands.
When the GRUB menu is displayed, select the appropriate Oracle Solaris entry and press Enter. The GRUB menu appears similar to the following:
GNU GRUB version 0.95 (631K lower / 2095488K upper memory) +-------------------------------------------------------------------------+ | Solaris 10 /sol_10_x86 | | Solaris failsafe | | | +-------------------------------------------------------------------------+ Use the ^ and v keys to select which entry is highlighted. Press enter to boot the selected OS, 'e' to edit the commands before booting, or 'c' for a command-line.
phys-schost-1# clquorum status --- Quorum Votes Summary --- Needed Present Possible ------ ------- -------- 4 6 6 --- Quorum Votes by Node --- Node Name Present Possible Status --------- ------- -------- ------ phys-schost-2 1 1 Online phys-schost-3 1 1 Online --- Quorum Votes by Device --- Device Name Present Possible Status ----------- ------- -------- ------ /dev/did/rdsk/d3s2 1 1 Online /dev/did/rdsk/d17s2 0 1 Online /dev/did/rdsk/d31s2 1 1 Online `
You can enable the automatic distribution of resource group load across nodes or zones by setting load limits. You can configure a set of load limits for each cluster node. You assign load factors to resource groups, and the load factors correspond to the defined load limits of the nodes. The default behavior is to distribute resource group load evenly across all the available nodes in the resource group's node list.
The resource groups are started on a node from the resource group's node list by the RGM so that the node's load limits are not exceeded. As resource groups are assigned to nodes by the RGM, the resource groups' load factors on each node are summed up to provide a total load. The total load is then compared against that node's load limits.
A load limit consists of the following items:
A user-assigned name.
A soft limit value – You can temporarily exceed a soft load limit.
A hard limit value – Hard load limits can never be exceeded and are strictly enforced.
You can set both the hard limit and the soft limit in a single command. If one of the limits is not explicitly set, the default value is used. Hard and soft load limits for each node are created and modified with the clnode create-loadlimit, clnode set-loadlimit, and clnode delete-loadlimit commands. See the clnode(1CL) man page for more information.
You can configure a resource group to have a higher priority so that it is less likely to be displaced from a specific node. You can also set a preemption_mode property to determine if a resource group will be preempted from a node by a higher-priority resource group because of node overload. A concentrate_load property also allows you to concentrate the resource group load onto as few nodes as possible. The default value of the concentrate_load property is FALSE by default.
Note - You can configure load limits on nodes in a global cluster or a zone cluster. You can use the command line, the clsetup utility, or the Oracle Solaris Cluster Manager interface to configure load limits. The following procedure illustrates how to configure load limits using the command line.
# clnode create-loadlimit -p limitname=mem_load -Z zc1 -p softlimit=11 -p hardlimit=20 node1 node2 node3
In this example, the zone cluster name is zc1 The sample property is called mem_load and has a soft limit of 11 and a hard load limit of 20. Hard and soft limits are optional arguments and default to unlimited if you do not specifically define them. See the clnode(1CL) man page for more information.
# clresourcegroup set -p load_factors=mem_load@50,factor2@1 rg1 rg2
In this example, the load factors are set on the two resource groups, rg1 and rg2. The load factor settings correspond to the defined load limits of the nodes. You can also perform this step during the creation of the resource group with the clresourceroup create command. See the clresourcegroup(1CL) man page for more information.
# clresourcegroup remaster rg1 rg2
This command can move resource groups off their current master to other nodes to achieve uniform load distribution.
# clresourcegroup set -p priority=600 rg1
The default priority is 500. Resource groups with higher priority values get precedence in node assignment over resource groups with lower priorities.
# clresourcegroup set -p Preemption_mode=No_cost rg1
See the clresourcegroup(1CL) man page for more information on the HAS_COST, NO_COST, and NEVER options.
# cluster set -p Concentrate_load=TRUE
A strong positive or negative affinity takes precedence over load distribution. A strong affinity can never be violated, nor can a hard load limit. If you set both strong affinities and hard load limits, some resource groups might be forced to remain offline if both constraints cannot be satisfied.
The following example specifies a strong positive affinity between resource group rg1 in zone cluster zc1 and resource group rg2 in zone cluster zc2.
# clresourcegroup set -p RG_affinities=++zc2:rg2 zc1:rg1
# clnode status -Z all -v
The output includes any load limit settings that are defined on the node or on its non-global zones.