Sun Cluster 2.2 System Administration Guide

Administering the Switch Management Agent

The Switch Management Agent (SMA) is a cluster module that maintains communication channels over the cluster private interconnect. It monitors the private interconnect and invokes a failover to a backup network if it detects a failure.

Note the following limitations before beginning the procedure:

On SC2000/SS1000 nodes, do not install more than one SCI card on a single system board. More than one SCI card can cause spurious link resets on the SCI interconnect.
On E10000 nodes, an SCI card should not be the only card on an SBus.
On Sun StorEdge A3000 configurations, do not install SCI adapters and other A3000 hosts adapters on the same SBus.

See also Appendix B in the Sun Cluster 2.2 Hardware Site Preparation, Planning, and Installation Guide.

How to Add Switches and SCI Cards

Use this procedure to add switches and SCI cards to cluster nodes. See the sm_config(1M) man page for details.

Edit the sm_config template file to include the configuration changes.

Normally, the template file is located in /opt/SUNWsma/bin/Examples.

Configure the SCI SBus cards by running the sm_config(1M) command from one of the nodes.

Rerun the command a second time to ensure that SCI node IDs and IP addresses are assigned correctly to the cluster nodes. Incorrect assignments can cause miscommunication between the nodes.

Reboot the new nodes.

SCI Software Troubleshooting

If a problem occurs with the SCI software, verify that the following are true:

The sm_config(1M) template file matches the hardware configuration (SCI link and switch) and cluster topology.
The sm_config(1M) command can be run successfully from one of the cluster nodes.
Any reconfigured nodes were rebooted after the sm_config(1M) command was executed.

Also note the following problems and solutions:

Some applications, such as Oracle Parallel Server (OPS), require an unusually high shared memory minimum to be specified in the /etc/system file. If the field shmsys:shminfo_shmmin in the /etc/system file is set to a value greater than 200 bytes, the sm_config(1M) command will not be able to acquire shared memory because it requests fewer bytes than the minimum the system can allocate. As a result, the system call made by the sm_config(1M) command fails, and the command is aborted.

To work around this problem, edit the /etc/system file and set the value of shmsys:shminfo_shmmin to less than 200. Then reboot the machine for the new values to take effect.

If you encounter semsys warnings and core dumps, check to see whether the semaphore values contained in the semsys:seminfo_* fields in the /etc/system file match the actual physical limits of the machine.

For more information about SCI components, see Appendix B in the Sun Cluster 2.2 Hardware Site Preparation, Planning, and Installation Guide.

How to Verify Connectivity Between Nodes

There are two ways to verify the connectivity between nodes: by running get_ci_status(1M) or by running ping(1).

Run the get_ci_status(1M) command on all cluster nodes.

Example output for get_ci_status(1M) is shown below.

# /opt/SUNWsma/bin/get_ci_status
sma: sci #0: sbus_slot# 1; adapter_id 8 (0x08); ip_address 1; switch_id# 0; port_id# 0; Adapter Status - UP; Link Status - UP
sma: sci #1: sbus_slot# 2; adapter_id 12 (0x0c); ip_address 17; switch_id# 1; port_id# 0; Adapter Status - UP; Link Status - UP
sma: Switch_id# 0
sma: port_id# 1: host_name = interconn2; adapter_id = 72; active | operational
sma: port_id# 2: host_name = interconn3; adapter_id = 136; active | operational
sma: port_id# 3: host_name = interconn4; adapter_id = 200; active | operational
sma: Switch_id# 1
sma: port_id# 1: host_name = interconn2; adapter_id = 76; active | operational
sma: port_id# 2: host_name = interconn3; adapter_id = 140; active | operational
sma: port_id# 3: host_name = interconn4; adapter_id = 204; active | operational
#

The first four lines indicate the status of the local node (in this case, interconn1). It is communicating with both switch_id# 0 and switch_id# 1 (Link Status - UP).

sma: sci #0: sbus_slot# 1; adapter_id 8 (0x08); ip_address 1; switch_id# 0; port_id# 0; Adapter Status - UP; Link Status - UP
sma: sci #1: sbus_slot# 2; adapter_id 12 (0x0c); ip_address 17; switch_id# 1; port_id# 0; Adapter Status - UP; Link Status - UP

The rest of the output indicates the global status of the other nodes in the cluster. All the ports on the two switches are communicating with their nodes. If there is a problem with the hardware, inactive is displayed (instead of active). If there is a problem with the software, inoperational is displayed (instead of operational).

sma: Switch_id# 0
sma: port_id# 1: host_name = interconn2; adapter_id = 72; active | operational
sma: port_id# 2: host_name = interconn3; adapter_id = 136; active | operational
sma: port_id# 3: host_name = interconn4; adapter_id = 200; active | operational
sma: Switch_id# 1
sma: port_id# 1: host_name = interconn2; adapter_id = 76; active | operational
sma: port_id# 2: host_name = interconn3; adapter_id = 140; active | operational
sma: port_id# 3: host_name = interconn4; adapter_id = 204; active | operational
#

Run the ping(1) command on all the IP addresses of remote nodes.

Example output for ping(1) is shown below.
# ping IP-address
The IP addresses are found in the /etc/sma.ip file. Be sure to run the ping(1) command for each node in the cluster.

The ping(1) command returns an "alive" message indicating that the two ends are communicating without a problem. Otherwise, an error message is displayed.

For example,
# ping 204.152.65.2 204.152.65.2 is alive

How to Verify the SCI Interface Configuration

Run the ifconfig -a command to verify that all SCI interfaces are up and that the cluster nodes have the correct IP addresses.

The last 8 bits of the IP address should match the IP field value in the /etc/sma.config file.

# ifconfig -a
lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
	inet 127.0.0.1 netmask ff000000
hme0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
	inet 129.146.238.55 netmask ffffff00 broadcast 129.146.238.255
	ether 8:0:20:7b:fa:0
scid0: flags=80cl<UP,RUNNING,NOARP,PRIVATE> mtu 16321
	inet 204.152.65.1 netmask fffffff0
scid1: flags=80cl<UP,RUNNING,NOARP,PRIVATE> mtu 16321
	inet 204.152.65.17 netmask fffffff0