Sun Cluster 2.2 System Administration Guide

Administering the Switch Management Agent

The Switch Management Agent (SMA) is a cluster module that maintains communication channels over the cluster private interconnect. It monitors the private interconnect and invokes a failover to a backup network if it detects a failure.

Note the following limitations before beginning the procedure:

See also Appendix B in the Sun Cluster 2.2 Hardware Site Preparation, Planning, and Installation Guide.

How to Add Switches and SCI Cards

Use this procedure to add switches and SCI cards to cluster nodes. See the sm_config(1M) man page for details.

  1. Edit the sm_config template file to include the configuration changes.

    Normally, the template file is located in /opt/SUNWsma/bin/Examples.

  2. Configure the SCI SBus cards by running the sm_config(1M) command from one of the nodes.

    Rerun the command a second time to ensure that SCI node IDs and IP addresses are assigned correctly to the cluster nodes. Incorrect assignments can cause miscommunication between the nodes.

  3. Reboot the new nodes.

SCI Software Troubleshooting

If a problem occurs with the SCI software, verify that the following are true:

Also note the following problems and solutions:

To work around this problem, edit the /etc/system file and set the value of shmsys:shminfo_shmmin to less than 200. Then reboot the machine for the new values to take effect.

For more information about SCI components, see Appendix B in the Sun Cluster 2.2 Hardware Site Preparation, Planning, and Installation Guide.

How to Verify Connectivity Between Nodes

There are two ways to verify the connectivity between nodes: by running get_ci_status(1M) or by running ping(1).

  1. Run the get_ci_status(1M) command on all cluster nodes.

    Example output for get_ci_status(1M) is shown below.


    # /opt/SUNWsma/bin/get_ci_status
    sma: sci #0: sbus_slot# 1; adapter_id 8 (0x08); ip_address 1; switch_id# 0; port_id# 0; Adapter Status - UP; Link Status - UP
    sma: sci #1: sbus_slot# 2; adapter_id 12 (0x0c); ip_address 17; switch_id# 1; port_id# 0; Adapter Status - UP; Link Status - UP
    sma: Switch_id# 0
    sma: port_id# 1: host_name = interconn2; adapter_id = 72; active | operational
    sma: port_id# 2: host_name = interconn3; adapter_id = 136; active | operational
    sma: port_id# 3: host_name = interconn4; adapter_id = 200; active | operational
    sma: Switch_id# 1
    sma: port_id# 1: host_name = interconn2; adapter_id = 76; active | operational
    sma: port_id# 2: host_name = interconn3; adapter_id = 140; active | operational
    sma: port_id# 3: host_name = interconn4; adapter_id = 204; active | operational
    # 

    The first four lines indicate the status of the local node (in this case, interconn1). It is communicating with both switch_id# 0 and switch_id# 1 (Link Status - UP).


    sma: sci #0: sbus_slot# 1; adapter_id 8 (0x08); ip_address 1; switch_id# 0; port_id# 0; Adapter Status - UP; Link Status - UP
    sma: sci #1: sbus_slot# 2; adapter_id 12 (0x0c); ip_address 17; switch_id# 1; port_id# 0; Adapter Status - UP; Link Status - UP

    The rest of the output indicates the global status of the other nodes in the cluster. All the ports on the two switches are communicating with their nodes. If there is a problem with the hardware, inactive is displayed (instead of active). If there is a problem with the software, inoperational is displayed (instead of operational).


    sma: Switch_id# 0
    sma: port_id# 1: host_name = interconn2; adapter_id = 72; active | operational
    sma: port_id# 2: host_name = interconn3; adapter_id = 136; active | operational
    sma: port_id# 3: host_name = interconn4; adapter_id = 200; active | operational
    sma: Switch_id# 1
    sma: port_id# 1: host_name = interconn2; adapter_id = 76; active | operational
    sma: port_id# 2: host_name = interconn3; adapter_id = 140; active | operational
    sma: port_id# 3: host_name = interconn4; adapter_id = 204; active | operational
    #

  1. Run the ping(1) command on all the IP addresses of remote nodes.

    Example output for ping(1) is shown below.


    # ping IP-address
    

    The IP addresses are found in the /etc/sma.ip file. Be sure to run the ping(1) command for each node in the cluster.

    The ping(1) command returns an "alive" message indicating that the two ends are communicating without a problem. Otherwise, an error message is displayed.

    For example,


    # ping 204.152.65.2
    204.152.65.2 is alive

How to Verify the SCI Interface Configuration
  1. Run the ifconfig -a command to verify that all SCI interfaces are up and that the cluster nodes have the correct IP addresses.

    The last 8 bits of the IP address should match the IP field value in the /etc/sma.config file.


    # ifconfig -a
    lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
    	inet 127.0.0.1 netmask ff000000
    hme0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
    	inet 129.146.238.55 netmask ffffff00 broadcast 129.146.238.255
    	ether 8:0:20:7b:fa:0
    scid0: flags=80cl<UP,RUNNING,NOARP,PRIVATE> mtu 16321
    	inet 204.152.65.1 netmask fffffff0
    scid1: flags=80cl<UP,RUNNING,NOARP,PRIVATE> mtu 16321
    	inet 204.152.65.17 netmask fffffff0