If a problem occurs with the SCI software, verify that the following are true:
The sm_config(1M) template file matches the hardware configuration (SCI link and switch) and cluster topology.
The sm_config(1M) command can be run successfully from one of the cluster nodes.
Any reconfigured nodes were rebooted after the sm_config(1M) command was executed.
Also note the following problems and solutions:
Some applications, such as Oracle Parallel Server (OPS), require an unusually high shared memory minimum to be specified in the /etc/system file. If the field shmsys:shminfo_shmmin in the /etc/system file is set to a value greater than 200 bytes, the sm_config(1M) command will not be able to acquire shared memory because it requests fewer bytes than the minimum the system can allocate. As a result, the system call made by the sm_config(1M) command fails, and the command is aborted.
To work around this problem, edit the /etc/system file and set the value of shmsys:shminfo_shmmin to less than 200. Then reboot the machine for the new values to take effect.
If you encounter semsys warnings and core dumps, check to see whether the semaphore values contained in the semsys:seminfo_* fields in the /etc/system file match the actual physical limits of the machine.
For more information about SCI components, see Appendix B in the Sun Cluster 2.2 Hardware Site Preparation, Planning, and Installation Guide.
There are two ways to verify the connectivity between nodes: by running get_ci_status(1M) or by running ping(1).
Run the get_ci_status(1M) command on all cluster nodes.
Example output for get_ci_status(1M) is shown below.
# /opt/SUNWsma/bin/get_ci_status sma: sci #0: sbus_slot# 1; adapter_id 8 (0x08); ip_address 1; switch_id# 0; port_id# 0; Adapter Status - UP; Link Status - UP sma: sci #1: sbus_slot# 2; adapter_id 12 (0x0c); ip_address 17; switch_id# 1; port_id# 0; Adapter Status - UP; Link Status - UP sma: Switch_id# 0 sma: port_id# 1: host_name = interconn2; adapter_id = 72; active | operational sma: port_id# 2: host_name = interconn3; adapter_id = 136; active | operational sma: port_id# 3: host_name = interconn4; adapter_id = 200; active | operational sma: Switch_id# 1 sma: port_id# 1: host_name = interconn2; adapter_id = 76; active | operational sma: port_id# 2: host_name = interconn3; adapter_id = 140; active | operational sma: port_id# 3: host_name = interconn4; adapter_id = 204; active | operational # |
The first four lines indicate the status of the local node (in this case, interconn1). It is communicating with both switch_id# 0 and switch_id# 1 (Link Status - UP).
sma: sci #0: sbus_slot# 1; adapter_id 8 (0x08); ip_address 1; switch_id# 0; port_id# 0; Adapter Status - UP; Link Status - UP sma: sci #1: sbus_slot# 2; adapter_id 12 (0x0c); ip_address 17; switch_id# 1; port_id# 0; Adapter Status - UP; Link Status - UP |
The rest of the output indicates the global status of the other nodes in the cluster. All the ports on the two switches are communicating with their nodes. If there is a problem with the hardware, inactive is displayed (instead of active). If there is a problem with the software, inoperational is displayed (instead of operational).
sma: Switch_id# 0 sma: port_id# 1: host_name = interconn2; adapter_id = 72; active | operational sma: port_id# 2: host_name = interconn3; adapter_id = 136; active | operational sma: port_id# 3: host_name = interconn4; adapter_id = 200; active | operational sma: Switch_id# 1 sma: port_id# 1: host_name = interconn2; adapter_id = 76; active | operational sma: port_id# 2: host_name = interconn3; adapter_id = 140; active | operational sma: port_id# 3: host_name = interconn4; adapter_id = 204; active | operational # |
Run the ping(1) command on all the IP addresses of remote nodes.
Example output for ping(1) is shown below.
# ping IP-address |
The IP addresses are found in the /etc/sma.ip file. Be sure to run the ping(1) command for each node in the cluster.
The ping(1) command returns an "alive" message indicating that the two ends are communicating without a problem. Otherwise, an error message is displayed.
For example,
# ping 204.152.65.2 204.152.65.2 is alive |
Run the ifconfig -a command to verify that all SCI interfaces are up and that the cluster nodes have the correct IP addresses.
The last 8 bits of the IP address should match the IP field value in the /etc/sma.config file.
# ifconfig -a lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232 inet 127.0.0.1 netmask ff000000 hme0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500 inet 129.146.238.55 netmask ffffff00 broadcast 129.146.238.255 ether 8:0:20:7b:fa:0 scid0: flags=80cl<UP,RUNNING,NOARP,PRIVATE> mtu 16321 inet 204.152.65.1 netmask fffffff0 scid1: flags=80cl<UP,RUNNING,NOARP,PRIVATE> mtu 16321 inet 204.152.65.17 netmask fffffff0 |