Sun Cluster 2.2 System Administration Guide

Chapter 6 Administering Network Interfaces

This chapter provides a description of the Public Network Management (PNM) feature of Sun Cluster, and instructions for adding or replacing network interface components. The topics in this chapter are listed below.

Public Network Management Overview

The PNM feature of Sun Cluster uses fault monitoring and failover to prevent loss of node availability due to single network adapter or cable failure. PNM fault monitoring runs in local-node or cluster-wide mode to check the status of nodes, network adapters, cables, and network traffic. PNM failover uses sets of network adapters called backup groups to provide redundant connections between a cluster node and the public network. The fault monitoring and failover capabilities work together to ensure availability of services.

If your configuration includes HA data services, you must enable PNM; HA data services are dependent on PNM fault monitoring. When an HA data service experiences availability problems, it queries PNM through the cluster framework to see whether the problem is related to the public network connections. If it is, the data services wait until PNM has resolved the problem. If the problem is not with the public network, the data services invoke their own failover mechanism.

The PNM package, SUNWpnm, is installed during initial Sun Cluster software installation. The commands associated with PNM include:

See the associated man pages for details.

PNM Fault Monitoring and Failover

PNM monitors the state of the public network and the network adapters associated with each node in the cluster, and reports dubious or errored states. When PNM detects lack of response from a primary adapter (the adapter currently carrying network traffic to and from the node) it fails over the network service to another working adapter in the adapter backup group for that node. PNM then performs some checks to determine whether the fault is with the adapter or the network.

If the adapter is faulty, PNM sends error messages to syslog(3), which are in turn detected by the Cluster Manager and displayed to the user through a GUI. After a failed adapter is fixed, it is automatically tested and reinstated in the backup group at the next cluster reconfiguration. If the entire adapter backup group is down, then the Sun Cluster framework invokes a failover of the node to retain availability. If an error occurs outside of PNM's control, such as the failure of a whole subnet, then a normal failover and cluster reconfiguration will occur.

PNM monitoring runs in two modes, cluster-aware and cluster-unaware. PNM runs in cluster-aware mode when the cluster is operational. It uses the Cluster Configuration Database (CCD) to monitor status of the network (for more information on the CCD, see the overview chapter in the Sun Cluster 2.2 Software Installation Guide). PNM uses the CCD to distinguish between public network failure and local adapter failure. See Appendix B, Sun Cluster Fault Detection for more information on logical host failover initiated by public network failure.

PNM runs in cluster-unaware mode when the cluster is not operational. In this mode, PNM is unable to use the CCD and therefore cannot distinguish between adapter and network failure. In cluster-unaware mode, PNM simply detects a problem with the local network connection.

You can check the status of the public network and adapters with the PNM monitoring command, pnmstat(1M). See the man page for details.

Backup Groups

Backup groups are sets of network adapters that provide redundant connections between a single cluster node and the public network. You configure backup groups during initial installation by using the scinstall(1M) command, or after initial installation by using the pnmset(1M) command. PNM allows you to configure as many redundant adapters as you want on a single host.

To configure backup groups initially, you run pnmset(1M) as root before the cluster is started. The command runs as an interactive script to configure and verify backup groups. It also selects one adapter to be used as the primary, or active, adapter. The pnmset(1M) command names backup groups nafon, where n is an integer you assign. The command stores backup group information in the /etc/pnmconfig file.

To change an existing PNM configuration on a cluster node, you must remove the node from the cluster and then run the pnmset(1M) command. PNM monitors and incorporates changes in backup group membership dynamically.


Note -

The /etc/pnmconfig file is not removed even if the SUNWpnm package is removed, for example, during a software upgrade. That is, the backup group membership information is preserved during software upgrades and you are not required to run the pnmset(1M) utility again, unless you want to modify backup group membership.


Updates to nsswitch.conf

When configuring PNM with a backup network adapter, the /etc/nsswitch.conf file should have one of the following entries for the netmasks entry.

Table 6-1 Name Service Entry Choices for the /etc/nsswitch.conf File

Name Service Used 

netmasks Entry

None 

netmasks: files

nis

netmasks: files [NOTFOUND=return] nis

nisplus

netmasks: files [NOTFOUND=return] nisplus

The above settings will ensure that the netmasks setting will not be looked up in an NIS/NIS+ lookup table. This is important if the adapter that has failed is the primary public network and thus would not be available to provide the requested information. If the netmasks entry is not set in the prescribed manner, failover to the backup adapter will not succeed.


Caution - Caution -

The preceding changes have the effect of using the local files (/etc/netmasks and /etc/groups) for lookup tables. The NIS/NIS+ services will only be used when the local files are unavailable. Therefore, these files must be kept up-to-date with their NIS/NIS+ versions. Failure to update them makes the expected values in these files inaccessible on the cluster nodes.


Setting Up and Administering Public Network Management

This section provides procedures for setting up PNM and configuring backup groups.

How to Set Up PNM

These are the high-level steps to set up PNM:

These are the detailed steps to set up PNM.

  1. Set up the node hardware so that you have multiple network adapters on a single node using the same subnet.

    Refer to your Sun Cluster hardware documentation to set up your network adapters.

  2. If the Sun Cluster node software packages have not been installed already, install them by using the scinstall(1M) command.

    The scinstall(1M) command runs interactively to install the package set you select. The PNM package, SUNWpnm, is part of the node package set. See the Sun Cluster 2.2 Software Installation Guide for the detailed cluster installation procedure.

  3. Register the default network interface on each node, if you did not do so already.

    You must register one default network interface per node in the interface database associated with each node, and verify that the interface is plumbed and functioning correctly.

    1. Create an interface database on each node and register the primary public network interfaces.

      Create a file in the /etc directory on each node to use as the interface database. Name the file hostname.interface, where interface is your interface type, such as qfe, hme, etc. Then add one line containing the host name for that node. For example, on node phys-hahost1 with a default interface qfe1, create a file /etc/hostname.qfe1 containing the following line:


      phys-hahost1

    2. In the /etc/hosts file on each node, associate the primary public network interface name with an IP address.

      In this example, the primary physical host name is phys-hahost1:


      129.146.75.200 phys-hahost1-qfe1

      If your system uses a naming mechanism other than /etc/hosts, refer to the appropriate section in the TCP/IP and Data Communications Administration Guide to perform the equivalent function.

  4. Establish PNM backup groups by using the pnmset(1M) command.

    Run the pnmset(1M) interactive script to set up backup groups.


    Caution - Caution -

    If you have configured logical hosts and data services already, you must stop the HA data services before changing the backup group membership with pnmset(1M). If you do not stop the data services before running the pnmset(1M) command, serious problems and data service failures can result.


    1. Run the pnmset(1M) command.


      phys-hahost1# /opt/SUNWpnm/bin/pnmset
      

    2. Enter the total number of backup groups you want to configure.

      Normally this number corresponds with the number of public subnets.


      In the following dialog, you will be prompted to configure public network management.
      
      do you want to continue ... [y/n]: y
      
      How many NAFO backup groups on the host [1]: 2
      

    3. Assign backup group numbers.

      At the prompt, supply an integer between 0 and the maximum of 255. The pnmset(1M) command appends this number to the string nafo to form the backup group name.


      Enter backup group number [0]: 0
      

    4. Assign adapters to backup groups.


      Please enter all network adapters under nafo0:
      qe0 qe1
      ...

      Continue by assigning backup group numbers and adapters for all other backup groups in the configuration.

    5. Allow the pnmset(1M) command to test your adapter configuration.

      The pnmset(1M) command tests the correctness of your adapter configuration. In this example, the backup group contains one active adapter and two redundant adapters.


      The following test will evaluate the correctness of the customer NAFO configuration...
      name duplication test passed
      
      
      Check nafo0... < 20 seconds
      qe0 is active
      remote address = 192.168.142.1
      nafo0 test passed 
      
      
      Check nafo1... < 20 seconds
      qe3 is active
      remote address = 192.168.143.1
      test qe4 wait... 
      test qe2 wait... 
      nafo1 test passed 
      phys-hahost1#

    Once the configuration is verified, the PNM daemon pnmd(1M) automatically notes the configuration changes and starts monitoring the interfaces.


    Note -

    Only one adapter within a backup group should be plumbed and have an entry in the /etc/hostname.adapter file. Do not assign IP addresses to the backup adapters; they should not be plumbed.



    Note -

    PNM uses broadcast ping(1M) to monitor networks, which in turn uses broadcast ICMP (Internet Control Message Protocol) packets to communicate with other remote hosts. Some routers do not forward broadcast ICMP packets; consequently, PNM's fault detection behavior is affected. See the Sun Cluster 2.2 Release Notes for a workaround to this problem.


  5. Start the cluster by using the scadmin(1M) command.

    Run the following command on one node:


    # scadmin startcluster physical-hostname sc-cluster
    

    Then add all other nodes to the cluster by running the following command from all other nodes:


    # scadmin startnode
    

  6. Verify the PNM configuration by using the pnmstat(1M) command.


    phys-hahost1# /opt/SUNWpnm/bin/pnmstat -l
    bkggrp  r_adp   status  fo_time live_adp
    nafo0   hme0    OK      NEVER   hme0
    phys-hahost1# 

    You have now completed the initial setup of PNM.

How to Reconfigure PNM

Use this procedure to reconfigure an existing PNM configuration by adding or removing network adapters. Follow these steps to administer one node at a time, so that Sun Cluster services remain available during the procedure.

  1. Stop the Sun Cluster software on the node to be reconfigured.


    phys-hahost1# scadmin stopnode
    

  2. Add or remove the network adapters.

    Use the procedures described in "Adding and Removing Network Interfaces".

  3. Run the pnmset(1M) command to reconfigure backup groups.

    Use the pnmset(1M) command to reconfigure backup groups as described in Step 4 of the procedure "How to Set Up PNM".


    phys-hahost1# pnmset
    

  4. Restart the Sun Cluster software on the node.

    Restart the node by running the following command from the administrative workstation:


    phys-hahost1# scadmin startnode
    

  5. Repeat Step 1 through Step 4 for each node you want to reconfigure.

How to Check the Status of Backup Groups

You can use the pnmptor(1M) and pnmrtop(1M) commands to check the status of local backup groups only, and the pnmstat(1M) command to check the status of local or remote backup groups.

  1. Run the pnmptor(1M) command to find the backup group to which an adapter belongs.

    The pnmptor(1M) command maps a pseudo adapter name that you supply to a real adapter name. In this example, the system output shows that pseudo adapter name nafo0 is associated with the active adapter hme2:


    phys-hahost1# pnmptor nafo0
    hme2

  1. Run the pnmrtop(1M) command to find the active adapter associated with a given backup group.

    In this example, the system output shows that adapter hme1 belongs to backup group nafo0:


    phys-hahost1# pnmrtop hme1
    nafo0

  1. Run the pnmstat(1M) command to determine the status of a backup group.

    Use the -c option to determine the status of a backup group on the local host:


    phys-hahost1# pnmstat -c nafo0
    OK
    NEVER
    hme2

    Use the following syntax to determine the status of a backup group on a remote host:


    phys-hahost1# pnmstat -sh remotehost -c nafo1
    OK
    NEVER
    qe1


    Note -

    It is important to use the -s and -h options together. The -s option forces pnmstat(1M) to communicate over the private interconnect. If the -s option is omitted, pnmstat(1M) queries over the public interconnect. Both remotehost and the host on which you run pnmstat(1M) must be cluster members.


    Whether checking the local or remote host, the pnmstat(1M) command reports the status, history, and current active adapter. See the man page for more details.

PNM Configurable Parameters

The following table describes the PNM parameters that are user-configurable. Configure these parameters after you have installed PNM, but before you bring up the cluster, by manually editing the configuration file /opt/SUNWcluster/conf/TEMPLATE.cdb on all nodes in the cluster. You can edit the file on one node and copy the file to all other nodes, or use the Cluster Console to modify the file on all nodes simultaneously. You can display the current PNM configuration with pnmd -t. See the pnmd(1M) man page for details.

Table 6-2 PNM Configurable Parameters

pnmd.inactive_time

The time, in seconds, between fault probes. The default interval is 5 seconds. 

pnmd.ping_timeout

The time, in seconds, after which a fault probe will time out. The default timeout value is 4 seconds. 

pnmd.repeat_test

The number of times that PNM will retry a failed probe before deciding there is a problem. The default repeat quantity is 3.  

pnmd.slow_network

The latency, in seconds, between the listening phase and actively probing phase of a fault probe. The default latency period is 2 seconds. If your network is slow, causing PNM to initiate spurious takeovers, consider increasing this latency period. 

Troubleshooting PNM Errors

The following errors are those most commonly returned by PNM.


PNM rpc svc failed

This error indicates that the PNM daemon has not been started. Restart the PNM daemon with the following command. The node-id is the value returned by the /opt/SUNWcluster/bin/get_node_status command.


# /opt/SUNWpnm/bin/pnmd -s -c cluster-name -l node-id


PNM not started

This message indicates that no backup groups have been configured. Use the pnmset(1M) command to create backup groups.


No nafoXX

This message indicates that you have specified an illegal backup group name. Use the pnmrtop(1M) command to determine the backup group names associated with a given adapter. Rerun the command and supply it with a valid backup group name.


PNM configure error

This message indicates that either the PNM daemon was unable to configure an adapter, or that there is a formatting error in the configuration file, /etc/pnmconfig. Check the syslog messages and take the actions specified by Sun Cluster Manager. For more information on Sun Cluster Manager, see Chapter 2, Sun Cluster Administration Tools.


Program error

This message indicates that the PNM daemon was unable to execute a system call. Check the syslog messages and take the actions specified by Sun Cluster Manager. For more information on Sun Cluster Manager, see Chapter 2, Sun Cluster Administration Tools.

Adding and Removing Network Interfaces

The procedures in this section can be used to add or remove public network interface cards within a cluster configuration.

To add or remove a network interface to or from the control of a logical host, you must modify each logical host configured to use that interface. You change a logical host's configuration by completely removing the logical host from the cluster, then adding it again with the required changes. You can reconfigure a logical host with either the scconf(1M) or scinstall(1M) command. The examples in this section use the scconf(1M) command. Refer to "Adding and Removing Logical Hosts", for the logical host configuration steps using the scinstall(1M) command.

Adding a Network Interface

Adding a network interface requires unconfiguring and reconfiguring all logical hosts associated with the interface. Note that all data services will be inaccessible for a short period of time during the procedure.

How to Add a Network Interface

On each node that will receive a new network interface card, perform the following steps.

  1. Stop the cluster software.


    phys-hahost# scadmin stopnode
    

  2. Add the new interface card, using the instructions included with the card.

  3. Configure the new network interface on each node.

    This step is necessary only if the new interface will be part of a logical host. Skip this step if your configuration does not include logical hosts.


    phys-hahost# pnmset
    

    For Ethernet, create a new /etc/hostname.if file for each new interface on each node, and run the ifconfig(1M) command as you normally would in a non-cluster environment.


    Note -

    When you configure a set of network interfaces to be used by different logical hosts within a cluster, you must connect all interfaces in the set to the same subnet.


  4. Start the cluster software.

    If all nodes have been stopped, run the scadmin startcluster command on node 0 and then the scadmin startnode command on all other nodes. If at least one node has not had the cluster software stopped, run the scadmin startnode command on the remaining nodes.


    phys-hahost# scadmin startnode
    

    If the new interfaces are being added to already existing backup groups, the procedure is complete.

    If you modified the backup group configuration, you must bring the cluster back into normal operation and reconfigure each logical host that will be using the new set of network controllers. You will unconfigure and reconfigure each logical host, so run the scconf -p command to print out the current configuration before starting these steps. You can run the scconf -p command on any node that is an active cluster member; it does not need to be run on all cluster nodes.

    To unconfigure and reconfigure the logical host, you can use either the scconf(1M) command as shown in these examples, or the scinstall(1M) command as described in "Adding and Removing Cluster Nodes".

  5. Notify users that data services on the affected logical hosts will be unavailable for a short period.

  6. Save copies of the /etc/opt/SUNWcluster/conf/ccd.database files on each node, in case you need to restore the original configuration.

  7. Turn off the data services.


    phys-hahost# hareg -n dataservice
    

  8. Unregister the data services.


    phys-hahost# hareg -u dataservice
    

  9. Remove the logical host from the cluster.

    Run this command on any node that is an active cluster member. You do not need to run this command on all cluster nodes.


    phys-hahost# scconf clustername -L logicalhost -r
    

  10. Reconfigure the logical host to include the new interface.

    Run this command on any node that is an active cluster member. You do not need to run this command on all cluster nodes.


    phys-hahost# scconf clustername -L logicalhost -n nodelist -g dglist -i logaddrinfo
    

    The logaddrinfo field is where you define the new interface name. Refer to the listing taken from the scconf -p command output to reconstruct each logical host.

  11. Register the data services.


    phys-hahost# hareg [-s] -r dataservice
    

  12. Turn on the data services.


    phys-hahost# hareg -y dataservice
    

  13. Check access to the data services.

  14. Notify users that the data services are once again available.

This completes the process of adding a network interface.

Removing a Network Interface

Use the following procedure to remove a public network interface from a cluster.

How to Remove a Network Interface

While all nodes are participating in the cluster, perform the following steps on one node only.

  1. Identify which logical hosts must be reconfigured to exclude the network interface.

    All of these logical hosts will need to be unconfigured then reconfigured. Run the scconf -p command to print out a list of logical hosts in the current configuration; save this list for later use. You do not need to run the scconf -p command on all cluster nodes. You can run it on any node that is an active cluster member.

  2. Run the pnmset(1M) command to display the current PNM configuration.

  3. Remove the controller from a backup group, if necessary.

    If the controller to be removed is part of a backup group, remove the controller from all logical hosts, then run the pnmset(1M) command to remove the controller from the backup group.

  4. Notify users that any data services on the affected logical hosts will be unavailable for a short period.

  5. Turn off the data services.


    phys-hahost# hareg -n dataservice
    

  6. Unregister the data services.


    phys-hahost# hareg -u dataservice
    

  7. Remove the logical host from the cluster.


    Note -

    To unconfigure and reconfigure the logical host (Step 7 and Step 8), you can either run the scconf(1M) command as shown, or run the scinstall(1M) command as described in "Adding and Removing Cluster Nodes".


    You can run this command on any node that is an active cluster member. You do not need to run it on all cluster nodes.


    phys-hahost# scconf clustername -L logicalhost -r
    

  8. Reconfigure the logical host to include the new interface.

    You can run this command on any node that is an active cluster member. You do not need to run it on all cluster nodes.


    phys-hahost# scconf clustername -L logicalhost -n nodelist -g dglist -i logaddrinfo
    

    The logaddrinfo field is where you define the new interface name. Refer to the listing taken from the scconf -p command output to reconstruct each logical host.

  9. If the controller being removed was part of a backup group, rerun the pnmset(1M) command.

    Rerun the pnmset(1M) command and exclude the controller being removed.

  10. (Optional) If you are removing the network adapter from the nodes, perform the following steps on each affected node:

    1. Stop the cluster software.


      phys-hahost# scadmin stopnode
      

    2. Halt the node and remove the interface card.

    3. Boot the node.

    4. Perform the Solaris system administration tasks you would normally perform to remove a network interface (remove hostname.if file, update /etc/hosts, etc).

    5. Restart the cluster software. If all nodes were brought down, start the first node using the scadmin startcluster command. If at least one node is still running the cluster software, restart the other nodes.


      phys-hahost# scadmin startnode
      

  11. Register the data services.


    phys-hahost# hareg -r dataservice
    

  12. Turn on the data services.


    phys-hahost# hareg -y dataservice
    

  13. Check access to the data services.

  14. Notify users that the data services are once again available.

Administering the Switch Management Agent

The Switch Management Agent (SMA) is a cluster module that maintains communication channels over the cluster private interconnect. It monitors the private interconnect and invokes a failover to a backup network if it detects a failure.

Note the following limitations before beginning the procedure:

See also Appendix B in the Sun Cluster 2.2 Hardware Site Preparation, Planning, and Installation Guide.

How to Add Switches and SCI Cards

Use this procedure to add switches and SCI cards to cluster nodes. See the sm_config(1M) man page for details.

  1. Edit the sm_config template file to include the configuration changes.

    Normally, the template file is located in /opt/SUNWsma/bin/Examples.

  2. Configure the SCI SBus cards by running the sm_config(1M) command from one of the nodes.

    Rerun the command a second time to ensure that SCI node IDs and IP addresses are assigned correctly to the cluster nodes. Incorrect assignments can cause miscommunication between the nodes.

  3. Reboot the new nodes.

SCI Software Troubleshooting

If a problem occurs with the SCI software, verify that the following are true:

Also note the following problems and solutions:

To work around this problem, edit the /etc/system file and set the value of shmsys:shminfo_shmmin to less than 200. Then reboot the machine for the new values to take effect.

For more information about SCI components, see Appendix B in the Sun Cluster 2.2 Hardware Site Preparation, Planning, and Installation Guide.

How to Verify Connectivity Between Nodes

There are two ways to verify the connectivity between nodes: by running get_ci_status(1M) or by running ping(1).

  1. Run the get_ci_status(1M) command on all cluster nodes.

    Example output for get_ci_status(1M) is shown below.


    # /opt/SUNWsma/bin/get_ci_status
    sma: sci #0: sbus_slot# 1; adapter_id 8 (0x08); ip_address 1; switch_id# 0; port_id# 0; Adapter Status - UP; Link Status - UP
    sma: sci #1: sbus_slot# 2; adapter_id 12 (0x0c); ip_address 17; switch_id# 1; port_id# 0; Adapter Status - UP; Link Status - UP
    sma: Switch_id# 0
    sma: port_id# 1: host_name = interconn2; adapter_id = 72; active | operational
    sma: port_id# 2: host_name = interconn3; adapter_id = 136; active | operational
    sma: port_id# 3: host_name = interconn4; adapter_id = 200; active | operational
    sma: Switch_id# 1
    sma: port_id# 1: host_name = interconn2; adapter_id = 76; active | operational
    sma: port_id# 2: host_name = interconn3; adapter_id = 140; active | operational
    sma: port_id# 3: host_name = interconn4; adapter_id = 204; active | operational
    # 

    The first four lines indicate the status of the local node (in this case, interconn1). It is communicating with both switch_id# 0 and switch_id# 1 (Link Status - UP).


    sma: sci #0: sbus_slot# 1; adapter_id 8 (0x08); ip_address 1; switch_id# 0; port_id# 0; Adapter Status - UP; Link Status - UP
    sma: sci #1: sbus_slot# 2; adapter_id 12 (0x0c); ip_address 17; switch_id# 1; port_id# 0; Adapter Status - UP; Link Status - UP

    The rest of the output indicates the global status of the other nodes in the cluster. All the ports on the two switches are communicating with their nodes. If there is a problem with the hardware, inactive is displayed (instead of active). If there is a problem with the software, inoperational is displayed (instead of operational).


    sma: Switch_id# 0
    sma: port_id# 1: host_name = interconn2; adapter_id = 72; active | operational
    sma: port_id# 2: host_name = interconn3; adapter_id = 136; active | operational
    sma: port_id# 3: host_name = interconn4; adapter_id = 200; active | operational
    sma: Switch_id# 1
    sma: port_id# 1: host_name = interconn2; adapter_id = 76; active | operational
    sma: port_id# 2: host_name = interconn3; adapter_id = 140; active | operational
    sma: port_id# 3: host_name = interconn4; adapter_id = 204; active | operational
    #

  1. Run the ping(1) command on all the IP addresses of remote nodes.

    Example output for ping(1) is shown below.


    # ping IP-address
    

    The IP addresses are found in the /etc/sma.ip file. Be sure to run the ping(1) command for each node in the cluster.

    The ping(1) command returns an "alive" message indicating that the two ends are communicating without a problem. Otherwise, an error message is displayed.

    For example,


    # ping 204.152.65.2
    204.152.65.2 is alive

How to Verify the SCI Interface Configuration
  1. Run the ifconfig -a command to verify that all SCI interfaces are up and that the cluster nodes have the correct IP addresses.

    The last 8 bits of the IP address should match the IP field value in the /etc/sma.config file.


    # ifconfig -a
    lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
    	inet 127.0.0.1 netmask ff000000
    hme0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
    	inet 129.146.238.55 netmask ffffff00 broadcast 129.146.238.255
    	ether 8:0:20:7b:fa:0
    scid0: flags=80cl<UP,RUNNING,NOARP,PRIVATE> mtu 16321
    	inet 204.152.65.1 netmask fffffff0
    scid1: flags=80cl<UP,RUNNING,NOARP,PRIVATE> mtu 16321
    	inet 204.152.65.17 netmask fffffff0