Sun Cluster 2.2 System Administration Guide

6.1 Public Network Management Overview

The PNM feature of Sun Cluster uses fault monitoring and failover to prevent loss of node availability due to single network adapter or cable failure. PNM fault monitoring runs in local-node or cluster-wide mode to check the status of nodes, network adapters, cables, and network traffic. PNM failover uses sets of network adapters called backup groups to provide redundant connections between a cluster node and the public network. The fault monitoring and failover capabilities work together to ensure availability of services.

If your configuration includes HA data services, you must enable PNM; HA data services are dependent on PNM fault monitoring. When an HA data service experiences availability problems, it queries PNM through the cluster framework to see whether the problem is related to the public network connections. If it is, the data services wait until PNM has resolved the problem. If the problem is not with the public network, the data services invoke their own failover mechanism.

The PNM package, SUNWpnm, is installed during initial Sun Cluster software installation. The commands associated with PNM include:

See the associated man pages for details.

6.1.1 PNM Fault Monitoring and Failover

PNM monitors the state of the public network and the network adapters associated with each node in the cluster, and reports dubious or errored states. When PNM detects lack of response from a primary adapter (the adapter currently carrying network traffic to and from the node) it fails over the network service to another working adapter in the adapter backup group for that node. PNM then performs some checks to determine whether the fault is with the adapter or the network.

If the adapter is faulty, PNM sends error messages to syslog(3), which are in turn detected by the Cluster Manager and displayed to the user through a GUI. After a failed adapter is fixed, it is automatically tested and reinstated in the backup group at the next cluster reconfiguration. If the entire adapter backup group is down, then the Sun Cluster framework invokes a failover of the node to retain availability. If an error occurs outside of PNM's control, such as the failure of a whole subnet, then a normal failover and cluster reconfiguration will occur.

PNM monitoring runs in two modes, cluster-aware and cluster-unaware. PNM runs in cluster-aware mode when the cluster is operational. It uses the Cluster Configuration Database (CCD) to monitor status of the network. For more information on the CCD, see the overview chapter in the Sun Cluster 2.2 Software Installation Guide. PNM uses the CCD to distinguish between public network failure and local adapter failure. See "C.3 Sun Cluster Fault Probes" for more information on logical host failover initiated by public network failure.

PNM runs in cluster-unaware mode when the cluster is not operational. In this mode, PNM is unable to use the CCD and therefore cannot distinguish between adapter and network failure. In cluster-unaware mode, PNM simply detects a problem with the local network connection.

You can check the status of the public network and adapters with the PNM monitoring command, pnmstat(1M). See the man page for details.

6.1.2 Backup Groups

Backup groups are sets of network adapters that provide redundant connections between a single cluster node and the public network. You configure backup groups during initial installation by using the scinstall(1M) command, or after initial installation by using the pnmset(1M) command. PNM allows you to configure as many redundant adapters as you want on a single host.

To configure backup groups initially, you run pnmset(1M) as root before the cluster is started. The command runs as an interactive script to configure and verify backup groups. It also selects one adapter to be used as the primary, or active, adapter. The pnmset(1M) command names backup groups nafon, where n is an integer you assign. The command stores backup group information in the /etc/pnmconfig file.

To change an existing PNM configuration on a cluster node, you must remove the node from the cluster and then run the pnmset(1M) command. PNM monitors and incorporates changes in backup group membership dynamically.


Note -

The /etc/pnmconfig file is not removed even if the SUNWpnm package is removed, for example, during a software upgrade. That is, the backup group membership information is preserved during software upgrades and you are not required to run the pnmset(1M) utility again, unless you want to modify backup group membership.


6.1.3 Updates to nsswitch.conf

When configuring PNM with a backup network adapter, the /etc/nsswitch.conf file should have one of the following entries for the netmasks entry.

Table 6-1 Name Service Entry Choices for the /etc/nsswitch.conf File

Name Service Used 

netmasks Entry

None 

netmasks: files

nis

netmasks: files [NOTFOUND=return] nis

nisplus

netmasks: files [NOTFOUND=return] nisplus

The above settings will ensure that the netmasks setting will not be looked up in an NIS/NIS+ lookup table. This is important if the adapter that has failed is the primary public network and thus would not be available to provide the requested information. If the netmasks entry is not set in the prescribed manner, failover to the backup adapter will not succeed.


Caution - Caution -

The preceding changes have the effect of using the local files (/etc/netmasks and /etc/groups) for lookup tables. The NIS/NIS+ services will only be used when the local files are unavailable. Therefore, these files must be kept up-to-date with their NIS/NIS+ versions. Failure to update them makes the expected values in these files inaccessible on the cluster nodes.