Sun Cluster Concepts Guide for Solaris OS

Previous: Chapter 3 Key Concepts for System Administrators and Application Developers

Chapter 4 Frequently Asked Questions

This chapter includes answers to the most frequently asked questions about the Sun Cluster product.

The questions are organized by topic as follows:

High Availability FAQs

Question:

What exactly is a highly available system?

Answer:

The Sun Cluster software defines high availability (HA) as the ability of a cluster to keep an application running. The application runs even when a failure occurs that would normally make a host system unavailable.

Question:

What is the process by which the cluster provides high availability?

Answer:

Through a process known as failover, the cluster framework provides a highly available environment. Failover is a series of steps that are performed by the cluster to migrate data service resources from a failing node to another operational node in the cluster.

Question:

What is the difference between a failover and scalable data service?

Answer:

There are two types of highly available data services:

Failover
Scalable

A failover data service runs an application on only one primary node in the cluster at a time. Other nodes might run other applications, but each application runs on only a single node. If a primary node fails, applications that are running on the failed node fail over to another node. They continue running.

A scalable data service spreads an application across multiple nodes to create a single, logical service. Scalable services leverage the number of nodes and processors in the entire cluster on which they run.

For each application, one node hosts the physical interface to the cluster. This node is called a Global Interface (GIF) node. Multiple GIF nodes can exist in the cluster. Each GIF node hosts one or more logical interfaces that can be used by scalable services. These logical interfaces are called global interfaces. One GIF node hosts a global interface for all requests for a particular application and dispatches them to multiple nodes on which the application server is running. If the GIF node fails, the global interface fails over to a surviving node.

If any node on which the application is running fails, the application continues to run on other nodes with some performance degradation. This process continues until the failed node returns to the cluster.

File Systems FAQs

Question:

Can I run one or more of the Solaris hosts in the cluster as highly available NFS servers with other Solaris hosts as clients?

Answer:

No, do not do a loopback mount.

Question:

Can I use a cluster file system for applications that are not under Resource Group Manager control?

Answer:

Yes. However, without RGM control, the applications need to be restarted manually after the failure of the node on which they are running.

Question:

Must all cluster file systems have a mount point under the /global directory?

Answer:

No. However, placing cluster file systems under the same mount point, such as /global, enables better organization and management of these file systems.

Question:

What are the differences between using the cluster file system and exporting NFS file systems?

Answer:

Several differences exist:

The cluster file system supports global devices. NFS does not support remote access to devices.
The cluster file system has a global namespace. Only one mount command is required. With NFS, you must mount the file system on each host.
The cluster file system caches files in more cases than does NFS. For example, the cluster file system caches files when a file is being accessed from multiple nodes for read, write, file locks, asynchronous I/O.
The cluster file system is built to exploit future fast cluster interconnects that provide remote DMA and zero-copy functions.
If you change the attributes on a file (using chmod, for example) in a cluster file system, the change is reflected immediately on all nodes. With an exported NFS file system, this change can take much longer.

Question:

The file system /global/.devices/node@nodeID appears on my cluster nodes. Can I use this file system to store data that I want to be highly available and global?

Answer:

These file systems store the global device namespace. These file systems are not intended for general use. While they are global, these file systems are never accessed in a global manner. Each node only accesses its own global device namespace. If a node is down, other nodes cannot access this namespace for the node that is down. These file systems are not highly available. These file systems should not be used to store data that needs to be globally accessible or highly available.

Volume Management FAQs

Question:

Do I need to mirror all disk devices?

Answer:

For a disk device to be considered highly available, it must be mirrored, or use RAID-5 hardware. All data services should use either highly available disk devices, or cluster file systems mounted on highly available disk devices. Such configurations can tolerate single disk failures.

Question:

Can I use one volume manager for the local disks (boot disk) and a different volume manager for the multihost disks?

Answer:

This configuration is supported with the Solaris Volume Manager software managing the local disks and Veritas Volume Manager managing the multihost disks. No other combination is supported.

Data Services FAQs

Question:

Which Sun Cluster data services are available?

Answer:

The list of supported data services is included in the Sun Cluster Release Notes.

Question:

Which application versions are supported by Sun Cluster data services?

Answer:

The list of supported application versions is included in the Sun Cluster Release Notes.

Question:

Can I write my own data service?

Answer:

Yes. See the Chapter 11, DSDL API Functions, in Sun Cluster Data Services Developer’s Guide for Solaris OS for more information.

Question:

When creating network resources, should I specify numeric IP addresses or host names?

Answer:

The preferred method for specifying network resources is to use the UNIX host name rather than the numeric IP address.

Question:

When creating network resources, what is the difference between using a logical host name (a LogicalHostname resource) or a shared address (a SharedAddress resource)?

Answer:

Except in the case of Sun Cluster HA for NFS, wherever the documentation recommends the use of a LogicalHostname resource in a Failover mode resource group, a SharedAddress resource or LogicalHostname resource can be used interchangeably. The use of a SharedAddress resource incurs some additional overhead because the cluster networking software is configured for a SharedAddress but not for a LogicalHostname.

The advantage to using a SharedAddress resource is demonstrated when you configure both scalable and failover data services, and want clients to be able to access both services by using the same host name. In this case, the SharedAddress resources along with the failover application resource are contained in one resource group. The scalable service resource is contained in a separate resource group and configured to use the SharedAddress resource. Both the scalable and failover services can then use the same set of host names and addresses that are configured in the SharedAddress resource.

Public Network FAQs

Question:

Which public network adapters does the Sun Cluster software support?

Answer:

Currently, the Sun Cluster software supports Ethernet (10/100BASE-T and 1000BASE-SX Gb) public network adapters. Because new interfaces might be supported in the future, check with your Sun sales representative for the most current information.

Question:

What is the role of the MAC address in failover?

Answer:

When a failover occurs, new Address Resolution Protocol (ARP) packets are generated and broadcast to the world. These ARP packets contain the new MAC address (of the new physical adapter to which the host failed over) and the old IP address. When another machine on the network receives one of these packets, it flushes the old MAC-IP mapping from its ARP cache and uses the new one.

Question:

Does the Sun Cluster software support setting local-mac-address?=true?

Answer:

Yes. In fact, IP Network Multipathing requires that local-mac-address? must be set to true.

You can set local-mac-address with the eeprom command, at the OpenBoot PROM ok prompt in a SPARC based cluster. See the eeprom(1M) man page. You can also set the MAC address with the SCSI utility that you optionally run after the BIOS boots in an x86 based cluster.

Question:

How much delay can I expect when IP network multipathing performs a switchover between adapters?

Answer:

The delay could be several minutes. The reason is because when an IP network multipathing switchover is performed, the operation sends a gratuitous ARP broadcast. However, you cannot be sure that the router between the client and the cluster uses the gratuitous ARP. So, until the ARP cache entry for this IP address on the router times out, the entry can use the stale MAC address.

Question:

How fast are failures of a network adapter detected?

Answer:

The default failure detection time is 10 seconds. The algorithm tries to meet the failure detection time, but the actual time depends on the network load.

Cluster Member FAQs

Question:

Do all cluster members need to have the same root password?

Answer:

You are not required to have the same root password on each cluster member. However, you can simplify administration of the cluster by using the same root password on all nodes.

Question:

Is the order in which nodes are booted significant?

Answer:

In most cases, no. However, the boot order is important to prevent amnesia. For example, if node two was the owner of the quorum device and node one is down, and then you bring node two down, you must bring up node two before bringing back node one. This order prevents you from accidentally bringing up a node with outdated cluster configuration information.

Question:

Do I need to mirror local disks in a cluster node?

Answer:

Yes. Though this mirroring is not a requirement, mirroring the cluster node's disks prevents a nonmirrored disk failure from taking down the node. The downside to mirroring a cluster node's local disks is more system administration overhead.

Question:

What are the cluster member backup issues?

Answer:

You can use several backup methods for a cluster. One method is to have a host as the back up node with a tape drive or library attached. Then use the cluster file system to back up the data. Do not connect this host to the shared disks.

See Chapter 12, Backing Up and Restoring a Cluster, in Sun Cluster System Administration Guide for Solaris OS for additional information about how to backup and restore data.

Question:

When is a node healthy enough to be used as a secondary node?

Answer:

Solaris 9 OS:

After a reboot, a node is healthy enough to be a secondary node when the node displays the login prompt.

Solaris 10 OS:

A node is healthy enough to be a secondary node if the multi-user-server milestone is running.

# svcs -a | grep multi-user-server:default

Cluster Storage FAQs

Question:

What makes multihost storage highly available?

Answer:

Multihost storage is highly available because it can survive the loss of a single disk, because of mirroring (or because of hardware-based RAID-5 controllers). Because a multihost storage device has more than one host connection, it can also withstand the loss of a single Solaris host to which it is connected. In addition, redundant paths from each host to the attached storage provide tolerance for the failure of a host bus adapter, cable, or disk controller.

Cluster Interconnect FAQs

Question:

Which cluster interconnects does the Sun Cluster software support?

Answer:

Currently, the Sun Cluster software supports the following cluster interconnects:

Ethernet (100BASE-T Fast Ethernet and 1000BASE-SX Gb) in both SPARC based and x86 based clusters
Infiniband in both SPARC based and x86 based clusters
SCI in SPARC based clusters only

Question:

What is the difference between a “cable” and a transport “path”?

Answer:

Cluster transport cables are configured by using transport adapters and switches. Cables join adapters and switches on a component-to-component basis. The cluster topology manager uses available cables to build end-to-end transport paths between hosts. A cable does not map directly to a transport path.

Cables are statically “enabled” and “disabled” by an administrator. Cables have a “state” (enabled or disabled), but not a “status.” If a cable is disabled, it is as if it were unconfigured. Cables that are disabled cannot be used as transport paths. These cables are not probed and therefore their state is unknown. You can obtain the state of a cable by using the cluster status command.

Transport paths are dynamically established by the cluster topology manager. The “status” of a transport path is determined by the topology manager. A path can have a status of “online” or “offline.” You can obtain the status of a transport path by using the clinterconnect status command. See the clinterconnect(1CL) man page.

Consider the following example of a two-host cluster with four cables.

node1:adapter0      to switch1, port0
node1:adapter1      to switch2, port0
node2:adapter0      to switch1, port1
node2:adapter1      to switch2, port1

Two possible transport paths can be formed from these four cables.

node1:adapter0      to node2:adapter0
node2:adapter1      to node2:adapter1

Client Systems FAQs

Question:

Do I need to consider any special client needs or restrictions for use with a cluster?

Answer:

Client systems connect to the cluster as they would to any other server. In some instances, depending on the data service application, you might need to install client-side software or perform other configuration changes so that the client can connect to the data service application. See Chapter 1, Planning for Sun Cluster Data Services, in Sun Cluster Data Services Planning and Administration Guide for Solaris OS for more information about client-side configuration requirements.

Administrative Console FAQs

Question:

Does the Sun Cluster software require an administrative console?

Answer:

Yes.

Question:

Does the administrative console have to be dedicated to the cluster, or can it be used for other tasks?

Answer:

The Sun Cluster software does not require a dedicated administrative console, but using one provides these benefits:

Enables centralized cluster management by grouping console and management tools on the same machine
Provides potentially quicker problem resolution by your hardware service provider

Question:

Does the administrative console need to be located “close” to the cluster, for example, in the same room?

Answer:

Check with your hardware service provider. The provider might require that the console be located in close proximity to the cluster. No technical reason exists for the console to be located in the same room.

Question:

Can an administrative console serve more than one cluster, if any distance requirements are also first met?

Answer:

Yes. You can control multiple clusters from a single administrative console. You can also share a single terminal concentrator between clusters.

Terminal Concentrator and System Service Processor FAQs

Question:

Does the Sun Cluster software require a terminal concentrator?

Answer:

Starting with Sun Cluster 3.0, Sun Cluster software does not require a terminal concentrator. Unlike Sun Cluster 2.2, Sun Cluster 3.0, Sun Cluster 3.1, and Sun Cluster 3.2 do not require a terminal concentrator. Sun Cluster 2.2 required a terminal concentrator for fencing.

Question:

I see that most Sun Cluster servers use a terminal concentrator, but the Sun Enterprise E1000 server does not. Why not?

Answer:

The terminal concentrator is effectively a serial-to-Ethernet converter for most servers. The terminal concentrator's console port is a serial port. The Sun Enterprise E1000 server doesn't have a serial console. The System Service Processor (SSP) is the console, either through an Ethernet or jtag port. For the Sun Enterprise E1000 server, you always use the SSP for consoles.

Question:

What are the benefits of using a terminal concentrator?

Answer:

Using a terminal concentrator provides console-level access to each Solaris host from a remote machine anywhere on the network. This access is provided even when the host is at the OpenBoot PROM (OBP) on a SPARC based host or a boot subsystem on an x86 based host.

Question:

If I use a terminal concentrator that Sun does not support, what do I need to know to qualify the one that I want to use?

Answer:

The main difference between the terminal concentrator that Sun supports and other console devices is that the Sun terminal concentrator has special firmware. This firmware prevents the terminal concentrator from sending a break to the console when it boots. If you have a console device that can send a break, or a signal that might be interpreted as a break to the console, the break shuts down the host.

Question:

Can I free a locked port on the terminal concentrator that Sun supports without rebooting it?

Answer:

Yes. Note the port number that needs to be reset and type the following commands:

telnet tc
Enter Annex port name or number: cli
annex: su -
annex# admin
admin : reset port-number
admin : quit
annex# hangup
#

Refer to the following manuals for more information about how to configure and administer the terminal concentrator that Sun supports.

Question:

What if the terminal concentrator itself fails? Must I have another one standing by?

Answer:

No. You do not lose any cluster availability if the terminal concentrator fails. You do lose the ability to connect to the host consoles until the concentrator is back in service.

Question:

If I do use a terminal concentrator, what about security?

Answer:

Generally, the terminal concentrator is attached to a small network that system administrators use, not a network that is used for other client access. You can control security by limiting access to that particular network.

Question:

SPARC: How do I use dynamic reconfiguration with a tape or disk drive?

Answer:

Perform the following steps:

Determine whether the disk or tape drive is part of an active device group. If the drive is not part of an active device group, you can perform the DR remove operation on it.
If the DR remove-board operation would affect an active disk or tape drive, the system rejects the operation and identifies the drives that would be affected by the operation. If the drive is part of an active device group, go to SPARC: DR Clustering Considerations for Disk and Tape Drives.
Determine whether the drive is a component of the primary node or the secondary node. If the drive is a component of the secondary node, you can perform the DR remove operation on it.
If the drive is a component of the primary node, you must switch the primary and secondary nodes before performing the DR remove operation on the device.

Caution –

If the current primary node fails while you are performing the DR operation on a secondary node, cluster availability is impacted. The primary node has no place to fail over until a new secondary node is provided.

Previous: Chapter 3 Key Concepts for System Administrators and Application Developers