Sun Cluster Concepts Guide for Solaris OS

Previous: Chapter 3 Key Concepts for System Administrators and Application Developers

Chapter 4 Frequently Asked Questions

This chapter includes answers to the most frequently asked questions about the Sun Cluster system. The questions are organized by topic.

High Availability FAQs

Question:

What exactly is a highly available system?

Answer:

The Sun Cluster system defines high availability (HA) as the ability of a cluster to keep an application running. The application runs even when a failure occurs that would normally make a server system unavailable.

Question:

What is the process by which the cluster provides high availability?

Answer:

Through a process known as failover, the cluster framework provides a highly available environment. Failover is a series of steps performed by the cluster to migrate data service resources from a failing node to another operational node in the cluster.

Question:

What is the difference between a failover and scalable data service?

Answer:

There are two types of highly available data services:

failover
scalable

A failover data service runs an application on only one primary node in the cluster at a time. Other nodes might run other applications, but each application runs on only a single node. If a primary node fails, the applications that are running on the failed node fail over to another node and continue running.

A scalable service spreads an application across multiple nodes to create a single, logical service. Scalable services leverage the number of nodes and processors in the entire cluster on which they run.

For each application, one node hosts the physical interface to the cluster. This node is called a Global Interface (GIF) Node. Multiple GIF nodes can exist in the cluster. Each GIF node hosts one or more logical interfaces that can be used by scalable services. These logical interfaces are called global interfaces. One GIF node hosts a global interface for all requests for a particular application and dispatches them to multiple nodes on which the application server is running. If the GIF node fails, the global interface fails over to a surviving node.

If any of the nodes on which the application is running fails, the application continues to run on the other nodes with some performance degradation. This process continues until the failed node returns to the cluster.

File Systems FAQs

Question:

Can I run one or more of the cluster nodes as highly available NFS servers with other cluster nodes as clients?

Answer:

No, do not do a loopback mount.

Question:

Can I use a cluster file system for applications that are not under Resource Group Manager control?

Answer:

Yes. However, without RGM control, the applications need to be restarted manually after the failure of the node on which they are running.

Question:

Must all cluster file systems have a mount point under the /global directory?

Answer:

No. However, placing cluster file systems under the same mount point, such as /global, enables better organization and management of these file systems.

Question:

What are the differences between using the cluster file system and exporting NFS file systems?

Answer:

Several differences exist:

The cluster file system supports global devices. NFS does not support remote access to devices.
The cluster file system has a global namespace. Only one mount command is required. With NFS, you must mount the file system on each node.
The cluster file system caches files in more cases than does NFS. For example, the cluster file system caches files when a file is being accessed from multiple nodes for read, write, file locks, async I/O.
The cluster file system is built to exploit future fast cluster interconnects that provide remote DMA and zero-copy functions.
If you change the attributes on a file (using chmod(1M), for example) in a cluster file system, the change is reflected immediately on all nodes. With an exported NFS file system, this change can take much longer.

Question:

The file system /global/.devices/node@nodeID appears on my cluster nodes. Can I use this file system to store data that I want to be highly available and global?

Answer:

These file systems store the global device namespace. These file system are not intended for general use. While they are global, they are never accessed in a global manner—each node only accesses its own global device namespace. If a node is down, other nodes cannot access this namespace for the node that is down. These file systems are not highly available. They should not be used to store data that needs to be globally accessible or highly available.

Volume Management FAQs

Question:

Do I need to mirror all disk devices?

Answer:

For a disk device to be considered highly available, it must be mirrored, or use RAID-5 hardware. All data services should use either highly available disk devices, or cluster file systems mounted on highly available disk devices. Such configurations can tolerate single disk failures.

Question:

Can I use one volume manager for the local disks (boot disk) and a different volume manager for the multihost disks?

Answer:

SPARC: This configuration is supported with the Solaris Volume Manager software managing the local disks and VERITAS Volume Manager managing the multihost disks. No other combination is supported.

x86: No, this configuration is not supported, as only Solaris Volume Manager is supported in x86 based clusters.

Data Services FAQs

Question:

Which Sun Cluster data services are available?

Answer:

The list of supported data services is included in Supported Products in Sun Cluster 3.1 4/05 Release Notes for Solaris OS.

Question:

Which application versions are supported by Sun Cluster data services?

Answer:

The list of supported application versions is included in Supported Products in Sun Cluster 3.1 4/05 Release Notes for Solaris OS.

Question:

Can I write my own data service?

Answer:

Yes. See the Chapter 11, DSDL API Functions, in Sun Cluster Data Services Developer’s Guide for Solaris OS for more information.

Question:

When creating network resources, should I specify numeric IP addresses or hostnames?

Answer:

The preferred method for specifying network resources is to use the UNIX hostname rather than the numeric IP address.

Question:

When creating network resources, what is the difference between using a logical hostname (a LogicalHostname resource) or a shared address (a SharedAddress resource)?

Answer:

Except in the case of Sun Cluster HA for NFS, wherever the documentation recommends the use of a LogicalHostname resource in a Failover mode resource group, a SharedAddress resource or LogicalHostname resource can be used interchangeably. The use of a SharedAddress resource incurs some additional overhead because the cluster networking software is configured for a SharedAddress but not for a LogicalHostname.

The advantage to using a SharedAddress resource is demonstrated when you configure both scalable and failover data services, and want clients to be able to access both services by using the same hostname. In this case, the SharedAddress resources along with the failover application resource are contained in one resource group. The scalable service resource is contained in a separate resource group and configured to use the SharedAddress resource. Both the scalable and failover services can then use the same set of hostnames/addresses that are configured in the SharedAddress resource.

Public Network FAQs

Question:

Which public network adapters does the Sun Cluster system support?

Answer:

Currently, the Sun Cluster system supports Ethernet (10/100BASE-T and 1000BASE-SX Gb) public network adapters. Because new interfaces might be supported in the future, check with your Sun sales representative for the most current information.

Question:

What is the role of the MAC address in failover?

Answer:

When a failover occurs, new Address Resolution Protocol (ARP) packets are generated and broadcast to the world. These ARP packets contain the new MAC address (of the new physical adapter to which the node failed over) and the old IP address. When another machine on the network receives one of these packets, it flushes the old MAC-IP mapping from its ARP cache and uses the new one.

Question:

Does the Sun Cluster system support setting local-mac-address?=true?

Answer:

Yes. In fact, IP Network Multipathing requires that local-mac-address? must be set to true.

You can set local-mac-address? with eeprom(1M), at the OpenBoot PROM ok prompt in a SPARC based cluster. You can also set the MAC address with the SCSI utility that you optionally run after the BIOS boots in an x86 based cluster.

Question:

How much delay can I expect when Internet Protocol (IP) Network Multipathing performs a switchover between adapters?

Answer:

The delay could be several minutes. The reason is because when an Internet Protocol (IP) Network Multipathing switchover is performed, the operation sends a gratuitous ARP. However, you cannot be sure that the router between the client and the cluster will use the gratuitous ARP. So, until the ARP cache entry for this IP address on the router times out, the entry could use the stale MAC address.

Question:

How fast are failures of a network adapter detected?

Answer:

The default failure detection time is 10 seconds. The algorithm tries to meet the failure detection time, but the actual time depends on the network load.

Cluster Member FAQs

Question:

Do all cluster members need to have the same root password?

Answer:

You are not required to have the same root password on each cluster member. However, you can simplify administration of the cluster by using the same root password on all nodes.

Question:

Is the order in which nodes are booted significant?

Answer:

In most cases, no. However, the boot order is important to prevent amnesia. For example, if node two was the owner of the quorum device and node one is down, and then you bring node two down, you must bring up node two before bringing back node one. This order prevents you from accidentally bringing up a node with outdated cluster configuration information. Refer to About Failure Fencing for details about amnesia.

Question:

Do I need to mirror local disks in a cluster node?

Answer:

Yes. Though this mirroring is not a requirement, mirroring the cluster node's disks prevents a nonmirrored disk failure from taking down the node. The downside to mirroring a cluster node's local disks is more system administration overhead.

Question:

What are the cluster member backup issues?

Answer:

You can use several backup methods for a cluster. One method is to have a node as the back up node with a tape drive or library attached. Then use the cluster file system to back up the data. Do not connect this node to the shared disks.

See Chapter 9, Backing Up and Restoring a Cluster, in Sun Cluster System Administration Guide for Solaris OS for additional information about how to backup and restore data.

Question:

When is a node healthy enough to be used as a secondary node?

Answer:

Solaris 8 and Solaris 9:

After a reboot, a node is healthy enough to be a secondary node when the node displays the login prompt.

Solaris 10:

A node is healthy enough to be a secondary node if the multi-user-server milestone is running.

# svcs -a | grep multi-user-server:default

Cluster Storage FAQs

Question:

What makes multihost storage highly available?

Answer:

Multihost storage is highly available because it can survive the loss of a single disk, because of mirroring (or because of hardware-based RAID-5 controllers). Because a multihost storage device has more than one host connection, it can also withstand the loss of a single node to which it is connected. In addition, redundant paths from each node to the attached storage provide tolerance for the failure of a host bus adapter, cable, or disk controller.

Cluster Interconnect FAQs

Question:

Which cluster interconnects does the Sun Cluster system support?

Answer:

Currently, the Sun Cluster system supports the following cluster interconnects:

Ethernet (100BASE-T Fast Ethernet and 1000BASE-SX Gb) in both SPARC based and x86 based clusters
Infiniband in both SPARC based and x86 based clusters
SCI in SPARC based clusters only

Question:

What is the difference between a “cable” and a transport “path”?

Answer:

Cluster transport cables are configured by using transport adapters and switches. Cables join adapters and switches on a component-to-component basis. The cluster topology manager uses available cables to build end-to-end transport paths between nodes. A cable does not map directly to a transport path.

Cables are statically “enabled” and “disabled” by an administrator. Cables have a “state” (enabled or disabled), but not a “status.” If a cable is disabled, it is as if it were unconfigured. Cables that are disabled cannot be used as transport paths. These cables are not probed and therefore their state is unknown. You can obtain the state of a cable by using the scconf -p command.

Transport paths are dynamically established by the cluster topology manager. The “status” of a transport path is determined by the topology manager. A path can have a status of “online” or “offline.” You can obtain the status of a transport path by using the scstat(1M) command.

Consider the following example of a two-node cluster with four cables.

node1:adapter0      to switch1, port0
node1:adapter1      to switch2, port0
node2:adapter0      to switch1, port1
node2:adapter1      to switch2, port1

Two possible transport paths can be formed from these four cables.

node1:adapter0    to node2:adapter0
node2:adapter1    to node2:adapter1

Client Systems FAQs

Question:

Do I need to consider any special client needs or restrictions for use with a cluster?

Answer:

Client systems connect to the cluster as they would to any other server. In some instances, depending on the data service application, you might need to install client-side software or perform other configuration changes so that the client can connect to the data service application. See Chapter 1, Planning for Sun Cluster Data Services, in Sun Cluster Data Services Planning and Administration Guide for Solaris OS for more information about client-side configuration requirements.

Administrative Console FAQs

Question:

Does the Sun Cluster system require an administrative console?

Answer:

Yes.

Question:

Does the administrative console have to be dedicated to the cluster, or can it be used for other tasks?

Answer:

The Sun Cluster system does not require a dedicated administrative console, but using one provides these benefits:

Enables centralized cluster management by grouping console and management tools on the same machine
Provides potentially quicker problem resolution by your hardware service provider

Question:

Does the administrative console need to be located “close” to the cluster, for example, in the same room?

Answer:

Check with your hardware service provider. The provider might require that the console be located in close proximity to the cluster. No technical reason exists for the console to be located in the same room.

Question:

Can an administrative console serve more than one cluster, if any distance requirements are also first met?

Answer:

Yes. You can control multiple clusters from a single administrative console. You can also share a single terminal concentrator between clusters.

Terminal Concentrator and System Service Processor FAQs

Question:

Does the Sun Cluster system require a terminal concentrator?

Answer:

No software releases starting with Sun Cluster 3.0 require a terminal concentrator to run. Unlike the Sun Cluster 2.2 product, which required a terminal concentrator for failure fencing, later products do not depend on the terminal concentrator.

Question:

I see that most Sun Cluster servers use a terminal concentrator, but the Sun Enterprise E1000 server does not. Why not?

Answer:

The terminal concentrator is effectively a serial-to-Ethernet converter for most servers. The terminal concentrator's console port is a serial port. The Sun Enterprise E1000 server doesn't have a serial console. The System Service Processor (SSP) is the console, either through an Ethernet or jtag port. For the Sun Enterprise E1000 server, you always use the SSP for consoles.

Question:

What are the benefits of using a terminal concentrator?

Answer:

Using a terminal concentrator provides console-level access to each node from a remote workstation anywhere on the network. This access is provided even when the node is at the OpenBoot PROM (OBP) on a SPARC based node or a boot subsystem on an x86 based node.

Question:

If I use a terminal concentrator that Sun does not support, what do I need to know to qualify the one that I want to use?

Answer:

The main difference between the terminal concentrator that Sun supports and other console devices is that the Sun terminal concentrator has special firmware. This firmware prevents the terminal concentrator from sending a break to the console when it boots. If you have a console device that can send a break, or a signal that might be interpreted as a break to the console, the break shuts down the node.

Question:

Can I free a locked port on the terminal concentrator that Sun supports without rebooting it?

Answer:

Yes. Note the port number that needs to be reset and type the following commands:

telnet tc
Enter Annex port name or number: cli
annex: su -
annex# admin
admin : reset port_number
admin : quit
annex# hangup
#

Refer to the following manuals for more information about how to configure and administer the terminal concentrator that Sun supports.

Question:

What if the terminal concentrator itself fails? Must I have another one standing by?

Answer:

No. You do not lose any cluster availability if the terminal concentrator fails. You do lose the ability to connect to the node consoles until the concentrator is back in service.

Question:

If I do use a terminal concentrator, what about security?

Answer:

Generally, the terminal concentrator is attached to a small network that system administrators use, not a network that is used for other client access. You can control security by limiting access to that particular network.

Question:

SPARC: How do I use dynamic reconfiguration with a tape or disk drive?

Answer:

Perform the following steps:

Determine whether the disk or tape drive is part of an active device group. If the drive is not part of an active device group, you can perform the DR remove operation on it.
If the DR remove-board operation would affect an active disk or tape drive, the system rejects the operation and identifies the drives that would be affected by the operation. If the drive is part of an active device group, go to SPARC: DR Clustering Considerations for Disk and Tape Drives.
Determine whether the drive is a component of the primary node or the secondary node. If the drive is a component of the secondary node, you can perform the DR remove operation on it.
If the drive is a component of the primary node, you must switch the primary and secondary nodes before performing the DR remove operation on the device.

Caution –

If the current primary node fails while you are performing the DR operation on a secondary node, cluster availability is impacted. The primary node has no place to fail over until a new secondary node is provided.

Previous: Chapter 3 Key Concepts for System Administrators and Application Developers