This chapter includes answers to the most frequently asked questions about the SunPlex system. The questions are organized by topic.
What exactly is a highly available system?
The SunPlex system defines high availability (HA) as the ability of a cluster to keep an application up and running, even though a failure has occurred that would normally make a server system unavailable.
What is the process by which the cluster provides high availability?
Through a process known as failover, the cluster framework provides a highly available environment. Failover is a series of steps performed by the cluster to migrate data service resources from a failing node to another operational node in the cluster.
What is the difference between a failover and scalable data service?
There are two types of highly available data services, failover and scalable.
A failover data service runs an application on only one primary node in the cluster at a time. Other nodes might run other applications, but each application runs on only a single node. If a primary node fails, the applications running on the failed node fail over to another node and continue running.
A scalable service spreads an application across multiple nodes to create a single, logical service. Scalable services leverage the number of nodes and processors in the entire cluster on which they run.
For each application, one node hosts the physical interface to the cluster. This node is called a Global Interface (GIF) Node. There can be multiple GIF nodes in the cluster. Each GIF node hosts one or more logical interfaces that can be used by scalable services. These logical interfaces are called global interfaces. One GIF node hosts a global interface for all requests for a particular application and dispatches them to multiple nodes on which the application server is running. If the GIF node fails, the global interface fails over to a surviving node.
If any of the nodes on which the application is running fails, the application continues to run on the other nodes with some performance degradation until the failed node returns to the cluster.
Can I run one or more of the cluster nodes as highly available NFS server(s) with other cluster nodes as clients?
No, do not do a loopback mount.
Can I use a cluster file system for applications that are not under Resource Group Manager control?
Yes. However, without RGM control, the applications need to be restarted manually after the failure of the node on which they are running.
Must all cluster file systems have a mount point under the /global directory?
No. However, placing cluster file systems under the same mount point, such as /global, enables better organization and management of these file systems.
What are the differences between using the cluster file system and exporting NFS file systems?
There are several differences:
The cluster file system supports global devices. NFS does not support remote access to devices.
The cluster file system has a global namespace. Only one mount command is required. With NFS, you must mount the file system on each node.
The cluster file system caches files in more cases than does NFS. For example, when a file is being accessed from multiple nodes for read, write, file locks, async I/O.
The cluster file system is built to exploit future fast cluster interconnects that provide remote DMA and zero-copy functions.
If you change the attributes on a file (using chmod(1M), for example) in a cluster file system, the change is reflected immediately on all nodes. With an exported NFS file system, this can take much longer.
The file system /global/.devices/node@<nodeID> appears on my cluster nodes. Can I use this file system to store data that I want to be highly available and global?
These file systems store the global device namespace. They are not intended for general use. While they are global, they are never accessed in a global manner--each node only accesses its own global device namespace. If a node is down, other nodes cannot access this namespace for the node that is down. These file systems are not highly available. They should not be used to store data that needs to be globally accessible or highly available.
Do I need to mirror all disk devices?
For a disk device to be considered highly available, it must be mirrored, or use RAID-5 hardware. All data services should use either highly available disk devices, or cluster file systems mounted on highly available disk devices. Such configurations can tolerate single disk failures.
Can I use one volume manager for the local disks (boot disk) and a different volume manager for the multihost disks?
SPARC: This configuration is supported with the Solaris Volume Manager software managing the local disks and VERITAS Volume Manager managing the multihost disks. No other combination is supported.
x86: No, this configuration is not supported, as only Solaris Volume Manager is supported in x86 based clusters.
What SunPlex data services are available?
The list of supported data services is included in “Supported Products” in Sun Cluster 3.1 9/04 Release Notes for Solaris OS .
What application versions are supported by SunPlex data services?
The list of supported application versions is included in “Supported Products” in Sun Cluster 3.1 9/04 Release Notes for Solaris OS.
Can I write my own data service?
Yes. See the “Data Service Development Library Reference” in Sun Cluster Data Services Developer's Guide for Solaris OS for more information.
When creating network resources, should I specify numeric IP addresses or hostnames?
The preferred method for specifying network resources is to use the UNIX hostname rather than the numeric IP address.
When creating network resources, what is the difference between using a logical hostname (a LogicalHostname resource) or a shared address (a SharedAddress resource)?
Except in the case of Sun Cluster HA for NFS, wherever the documentation calls for the use of a LogicalHostname resource in a Failover mode resource group, a SharedAddress resource or LogicalHostname resource may be used interchangeably. The use of a SharedAddress resource incurs some additional overhead because the cluster networking software is configured for a SharedAddress but not for a LogicalHostname.
The advantage to using a SharedAddress is the case where you are configuring both scalable and failover data services, and want clients to be able to access both services using the same hostname. In this case, the SharedAddress resource(s) along with the failover application resource are contained in one resource group, while the scalable service resource is contained in a separate resource group and configured to use the SharedAddress. Both the scalable and failover services may then use the same set of hostnames/addresses which are configured in the SharedAddress resource.
What public network adapters does the SunPlex system support?
Currently, the SunPlex system supports Ethernet (10/100BASE-T and 1000BASE-SX Gb) public network adapters. Because new interfaces might be supported in the future, check with your Sun sales representative for the most current information.
What is the role of the MAC address in failover?
When a failover occurs, new Address Resolution Protocol (ARP) packets are generated and broadcast to the world. These ARP packets contain the new MAC address (of the new physical adapter to which the node failed over) and the old IP address. When another machine on the network receives one of these packets, it flushes the old MAC-IP mapping from its ARP cache and uses the new one.
Does the SunPlex system support setting local-mac-address?=true?
Yes. In fact, IP Network Multipathing requires that local-mac-address? must be set to true.
You can set local-mac-address? with eeprom(1M), at the OpenBoot PROM ok prompt in a SPARC based cluster, or with the SCSI utility that you optionally run after the BIOS boots in an x86 based cluster.
How much delay can I expect when IP Network Multipathing performs a switchover between adapters?
The delay could be several minutes. This is because when a IP Network Multipathing switchover is done, it involves sending out a gratuitous ARP. However, there is no guarantee that the router between the client and the cluster will use the gratuitous ARP. So, until the ARP cache entry for this IP address on the router times out, it is possible that it could use the stale MAC address.
How fast are failures of a network adapter detected?
The default failure detection time is 10 seconds. The algorithm tries to meet the failure detection time, but the actual time depends on the network load.
Do all cluster members need to have the same root password?
You are not required to have the same root password on each cluster member. However, you can simplify administration of the cluster by using the same root password on all nodes.
Is the order in which nodes are booted significant?
In most cases, no. However, the boot order is important to prevent amnesia (refer to About Failure Fencing for details on amnesia). For example, if node two was the owner of the quorum device and node one is down, and then you bring node two down, you must bring up node two before bringing back node one. This prevents you from accidentally bringing up a node with out of date cluster configuration information.
Do I need to mirror local disks in a cluster node?
Yes. Though this mirroring is not a requirement, mirroring the cluster node's disks precludes against a non-mirrored disk failure taking down the node. The downside to mirroring a cluster node's local disks is more system administration overhead.
What are the cluster member backup issues?
You can use several backup methods for a cluster. One method is to have a node as the backup node with a tape drive/library attached. Then use the cluster file system to back up the data. Do not connect this node to the shared disks.
See the “Backing Up and Restoring a Cluster” in Sun Cluster System Administration Guide for Solaris OS for additional information about how to backup and restore data.
When is a node healthy enough to be used as a secondary node?
After a reboot, a node is healthy enough to be a secondary node when the node displays the login prompt.
What makes multihost storage highly available?
Multihost storage is highly available because it can survive the loss of a single disk, due to mirroring (or due to hardware-based RAID-5 controllers). Because a multihost storage device has more than one host connection, it can also withstand the loss of a single node to which it is connected. In addition, redundant paths from each node to the attached storage provide tolerance for the failure of a host bus adapter, cable, or disk controller.
What cluster interconnects does the SunPlex system support?
Currently, the SunPlex system supports Ethernet (100BASE-T Fast Ethernet and 1000BASE-SX Gb) cluster interconnects in both SPARC based and x86 based clusters. The SunPlex system supports the SCI network interface cluster interconnect in SPARC based clusters only.
What is the difference between a “cable” and a transport “path?”
Cluster transport cables are configured using transport adapters and switches. Cables join adapters and switches on a component-to-component basis. The cluster topology manager uses available cables to build end-to-end transport paths between nodes. A cable does not map directly to a transport path.
Cables are statically “enabled” and “disabled” by an administrator. Cables have a “state,” (enabled or disabled) but not a “status.” If a cable is disabled, it is as if it were unconfigured. Cables that are disabled cannot be used as transport paths. They are not probed and therefore, it is not possible to know their status. The state of a cable can be viewed using scconf -p.
Transport paths are dynamically established by the cluster topology manager. The “status” of a transport path is determined by the topology manager. A path can have a status of “online” or “offline.” The status of a transport path can be viewed using scstat(1M).
Consider the following example of a two-node cluster with four cables.
node1:adapter0 to switch1, port0 node1:adapter1 to switch2, port0 node2:adapter0 to switch1, port1 node2:adapter1 to switch2, port1 |
There are two possible transport paths that can be formed from these four cables.
node1:adapter0 to node2:adapter0 node2:adapter1 to node2:adapter1 |
Do I need to consider any special client needs or restrictions for use with a cluster?
Client systems connect to the cluster as they would any other server. In some instances, depending on the data service application, you might need to install client-side software or perform other configuration changes so that the client can connect to the data service application. See individual chapters in Sun Cluster Data Services Planning and Administration Guide for more information on client-side configuration requirements.
Does the SunPlex system require an administrative console?
Yes.
Does the administrative console have to be dedicated to the cluster, or can it be used for other tasks?
The SunPlex system does not require a dedicated administrative console, but using one provides these benefits:
Enables centralized cluster management by grouping console and management tools on the same machine
Provides potentially quicker problem resolution by your hardware service provider
Does the administrative console need to be located “close” to the cluster itself, for example, in the same room?
Check with your hardware service provider. The provider might require that the console be located in close proximity to the cluster itself. No technical reason exists for the console to be located in the same room.
Can an administrative console serve more than one cluster, as long as any distance requirements are also first met?
Yes. You can control multiple clusters from a single administrative console. You can also share a single terminal concentrator between clusters.
Does the SunPlex system require a terminal concentrator?
All software releases starting with Sun Cluster 3.0 do not require a terminal concentrator to run. Unlike the Sun Cluster 2.2 product, which required a terminal concentrator for failure fencing, later products do not depend on the terminal concentrator.
I see that most SunPlex servers use a terminal concentrator, but the Sun Enterprise E10000 server does not. Why is that?
The terminal concentrator is effectively a serial-to-Ethernet converter for most servers. Its console port is a serial port. The Sun Enterprise E10000 server doesn't have a serial console. The System Service Processor (SSP) is the console, either through an Ethernet or jtag port. For the Sun Enterprise E10000 server, you always use the SSP for consoles.
What are the benefits of using a terminal concentrator?
Using a terminal concentrator provides console-level access to each node from a remote workstation anywhere on the network, including when the node is at the OpenBoot PROM (OBP) on a SPARC based node or a boot subsystem on an x86 based node.
If I use a terminal concentrator not supported by Sun, what do I need to know to qualify the one that I want to use?
The main difference between the terminal concentrator supported by Sun and other console devices is that the Sun terminal concentrator has special firmware that prevents the terminal concentrator from sending a break to the console when it boots. Note that if you have a console device that can send a break, or a signal that might be interpreted as a break to the console, it shuts down the node.
Can I free a locked port on the terminal concentrator supported by Sun without rebooting it?
Yes. Note the port number that needs to be reset and type the following commands:
telnet tc Enter Annex port name or number: cli annex: su - annex# admin admin : reset port_number admin : quit annex# hangup # |
Refer to the following manuals for more information about how to configure and administer the terminal concentrator supported by Sun.
What if the terminal concentrator itself fails? Must I have another one standing by?
No. You do not lose any cluster availability if the terminal concentrator fails. You do lose the ability to connect to the node consoles until the concentrator is back in service.
If I do use a terminal concentrator, what about security?
Generally, the terminal concentrator is attached to a small network used by system administrators, not a network that is used for other client access. You can control security by limiting access to that particular network.
SPARC: How do I use dynamic reconfiguration with a tape or disk drive?
Determine whether the disk or tape drive is part of an active device group. If the drive is not part of an active device group, you can perform the DR remove operation on it.
If the DR remove-board operation would affect an active disk or tape drive, the system rejects the operation and identifies the drives that would be affected by the operation. If the drive is part of an active device group, go to SPARC: DR Clustering Considerations for Disk and Tape Drives.
Determine whether the drive is a component of the primary node or the secondary node. If the drive is a component of the secondary node, you can perform the DR remove operation on it.
If the drive is a component of the primary node, you must switch the primary and secondary nodes before performing the DR remove operation on the device.
If the current primary node fails while you are performing the DR operation on a secondary node, cluster availability is impacted. The primary node has no place to fail over until a new secondary node is provided.