This chapter includes answers to the most frequently asked questions about Sun Cluster. The questions are organized by topic.
What exactly is a highly available system?
Sun Cluster defines high availability (HA) as the ability of a cluster to keep an application up and running, even though a failure has occurred that would normally make a server system unavailable.
What is the process by which the cluster provides high availability?
Through a process known as failover, the cluster framework provides a highly available environment. Failover is a series of steps performed by the cluster to migrate an application from a failing node to another operational node in the cluster.
What is the difference between an HA and scalable service?
An HA service means that an application runs on only one primary node in the cluster at a time. Other nodes might run other applications, but each application runs on only a single node. If a primary node fails, the applications running on the failed node fail over to another node and continue running.
A scalable service spreads an application across multiple nodes to create a single, logical service. Scalable services leverage the number of nodes and processors in the entire cluster on which they run. One node receives all application requests and dispatches them to multiple nodes on which the application server is running. If this node fails (it is called the Global Interface Node or GIF), the global interface fails over to a surviving node. If any of the nodes on which the application is running fails, the application continues to run on the other nodes with some performance degradation until the failed node returns to the cluster.
Can I run one or more of the cluster nodes as highly available NFS server(s) with other cluster nodes as clients?
No. Issues exist with local locking interferes having the ability to kill and restart lockd (which occurs during NFS failover). Between the kill and restart, a blocked local process can be granted the lock, which prevents the client system that owns the lock from reclaiming it after failover.
Can I use a cluster file system for applications that are not under Resource Group Manager control?
Yes. However, without RGM control, the applications cannot survive the failure of the node on which they are running.
Must all cluster file systems have a mount point under the /global/device-group directory?
No. However, placing cluster file systems under the same mount point, such as /global/device-group, enables better organization and management of these file systems.
What are the differences between using the cluster file system and exporting NFS file systems?
There are several differences:
The cluster file system supports global devices. NFS does not support remote access to devices.
The cluster file system has a global namespace. Only one mount command is required. With NFS, you must mount the file system on each node.
The cluster file system caches files in more cases than does NFS. For example when a file is being accessed from multiple nodes for read, write, file locks, async I/O.
The cluster file system supports seamless failover if one server fails. NFS supports multiple servers, but failover is only possible for read-only file systems.
The cluster file system is built to exploit future fast cluster interconnects that provide remote DMA and zero-copy functions.
If you change the attributes on a file (using chmod(1M), for example) in a cluster file system, the change is reflected immediately on all nodes. With an exported NFS file system, this can take much longer.
Do I need to mirror all disk devices?
For a disk device to be considered highly available, it must be mirrored, or use RAID-5 hardware. All data services should use either highly available disk devices, or cluster file systems mounted on highly available disk devices. Such configurations can tolerate single disk failures.
What Sun Cluster data services are available?
The list of supported data services is included in the Sun Cluster 3.0 Release Notes.
What application versions are supported by Sun Cluster data services?
The list of supported application versions is included in the Sun Cluster 3.0 Release Notes.
Can I write my own data service?
Yes. See the Sun Cluster 3.0 Data Services Developers' Guide and the Data Service Enabling Technologies documentation provided with the Data Service Development Library API for more information.
When creating network resources, should I specify numeric IP addresses or hostnames?
The preferred method for specifying network resources is to use the UNIX hostname rather than the numeric IP address.
When creating network resources, what is the difference between using a logical hostname (a LogicalHostname resource) or a shared address (a SharedAddress resource)?
Wherever the documentation calls for the use of a LogicalHostname resource in a Failover mode resource group, a SharedAddress resource or LogicalHostname resource may be used interchangeably. The use of a SharedAddress resource incurs some additional overhead because the cluster networking software is configured for a SharedAddress but not for a LogicalHostname.
The advantage to using a SharedAddress is the case where you are configuring both scalable and failover data services, and want clients to be able to access both services using the same hostname. In this case, the SharedAddress resource(s) along with the failover application resource are contained in one resource group, while the scalable service resource is contained in a separate resource group and configured to use the SharedAddress. Both the scalable and failover services may then use the same set of hostnames/addresses which are configured in the SharedAddress resource.
What public network adapters does Sun Cluster support?
Currently, Sun Cluster supports Ethernet (10/100BASE-T and 1000BASE-SX Gb) public network adapters. Because new interfaces might be supported in the future, check with your Sun sales representative for the most current information.
What is the role of the MAC address in failover?
When a failover occurs, new Address Resolution Protocol (ARP) packets are generated and broadcast to the world. These ARP packets contain the new MAC address (of the new physical adapter to which the node failed over) and the old IP address. When another machine on the network receives one of these packets, it flushes the old MAC-IP mapping from its ARP cache and uses the new one.
Is Sun Cluster supported to set local-mac-address?=true in the OpenBoot PROM for a host adapter?
No, this variable is not supported.
Do all cluster members need to have the same root password?
You are not required to have the same root password on each cluster member. However, you can simplify administration of the cluster by using the same root password on all nodes.
Is the order in which nodes are booted significant?
In most cases, no. However, the boot order is important to prevent amnesia (refer to "Quorum and Quorum Devices" for details on amnesia). For example, if node two was the owner of the quorum device and node one is down, and then you bring node two down, you must bring up node two before bringing back node one. This prevents you from accidentally bringing up a node with out of date cluster configuration information.
Do I need to mirror local disks in a cluster node?
Yes. Though this mirroring is not a requirement, mirroring the cluster node's disks precludes against a non-mirrored disk failure taking down the node. The downside to mirroring a cluster node's local disks is more system administration overhead.
What are the cluster member backup issues?
You can use several backup methods for a cluster. One method is to have a node as the backup node with a tape drive/library attached. Then use the cluster file system to back up the data. Do not connect this node to the shared disks.
See the Sun Cluster 3.0 System Administration Guide for additional information on backup and restore procedures.
What makes multihost storage highly available?
Multihost storage is highly available because it can survive the loss of a single disk due to mirroring (or due to hardware-based RAID-5 controllers). Because a multihost storage device has more than one host connection, it can also withstand the loss of a single node to which it is connected.
What multihost storage configurations are supported?
Currently, greater than two-node connectivity is not supported. All multihosted disks within a single enclosure must connect to the same two nodes. Refer to "Sun Cluster Topologies" for more information.
Can I use disks configured for SCSI-3 PGR as global devices?
Currently, SCSI-3 PGR is not supported in Sun Cluster. Only SCSI-2 semantics are supported for global disk devices. Since SCSI-3 disks are not supported, you must use the -R option to scdidadm(1M) to set the correct SCSI semantics for any SCSI-3 disks that you want to use as global devices in a cluster.
What cluster interconnects does Sun Cluster support?
Currently Sun Cluster supports Ethernet (100BASE-T Fast Ethernet and 1000BASE-SX Gb) cluster interconnects. Support is also planned for Scalable Coherent Interface (SCI).
Do I need to consider any special client needs or restrictions for use with a cluster?
Client systems connect to the cluster as they would any other server. In some instances, depending on the data service application, you might need to install client-side software or perform other configuration changes so that the client can connect to the data service application. See individual chapters in Sun Cluster 3.0 Data Services Installation and Configuration Guide for more information on client-side configuration requirements.
Does Sun Cluster require an administrative console?
Yes.
Does the administrative console have to be dedicated to the cluster, or can it be used for other tasks?
Sun Cluster does not require a dedicated administrative console, but using one provides these benefits:
Enables centralized cluster management by grouping console and management tools on the same machine
Provides potentially quicker problem resolution by your hardware service provider
Does the administrative console need to be located "close" to the cluster itself, for example, in the same room?
Check with your hardware service provider. The provider might require that the console be located in close proximity to the cluster itself. No technical reason exists for the console to be located in the same room.
Can an administrative console serve more than one cluster, as long as any distance requirements are also first met?
Yes. You can control multiple clusters from a single administrative console. You can also share a single terminal concentrator between clusters.
Does Sun Cluster require a terminal concentrator?
Sun Cluster 3.0 does not require a terminal concentrator to run. Unlike the Sun Cluster 2.2 product, which required a terminal concentrator for failure fencing, Sun Cluster 3.0 does not depend on the terminal concentrator.
I see that most Sun Cluster servers use a terminal concentrator, but the E10000 does not. Why is that?
The terminal concentrator is effectively a serial-to-Ethernet converter for most servers. Its console port is a serial port. The Sun Enterprise E10000 server doesn't have a serial console. The System Service Processor (SSP) is the console, either through an Ethernet or jtag port. For the Sun Enterprise E10000 server, you always use the SSP for consoles.
What are the benefits of using a terminal concentrator?
Using a terminal concentrator provides console-level access to each node from a remote workstation anywhere on the network, including when the node is at the OpenBoot PROM (OBP).
If I use a terminal concentrator not supported by Sun, what do I need to know to qualify the one that I want to use?
The main difference between the terminal concentrator supported by Sun and other console devices is that the Sun terminal concentrator has special firmware that prevents the terminal concentrator from sending a break to the console when it boots. Note that if you have a console device that can send a break, or a signal that might be interpreted as a break to the console, it shuts down the node.
Can I free a locked port on the terminal concentrator supported by Sun without rebooting it?
Yes. Note the port number that needs to be reset and do the following:
telnet tc Enter Annex port name or number: cli annex: su - annex# admin admin : reset port_number admin : quit annex# hangup # |
Refer to the Sun Cluster 3.0 System Administration Guide for more information about configuring and administering the terminal concentrator supported by Sun.
What if the terminal concentrator itself fails? Must I have another one standing by?
No. You do not lose any cluster availability if the terminal concentrator fails. You do lose the ability to connect to the node consoles until the concentrator is back in service.
If I do use a terminal concentrator, what about security?
Generally, the terminal concentrator is attached to a small network used by system administrators, not a network that is used for other client access. You can control security by limiting access to that particular network.