Sun Cluster Concepts Guide for Solaris OS

High Availability FAQs

Question:

What exactly is a highly available system?

Answer:

The Sun Cluster software defines high availability (HA) as the ability of a cluster to keep an application running. The application runs even when a failure occurs that would normally make a server system unavailable.

Question:

What is the process by which the cluster provides high availability?

Answer:

Through a process known as failover, the cluster framework provides a highly available environment. Failover is a series of steps that are performed by the cluster to migrate data service resources from a failing node or zone to another operational node or zone in the cluster.

Question:

What is the difference between a failover and scalable data service?

Answer:

There are two types of highly available data services:

A failover data service runs an application on only one primary node or zone in the cluster at a time. Other nodes or zones might run other applications, but each application runs on only a single node or zone. If a primary node or zone fails, applications that are running on the failed node or zone fail over to another node or zone. They continue running.

A scalable data service spreads an application across multiple nodes or zones to create a single, logical service. A scalable data service that uses a shared address to balance the service load between nodes can be online in only one zone per physical node. Scalable services leverage the number of nodes or zones and processors in the entire cluster on which they run.

For each application, one node hosts the physical interface to the cluster. This node is called a Global Interface (GIF) node. Multiple GIF nodes can exist in the cluster. Each GIF node hosts one or more logical interfaces that can be used by scalable services. These logical interfaces are called global interfaces. One GIF node hosts a global interface for all requests for a particular application and dispatches them to multiple nodes on which the application server is running. If the GIF node fails, the global interface fails over to a surviving node.

If any node or zone on which the application is running fails, the application continues to run on other nodes or zones with some performance degradation. This process continues until the failed node or zone returns to the cluster.