Sun Cluster Geographic Edition software protects applications from unexpected disruptions by using multiple clusters that are geographically separated. These clusters contain identical copies of the Sun Cluster Geographic Edition infrastructure, which manage replicated data between the clusters. Sun Cluster Geographic Edition software is a layered extension of the Sun Cluster software.
This chapter contains the following sections:
Familiarize yourself with the planning information in the Sun Cluster Geographic Edition Installation Guide and the Sun Cluster Geographic Edition Overview before beginning administration tasks. This guide contains the standard tasks that are used to administer and maintain the Sun Cluster Geographic Edition configurations.
For general Sun Cluster, data service, and hardware administration tasks, refer to the Sun Cluster documentation.
You can perform all administration tasks on a cluster that is running the Sun Cluster Geographic Edition software without causing any nodes or the cluster to fail. You can install, configure, start, use, stop, and uninstall the Sun Cluster Geographic Edition software on an operational cluster.
You might be required to take nodes or the cluster offline for preparatory actions, such as installing data replication software and performing Sun Cluster administrative tasks. Refer to the appropriate product documentation for administration restrictions.
You can perform administrative tasks on a cluster that is running Sun Cluster Geographic Edition software by using a graphical user interface (GUI) or the command-line interface (CLI).
The procedures in this guide describe how to perform administrative tasks by using the CLI.
Sun Cluster software supports Sun Cluster Manager, a GUI tool that you can use to perform various administrative tasks on your cluster. For specific information about how to use Sun Cluster Manager, see the Sun Cluster online help.
To administer Sun Cluster Geographic Edition software by using the Sun Cluster Manager – Geographic Edition GUI, ensure that the root passwords are the same on all nodes of both clusters in the partnership.
You can only use the GUI to administer Sun Cluster Geographic Edition software after the software infrastructure has been enabled by using the geoadm start command. Use a shell to run the geoadm start and geoadm stop commands. For information about enabling and disabling the Sun Cluster Geographic Edition infrastructure, see Chapter 3, Administering the Sun Cluster Geographic Edition Infrastructure.
The GUI does not support creating custom heartbeats outside of a partnership. If you want to specify a custom heartbeat in a partnership join operation, use the CLI to run the geops join-partnership command.
RBAC is not supported in the GUI.
Table 1–1 lists the commands that you can use to administer the Sun Cluster Geographic Edition software. For more information about each command, refer to the Sun Cluster Geographic Edition Reference Manual.Table 1–1 Sun Cluster Geographic Edition CLI
Enables or disables the Sun Cluster Geographic Edition software on the local cluster and displays the runtime status of the local cluster
Configures and manages the heartbeat mechanism that is provided with the Sun Cluster Geographic Edition software
Creates and manages the partnerships between clusters
Configures and manages protection groups
This section provides an example of a disaster recovery scenario and actions an administrator might perform.
Company X has two geographically separated clusters, cluster-paris in Paris, and cluster-newyork in New York. These clusters are configured as partner clusters. The cluster in Paris is configured as the primary cluster and the cluster in New York is the secondary.
The cluster-paris cluster fails temporarily as a result of power outages during a windstorm. An administrator can expect the following events:
The heartbeat communication is lost between cluster-paris and cluster-newyork. Because heartbeat notification was configured during the creation of the partnership, a heartbeat-loss notification email is sent to the administrator.
For information about the configuring partnerships and heartbeat notification, see Creating and Modifying a Partnership.
The administrator receives the notification email and follows the company procedure to verify that the disconnect occurred because of a situation that requires a takeover by the secondary cluster. Because a takeover might take a long time, depending on the requirements of the applications being protected, Company X does not allow takeovers unless the primary cluster cannot be repaired within two hours.
For information about verifying a disconnect on a system, see one of following data replication guides:
Because the cluster-paris cluster cannot be brought online again for at least another day, the administrator runs a geopg takeover command on a node in the cluster in New York. This command starts the protection group on the secondary cluster cluster-newyork in New York.
For information about performing a takeover on a system, see one of the following data replication guides:
After the takeover, the secondary cluster cluster-newyork becomes the new primary cluster. The failed cluster in Paris is still configured to be the primary cluster. Therefore, when the cluster-paris cluster restarts, the cluster detects that the primary cluster was down and lost contact with the partner cluster. Then, the cluster-paris cluster enters an error state that requires administrative action to clear. You might also be required to recover and resynchronize data on the cluster.
For information about recovering data after a takeover, see one of the following data replication guides:
This section describes the guidelines you must follow in creating applications to be managed by Sun Cluster Geographic Edition software.
Before you create an application to be managed by Sun Cluster Geographic Edition software, determine whether the application satisfies the following requirements for being made highly available or scalable.
If the application fails to meet all requirements, modify the application source code to make it highly available or scalable.
Both network-aware (client-server model) and network-unaware (client-less) applications are potential candidates for being made highly available or scalable in the Sun Cluster Geographic Edition environment. However, Sun Cluster Geographic Edition cannot provide enhanced availability in timesharing environments in which applications are run on a server that is accessed through telnet or rlogin.
The application must be crash tolerant. That is, it must recover disk data (if necessary) when it is started after an unexpected node death. Furthermore, the recovery time after a crash must be bounded. Crash tolerance is a prerequisite for making an application highly available because the ability to recover the disk and restart the application is a data integrity issue. The data service is not required to be able to recover connections.
The application must not depend on the physical host name of the node on which it is running.
The application must operate correctly in environments in which multiple IP addresses are configured to go up. Examples include environments with multihomed hosts, in which the node is located on more than one public network, and environments with nodes on which multiple, logical interfaces are configured to go up on one hardware interface.
Application binaries and libraries can be located locally on each node or in the cluster file system. The advantage of being located in the cluster file system is that a single installation is sufficient. The disadvantage is that when you use rolling upgrade for Sun Cluster software, the binaries are in use while the application is running under the control of the Resource Group Manager (RGM).
The client must have capacity to retry a query automatically if the first attempt times out. If the application and the protocol already handle the case of a single server crashing and rebooting, they also can handle the containing resource group failing over or switching over.
The application must not have UNIX® domain sockets or named pipes in the cluster file system.
A scalable service must meet all the preceding conditions for high availability as well as the following additional requirements.
The application must have the ability to run multiple instances, all operating on the same application data in the cluster file system.
The application must provide data consistency for simultaneous access from multiple nodes.
The application must implement sufficient locking with a globally visible mechanism, such as the cluster file system.
For a scalable service, application characteristics also determine the load-balancing policy. For example, the load-balancing policy Lb_weighted, which allows any instance to respond to client requests, does not work for an application that makes use of an in-memory cache on the server for client connections. In this case, you should specify a load-balancing policy that restricts a given client's traffic to one instance of the application. The load-balancing policies Lb_sticky and Lb_sticky_wild repeatedly send all requests by a client to the same application instance, where they can make use of an in-memory cache. If multiple client requests come in from different clients, the RGM distributes the requests among the instances of the service.
See Chapter 2, Developing a Data Service, in Sun Cluster Data Services Developer’s Guide for Solaris OS for more information about setting the load-balancing policy for scalable data services.
The application must be able to meet the following data replication requirements:
Information replicated must not be host– or cluster-specific.
When the application fails over to the remote site, the application might run on a host with a different IP address. To allow client nodes to find the remote site, use a Sun Cluster Geographic Edition action script to update the DNS/NIS mapping.
If you don't want your application to tolerate any data loss, the application should use synchronous replication.