Sun Cluster 3.1 Data Service Planning and Administration Guide

Chapter 1 Planning for Sun Cluster Data Services

This chapter provides planning information and guidelines to install and configure Sun Cluster data services. This chapter contains the following sections.

See the Sun Cluster 3.1 Concepts Guide document for conceptual information about data services, resource types, resources, and resource groups.

If your applications are not currently offered as Sun Cluster data services, see the Sun Cluster 3.1 Data Services Developer's Guide for information on how to develop other applications to become highly available data services.

Sun Cluster Data Services Installation and Configuration Tasks

The following table lists the books that describe the installation and configuration of Sun Cluster data services.

Table 1–1 Task Map: Installing and Configuring Sun Cluster Data Services

Task 

For Instructions, Go To … 

Install and configure Sun Cluster HA for Oracle  

Sun Cluster 3.1 Data Service for Oracle

Install and configure Sun Cluster HA for Sun Open Net Environment (Sun ONE) Web Server  

Sun Cluster 3.1 Data Service for Sun ONE Web Server

Install and configure Sun Cluster HA for Sun ONE Directory Server  

Sun Cluster 3.1 Data Service for Sun ONE Directory Server

Install and configure Sun Cluster HA for Apache  

Sun Cluster 3.1 Data Service for Apache

Install and configure Sun Cluster HA for DNS  

Sun Cluster 3.1 Data Service for Domain Name Service (DNS)

Install and configure Sun Cluster HA for NFS  

Sun Cluster 3.1 Data Service for Network File System (NFS)

Install and configure Sun Cluster Support for Oracle Parallel Server/Real Application Clusters  

Sun Cluster 3.1 Data Service for Oracle Parallel Server/Real Application Clusters

Install and configure Sun Cluster HA for SAP  

Sun Cluster 3.1 Data Service for SAP

Install and configure Sun Cluster HA for Sybase ASE  

Sun Cluster 3.1 Data Service for Sybase ASE

Install and configure Sun Cluster HA for BroadVision One-To-One Enterprise 

Sun Cluster 3.1 Data Service for BroadVision One-To-One Enterprise

Install and configure Sun Cluster HA for NetBackup 

Sun Cluster 3.1 Data Service for Netbackup

Install and configure Sun Cluster HA for SAP liveCache 

Sun Cluster 3.1 Data Service for SAP liveCache

Install and configure Sun Cluster HA for Siebel 

Sun Cluster 3.1 Data Service for Siebel

Configuration Guidelines for Sun Cluster Data Services

This section provides configuration guidelines for Sun Cluster data services.

Identifying Data Service Special Requirements

Identify requirements for all of the data services before you begin Solaris and Sun Cluster installation. Failure to do so might result in installation errors that require that you completely reinstall the Solaris and Sun Cluster software.

For example, the Oracle Parallel Fail Safe/Real Application Clusters Guard option of Sun Cluster Support for Oracle Parallel Server/Real Application Clusters has special requirements for the hostnames that you use in the cluster. Sun Cluster HA for SAP also has special requirements. You must accommodate these requirements before you install Sun Cluster software because you cannot change hostnames after you install Sun Cluster software.

Determining the Location of the Application Binaries

You can install the application software and application configuration files on one of the following locations.

Verifying the nsswitch.conf File Contents

The nsswitch.conf file is the configuration file for name-service lookups. This file determines the following information.

Some data services require that you direct “group” lookups to “files” first. For these data services, change the “group” line in the nsswitch.conf file so that the “files” entry is listed first. See the chapter for the data service that you plan to configure to determine whether you need to change the “group” line.

See the planning chapter in the Sun Cluster 3.1 Software Installation Guide for additional information on how to configure the nsswitch.conf file for the Sun Cluster environment.

Planning the Cluster File System Configuration

Depending on the data service, you might need to configure the cluster file system to meet Sun Cluster requirements. See the chapter for the data service that you plan to configure to determine whether any special considerations apply.

The resource type HAStoragePlus enables you to use a highly available local file system in a Sun Cluster environment configured for failover. See Enabling Highly Available Local File Systems for information on setting up the HAStoragePlus resource type.

See the planning chapter of the Sun Cluster 3.1 Software Installation Guide for information on how to create cluster file systems.

Relationship Between Resource Groups and Disk Device Groups

Sun Cluster uses the concept of node lists for disk device groups and resource groups. Node lists are ordered lists of primary nodes, which are potential masters of the disk device group or resource group. Sun Cluster uses a failback policy to determine what happens when a node has been down and then rejoins the cluster, and the rejoining node appears earlier in the node list than the current primary node. If failback is set to True, the device group or resource group will be switched off of the current primary and switched onto the rejoining node, making the rejoining node the new primary.

To ensure high availability of a failover resource group, make the resource group's node list match the node list of associated disk device groups. For a scalable resource group, the resource group's node list cannot always match the device group's node list because, currently, a device group's node list must contain exactly two nodes. For a greater-than-two-node cluster, the node list for the scalable resource group can have more than two nodes.

For example, assume that you have a disk device group, disk-group-1, that has nodes phys-schost-1 and phys-schost-2 in its node list, with the failback policy set to Enabled. Assume that you also have a failover resource group, resource-group-1, which uses disk-group-1 to hold its application data. When you set up resource-group-1, also specify phys-schost-1 and phys-schost-2 for the resource group's node list, and set the failback policy to True.

To ensure high availability of a scalable resource group, make the scalable resource group's node list a superset of the node list for the disk device group. Doing so ensures that the nodes that are directly connected to the disks are also nodes that can run the scalable resource group. The advantage is that, when at least one cluster node connected to the data is up, the scalable resource group runs on that same node, making the scalable services available also.

See the Sun Cluster 3.1 Software Installation Guide for information on how to set up disk device groups. See the Sun Cluster 3.1 Concepts Guide document for more details on the relationship between disk device groups and resource groups.

Understanding HAStorage and HAStoragePlus

The HAStorage and the HAStoragePlus resource types can be used to configure the following options.

In addition, HAStoragePlus is capable of mounting any global file system found to be in an unmounted state. See Planning the Cluster File System Configuration for more information.


Note –

If the device group is switched to another node while the HAStorage or HAStoragePlus resource is online, AffinityOn has no effect and the resource group does not migrate along with the device group. On the other hand, if the resource group is switched to another node, AffinityOn being set to True causes the device group to follow the resource group to the new node.


See Synchronizing the Startups Between Resource Groups and Disk Device Groups for information about the relationship between disk device groups and resource groups. The SUNW.HAStorage(5) and SUNW.HAStoragePlus(5) man pages provides additional details.

See Enabling Highly Available Local File Systems for procedures for mounting of file systems such as VxFS in a local mode. The SUNW.HAStoragePlus(5) man page provides additional details.

Determining Whether Your Data Service Requires HAStorage or HAStoragePlus

Choosing Between HAStorage and HAStoragePlus

To determine whether to create HAStorage or HAStoragePlus resources within a data service resource group, consider the following criteria.

Considerations

Use the information in this section to plan the installation and configuration of any data service. The information in this section encourages you to think about the impact your decisions have on the installation and configuration of any data service. For specific considerations for your data service, see the chapter in Sun Cluster 3.1 Data Service Planning and Administration Guide that applies to your specific data service.

Node List Properties

You can specify the following three node lists when configuring data services.

  1. installed_nodes – A property of the resource type. This property is a list of the cluster node names on which the resource type is installed and enabled to run.

  2. nodelist – A property of a resource group that specifies a list of cluster node names where the group can be brought online, in order of preference. These nodes are known as the potential primaries or masters of the resource group. For failover services, configure only one resource group node list. For scalable services, configure two resource groups and thus two node lists. One resource group and its node list identifies the nodes on which the shared addresses are hosted. This list is a failover resource group on which the scalable resources depend. The other resource group and its list identifies nodes on which the application resources are hosted. The application resources depend on the shared addresses. Therefore, the node list for the resource group that contains the shared addresses must be a superset of the node list for the application resources.

  3. auxnodelist – A property of a shared address resource. This property is a list of physical node IDs that identify cluster nodes that can host the shared address but never serve as primary in the case of failover. These nodes are mutually exclusive with the nodes identified in the node list of the resource group. This list pertains to scalable services only. See the scrgadm(1M) man page for details.

Overview of the Installation and Configuration Process

Use the following procedures to install and configure data services.

Before you install and configure data services, see the Sun Cluster 3.1 Software Installation Guide, which includes procedures on how to install the data service software packages and how to configure Internet Protocol Network Multipathing (IP Networking Multipathing) groups that the network resources use.


Note –

You can use SunPlex Manager to install and configure the following data services: Sun Cluster HA for Oracle, Sun Cluster HA for Sun ONE Web Server, Sun Cluster HA for Sun ONE Directory Server, Sun Cluster HA for Apache, Sun Cluster HA for DNS, and Sun Cluster HA for NFS. See the SunPlex Manager online help for more information.


Installation and Configuration Task Flow

The following table shows a task map of the procedures to install and configure a Sun Cluster failover data service.

Table 1–2 Task Map: Sun Cluster Data Service Installation and Configuration

Task 

For Instructions, Go to 

Install the Solaris and Sun Cluster software 

Sun Cluster 3.1 Software Installation Guide

Set up IP Networking Multipathing groups 

Sun Cluster 3.1 Software Installation Guide

Set up multihost disks 

Sun Cluster 3.1 Software Installation Guide

Plan resources and resource groups 

Sun Cluster 3.1 Release Notes

Decide the location for application binaries, and configure the nsswitch.conf file

Chapter 1, Planning for Sun Cluster Data Services

Install and configure the application software 

The chapter for each data service in this book 

Install the data service software packages 

Sun Cluster 3.1 Software Installation Guide or the chapter for each data service in this book

Register and configure the data service 

The chapter for each data service in this book 

Example

The example in this section shows how you might set up the resource types, resources, and resource groups for an Oracle application that has been instrumented to be a highly available failover data service.

The main difference between this example and an example of a scalable data service is that, in addition to the failover resource group that contains the network resources, a scalable data service requires a separate resource group (called a scalable resource group) for the application resources.

The Oracle application has two components, a server and a listener. Sun supplies the Sun Cluster HA for Oracle data service, and therefore these components have already been mapped into Sun Cluster resource types. Both of these resource types are associated with resources and resource groups.

Because this example is a failover data service, the example uses logical hostname network resources, which are the IP addresses that fail over from a primary node to a secondary node. Place the logical hostname resources into a failover resource group, and then place the Oracle server resources and listener resources into the same resource group. This ordering enables all of the resources to fail over as a group.

For Sun Cluster HA for Oracle run to on the cluster, you must define the following objects.

Tools for Data Service Resource Administration

This section describes the tools that you can use to perform installation and configuration tasks.

The SunPlex Manager Graphical User Interface (GUI)

SunPlex Manager is a web-based tool that enables you to perform the following tasks.

See the Sun Cluster 3.1 Software Installation Guide for instructions on how to use SunPlex Manager to install cluster software. SunPlex Manager provides online help for most administrative tasks.

The Sun Cluster Module for the Sun Management Center GUI

The Sun Cluster module enables you to monitor clusters and to perform some operations on resources and resource groups from the Sun Management Center GUI. See the Sun Cluster 3.1 Software Installation Guide for information about installation requirements and procedures for the Sun Cluster module. Go to http://docs.sun.com to access the Sun Management Center software documentation set, which provides additional information about Sun Management Center.

The scsetup Utility

The scsetup(1M) utility is a menu-driven interface that you can use for general Sun Cluster administration. You can also use this utility to configure data service resources and resource groups. Select option 2 from the scsetup main menu to launch the Resource Group Manager submenu.

The scrgadm Command

You can use the scrgadm command to register and configure data service resources. See the procedure on how to register and configure your data service in the applicable chapter of this book. If, for example, you use Sun Cluster HA for Oracle, see “Installing and Configuring Sun Cluster HA for Oracle” in Sun Cluster 3.1 Data Service for Oracle. Chapter 2, Administering Data Service Resources also contains information on how to use the scrgadm command to administer data service resources. Finally, see the scrgadm(1M) man page for additional information.

Data Service Resource Administration Tasks

The following table lists which tool you can use in addition to the command line for different data service resource administration tasks. See Chapter 2, Administering Data Service Resources for more information about these tasks and for details on how to use the command line to complete related procedures.

Table 1–3 Tools You Can Use for Data Service Resource Administration Tasks

Task 

SunPlex Manager 

Sun Management Center 

The scsetup Utility

Register a resource type 

Yes 

No 

Yes 

Create a resource group 

Yes 

No 

Yes 

Add a resource to a resource group 

Yes 

No 

Yes 

Bring a resource group online 

Yes 

Yes 

No 

Remove a resource group 

Yes 

Yes 

No 

Remove a resource 

Yes 

Yes 

No 

Switch the current primary of a resource group 

Yes 

No 

No 

Disable a resource 

Yes 

Yes 

No 

Move the resource group of a disabled resource into the unmanaged state 

Yes 

No 

No 

Display resource type, resource group, and resource configuration information 

Yes 

Yes 

No 

Change resource properties 

Yes 

No 

No 

Clear the STOP_FAILED error flag on resources

Yes 

No 

No 

Add a node to a resource group 

Yes 

No 

No 

Sun Cluster Data Service Fault Monitors

This section provides general information about data service fault monitors. The Sun-supplied data services contain fault monitors that are built into the package. The fault monitor (or fault probe) is a process that probes the health of the data service.

Fault Monitor Invocation

The RGM invokes the fault monitor when you bring a resource group and its resources online. This invocation causes the RGM to internally call the MONITOR_START method for the data service.

The fault monitor performs the following two functions.

Monitoring of the Abnormal Exit of the Server Process

The Process Monitor Facility (PMF) monitors the data service processes.

The data service fault probe runs in an infinite loop and sleeps for an adjustable amount of time that the resource property Thorough_probe_interval sets. While sleeping, the probe checks with the PMF to see if the process has exited. If the process has exited, the probe updates the status of the data service as “Service daemon not running” and takes action. The action can involve restarting the data service locally or failing over the data service to a secondary cluster node. To decide whether to restart or to fail over the data service, the probe checks the value set in the resource properties Retry_count and Retry_interval for the data service application resource.

Checking the Health of the Data Service

Typically, communication between the probe and the data service occurs through a dedicated command or a successful connection to the specified data service port.

The logic that the probe uses is roughly as follows.

  1. Sleep (Thorough_probe_interval).

  2. Perform health checks under a time-out property Probe_timeout. Probe_timeout is a resource extension property of each data service that you can set.

  3. If Step 2 is a success, that is, the service is healthy, update the success/failure history. To update the success/failure history, purge any history records that are older than the value that is set for the resource property Retry_interval. The probe sets the status message for the resource as “Service is online” and returns to Step 1.

    If Step 2 resulted in a failure, the probe updates the failure history. The probe then computes the total number of times that the health check failed.

    The result of the health check can range from a complete failure to success. The interpretation of the result depends on the specific data service. Consider a scenario where the probe can successfully connect to the server and send a handshake message to the server, but the probe receives only a partial response before it times out. This scenario is most likely a result of system overload. If some action is taken (such as restarting the service), the clients reconnect to the service, thus further overloading the system. If this event occurs, a data service fault monitor can decide not to treat this “partial” failure as fatal. Instead, the monitor can track this failure as a nonfatal probe of the service. These partial failures are still accumulated over the interval that the Retry_interval property specifies.

    However, if the probe cannot connect to the server at all, the failure can be considered fatal. Partial failures lead to incrementing the failure count by a fractional amount. Every time the failure count reaches total failure (either by a fatal failure or by accumulation of partial failures), the probe restarts or fails over the data service in an attempt to correct the situation.

  4. If the result of the computation in Step 3 (the number of failures in the history interval) is less than the value of the resource property Retry_count, the probe attempts to correct the situation locally (for example, by restarting the service). The probe sets the status message of the resource as “Service is degraded” and returns to Step 1.

  5. If the number of failures in Retry_interval exceeds Retry_count, the probe calls scha_control with the “giveover” option. This option requests failover of the service. If this request succeeds, the fault probe stops on this node. The probe sets the status message for the resource as, “Service has failed.”

  6. The Sun Cluster framework can deny the scha_control request issued in the previous step for various reasons. The return code of scha_control identifies the reason. The probe checks the return code. If the scha_control is denied, the probe resets the failure/success history and starts afresh. This probe resets the history because the number of failures is already above Retry_count, and the fault probe would attempt to issue scha_control in each subsequent iteration (which would be denied again). This request would place additional load on the system and would increase the likelihood of further service failures.

    The probe then returns to Step 1.