Skip Navigation Links | |
Exit Print View | |
Oracle Solaris Cluster Data Service for Oracle VM Server for SPARC Guide |
Installing and Configuring HA for Logical Domains
Installing and Configuring HA for Logical Domains
HA for Logical Domains Overview
Planning the HA for Logical Domains Installation and Configuration
How to Configure Logical Domains to Reset for Control Domain Failures
Installing and Configuring Logical Domains
How to Install the Logical Domains Software
How to Enable the Logical Domains Instances to Run in a Cluster
Verifying the Installation and Configuration of Logical Domains
How to Verify the HA for Logical DomainsInstallation and Configuration
Installing the HA for Logical Domains Packages
How to Install the HA for Logical Domains Packages
Registering and Configuring HA for Logical Domains
How to Configure HA for Logical Domains
How to Remove a HA for Logical Domains Resource From a Failover Resource Group
Verifying the Installation and Configuration of Logical Domains
How to Verify the HA for Logical DomainsInstallation and Configuration
Debugging HA for Logical Domains
This section describes the HA for Logical Domains fault monitor's probing algorithm or functionality, and states the conditions, messages, and recovery actions associated with unsuccessful probing.
For conceptual information about fault monitors, see the Oracle Solaris Cluster Concepts Guide.
The HA for Logical Domains guest domain fault monitor uses the resource properties specified in the resource type SUNW.ldom. Refer to the SUNW.ldom(5) man page for a complete list of resource properties used.
HA for Logical Domains is controlled by the extension properties that control the probing frequency. The default values of these properties determine the preset behavior of the fault monitor and are suitable for most Oracle Solaris Cluster installations. You can modify this preset behavior by performing the following actions:
Setting the interval between fault monitor probes (Thorough_probe_interval)
Setting the timeout for fault monitor probes (Probe_timeout)
Setting the number of times the fault monitor attempts to restart the resource (Retry_count)
The HA for Logical Domains fault monitor checks the domain status within an infinite loop. During each cycle, the fault monitor checks the domain state and reports either a failure or success.
If the fault monitor is successful, it returns to its infinite loop and continues the next cycle of probing and sleeping.
If the fault monitor reports a failure, a request is made to the cluster to restart the resource. If the fault monitor reports another failure, another request is made to the cluster to restart the resource. This behavior continues whenever the fault monitor reports a failure. If successive restarts exceed the Retry_count within the Thorough_probe_interval, a request is made to fail over the resource group onto a different node.
The probe checks the domain state every 60 seconds by using the ldm list-domain command.
The ldm list-domain command produces a status line for the domain and is accurate at the instant that the command executes.
The status modes that are considered to be normal operational modes are as follows: active, suspending, resuming, suspended, and starting. Whenever the ldm command reports these status modes, the probe considers that the domain is operating in an acceptable mode.
The status modes that are considered to be restartable modes are as follows: inactive and stopping. These modes are not considered acceptable and if one of these modes is encountered, the probe requests a restart of the resource.
The probe also requests a resource to restart if any unknown status modes are reported by the ldm command.
If the guest domain configuration has changed, the probe will update this information to CCR.
The probe runs the user-supplied script or binary provided for plugin_probe. If this process fails, then the probe will restart the Logical Domains guest domain resource.
If the Logical Domains guest domain resource is repeatedly restarted and subsequently exhausts the Retry_count within the Retry_interval, then a failover is initiated for the resource group onto another node if Failover_enabled is set to TRUE.