Go to main content

Oracle® Solaris Cluster Data Service for Oracle Solaris Zones Guide

Exit Print View

Updated: September 2015
 
 

Tuning the HA for Solaris Zones Fault Monitors

The HA for Solaris Zones fault monitors verify that the following components are running correctly:

  • Zone boot resource

  • Zone script resource

  • Zone SMF resource

Each HA for Solaris Zones fault monitor is contained in the resource that represents Solaris Zones component. You create these resources when you register and configure HA for Solaris Zones. For more information, see Registering and Configuring HA for Solaris Zones.

System properties and extension properties of these resources control the behavior of the fault monitor. The default values of these properties determine the preset behavior of the fault monitor. The preset behavior should be suitable for most Oracle Solaris Cluster installations. Therefore, you should tune the HA for Solaris Zones fault monitor only if you need to modify this preset behavior.

Tuning the HA for Solaris Zones fault monitors involves the following tasks:

  • Setting the interval between fault monitor probes

  • Setting the time-out for fault monitor probes

  • Defining the criteria for persistent faults

  • Specifying the failover behavior of a resource

For more information, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Oracle Solaris Cluster 4.3 Data Services Planning and Administration Guide .

Operation of the Fault Monitor for the Zone Boot Component

The fault monitor for the zone boot component ensures that the all requirements for the zone boot component to run are met:

  • The corresponding zsched process for the zone is running.

    If this process is not running, the fault monitor restarts the zone. If this fault persists, the fault monitor fails over the resource group that contains resource for the zone boot component.

  • Every logical hostname that is managed by a SUNW.LogicalHostname resource is operational.

    If the logical hostname is not operational, the fault monitor fails over the resource group that contains resource for the zone boot component.

  • The specified milestone for the solaris, the solaris10 or the solaris-kz zone brand type is either online or degraded.

    If the milestone is not online or degraded, the fault monitor restarts the zone. If this fault persists, the fault monitor fails over the resource group that contains resource for the zone boot component.

    To verify the state of the milestone, the fault monitor connects to the zone. If the fault monitor cannot connect to the zone, the fault monitor retries every five seconds for approximately 60% of the probe time-out. If the attempt to connect still fails, then the fault monitor restarts the resource for the zone boot component.


Caution

Caution  -  The Probe_timeout defaults to 30 seconds. If you configure multiple Solaris HA zones on the same cluster or in combination with additional workloads, ensure that 60% of the Probe_timeout is enough (even under high system load) to successfully run the probe. Increase the Probe_timeout if the default is too sensitive in your actual deployment.


Operation of the Fault Monitor for the Zone Script Component

The fault monitor for the zone script component runs the script that you specify for the component. The value that this script returns to the fault monitor determines the action that the fault monitor performs. For more information, see Figure 3, Table 3, Zone Script Resource Return Codes.

Operation of the Fault Monitor for the Zone SMF Component

The fault monitor for the zone SMF component verifies that the SMF service is not disabled. If the service is disabled, the fault monitor restarts the SMF service. If this fault persists, the fault monitor fails over the resource group that contains the resource for the zone SMF component.

If the service is not disabled, the fault monitor runs the SMF service probe that you can specify for the component. The value that this probe returns to the fault monitor determines the action that the fault monitor performs. For more information, see Figure 4, Table 4, Zone SMF Resource Return Codes.