Oracle® Solaris Cluster Data Service for Oracle Solaris Zones Guide

Exit Print View

Updated: July 2014, E39657-01
 
 

Operation of the Fault Monitor for the Zone Boot Component

The fault monitor for the zone boot component ensures that the all requirements for the zone boot component to run are met:

  • The corresponding zsched process for the zone is running.

    If this process is not running, the fault monitor restarts the zone. If this fault persists, the fault monitor fails over the resource group that contains resource for the zone boot component.

  • Every logical hostname that is managed by a SUNW.LogicalHostname resource is operational.

    If the logical hostname is not operational, the fault monitor fails over the resource group that contains resource for the zone boot component.

  • The specified milestone for the solaris, the solaris10 or the solaris-kz zone brand type is either online or degraded.

    If the milestone is not online or degraded, the fault monitor restarts the zone. If this fault persists, the fault monitor fails over the resource group that contains resource for the zone boot component.

    To verify the state of the milestone, the fault monitor connects to the zone. If the fault monitor cannot connect to the zone, the fault monitor retries every five seconds for approximately 60% of the probe time-out. If the attempt to connect still fails, then the fault monitor restarts the resource for the zone boot component.


Caution

Caution  -  The Probe_timeout defaults to 30 seconds. If you configure multiple Solaris HA zones on the same cluster or in combination with additional workloads, ensure that 60% of the Probe_timeout is enough (even under high system load) to successfully run the probe. Increase the Probe_timeout if the default is too sensitive in your actual deployment.