Sun Cluster Data Service for SAP Guide for Solaris OS

Sun Cluster HA for SAP Fault Probes for Central Instance

For the central instance, the fault probe executes the following steps.

  1. Retrieves the process IDs for the SAP Message Server and the dispatcher

  2. Loops infinitely (sleeps for Thorough_probe_interval)

  3. Checks the availability of the SAP resources

    1. Abnormal exit – If the Process Monitor Facility (PMF) detects that the SAP process tree has failed, the fault monitor treats this problem as a complete failure. The fault monitor restarts or fails over the SAP resource to another node based on the resources' failure history.

    2. Availability check of the SAP resources through probe – The probe uses the ps(1) command to check the SAP Message Server and main dispatcher processes. If any of the SAP Message Server or main dispatcher processes are missing from the system's active processes list, the fault monitor treats this problem as a complete failure.

      If you configure the parameter Check_ms_retry to have a value greater than zero, the probe checks the SAP Message Server connection. If you have set the extension property Lgtst_ms_with_logicalhostname to its default value TRUE, the probe completes the SAP Message Server connection test with the utility lgtst. The probe uses the logical hostname interface that is specified in the SAP resource group to call the SAP-supplied utility lgtst. If you set the extension property Lgtst_ms_with_logicalhostname to a value other than TRUE, the probe calls lgtst with the node's local hostname (loopback interface).

      If the lgtst utility call fails, the SAP Message Server connection is not functioning. In this situation, the fault monitor considers the problem to be a partial failure and does not trigger an SAP restart or a failover immediately. The fault monitor counts two partial failures as a complete failure if the following conditions occur.

      1. You configure the extension property Check_ms_retry to be 2.

      2. The fault monitor accumulates two partial failures that happen within the retry interval that the resource property Retry_interval sets.

      A complete failure triggers either a local restart or a failover, based on the resource's failure history.

    3. Database connection status through probe – The probe calls the SAP-supplied utility R3trans to check the status of the database connection. Sun Cluster HA for SAP fault probes verify that SAP can connect to the database. Sun Cluster HA for SAP depends, however, on the highly available database fault probes to determine database availability. If the database connection status check fails, the fault monitor logs the message, Database might be down, to /var/adm/messages. The fault monitor then sets the status of the SAP resource to DEGRADED. If the probe checks the status of the database again and the connection is reestablished, the fault monitor logs the message, Database is up, to /var/adm/messages and sets the status of the SAP resource to OK.

  4. Evaluates the failure history

    Based on the failure history, the fault monitor completes one of the following actions.

    • no action

    • local restart

    • failover