Fault monitoring for the Sun Cluster HA for SAP Web Application Server data service is provided by the following fault monitors:
The fault monitor for the SAP enqueue server
The fault monitor for the SAP replica server
The fault monitor for the SAP message server
The fault monitor for the SAP web application server component
The fault monitor for the SAP J2EE engine
Each fault monitor is contained in a resource whose resource type is shown in the following table.
Table 3 Resource Types for the Fault Monitors of Sun Cluster HA for SAP Web Application Server
Component |
Resource Type |
---|---|
SAP enqueue server |
SUNW.sapenq |
SAP replica server |
SUNW.saprepl |
SAP message server |
SUNW.sapscs |
SAP web application server component |
SUNW.sapwebas |
SAP J2EE engine |
SUNW.sapwebas |
System properties and extension properties of the resource types control the behavior of the fault monitors. The default values of these properties determine the preset behavior of the fault monitors. The preset behavior should be suitable for most Sun Cluster installations. Therefore, you should tune the fault monitors only if you need to modify this preset behavior.
Tuning these fault monitors involves the following tasks:
Setting the interval between fault monitor probes
Setting the timeout for fault monitor probes
Defining the criteria for persistent faults
Specifying the failover behavior of a resource
Perform these tasks when you register and configure Sun Cluster HA for SAP Web Application Server, as described in Registering and Configuring Sun Cluster HA for SAP Web Application Server.
For detailed information about these tasks, see Tuning Fault Monitors for Sun Cluster Data Services in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.
To determine whether the SAP enqueue server and the SAP replica server are operating correctly, the fault monitor for the SAP enqueue server resource type probes these resources periodically.
The probe uses the SAP utility ensmon to check the health of the SAP enqueue server and the SAP replica server.
# ensmon -H localhost -S port option |
Specifies that the name of the host is localhost.
Specifies the enqueue port.
Specifies the resources that the probe should check. The possible values of this option are as follows:
1 – Check the SAP enqueue server only.
2 – Check both the SAP enqueue server and the SAP replica server.
If this command is run on the command line, a return code is returned on the command line.
During a probe, the fault monitor first determines whether both the SAP enqueue server and the SAP replica server are online by running the ensmon command with the option argument set to 2.
# ensmon -H localhost -S port 2 |
The result of this command determines the action of the probe, as follows:
If the command times out, the SAP enqueue server fault monitor checks whether only the SAP enqueue server is online by running the ensmon command with the option set to 1.
# ensmon -H localhost -S port 1 |
If this command times out, the SAP enqueue server issues a partial failure. If this timeout occurs one more time within the probe interval period, a failover occurs.
If this command succeeds, the SAP enqueue server fault monitor logs a warning message to explain that the SAP enqueue server is online but the status of the SAP replica server is unknown.
If this command causes a system error, the SAP enqueue server issues a less serious partial failure. If a system error occurs three more times within the probe interval period, a failover occurs.
For all other unsuccessful conditions, the SAP enqueue server triggers a failover.
If the command does not time out, the probe checks the value of the return code from the ensmon command, as follows:
A return code value of 0 indicates that the command is successful, and no further action is taken until the next probe.
A return code value of 4 indicates that the enqueue is running, and the replica is configured, but the replica is not running. The probe logs a warning message to indicate that the replica is not running.
A return code value of 8 indicates that the enqueue server is not running, and the probe triggers a failover.
A return code of 12 indicates an invalid parameter for the command, and the probe triggers a failover.
All other return codes are treated as a partial failure. If such a failure occurs three more times within the probe interval period, a failover occurs.
Note that the values for the number of timeouts and the probe interval period are assigned by the SAP enqueue server fault monitor. You cannot change these values.
Fault monitor responsibility for the SAP replica server resource type is currently handled by the Process Monitor Facility (PMF) in Sun Cluster.
To determine whether the SAP Message Server is operating correctly, the fault monitor for theSAP Message Server resource type probes these resources periodically.
The probe uses the SAP utility msprot to check the health of the SAP Message Server.
# msprot -mshost localhost -msserv port -r probe_timeout/2 |
Specifies that the name of the host is localhost.
Specifies the message server port.
Specifies the time within which the msprot command should be executed. This value should be set to the probe_timeout value of the resource.
If this command is run on the command line, a return code is returned on the command line.
During a probe, the fault monitor determines whether the SAP Message Server is online by running the msprot command.
# msprot -mshost localhost -msserv port -r probe_timeout/2 |
The result of this command determines the action of the probe, as follows:
If the command times out, the SAP Message Server issues a partial failure. If this time-out occurs one more time within the probe interval period, a failover occurs.
If the command does not time out, the probe checks the value of the return code from the msprot command, as follows:
A return code value of 0 indicates that the command is successful, and no further action is taken until the next probe.
A return code value of 7 indicates that the message server is not responding, and the probe triggers a failover.
All other return codes are treated as a partial failure. If such a failure occurs three more times within the probe interval period, a failover occurs.
Note that the values for the number of timeouts and the probe interval period are assigned by the SAP Message Server fault monitor. You cannot change these values.
To determine whether the SAP Web Application Server and the SAP J2EE Engine are operating correctly, the fault monitor for the SAP Web Application Server resource type probes these resources periodically.
The probe uses the SAP utility dpmon to check the health of the SAP Web Application Server and sends an XML/HTTP request to the SAP J2EE Engine.
# dpmon -info |
Specifies the dispatcher info that needs to be retrieved.
If this command is run on the command line, a return code is returned on the command line.
During a probe, the fault monitor determines whether both the SAP Web Application Serveris online by running the dpmon command with the -info option.
# dpmon -info |
The result of this command determines the action of the probe, as follows:
If the command times out, the SAP Web Application Server issues a partial failure. If this time-out occurs one more time within the probe interval period, a failover occurs.
If the command does not time out, the probe checks the value of the return code from the dpmon command, as follows:
A return code value of 0 indicates that the command is successful, and no further action is taken until the next probe.
All other return codes are treated as a partial failure. If such a failure occurs three more times within the probe interval period, a failover occurs.
The fault monitor probe for the SAP J2EE Engine instance is not configurable.
Note that the values for the number of timeouts and the probe interval period are assigned by the SAP Web Application Server fault monitor. You cannot change these values.