Tuning the HA for SAP NetWeaver Fault Monitors

Language:

Fault monitoring for the HA for SAP NetWeaver data service is provided by the following fault monitors:

The fault monitor for the SAP sapstartsrv process
The fault monitor for the SAP central services
The fault monitor for the SAP replicated enqueue server
The fault monitor for the SAP replicated enqueue preempter
The fault monitor for the SAP application server instance

Each fault monitor is contained in a resource whose resource type is shown in the following table.

Table 3 Resource Types for the Fault Monitors of HA for SAP NetWeaver

Component	Resource Type
SAP `sapstartsrv`	`ORCL.sapstartsrv`
SAP central services	`ORCL.sapcentr`
SAP application server instance	`ORCL.sapdia`
SAP replicated enqueue server	`ORCL.saprepenq`
SAP replicated enqueue preempter	`ORCL.saprepenq_preempt`

Standard properties and extension properties of the resource types control the behavior of the fault monitors. The default values of these properties determine the preset behavior of the fault monitors. The preset behavior should be suitable for most Oracle Solaris Cluster installations. Therefore, you should tune the fault monitors only if you need to modify this preset behavior.

Tuning these fault monitors involves the following tasks:

Setting the interval between fault monitor probes
Setting the timeout for fault monitor probes
Defining the criteria for persistent faults
Specifying the failover behavior of a resource

Perform these tasks when you register and configure HA for SAP NetWeaver, as described in Registering and Configuring HA for SAP NetWeaver.

In addition, perform the following task for tuning the SAP NetWeaver profiles and Oracle Solaris Cluster resources:

Ensure that the SAP enqueue server is not restarted upon process failures, by changing the Restart_Program_01 = local $(_EN) pf=$(_PF) line in the SAP central services profile to Start_Program_01 = local $(_EN) pf=$(_PF). For example:

#-----------------------------------------------------------------------
# Start SAP enqueue server
#-----------------------------------------------------------------------
_EN = en.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME)
Execute_03 = local rm -f $(_EN)
Execute_04 = local ln -s -f $(DIR_EXECUTABLE)/enserver$(FT_EXE) $(_EN)
Start_Program_01 = local $(_EN) pf=$(_PF)

For detailed information about these tasks, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Oracle Solaris Cluster 4.3 Data Services Planning and Administration Guide.

Operation of the Fault Monitor for the SAP `sapstartsrv` Resource Type

To determine whether the SAP sapstartsrv process is operating correctly, the fault monitor for the SAP sapstartsrv resource type probes these resources periodically.

The probe uses the sapcontrol command to check the health of the sapstartsrv process.

# su - sidadm -c "sapcontrol -nr instance_number -function GetProcessList"

sidadm: Specifies the SAP administrative user.
instance_number: Specifies the SAP instance number of the sapstartsrv process.

The return codes 0, 3, and 4 signal a healthy sapstartsrv process. Every other return code indicates a faulty sapstartsrv process.

If the probe command times out, that is if it reaches 80% of the probe_timeout property, the return code of the probe command is determined by the timeout_return property.

The number of tolerated consecutive timeouts within retry_interval seconds is obtained by dividing 100 over timeout_return. If this number is greater than the number you obtain by dividing retry_interval by thorough_probe_interval, then timeouts will be tolerated forever.

For more information, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Oracle Solaris Cluster 4.3 Data Services Planning and Administration Guide.

Operation of the Fault Monitor for the SAP Central Services Resource Type

To determine whether the SAP central services are operating correctly, the fault monitor for the SAP central services resource type probes these resources periodically.

The probe uses the sapcontrol command to check the health of the sapstartsrv process.

# su - sidadm -c "sapcontrol -nr instance_number -function GetProcessList"

sidadm: Specifies the SAP administrative user.
instance_number: Specifies the SAP instance number of the central service.

The return code indications are as follows:

1 signals an internal error.
3 signals that everything is running as expected.
4 indicates that everything is stopped and a failover is initiated.
For all other return codes, the output off the sapcontrol command is evaluated for the status of the critical processes.

Critical processes are the message server, enqueue server, and the gateway reader, if available. The status of the different processes leads to the following different actions:

GREEN: No action.
YELLOW: The probe command returns with the number specified in the Yellow property.
GRAY: The probe indicates a restart.
RED: The probe indicates a failover.

If the probe command times out, that is if it reaches 80% of the probe_timeout property, the return code of the probe command is determined by the Yellow property.

The number of tolerated consecutive YELLOW states or timeouts within retry_interval seconds is obtained by dividing 100 over timeout_return. If this number is greater than the number you obtain by dividing retry_interval by thorough_probe_interval, then timeouts and YELLOW states will be tolerated forever.

For more information, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Oracle Solaris Cluster 4.3 Data Services Planning and Administration Guide.

Operation of the Fault Monitor for the SAP Replicated Enqueue Server Resource Type

To determine whether the SAP central services are operating correctly, the fault monitor for the SAP replicated enqueue server resource type probes these resources periodically.

The probe uses the sapcontrol command to check the health of the sapstartsrv process.

# su - sidadm -c "sapcontrol -nr instance_number -function GetProcessList"

sidadm: Specifies the SAP administrative user.
instance_number: Specifies the SAP instance number of the SAP replicated enqueue server.

The return code indications are as follows:

1 signals an internal error.
3 signals that everything is running perfectly.
4 signals that everything is stopped and a failover is initiated.
For all other return codes, the output off the sapcontrol command is evaluated for the status of the critical processes.

The critical process is SAP replicated enqueue server. The status of the process leads to the following different actions:

GREEN: No action.
YELLOW: The probe command returns with the number specified in the Yellow property.
GRAY: The probe indicates a restart.
RED: The probe indicates a failover.

If the probe command times out, that is if it reaches 80% of the probe_timeout property, the return code of the probe command is determined by the Yellow property.

For more information, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Oracle Solaris Cluster 4.3 Data Services Planning and Administration Guide.

Operation of the Fault Monitor for the SAP Application Server Instance Resource Type

To determine whether the SAP application server instance is operating correctly, the fault monitor for the SAP application server instance resource type probes these resources periodically.

The probe uses the sapcontrol command to check the health of the application server instance process.

# su - sidadm -c "sapcontrol -nr instance_number -function GetProcessList"

sidadm: Specifies the SAP administrative user.
instance_number: Specifies the SAP instance number of the application server instance process.

The return code indications are as follows:

1 signals an internal error.
3 signals that everything is running perfectly.
4 signals that everything is stopped and a failover is initiated.
For all other return codes, the output off the sapcontrol command is evaluated for the status of the critical processes.

The critical processes depend on the deployment variation.

For a pure application server instance, the critical processes are as follows:

disp+work, jstart, or jcontrol

For a combined instance the critical processes are as follows:

disp+work, jstart, or jcontrol
Enqueue server
Message server

The status of a critical process leads to the following different actions:

GREEN: No action.
YELLOW: The probe command returns with the number specified in the Yellow property.
GRAY: The probe indicates a restart.
RED: The probe indicates a failover.

If the probe command times out, that is if it reaches 80% of the probe_timeout property, the return code of the probe command is determined by the Yellow property.

For more information, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Oracle Solaris Cluster 4.3 Data Services Planning and Administration Guide.

Operation of the Fault Monitor for the SAP Replicated Enqueue Preempter Resource Type

To determine whether the SAP replicated enqueue preempter is operating correctly, the fault monitor for the SAP replicated enqueue preempter service resource type probes these resources periodically.

The probe uses the sapcontrol command to check the health of the sapstartsrv process.

# su - sidadm -c "sapcontrol -nr instance_number -function GetProcessList"

sidadm: Specifies the SAP administrative user.
instance_number: Specifies the SAP instance number of the sapstartsrv process.

The return code of 0 indicates that everything is running as expected. All other return codes indicate errors.

If the SAP central services and the SAP replicated enqueue server are running on the same node, the probe evaluates if a giveover is possible. If yes, it will initiate the giveover to reinstate the redundancy of the lock table.

If the probe command times out, that is if it reaches 80% of the probe_timeout property, the return code of the probe command is determined by the Yellow property.

For more information, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Oracle Solaris Cluster 4.3 Data Services Planning and Administration Guide.

Oracle® Solaris Cluster Data Service for SAP NetWeaver Guide