Automatically Restarting Software Processes

The LSMS Automatic Software Recovery feature, available as a standard feature for LSMS Release 2.0 and later, detects failures in certain LSMS processes and attempts to restart the processes without the need for manual intervention by the customer. This feature is implemented by the sentryd utility.

Detecting Failure Conditions

Table 1 shows which processes are checked by sentryd and the error conditions for which they are checked.

The sentryd process uses either of the following methods to detect failures:

For more information about specific methods used to detect failures, see the section shown in the last column of Table 5-1.

Reporting Failures Through the Surveillance Feature

If the Surveillance feature is not enabled, sentryd still detects failures and attempts to restart processes, but important information concerning the state of the LSMS is neither displayed nor logged.

To obtain the full benefit of this feature, the Surveillance feature must be enabled. The Surveillance feature displays and logs (in /var/TKLC/lsms/logs/survlog.log) the following notifications regarding the following conditions:

Also, whether or not the Surveillance feature is enabled, surveillance agents will restart the sentryd process if it exits abnormally.

Automatically Restarting Processes Hierarchically

Figure 5-1 shows how sentryd restarts processes in a hierarchical order.

Order of Automatically Restarting Processes

This figure illustrates:

All recovery procedures start within 60 seconds of failure detection.