The LSMS Automatic Software Recovery feature, available as a standard feature for LSMS Release 2.0 and later, detects failures in certain LSMS processes and attempts to restart the processes without the need for manual intervention by the customer. This feature is implemented by the sentryd utility.
Table 1 shows which processes are checked by sentryd and the error conditions for which they are checked.
The sentryd process uses either of the following methods to detect failures:
Verifying that the process has updated its timestamp in the supplemental database periodically
Using standard Linux commands to determine whether a process is running
For more information about specific methods used to detect failures, see the section shown in the last column of Table 5-1.
If the Surveillance feature is not enabled, sentryd still detects failures and attempts to restart processes, but important information concerning the state of the LSMS is neither displayed nor logged.
To obtain the full benefit of this feature, the Surveillance feature must be enabled. The Surveillance feature displays and logs (in /var/TKLC/lsms/logs/survlog.log) the following notifications regarding the following conditions:
Software failures
Successful recovery of the software
Unsuccessful recovery of the software
Also, whether or not the Surveillance feature is enabled, surveillance agents will restart the sentryd process if it exits abnormally.
Figure 5-1 shows how sentryd restarts processes in a hierarchical order.
This figure illustrates:
All recovery procedures start within 60 seconds of failure detection.