Automatically Restarting Software Processes

The LSMS Automatic Software Recovery feature, available as a standard feature for LSMS Release 2.0 and later, detects failures in certain LSMS processes and attempts to restart the processes without the need for manual intervention by the customer. This feature is implemented by the sentryd utility.

Detecting Failure Conditions

Table 1 shows which processes are checked by sentryd and the error conditions for which they are checked.

Table 1. Processes Monitored by the Automatic Software Recovery Feature

Process

Unintentional Exit

Inability to Perform Defined Tasks

Failed to Initialize During Startup

See section:

EAGLE agents

X

X

X

Automatically Monitoring and Restarting EAGLE Agent Processes

Regional NPAC agents

X

X

X

Automatically Monitoring and Restarting NPAC Agent Processes

OSI

X

    Automatically Monitoring and Restarting OSI Process

Service Assurance

X

    Automatically Monitoring and Restarting the Service Assurance Process

Local Services Manager

X

X

X

Automatically Monitoring and Restarting Other Processes

Local Data Manager

X

X

X

Automatically Monitoring and Restarting Other Processes

Logger Server

X

 

X

Automatically Monitoring and Restarting Other Processes
LSMS SNMP Agent X   X Automatically Monitoring and Restarting Other Processes

Apache web server

X

 

X

Automatically Monitoring and Restarting Other Processes

RMTP Manager

X

 

X

Automatically Monitoring and Restarting the rmtpmgr Process

RMTP Agent

X

 

X

Automatically Monitoring and Restarting the rmtpagent Process

Report Manager

X

 

X

Automatically Monitoring and Restarting Other Processes

The sentryd process uses either of the following methods to detect failures:

For more information about specific methods used to detect failures, see the section shown in the last column of Table 5-1.

Reporting Failures Through the Surveillance Feature

If the Surveillance feature is not enabled, sentryd still detects failures and attempts to restart processes, but important information concerning the state of the LSMS is neither displayed nor logged.

To obtain the full benefit of this feature, the Surveillance feature must be enabled. The Surveillance feature displays and logs (in /var/TKLC/lsms/logs/survlog.log) the following notifications regarding the following conditions:

Also, whether or not the Surveillance feature is enabled, surveillance agents will restart the sentryd process if it exits abnormally.

Automatically Restarting Processes Hierarchically

Figure 5-1 shows how sentryd restarts processes in a hierarchical order.

Figure 1. Order of Automatically Restarting Processes

This figure illustrates:

All recovery procedures start within 60 seconds of failure detection.