5 Restarting Software Processes
This chapter describes how the LSMS automatically attempts to restart certain types of failures. It also describes how to manually verify and restart LSMS software components.
5.1 Introduction
This chapter describes how the LSMS automatically attempts to restart certain types of failures. It also describes how to manually verify and restart LSMS software components.
5.2 Automatically Restarting Software Processes
The LSMS Automatic Software Recovery feature, available as a standard feature for LSMS Release 2.0 and later, detects failures in certain LSMS processes and attempts to restart the processes without the need for manual intervention by the customer. This feature is implemented by the sentryd utility.
Detecting Failure Conditions
Table 5-1 shows which processes are checked by sentryd and the error conditions for which they are checked.
Table 5-1 Processes Monitored by the Automatic Software Recovery Feature
The sentryd process uses either of the following methods to detect failures:
-
Verifying that the process has updated its timestamp in the supplemental database periodically
-
Using standard Linux commands to determine whether a process is running
For more information about specific methods used to detect failures, see the section shown in the last column of Table 5-1.
Reporting Failures Through the Surveillance Feature
If the Surveillance feature is not enabled, sentryd still detects failures and attempts to restart processes, but important information concerning the state of the LSMS is neither displayed nor logged.
To obtain the full benefit of this feature, the Surveillance feature must be enabled. The Surveillance feature displays and logs (in /var/TKLC/lsms/logs/survlog.log) the following notifications regarding the following conditions:
-
Software failures
-
Successful recovery of the software
-
Unsuccessful recovery of the software
Also, whether or not the Surveillance feature is enabled, surveillance agents will restart the sentryd process if it exits abnormally.
Automatically Restarting Processes Hierarchically
Figure 5-1 shows how sentryd restarts processes in a hierarchical order.
Figure 5-1 Order of Automatically Restarting Processes

This figure illustrates:
- Which processes
sentrydmonitors. - When a failure is detected in a process,
sentrydattempts to restart the failed process and all processes shown below it. - The optional Service Assurance process is monitored for failure, but is not restarted by
sentryd. Also, ifsentrydrestarts the OSI process, it stops the Service Assurance process. (The Surveillance feature restarts the Service Assurance process whenever it detects that the Service Assurance process has stopped.)
All recovery procedures start within 60 seconds of failure detection.
5.2.1 Automatically Monitoring and Restarting EAGLE Agent Processes
The following sections describe the failure conditions for which sentryd monitors the EAGLE agent processes (eagleagent) and the steps performed in attempts to restart the process after failure has been detected.
Monitoring EAGLE Agent Processes
The sentryd process monitors each EAGLE agent process for the following conditions:
-
Failure to initialize during automatic system startup
-
Failure to initialize during manual startup using the
eaglecommand -
An abnormal exit during normal operation
-
Inability to perform its defined tasks, for example, because it is in an infinite loop
Restarting an EAGLE Agent Process
When one of conditions described in “Monitoring Eagle Agent Processes” has been detected, sentryd performs the following tasks:
-
Generates the following surveillance notification, which represents the Common Language Location Identified (CLLI) of the EAGLE:
LSMS6004|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - FAILD: eagleagent <CLLI> -
Attempts to stop and restart the
eagleagent. If theeagleagentrestarts,sentrydgenerates the following Surveillance notification:LSMS6005|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - RECOV: eagleagent <CLLI>
Continuing Attempts to Restart an EAGLE Agent Process
If the attempt to restart the eagleagent fails, sentryd attempts again.
If this attempt is also unsuccessful, the sentryd process generates the following Surveillance notification and continues to attempt to restart the eagleagent process.
LSMS6006|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - RFAILD: eagleagent <CLLI>
If this notification appears several times in a row, contact My Oracle Support.
5.2.2 Automatically Monitoring and Restarting NPAC Agent Processes
The following sections describe the failure conditions for which sentryd monitors the regional NPAC agent processes (npacagents) and the steps performed in attempts to restart an npacagent process after failure has been detected.
Monitoring NPAC Agent Processes
For each region, sentryd monitors its npacagent process for the following conditions:
-
Failure to initialize during automatic system startup
-
Failure to initialize during manual startup using the
lsmscommand -
An unintentional exit or crash during normal operation
-
Inability to perform its defined tasks, for example, because it is in an infinite loop
Restarting NPAC Agent Processes
When one of conditions described in “Monitoring NPAC Agent Processes” has been detected, sentryd performs the following tasks:
-
Generates the following surveillance notification:
LSMS6008|08:40 Sep 11, 1998|xxxxxxx| Notify:Sys Admin - FAILED: <NPAC_region> agentwhere
<NPAC_region>indicates the name of the region whosenpacagentprocess has failed. -
Attempts to stop and restart the failed
npacagent. If thenpacagentrestarts,sentrydgenerates the following Surveillance notification:LSMS6009|08:40 Sep 11, 1998|xxxxxxx| Notify:Sys Admin - RECOV: <NPAC_region> agent
Continuing Attempts to Restart NPAC Agent Processes
If the attempt to restart the npacagent fails, sentryd attempts again. If this attempt is also unsuccessful, the sentryd process generates the following Surveillance notification and continues to attempt to restart the npacagent process.
LSMS6010|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - RFAILED:
<region> agent
If this notification appears several times in a row, contact My Oracle Support.
5.2.3 Automatically Monitoring and Restarting OSI Process
The following sections describe the failure conditions for which sentryd monitors the OSI process and the steps performed in attempts to restart the processes after failure has been detected.
Monitoring the OSI Process
The sentryd process monitors the OSI process for the following conditions:
-
An unintentional exit or crash during normal operation
Restarting the OSI Process
When one of conditions described in “Monitoring the OSI Process” has been detected, sentryd performs the following tasks:
-
Generates the following surveillance notification:
LSMS8037|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - FAILD: OSI -
Stops all running
npacagentprocesses and the Service Assurance process, if it is running. -
Attempts to restart the OSI process and all
lsmsagentprocesses that were previously running. If all processes restart,sentrydgenerates the following Surveillance notifications, where <NPAC_region> is the name of the region served by thenpacagentprocess and <CLLI> is the name of the EAGLE agent:LSMS8038|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - RECOV: OSI LSMS6005|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - RECOV: eagleagent <CLLI> LSMS6009|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - RECOV: <NPAC_region> agent
Continuing Attempts to Restart the OSI Process
If the attempt to restart the OSI process fails, sentryd attempts again. After two failed attempts, sentryd generates the following Surveillance notification.
LSMS8039|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - RFAILD: OSI
If this notification appears, contact My Oracle Support.
5.2.4 Automatically Monitoring and Restarting the Service Assurance Process
The following sections describe the failure conditions for which sentryd monitors the optional Service Assurance process (sacw) and states that the Surveillance feature restarts sacw when it fails.
Monitoring the Service Assurance Process
The sentryd process monitors the optional Service Assurance process (sacw) so that it can be stopped if the OSI process need to be restarted. It is monitored for the following conditions:
-
An unintentional exit or crash during normal operation
-
Inability to perform its defined tasks, for example, because it is in an infinite loop
Restarting the Service Assurance Process
The sentryd does not attempt to restart the Service Assurance process when it fails. The Surveillance feature performs that function. For more information about the Service Assurance process, see “Understanding the Service Assurance Feature”.
5.2.5 Automatically Monitoring and Restarting the rmtpmgr Process
The following sections describe the failure conditions for which sentryd monitors the RMTP Manager process (rmtpmgr) and the steps performed in attempts to restart rmtpmgr after failure has been detected.
Monitoring the rmtpmgr Process
The sentryd process monitors rmtpmgr for the following conditions:
-
Failure to initialize during automatic system startup
-
An unintentional exit or crash during normal operation
-
Inability to perform its defined tasks, for example, because it is in an infinite loop
Restarting the rmtpmgr Process
When one of conditions described in “Monitoring the rmtpmgr Process” has been detected, sentryd performs the following tasks:
-
Generates the following surveillance notification:
LSMS4021|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - rmtpmgr failed -
Attempts to stop and restart the process. If the process restarts, no notification is posted. After the
sentrydprocess has restarted thermtpmgrprocess,sentrydthen attempts to restart the following processes that exited previously due to thermtpmgrfailure:-
NPAC agents (see Restarting NPAC Agent Processes)
-
EAGLE agents (see Restarting an EAGLE Agent Process)
-
Local Data Manager (see Restarting Other Processes)
-
Continuing Attempts to Restart the rmtpmgr Process
If the attempt to restart the rmtpmgr
process fails, sentryd attempts again. If the
attempt fails again, sentryd generates the LSMS4021 notification again. If this notification appears
several times in a row, contact My Oracle Support.
5.2.6 Automatically Monitoring and Restarting the rmtpagent Process
The following sections describe the failure conditions for which sentryd monitors the RMTP Agent process (rmtpagent) and the steps performed in attempts to restart rmtpagent after failure has been detected.
Monitoring the rmtpagent Process
The sentryd process monitors rmtpagent for the following conditions:
-
Failure to initialize during automatic system startup
-
An unintentional exit or crash during normal operation
-
Inability to perform its defined tasks, for example, because it is in an infinite loop
Restarting the rmtpagent Process
When one of conditions described in Monitoring the rmtpagent Process has been detected, sentryd performs the following tasks:
-
Generates the following surveillance notification:
LSMS4021|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - rmtpagent failed -
Attempts to stop and restart the process. If the process restarts, no notification is posted. After the
sentrydprocess has restarted thermtpagentprocess,sentrydthen attempts to restart the following processes that exited previously due to thermtpagentfailure:-
NPAC agents (see Restarting NPAC Agent Processes)
-
EAGLE agents (see Restarting an EAGLE Agent Process)
-
Local Data Manager (see Restarting Other Processes)
-
Continuing Attempts to Restart the rmtpagent Process
If the attempt to restart the rmtpagent
process fails, sentryd attempts again. If the
attempt fails again, sentryd generates the LSMS4021 notification again. If this notification
appears several times in a row, contact My Oracle Support.
5.2.7 Automatically Monitoring and Restarting Other Processes
The following sections describe the failure conditions for which sentryd monitors the following processes and the steps performed in attempts to restart a process after failure has been detected:
-
Local Services Manager (
lsman) - LSMS SNMP Agent (lsmsSNMPagent)
-
Local Data Manager (
supman) -
Report Manager (
reportman) -
Logger Server
-
Apache Web Server
Monitoring Other Processes
The sentryd process monitors each process for the following conditions:
-
Failure to initialize during automatic system startup
-
An unintentional exit or crash during normal operation
-
Inability to perform its defined tasks, for example, because it is in an infinite loop
Restarting Other Processes
When one of conditions described in Monitoring EAGLE Agent Processes has been detected, sentryd performs the following tasks:
-
Generates the following surveillance notification, where
<process_name>is the name of the process:LSMS4021|08:40 Sep 11, 1998|xxxxxxx|Notify:Sys Admin - <process_name> failed -
Attempts to stop and restart the process. If the process restarts, no notification is posted.
Continuing Attempts to Restart Other Processes
If the attempt to restart the process fails, sentryd attempts again. If the attempt fails again, sentryd generates the LSMS4021 notification again. If this notification appears several
times in a row, contact My Oracle Support.