Automatic Monitoring of Events

B Automatic Monitoring of Events

This appendix contains overviews of monitored events, GUI and surveillance notifications, and traps.

Introduction

This appendix contains:

“Overview of Monitored Events”, which describes how the LSMS monitors itself for events and alarms and how it reports them.
“Overview of GUI Notifications”, which describes the display, format, and logging of notifications that appear on the graphical user interface.
“Overview of Surveillance Notifications”, which describes the display, format, and logging of Surveillance notifications.
“Overview of Traps”, which describes the transmission, format, and logging of SNMP traps.
A listing of all events, in numerical order, starting on page B-18. For each event, this appendix includes:
- Explanation of the probable cause for the event
- Suggested recovery
- Indication of whether the event results in a GUI notification, Surveillance notification, trap, or some combination of these.

Overview of Monitored Events

This section describes:

Types of Events and Alarms Reported

The LSMS monitors itself for the types of events and alarms shown in Table B-1 . When one of these events occurs, the LSMS does one or more of the following:

Displays a notification on the graphical user interface (GUI notification)
Posts a Surveillance notification at a certain frequency to the administration console by default, or to the second serial port if so configured
Sends a trap to a Network Management System (NMS) if you have installed the optional Remote Monitoring feature

Every GUI notification and Surveillance notification contains its associated event number. Traps contain a trap ID, which is explained in Overview of Traps.

Table B-1 Notification Event Number Categories

Event Number Range	Category	Description
0000–1999	EMS	Events that pertain to an Element Management System (EMS). The EMS is a process that runs on the Multi-Purpose Server (MPS) at a network element.
2000–3999	NPAC	Events that pertain to a Number Portability Administration Center (NPAC)
4000–5999	Platform and switchover (some of these events do not produce GUI notifications)	Events that pertain to system resources, such as disks, hardware, memory, central processing unit (CPU) utilization and to switchover functions
6000–7999	Main LSMS processes	Events that pertain to one of the following main LSMS processes: `lsman`, `supman`, `npacagent`, or `eagleagent`
8000–8999	Applications	Events that pertain to LSMS applications that are feature or application dependent, such as LNP Database Synchronization, Service Assurance, or NPA Split Administration

How Servers Report Alarms and Events

The LSMS 9.0 servers perform the following functions to monitor and report events:

The standby server:
- Monitors itself only for:
  - Platform events (see Platform Alarms)
  - Switchover-readiness events, such as those that describe database replication or critical network interfaces
- Controls the appropriate AlarmLED (Critical, Major, or Minor) on the front of the server by illuminating the LED when one or more platform alarm in that category exists and turning off the LED when no platform alarms in that category exist
- Sends any notification to its Serial Port 3 and logs the notification in its Surveillance log
- Sends the notification to the active server
The active server performs the following functions:
- Monitors itself for both platform events and application events
- Controls the appropriate AlarmLED (Critical, Major, or Minor) on the front of the server by illuminating the LED when one or more platform alarm in that category exists and turning off the LED when no platform alarms in that category exist
- Sends all platform events for itself, events reported from the standby server, and appropriate application events for itself to its Serial Port 3 and also logs the event as appropriate in its Surveillance log (some event notifications are reported repeatedly; for more information about which events are reported repeatedly, see the individual event descriptions)
  - Alarms that originate from the active server contain the alarm text with no hostname
  - Alarms that originate from the standby server contain the alarm text preceded by the standby server’s hostname
    
    Note:
    Although all events are reported through SNMP traps and all platform alarms are reported through Surveillance notifications, not all application alarms are reported both through the GUI and through Surveillance notifications; for more information about which alarms are reported in which way, see the individual event descriptions.
- Displays one time on the GUI each platform or application event for itself and each platform event received from the standby server:
  - Alarms that originate from the active server display the alarm text with no hostname
  - Alarms that originate from the standby server display the alarm text preceded by the standby server’s hostname
- Sends one Simple Network Management Protocol (SNMP) trap for each platform or application event for itself and for each platform event received from the standby server. Each trap contains the IP address of the server from which the notification originated.

Overview of GUI Notifications

Displaying GUI Notifications

GUI notifications are displayed on the GUI only if the GUI is active when the reported event occurs, but all GUI notifications are logged in an appropriate log as described in Logging GUI Notifications. Figure B-1 shows an example of notifications displayed on the GUI.

Figure B-1 GUI Notifications

img/c_overview_of_gui_notifications_mm-fig1-r13.jpg

Format of GUI Notifications

This section describes the general format used for most GUI notifications, as well as additional fields used for GUI event notifications (used to report information only) and for EMS GUI notifications. The formats are expressed as an ordered sequence of variables. Variables are expressed with the name of the variable enclosed by angle brackets; for example, <Severity> indicates a variable for the severity assigned to a GUI notification. Variables Used in GUI Notification Format Descriptionsshows the variables used in GUI notification formats.

General Format for GUI Notifications

The format for most GUI notifications is:


[<Severity>]:<Time Stamp> <Event Number> <Message Text String>

In addition, the following types of GUI notifications contain additional fields:

EMS GUI notifications contain information about the EMS for which they are reporting status (see Format for EMS GUI Notifications)
Notifications that have the severity EVENT can contain additional event data fields (see Format for GUI Notifications with EVENT Severity)

Format for EMS GUI Notifications

EMS GUI notifications (event numbers in the range 0000–1999) contain a <CLLI> value to indicate the Common Language Location Identifier for the network element where the EMS resides. The format for EMS GUI notifications is:


[<Severity>]:<Time Stamp> <Event Number> <CLLI>: <Message Text String>

Format for GUI Notifications with EVENT Severity

Notifications that have the severity EVENT can contain additional event data fields. The format for GUI notifications with severity EVENT is:


[EVENT]:<Time Stamp> <Event Number> <EventType>:<EventData1>, [<EventData2>],...

Variables Used in GUI Notification Format Descriptions

Table B-2 shows the possible values and meanings for each of the variables shown in format definitions for GUI notifications.

Table B-2 Variables Used in GUI Notifications

Field	Description
`<Severity>`	Indicates seriousness of event, using both text and color, as follows:
	Text	Color	Meaning
	`[Critical]`	Red	Reports a serious condition that requires immediate attention
	`[Major]`	Yellow	Reports a moderately serious condition that should be monitored, but does not require immediate attention
	[Minor]	Turquoise	Reports a condition of minor significance that should be monitored, but which does not require immediate attention.
	`[Cleared]`	Green	Reports status information or the clearing of a condition that caused previous posting of a `[Critical]` or `[Major]` GUI notification
	`[EVENT]`	White	For information only
`<Time Stamp>`	Indicates time that the event was detected, in format: YYYY-MM-DD hh:mm:ss where fields are as follows:
	Field	Meaning	Possible Values
	`YYYY`	Year	Any four digits
	`MM`	Month	01 through 12
	`DD`	Day	01 through 31
	`hh`	Hour	00 through 23
	`mm`	Minute	00 through 59
	`ss`	Second	00 through 59
`<Event Number>`	Four-digit number that identifies the specific GUI notification (also indicates the type of GUI notification, as shown in Table B-1 ).
`<Message Text String>`	Text string (which may contain one or more variables defined in Table B-3 ) that provides a small amount of information about the event. For more information about the event, look up the corresponding event number in this appendix; for each event number, this appendix shows the text string as it appears in a GUI notification, as well as a more detailed explanation and suggested recovery.
`<CLLI>`	Used in all EMS GUI notifications to indicate the Common Language Location Identifier for the network element where the EMS resides.
`<EventType>: <EventData1>, [<EventData2>],...`	Optional event data fields, as indicated by square brackets around the field, included in GUI notifications with severity `[EVENT]`. If no data is available for a given field, the field is empty. If other fields follow an empty field, the empty field is indicated by consecutive commas with no intervening data. One of the optional fields in an event notification is an effective timestamp field. This field indicates the time that the event actually occurred. When present, it uses the ASN.1 Generalized Time format.

Variables Used in Message Text String of GUI Notifications

Table B-3 shows the variables that can appear in the message text of a GUI notification.

Table B-3 Variables Used in Message Text of GUI Notifications

Symbol	Possible Values and Meanings	Number of Characters
`<PRIMARY\|SECONDARY>`	PRIMARY=Primary NPAC SECONDARY=Secondary NPAC	7 or 9
`<retry_interval>`	Time, in minutes, between retries of a request sent to an NPAC after it sent a failure response	1-10
`<retry_number>`	Number of times the LSMS will retry to recover from a failure response sent by NPAC	1-10
`<YYYYMMDDhhmmss>`	Year, month, day, hour, minute, second	14
`<NPAC_region_ID>`	CA = Canada MA = MidAtlantic MW = Midwest NE = Northeast SE = Southeast SW = Southwest WE = Western WC = WestCoast	2

Examples of GUI Notifications

Example of General Format GUI Notifications

Following is an example of a general GUI notification (for a description of its format, see General Format for GUI Notifications):


[Critical]:1998-07-05 11:49:56 2012 NPAC PRIMARY-NE Connection Attempt Failed:
Access Control Failure

Example of an EMS GUI Notification

Following is an example of an EMS GUI notification (for a description of its format, see Format for EMS GUI Notifications). In this example, <CLLI> has the value LNPBUICK:


[Critical]:1998-07-05 11:49:56 0003 LNPBUICK: Primary Association Failed

Example of GUI Notification with EVENT Severity Level

Following is an example of a GUI notification with severity [EVENT]. For a description of its format, see Format for GUI Notifications with EVENT Severity:


[EVENT]: 2000-02-05 11:49:56 8069 LNPBUICK: Audit LNP DB Synchronization Aborted

Logging GUI Notifications

When an event that generates a GUI notification occurs, that notification is logged in the file created for those events. Table B-4 shows the types of log files used for each of these file names, where <mmdd> indicates the month and day the event was logged.

Table B-4 Logs for GUI Notifications

Event Type	Log File
EMS Alarms, NPAC Alarms, and Main LSMS Process Alarms	`/var/TKLC/lsms/logs/alarm/LsmsAlarm.log.<mmdd>`
Non-alarm Events	`/var/TKLC/lsms/logs/<region>/LsmsEvent.log.<mmdd>`, where `<region>` indicates the region of the NPAC that generated the information

For information about the format of the logs and how to view the logs, refer to the Database Administrator's Guide.

Overview of Surveillance Notifications

Surveillance notifications are created by the Surveillance feature. These notifications can report status that is not available through the GUI notifications and report status that can be monitored without human intervention.

Displaying Surveillance Notifications

Surveillance notifications are sent to Serial Port 3 on each server.

Format of Surveillance Notifications

All Surveillance notifications reported on the same server where the event occurred have the following format:


<Event Number>|<Time Stamp>|<Message Text String>

Surveillance notifications that originated from the non-active server and are reported on the active server where the event occurred have an additional field that shows the hostname of the server where the event occurred, as shown in the following format:


<Event Number>|<Time Stamp>|<Host Name>|<Message Text String>

Variables Used in Surveillance Notification Format Descriptions

Table B-5 shows the possible values and meanings for each of the variables shown in format definition for Surveillance notifications.

Table B-5 Variables Used in Surveillance Notifications

Field	Description
`<Event Number>`	Four-digit number that identifies the specific Surveillance notification and also indicates the type of Surveillance notification, as shown in Table B-2 .
`<Time Stamp>`	Indicates time that the event was detected, in format: `hh:mm Mon DD, YYYY` where fields are as follows:
	Field	Meaning	Possible Values
	`hh`	Hour	00 through 23
	`mm`	Minute	00 through 59
	`Mon`	Month	First three letters of month’s name
	`DD`	Day	01 through 31
	`YYYY`	Year	Any four digits
`<Host Name>`	First seven letters of the name of the host (one of two redundant servers) that noted the event. (In addition, the documentation of the individual event includes information about whether the event is reported by the active server or inactive server, or both servers.)
`<Message Text String>`	Text string (which may contain one or more variables defined in Table B-6) that provides a small amount of information about the event. For more information about the event, look up the corresponding event number in this appendix; for each event number, this appendix shows the text string as it appears in a Surveillance notification, as well as a more detailed explanation and suggested recovery.

Variables Used in Message Text String of Surveillance Notifications

Table B-6 shows the variables that can appear in the message text of a Surveillance notification.

Table B-6 Variables Used in Message Text of Surveillance Notifications

Symbol	Possible Values and Meanings	Number of Characters
`<CLLI>`	Common Language Location Identifier for the network element	11
`<PRIMARY\|SECONDARY>`	`PRIMARY`=Primary NPAC `SECONDARY`=Secondary NPAC	7 or 9
`<NPAC_cust_ID>`	0000 = Midwest 0001 = MidAtlantic 0002 = Northeast 0003 = Southeast 0004 = Southwest 0005 = Western 0006 = WestCoast 0008 = Canada	4
`<NPAC_IP_Address>`	IP address of the NPAC	10
`<process_name>`	First 12 characters of process name	12
`<region>`	Midwest MidAtlantic Northeast Southeast Southwest Western WestCoast Canada	6 to 12
`<return_code>`	Return code	1 or 2
`<Service_Assurance_Manager_name>`	System name of machine that implements the Service Assurance Manager	12
`<volume_name>`	Name of disk volume, for example: `a01`	3
`<volume_name_of_disk_` `partition>`	Name of disk volume, for example: `a01`	3

Example of a Surveillance Notification

Following is an example of a Surveillance notification:


LSMS8088|14:58 Mar 10, 2000|lsmspri|Notify: sys Admin - Auto Xfer Failure

Logging Surveillance Notifications

In addition to displaying Surveillance notifications, the Surveillance feature logs all Surveillance notifications in the file survlog.log in the/var/TKLC/lsms/logs directory.

If the LSMS Surveillance feature becomes unable to properly report conditions, it logs the error information in a file, named lsmsSurv.log, in the /var/TKLC/lsms/logs directory on each server’s system disk. When the size of lsmsSurv.log exceeds 1MB, it is copied to a backup file, named lsmsSurv.log.bak,in the same directory. There is only one LSMS Surveillance feature backup log file, which limits the amount of log disk space to approximately 2MB.

Overview of Traps

The optional Remote Monitoring feature provides the capability for the LSMS to report certain events and alarms to a remote location, using the industry-standard Simple Network Management Protocol (SNMP). The LSMS implements an SNMP agent.

Customers can use this feature to cause the LSMS to report events and alarms to another location, which implements an SNMP Network Management System (NMS). An NMS is typically a standalone device, such as a workstation, which serves as an interface through which a human network manager can monitor and control the network. The NMS typically has a set of management applications (for example, data analysis and fault recovery applications).

For more information about the LSMS implementation of an SNMP agent, see “Understanding the SNMP Agent Process”.

SNMP Version 3 Trap PDU Format

An SNMPv3 trap PDU consists of the following fields:

PDU Type
Specifies the type of PDU (in this case, trap).
Request ID
Used to associate requests with responses.
Error Status
Specifies an error or error type in response PDUs only (else set to 0)
Error Index
Associates an error with a particular object instance in response PDUs only (else set to 0)
Variable Bindings
Each variable binding contains an object field followed by its value field. The object and value fields together specify information about the event being reported.

SNMP Version 1 Trap PDU Format

Following is an overview of the format of the SNMP version 1 trap request. For more information about SNMP message formats, refer to SNMP, SNMPv2, SNMPv3, and RMON 1 and 2, Third Edition, William Stallings, Addison Wesley, ISBN 0-201-48534-6, 1999.

Each SNMP message consists of the following fields:

SNMP authentication header, which consists of:
- Version identifier, used to ensure that both the sender and receiver of the message are using the same version of the SNMP protocol. Currently, the LSMS supports only version 1, which has a version identifier of 0 (zero).
- Community name, used to authenticate the NMS. The SNMP agent uses this field as a password to ensure that the sender of the message is allowed to access the SNMP agent’s information. The LSMS supports only trap requests, which originate at the LSMS; therefore, this field is not significant.
Protocol data unit (PDU), which for a trap request consists of:

An SNMPv1 trap PDU consists of the following fields:

PDU Type field, which specifies the type of PDU (in this case, trap).
Enterprise field, which identifies the device generating the message. For the LSMS SNMP agent, this field is 323.
Agent address field, which contains the IP address of the host that runs SNMP agent. For the LSMS SNMP agent, this field contains the IP address of the LSMS active server.
Generic trap type, which can be set to any value from 0 through 6. Currently, the LSMS supports only the value 6, which corresponds to the enterpriseSpecific type of trap request.
Specific trap type, which can be used to identify a specific trap.
Time stamp, which indicates how many hundredths of a second have elapsed since the last reinitialization of the host that runs the SNMP agent.
One or more variables bindings, each of which contains an object field followed by a value field. The object and value fields together specify information about the event being reported.

Logging SNMP Agent Actions

When the LSMS SNMP agent process starts, stops, or sends a trap request, it logs information about the action in a log file. The log file is named lsmsSNMP.log.<MMDD>, where <MMDD> represents the current month and day. The log file is stored in the directory /usr/TKLC/lsms/logs/snmp.

Table B-7 shows the actions and information logged by the LSMS SNMP agent.

Table B-7 Information Logged by the LSMS SNMP Agent

Action Information Logged

Action	Information Logged
The SNMP agent starts	Action, followed by day, date, time, and year; for example: `LSMS SNMP agent started: Thu Mar 09 09:02:53 2000`
The SNMP agent stops	Action, followed by day, date, time, and year; for example: `LSMS SNMP agent stopped: Thu Mar 09 15:34:50 2000`
The SNMP agent sends a trap request	The following fields, delimited by pipe characters: Timestamp, recorded as YYYYMMDDhhmmss (year, month, date, hour, minute, second) trap_ID, a unique numeric identifier that corresponds to the specific trap request sent. For each NMS configured (up to five allowed): The NMS’s IP address Status (either of the following): `S` to indicate that the LSMS SNMP agent succeeded in sending the trap request. (Even if the LSMS SNMP agent successfully sends the trap request, there is no guarantee that the NMS receives it.) `F` to indicate that the LSMS SNMP agent failed in sending the trap request. Following is a sample entry logged when a trap is sent (in this entry, a trap with a trap_ID of 3 is sent to two NMSs): `20000517093127\|3\|10.25.60.33\|S\|10.25.60.10\|S`

The SNMP agent starts

Action, followed by day, date, time, and year; for example:

LSMS SNMP agent started: Thu Mar 09 09:02:53 2000

The SNMP agent stops

Action, followed by day, date, time, and year; for example:

LSMS SNMP agent stopped: Thu Mar 09 15:34:50 2000

The SNMP agent sends a trap request

The following fields, delimited by pipe characters:

Timestamp, recorded as YYYYMMDDhhmmss (year, month, date, hour, minute, second)
trap_ID, a unique numeric identifier that corresponds to the specific trap request sent.
For each NMS configured (up to five allowed):
- The NMS’s IP address
- Status (either of the following):
  - S to indicate that the LSMS SNMP agent succeeded in sending the trap request. (Even if the LSMS SNMP agent successfully sends the trap request, there is no guarantee that the NMS receives it.)
  - F to indicate that the LSMS SNMP agent failed in sending the trap request.

Following is a sample entry logged when a trap is sent (in this entry, a trap with a trap_ID of 3 is sent to two NMSs):

20000517093127|3|10.25.60.33|S|10.25.60.10|S

Event Descriptions

0001

Explanation

The EMS Ethernet interface has a problem. The ping utility did not receive a response from the interface associated with the EMS.

Recovery

Consult with your network administrator.

Event Details

Table B-8 Event 0001 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - EMS interface failure
Source	Both servers
Frequency	Every 2.5 minutes as long as condition exists
Trap
Trap ID	16
Trap MIB Name	emsInterfaceFailure

0002

Explanation

The EMS, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text, requires a resynchronization with the LSMS that cannot be accomplished by automatic resynchronization between the LSMS and the EMS.

Recovery

Perform one of the synchronization procedures described in the LNP Database Synchronization User's Guide.

Event Details

Table B-9 Event 0002 Details

GUI Notification
Severity	Critical
Text	DB Maintenance Required
Surveillance Notification
Text	Notify:Sys Admin - NE CLLI=<CLLI>
Source	Active server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	33
Trap MIB Name	emsRequiresResynchWithLSMS

0003

Explanation

The LSMS has lost association with the primary EMS of the network element, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text; the association with the secondary EMS is established.

Recovery

Determine why the primary association failed (connectivity problem, EMS software problems, NE software problem, etc.). Correct the problem. Association will be automatically retried.

Event Details

Table B-10 Event 0003 Details

GUI Notification
Severity	Major
Text	Primary Association Failed
Surveillance Notification
Text	Notify:Sys Admin - NE CLLI=<CLLI>
Source	Active server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	5
Trap MIB Name	primaryEMSAssocLostSecEstablished

0004

Explanation

Recovery

Determine why the primary association failed (connectivity problem, EMS software problems, NE software problem, etc.). Correct the problem, and then reestablish the association with the primary EMS.

Event Details

Table B-11 Event 0004 Details

GUI Notification
Severity	Critical
Text	Primary Association Failed
Surveillance Notification
Text	Notify:Sys Admin - NE CLLI=<CLLI>
Source	Active server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	36
Trap MIB Name	primaryEMSAssocLostNoSec

0006

Explanation

The pending queue used to hold transactions to be sent to the EMS/NE, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text, is full. To help ensure that no updates are lost, the eagleagent will abort associations with both the primary EMS and secondary EMS. Updates will be queued in a resynchronization log until the EMS reassociates.

Recovery

Determine why the EMS/NE is not receiving LNP updates, and correct the problem.

Event Details

Table B-12 Event 0006 Details

GUI Notification
Severity	Critical
Text	All Association(s) Aborted: Pending Queue Full
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	97
Trap MIB Name	emsAssociationAbortedQueueFull

0007

Explanation

The network element, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text, is busy and is sending ’retry later’ in response to a message sent by the eagleagent. The eagleagent has already tried resending the same message the maximum number of times. The eagleagent has aborted associations with both the primary EMS and secondary EMS.

Recovery

Correct the problem at the network element. When the EMS reconnects with the LSMS, the LSMS will automatically resynchronize the network element’s LNP database.

Event Details

Table B-13 Event 0007 Details

GUI Notification
Severity	Critical
Text	All Association(s) Aborted: Retries Exhausted
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	98
Trap MIB Name	emsAssocAbortedMaxResend

0008

Explanation

The LSMS has lost association with the secondary EMS which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text. The association with the primary EMS is still up.

Recovery

Determine why the secondary association failed (connectivity problem, EMS software problems, NE software problem, etc.) and then reestablish the association with the secondary EMS.

Event Details

Table B-14 Event 0008 Details

GUI Notification
Severity	Major
Text	Secondary Association Failed
Surveillance Notification
Text	Notify:Sys Admin - NE CLLI=<CLLI>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	130
Trap MIB Name	secondaryEMSAssocLost

0009

Explanation

The LSMS has established the first association with the network element (NE) which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text. The first association established is called the primary association. This EMS is called the primary EMS.

Recovery

No action required; this notification is for information only.

Event Details

Table B-15 Event 0009 Details

GUI Notification
Severity	Cleared
Text	Primary Association Established
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	8
Trap MIB Name	primaryEMSAssocEstablished

0010

Explanation

The LSMS has established the second association with the network element (NE) which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text. The association is established only if a primary association already exists. This EMS is called the secondary EMS.

Recovery

No action required; this notification is for information only.

Event Details

Table B-16 Event 0010 Details

GUI Notification
Severity	Cleared
Text	Secondary Association Established
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	134
Trap MIB Name	secondaryEMSAssocEstablished

0011

Explanation

The primary association for the EMS/NE, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text, is either down or is inhibited, such that transactions sent to the primary EMS will not be received by the NE. Transactions are being sent to the secondary EMS instead of the primary EMS.

Recovery

Determine why the primary association failed (connectivity problem, EMS software problem, NE software problem, or other problem). Correct the problem. Association will be automatically retried. When the association is reestablished, it will be a secondary association, and the EMS will be the secondary EMS.

Event Details

Table B-17 Event 0011 Details

GUI Notification
Severity	Cleared
Text	Successful Switchover Occurred to Secondary EMS
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	139
Trap MIB Name	transactionToSecondary

2000

Explanation

The NPAC Ethernet interface has a problem. The ping utility did not receive a response from the interface associated with the NPAC.

Recovery

Consult with your network administrator.

Event Details

Table B-18 Event 2000 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - NPAC interface failure
Source	Both primary and secondary servers
Frequency	Every 2.5 minutes as long as condition exists
Trap
Trap ID	15
Trap MIB Name	npacInterfaceFailure

2001

Explanation

The association with the NPAC identified by <NPAC_region_ID> has been disconnected by the user.

Recovery

Examine additional GUI notifications to determine whether the LSMS is retrying the association. Follow the recovery actions described for the GUI notification.

Event Details

Table B-19 Event 2001 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Disconnected
Surveillance Notification
Text	Notify:Sys Admin - NPAC=<PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	37
Trap MIB Name	lostNPACAssoc

2002

Explanation

The LSMS is not able to confirm the physical connectivity with the NPAC, which is specified in the System field on the GUI or is indicated by <NPAC_region_ID> in the Surveillance notification.

Recovery

Check the physical connection between the LSMS and the NPAC. The problem may be in the network, a router, or both.

Event Details

Table B-20 Event 2002 Details

GUI Notification
Severity	Critical
Text	LSMS Physical Disconnect with NPAC
Surveillance Notification
Text	Notify:Sys Admin - NPAC=<NPAC_region_ID>
Source	Active server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	45
Trap MIB Name	failedNPACConnectivity

2003

Explanation

The NPAC (PRIMARY or SECONDARY, as indicated) identified by <NPAC_region_ID> rejected the association because it received a message from the LSMS that failed security checks. This can be due to one of the following:

The CMIP departure time is more than five minutes out of synchronization with the NPAC servers.
The security key is not valid.
The CMIP sequence number is out of sequence (messages must be returned to the NPAC in the same order in which they were received).

Recovery

Do the following:

Log in as lsmsadm to the active server.
Enter the following command to determine what the LSMS system time is:
$ date
Contact the NPAC administrator to determine what the NPAC time is. If the NPAC time is more than five minutes different from the LSMS time, reset the LSMS system time on both servers and on the administration console using one of the procedures described in “Managing the System Clock”.
After you have verified that the NPAC and LSMS times are within five minutes of each other, cause a different security key to be used by stopping and restarting the regional agent. Enter the following commands, where <region> is the name of the region in which this notification occurred:
$LSMS_DIR/lsms stop <region> $LSMS_DIR/lsms start <region>
Start the GUI again.
Attempt to reassociate with the NPAC. For information about associating with an NPAC, refer to the Configuration Guide.
If the problem persists, contact Oracle Technical Service.

Event Details

Table B-21 Event 2003 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Aborted by PEER: Access Control Failure
Surveillance Notification
Text	Notify:Sys Admin - NPAC=<PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	95
Trap MIB Name	npacRejectedAssocAccessCtrlFail

2004

Explanation

The primary or secondary NPAC, identified by <NPAC_region_ID>, rejected the association because it received data that was not valid.

Recovery

Contact the NPAC administrator.

Event Details

Table B-22 Event 2004 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Aborted by PEER: Invalid Data Received
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	96
Trap MIB Name	npacRejectedAssocInvalidData

2005

Explanation

The LSMS has lost association with the primary or secondary NPAC identified by <NPAC_region_ID> because the user aborted the association.

Recovery

Reassociate with the NPAC when the reason for aborting the association no longer exists. For information about associating with an NPAC, refer to the Configuration Guide.

Event Details

Table B-23 Event 2005 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>]-<NPAC_region_ID> Association Aborted by User
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	9
Trap MIB Name	npacAbortByUser

2006

Explanation

The LSMS did not receive an association response from the NPAC within the timeout period. The LSMS will attempt the association with the NPAC again after an interval that defaults to two minutes, but can be configured to a different value by Oracle.

Recovery

Determine whether there is a network connection problem and/or contact the NPAC administrator to determine whether the NPAC is up and running.

Event Details

Table B-24 Event 2006 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Bind Timed Out - Auto Retry After NPAC_RETRY_INTERVAL
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	As soon as condition occurs, and at two-minute intervals as long as condition exists
Trap
Trap ID	100
Trap MIB Name	assocRespNPACTimeout

2007

Explanation

The NPAC association attempt was rejected by the NPAC, and the LSMS was informed to attempt the NPAC association again to the same NPAC host after an interval that defaults to two minutes, but can be configured to a different value by Oracle.

Recovery

No action required; the LSMS will automatically try to associate again.

Event Details

Table B-25 Event 2007 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Aborted by PEER - Auto Retry Same Host After NPAC_RETRY_INTERVAL
Surveillance Notification
Text	Notify:Sys Admin - NPAC=< PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	101
Trap MIB Name	assocRejectedRetrySameHost

2008

Explanation

The NPAC association attempt was rejected by the NPAC, and the LSMS was informed to attempt the NPAC association again to the other NPAC host after an interval that defaults to two minutes, but can be configured to a different value by Oracle.

Recovery

No action required; the LSMS will automatically try to associate again.

Event Details

Table B-26 Event 2008 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>]-<NPAC_region_ID>- Connection Aborted by PEER - Auto Retry Other Host After NPAC_RETRY_INTERVAL
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	102
Trap MIB Name	assocRejectedRetryOtherHost

2009

Explanation

A problem exists in the network connectivity. The LSMS will attempt the association with the NPAC again after an interval that defaults to two minutes, but can be configured to a different value by Oracle.

Recovery

Check the network connectivity for errors. Verify the ability to ping the NPAC from the LSMS.

Event Details

Table B-27 Event 2009 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Aborted by PROVIDER - Auto Retry Same Host After NPAC_RETRY_INTERVAL
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	103
Trap MIB Name	nwtkProblemRetryNPACAssoc

2010

Explanation

The LSMS received three consecutive responses from the NPAC with a download status of failure from a recovery action request. The LSMS has aborted the association and will attempt to associate again after a retry interval that defaults to five minutes, but can be configured to a different value by Oracle. The LSMS will retry the recovery action after the association is reestablished.

Recovery

No action required; the LSMS will automatically try to associate again.

Event Details

Table B-28 Event 2010 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Aborted Due to Recovery Failure - Auto Retry After NPAC_RETRY_INTERVAL
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	104
Trap MIB Name	lsmsAbortedNPACassocDowRecFail

2011

Explanation

The LSMS has disconnected the association with the NPAC region in question due to the lack of a response to heartbeat messages from the LSMS to the NPAC.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-29 Event 2011 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Disconnected by Heartbeat
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	111
Trap MIB Name	lostNPACAssoc

2012

Explanation

The NPAC (primary or secondary, as indicated) identified by <NPAC_region_ID> rejected the association because of an access control failure. This can be due to one of the following:

The OSI Presentation Address is incorrect.
The Service Provider ID in the regional configuration file is incorrect.
The CMIP departure time is more than five minutes out of synchronization with the NPAC servers.
The security key is not valid.

Recovery

Do the following:

Verify that the correct PSEL, SSEL, TSEL, and NSAP values have been configured for the OSI Presentation Address (for more information, refer to “Viewing a Configured NPAC Component” in the Configuration Guide). If you need to change the values, use the procedure described in “Modifying an NPAC Component” in the Configuration Guide.
Verify that the configured Service Provider ID (SPID) is the same as the SPID assigned by the NPAC. For more information about this configuration file, refer to “Modifying LSMS Configuration Components” in the Configuration Guide.
Verify that the configured NPAC_SMS_NAME is the same as the value assigned by the NPAC (this field is case-sensitive). For more information about this configuration file, refer to “Modifying an NPAC Component” in the Configuration Guide.
Log in as lsmsadm to the active server.
Enter the following command to determine what the LSMS system time is:
$ date
Contact the NPAC administrator to determine what the NPAC time is. If the NPAC time is more than five minutes different from the LSMS time, reset the LSMS system time on both servers and on the administration console by performing one of the procedures described in “Managing the System Clock”.
After you have verified that the NPAC and LSMS times are within five minutes of each other, cause a different security key to be used by stopping and restarting the regional agent. Enter the following commands, where <region> is the name of the region in which this notification occurred:
$ $LSMS_DIR/lsms stop <region> $ $LSMS_DIR/lsms start <region>
Start the GUI again.
Attempt to reassociate with the NPAC.
If the problem persists, contact Oracle Technical Service.

Event Details

Table B-30 Event 2012 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Attempt Failed: Access Control Failure
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	106
Trap MIB Name	assocRejDueToAccessControl

2014

Explanation

The userInfo value in the cmipUserInfo portion of the NPAC association response CMIP message is not valid.

Recovery

Contact the NPAC administrator to determine why the NPAC is sending an invalid association response.

Event Details

Table B-31 Event 2014 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Attempt Failed: Invalid Data Received
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	108
Trap MIB Name	npacConnFailedCMIP

2015

Explanation

The NPAC association was terminated gracefully by the NPAC.

Recovery

According to the NANC specifications, this should never occur; if this message is seen, contact the NPAC administrator for the reason for the association unbind.

Event Details

Table B-32 Event 2015 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Disconnected by NPAC
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	109
Trap MIB Name	npacAssocGracefullyTerminated

2018

Explanation

The LSMS was unable to properly resynchronize (with the NPAC) the data that was lost while the LSMS was not associated with the NPAC.

Recovery

Do the following:

Abort the NPAC association (refer to the Configuration Guide).
Attempt to reassociate with the NPAC (refer to the Configuration Guide).
If the reassociation is not successful, contact the NPAC and contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-33 Event 2018 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Recovery Failed
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	112
Trap MIB Name	lsmsDataLostBadResynch

2019

Explanation

The LSMS data lost during the resynchronization time was not resynchronized properly with the NPAC.

Recovery

Do the following:

Abort the NPAC association (refer to the Configuration Guide).
Reestablish the NPAC association (refer to the Configuration Guide).
Determine whether notification automatic-monitoring-events1.html NPAC <PRIMARY|SECONDARY> Recovery Complete is posted. If instead notification 2019 reappears, perform a resynchronization for a period of time starting one hour before the 2019 notification first appeared, using either the GUI (refer to “Resynchronizing for a Defined Period of Time Using the GUI” in the Database Administrator's Guide).
If 2019 continues to appear, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-34 Event 2019 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Recovery Partial Failure
Surveillance Notification
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Recovery Failure
Source	Active server
Frequency	Once , as soon as condition occurs
Trap
Trap ID	113
Trap MIB Name	badNPACresynchTime

2020

Explanation

The LSMS aborted the NPAC association because the LSMS received a message from the NPAC that did not have the correct LSMS key signature.

Recovery

Verify that the correct keys are being used by both the NPAC and the LSMS.

Event Details

Table B-35 Event 2020 Details

GUI Notification
Severity	Critical
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Security Violation. Association Aborted. Retrying
Surveillance Notification
Text	Notify:Sys Admin - NPAC= <PRIMARY\|SECONDARY>-<NPAC_region_ID>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	114
Trap MIB Name	assocAbortedBadKeys

2021

Explanation

An associate retry timer was in effect. The retry attempt was canceled because a GUI user issued an Associate, Abort or Disconnect request. If an Associate request was issued, the association is attempted immediately.

Recovery

No action required; for information only.

Event Details

Table B-36 Event 2021 Details

GUI Notification
Severity	Major
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Automatic Association Retry Canceled
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	122
Trap MIB Name	npacAutoAssociationRetryCanceled

2022

Explanation

Either the LSMS did not receive any response from the NPAC before a timeout expired or the LSMS received a response from the NPAC with a download status of failure from a recovery action request. The NPAC is unable to process the recovery action due to a temporary resource limitation. The LSMS will retry the request for the number of times indicated by <retry_number> with the interval between each retry indicated by <retry_interval> minutes. If recovery is not successful after the indicated number of retries, the LSMS will abort the association and post the following notification:


[Critical]: <Timestamp>  2010
:  NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Aborted Due to Recovery Failure - Auto Retry After NPAC_RETRY_INTERVAL

Recovery

No action required; for information only.

Event Details

Table B-37 Event 2022 Details

GUI Notification
Severity	Major
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Fail/No Response from NPAC Recovery - Auto Retry <retry_number> Times in <retry_interval> Minutes
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	123
Trap MIB Name	npacRecoveryFailureResourceLimit

2023

Explanation

The NPAC association will be down for the specified period of time (from the first time field shown in the notification to the second time field shown in the notification) due to NPAC-scheduled down time.

Recovery

When the scheduled down time is over, manually reestablish the NPAC association. For information about aborting and reestablishing an association, refer to the Configuration Guide.

Event Details

Table B-38 Event 2023 Details

GUI Notification
Severity	Major
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] ScheduleDownTime from [<YYYYMMDDhhmmss>] to [<YYYYMMDDhhmmss>]
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	124
Trap MIB Name	npacAssocPeriodDown

2024

Explanation

An Associate request has been sent to the NPAC after a retry timer expired.

Recovery

No action required; for information only.

Event Details

Table B-39 Event 2024 Details

GUI Notification
Severity	Major
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Timer Expired - Resending Association Request
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	125
Trap MIB Name	npacAssocRequestSentAfterRetryTimer

2025

Explanation

The NPAC association was successfully established.

Recovery

No action required; for information only.

Event Details

Table B-40 Event 2025 Details

GUI Notification
Severity	Cleared
Text	NPAC [<PRIMARY\|SECONDARY>-<NPAC_region_ID>] Connection Successfully Established
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	7
Trap MIB Name	npacAssocEstablished

4000

Explanation

The active server has initiated an automatic switchover to the inactive server.

Recovery

No action required; for information only.

Event Details

Table B-41 Event 4000 Details

GUI Notification
Severity	Event
Text	Switchover Initiated
Surveillance Notification
Text	Notify:Sys Admin - Switchover initiated
Source	Active server
Frequency	Once, soon as condition occurs.
Trap
Trap ID	11
Trap MIB Name	switchOverStarted

4001

Explanation

LSMS service has been switched over.

Recovery

No action required; for information only.

Event Details

Table B-42 Event 4001 Details

GUI Notification
Severity	Event
Text	Switchover complete
Surveillance Notification
Text	Notify:Sys Admin - Switchover complete
Source	Active server
Frequency	Once, soon as condition occurs.
Trap
Trap ID	12
Trap MIB Name	switchOverCompleted

4002

Explanation

LSMS service could not be switched over to the inactive server; the inactive server was not able to start LSMS service.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-43 Event 4002 Details

GUI Notification
Severity	Event
Text	Switchover Failed
Surveillance Notification
Text	Notify:Sys Admin - Switchover Failed
Source	Active server
Frequency	Once, as soon as condition occurs.
Trap
Trap ID	13
Trap MIB Name	switchOverFailed

4003

Explanation

This notification indicates that the disk controller <controllerId> is out of service and is affecting shared storage. This notification is only valid on E3000 systems.

controllerId= The specific controller number (either 0 or 1).

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-44 Event 4003 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - Loss of disk on < controllerId>
Source	Either server
Frequency	Every 5 minutes as long as condition exists
Trap
Trap ID	14
Trap MIB Name	diskContrService

4004

Explanation

The Ethernet interface used to connect to the application network has a problem. This interface usually connects to network-connected workstations. The ping utility did not receive a response from the interface associated with the application network.

Recovery

Consult with your network administrator.

Event Details

Table B-45 Event 4004 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - APP interface failure
Source	Either server
Frequency	Every 2.5 minutes as long as condition exists
Trap
Trap ID	17
Trap MIB Name	appsInterfaceFailure

4005

Explanation

This notification indicates that the Ethernet interface used to connect to the ADMINISTRATION network has a problem.

Recovery

Consult with your network administrator.

Event Details

Table B-46 Event 4005 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - ADMIN interface faire
Source	Either server
Frequency	Every 2.5 minutes as long as condition exists
Trap
Trap ID	18
Trap MIB Name	adminInterfaceFailure

4006

Explanation

This notification indicates that the system disk has lost synchronization, possibly due to a hardware problem.

driveSpecId= disk drive specification.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-47 Event 4006 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - < driveSpecId >
Source	Either server
Frequency	Every 5 minutes as long as condition exists
Trap
Trap ID	20
Trap MIB Name	systemDiskSynch

4007

Explanation

Database replication has failed.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-48 Event 4007 Details

GUI Notification
Severity	Critical
Text	DB Repl Err - <dbReplErr>
Surveillance Notification
Text	Notify:Sys Admin - DB repl error
Source	Both servers
Frequency	Every minute as long as condition exists.
Trap
Trap ID	21
Trap MIB Name	dataReplError

4008

Explanation

The database replication process monitor has failed.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-49 Event 4008 Details

GUI Notification
Severity	Critical
Text	DB Proc Mon Err - <dbMonErr>
Surveillance Notification
Text	Notify:Sys Admin - DB monitor failure
Source	Active server
Frequency	Every five minutes as long as condition exists.
Trap
Trap ID	22
Trap MIB Name	dbMonitorFail

4009

Explanation

The server has an internal disk error.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-50 Event 4009 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - Internal Disk Error
Source	Either server
Frequency	Within five minutes of the condition occurring and at five-minute intervals as long as condition exists
Trap
Trap ID	23
Trap MIB Name	internalDiskError

4010

Explanation

This notification indicates that the hot-spare feature has completed automatic data resynchronization.

Recovery

No action required; this notification is for information only.

Event Details

Table B-51 Event 4010 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - < driveSpecId >- < driveSpecId >
Source	Either server
Frequency	Once
Trap
Trap ID	24
Trap MIB Name	hotSparedDataResynch

4011

Explanation

This notification indicates that LSMS database replication is delayed.

Recovery

No action required.

Event Details

Table B-52 Event 4011 Details

GUI Notification
Severity	N/A
Text	DB Repl Info
Surveillance Notification
Text	Notify:Sys Admin - DB repl info
Source	Either server
Frequency	Within five minutes of the condition occurring and every minute thereafter as long as condition exists.
Trap
Trap ID	25
Trap MIB Name	dataReplInfo

4012

Explanation

A process specified by <process_name> is utilizing 40 percent or more of the LSMS’s CPU resource and the <second_ID> indicates a specific instance of the process, as follows:

When the <process_name> is eagleagent, the <second_ID> specifies the Common Language Location Indicator (CLLI) of the network element
When the <process_name> is npacagent, the <second_ID> specifies the name of the region
When the <process_name> is not eagleagent or npacagent, the <second_ID> specifies the process ID (PID) of the process.

Recovery

Because this notification is posted every five minutes as long as the condition exists, you may choose to ignore this notification the first time that it appears. However, if this notification is repeated several times in a row, do one of the following:

If the <process_name> is not npacagent, go to step 4. Otherwise, determine whether the npacagent is still using 40% or more of the CPU resource by entering the following command, where <region> can be optionally specified (it is the name of the region as displayed at the end of the notification text):
$ ps -eo pid,pcpu,args | grep npacagent | grep <region>
If the npacagent is still using 40% or more of the CPU resource, enter the following commands to stop the npacagent and restart it, where <region> is the name of the NPAC region whose npacagent is using 40% or more of the CPU resource:
$ cd $LSMS_DIR
$ lsms stop <region>

$ lsms start <region>
Repeat step 1. If the npacagent you tried to stop is still using 40% or more of the CPU resource, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
If the <process_name> is not eagleagent, go to step 7. Otherwise, determine whether the eagleagent is still using 40% or more of the CPU resource by entering the following command, where <CLLI> can be optionally specified (it is the name of the network element as displayed at the end of the notification text):
$ ps -eo pid,pcpu,args | grep eagleagent | grep <CLLI>
If the eagleagent is still using 40% or more of the CPU resource, enter the following commands to stop the eagleagent and restart it, where <CLLI> is the Common Language Location Indicator (CLLI) of the network element whose eagleagent is using 40% or more of the CPU resource:
$ cd $LSMS_DIR
$ eagle stop <CLLI>

$ eagle start <region>
Repeat step 1. If the process you tried to stop is still using 40% or more of the CPU resource, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
If the <process_name> is not eagleagent or npacagent, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-53 Event 4012 Details

GUI Notification
Severity	Major
Text	Process [<process_name>-<second_ID>] Utilizing High Percentage of CPU
Surveillance Notification
Text	Notify:Sys Admin - [<process_name>-<second_ID>]
Source	Either server
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	26
Trap MIB Name	cpuUtilitzationOver39

4013

Explanation

The LSMS server with default hostname lsmspri has been inhibited.

Recovery

As soon as possible, start the server by performing the procedure described in “Starting a Server”.

Event Details

Table B-54 Event 4013 Details

GUI Notification
Severity	Major
Text	Primary Server Inhibited
Surveillance Notification
Text	Notify:Sys Admin - Primary inhibited
Source	Server with default hostname `lsmspri`
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	27
Trap MIB Name	primaryServerInhibited

4014

Explanation

The LSMS server with default hostname lsmssec has been inhibited.

Recovery

As soon as possible, start the server by performing the procedure described in “Starting a Server”.

Event Details

Table B-55 Event 4014 Details

GUI Notification
Severity	Major
Text	`Secondary Server Inhibited`
Surveillance Notification
Text	`Notify:Sys Admin - Secondary inhibited`
Source	Server with default hostname `lsmssec`
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	28
Trap MIB Name	secondaryServerInhibited

4015

Explanation

A heartbeat link is down.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-56 Event 4015 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - Heartbeat failure
Source	Both servers
Frequency	Once, as soon as condition occurs
Trap
Trap ID	29
Trap MIB Name	heartbeatLinkDown

4016

Explanation

This notification indicates that the Heartbeat 2 link is down.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-57 Event 4016 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - Heartbeat 2 failure
Source	Both server
Frequency	Once
Trap
Trap ID	30
Trap MIB Name	heartbeatLinkTwoDown

4017

Explanation

This notification indicates that the LSMS network configuration is incorrect.

Recovery

Customer or field engineers should:

Verify network configuration and network cabling
Verify serial configuration and cabling if serial keepalive is configured
If the problem persists, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66

Event Details

Table B-58 Event 4017 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - Network setup error
Source	Active server
Frequency	Every 5 minutes
Trap
Trap ID	31
Trap MIB Name	lsmsNtwkConfigError

4018

Explanation

This notification indicates that the LSMS network configuration is not supported or recommended.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-59 Event 4018 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - Network setup unsupp
Source	Active server
Frequency	Every 5 minutes
Trap
Trap ID	32
Trap MIB Name	lsmsNtwkConfigNotSupported

4019

Explanation

This notification indicates that the disk volume specified by diskVolName has exceeded the 95 percent usage threshold.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-60 Event 4019 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - < diskVolName >
Source	Either server
Frequency	Every 5 minutes
Trap
Trap ID	38
Trap MIB Name	diskVolume95Usage

4020

Explanation

The server’s swap space has exceeded the critical usage threshold (default = 95%).

Recovery

If the problem persists, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-61 Event 4020 Details

GUI Notification
Severity	Critical
Text	Swap space exceeds Critical
Surveillance Notification
Text	Notify:Sys Admin - Swap space Critical
Source	Either server
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	39
Trap MIB Name	swapSpaceCritical

4021

Explanation

The LSMS application or system daemon whose name has <process_name> as the first 12 characters is not running.

Recovery

No user action is necessary. The Surveillance process automatically restarts the Service Assurance process (sacw) and the sentryd process automatically restarts other processes.

Event Details

Table B-62 Event 4021 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - <process_name> failed
Source	Active server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	40
Trap MIB Name	lsmsAppsNotRunning

4022

Explanation

The backup of the LSMS database has completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-63 Event 4022 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	DATABASE backup complete
Source	Standby server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	41
Trap MIB Name	backupCompleted

4023

Explanation

The backup of the LSMS database has failed.

Recovery

Review backup output to determine why backup failed, correct the problems, and run backup script again manually.

Note:

Determine whether the NAS can be reached using the ping command. If the NAS cannot be reached, restart the NAS. To restart the NAS turn the power off, then turn the power on. If the NAS can be reached, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-64 Event 4023 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - DATABASE backup failed
Source	Standby server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	42
Trap MIB Name	backupFailed

4024

Explanation

The primary LSMS server (Server 1A) is not providing the LSMS service.

Recovery

No action required; for information only.

Event Details

Table B-65 Event 4024 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - Primary not online
Source	Both primary and secondary servers
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	63
Trap MIB Name	primaryServerNotOnline

4025

Explanation

The standby server is not prepared to take over LSMS service.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-66 Event 4025 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - Can't switch to standby
Source	Standby server
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	64
Trap MIB Name	standbyNotReadyForSwitchover

4026

Explanation

The secondary LSMS server (Server 1B) is currently providing the LSMS service.

Recovery

No action required; for information only.

Event Details

Table B-67 Event 4026 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - Secondary online
Source	Both primary and secondary servers
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	65
Trap MIB Name	secServerProvidingLSMSService

4027

Explanation

The standby LSMS server cannot determine the availability of the LSMS service on the active server.

Recovery

Determine if the other server is working normally. Also, verify that the heartbeat connections (eth2, eth3, and the serial cable) are connected and functioning properly

Event Details

Table B-68 Event 4027 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - Primary status unknown
Source	Standby server
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	66
Trap MIB Name	secServerCannotDeterminePrimAvailability

4028

Explanation

This notification indicates an LSMS mirroring inconsistency.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-69 Event 4028 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - < volume_name >
Source	Either server
Frequency	Every 5 minutes
Trap
Trap ID	169
Trap MIB Name	lsmsMirroringInconsistance

4029

Explanation

This notification indicates that the LSMS filesystem is not writeable.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-70 Event 4029 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - < fileSystem >
Source	Either server
Frequency	Every 5 minutes
Trap
Trap ID	170
Trap MIB Name	lsmsFilesystemNotWritable

4030

Explanation

The server’s swap space has exceeded the major usage threshold (default = 80%).

Recovery

If the problem persists, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-71 Event 4030 Details

GUI Notification
Severity	Major
Text	`Swap Space Warning`
Surveillance Notification
Text	`Notify:Sys Admin - Swap space warning`
Source	Both servers
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	190
Trap MIB Name	swapSpaceWarning

4031

Explanation

A database replication error that was reported earlier by the 4007 event has now been cleared.

Recovery

No action necessary.

Event Details

Table B-72 Event 4031 Details

GUI Notification
Severity	Cleared
Text	Database Replication cleared - <dbReplErr>
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	195
Trap MIB Name	dataReplClear

4032

Explanation

A database process monitor error that was reported earlier by the 4008 event has now been cleared.

Recovery

No action necessary.

Event Details

Table B-73 Event 4032 Details

GUI Notification
Severity	Cleared
Text	Database Replication cleared - <dbMonErr>
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	196
Trap MIB Name	dbMonitorCLear

4033

Explanation

The LSMS database failed count operation, which suggests a corrupt MySQL index.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-74 Event 4033 Details

GUI Notification
Severity	Critical
Text	Database Corrupt Index
Surveillance Notification
Text	None
Source	Both servers
Frequency	Every 30 minutes.
Trap
Trap ID	200
Trap MIB Name	dbCorruptIndex

4034

Explanation

This notification indicates that the Invalid Snapshot has been detected.

Recovery

Clean Up After Failed or Interrupted Snapshot

Event Details

Table B-75 Event 4034 Details

GUI Notification
Severity	Critical
Text	Invalid Snapshot - <snapName>
Surveillance Notification
Text	Notify:Sys Admin - Invalid Snapshot
Source	Active server
Frequency	Every 30 minutes
Trap
Trap ID	201
Trap MIB Name	snapInvalidErr

4035

Explanation

This notification indicates that the Invalid Snapshot error has been cleared.

Recovery

No action required; this notification is for information only.

Event Details

Table B-76 Event 4035 Details

GUI Notification
Severity	Cleared
Text	Invalid Snapshot cleared - <snapName>
Surveillance Notification
Text	Invalid Snapshot cleared - <snapName>
Source	Active server
Frequency	Every 30 minutes
Trap
Trap ID	202
Trap MIB Name	snapInvalidClear

4036

Explanation

This notification indicates that the Snapshot is greater than 80% full.

Recovery

No action required; this notification is for information only.

Event Details

Table B-77 Event 4036 Details

GUI Notification
Severity	Critical
Text	Full Snapshot - <snapName>
Surveillance Notification
Text	Notify:Sys Admin - Full Snapshot
Source	Active server
Frequency	Every 30 minutes
Trap
Trap ID	203
Trap MIB Name	fullSnapshot

4037

Explanation

This notification indicates that the Snapshot full error is cleared.

Recovery

No action required; this notification is for information only.

Event Details

Table B-78 Event 4037 Details

GUI Notification
Severity	Cleared
Text	Full Snapshot cleared - <snapName>
Surveillance Notification
Text	Full Snapshot cleared - <snapName>
Source	Active server
Frequency	Every 30 minutes
Trap
Trap ID	204
Trap MIB Name	fullSnapshotClear

4038

Explanation

The mate server is down.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-79 Event 4038 Details

GUI Notification
Severity	Critical
Text	Mate Server Down
Surveillance Notification
Text	Notify:Sys Admin - Mate Server Down
Source	Both servers
Frequency	Every minute as long as condition exists
Trap
Trap ID	205
Trap MIB Name	mateServerDown

4039

Explanation

The mate server is up.

Recovery

No action is required.

Event Details

Table B-80 Event 4039 Details

GUI Notification
Severity	Cleared
Text	Mate Server Up
Surveillance Notification
Text	Notify:Sys Admin - Mate Server Up
Source	Both servers
Frequency	As soon as condition clears
Trap
Trap ID	206
Trap MIB Name	mateServerUp

4100

Explanation

One or more platform alarms in the minor category exists. To determine which minor platform alarms are being reported, see “How to Decode Platform Alarms”. When the active server reports minor platform alarms that originated on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Note:

If you received Event 4100 in response to an snmpget error, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 to have the NAS snmp daemon stopped and restarted.

Event Details

Table B-81 Event 4100 Details

GUI Notification
Severity	Minor
Text	Minor Platform Alarm [hostname]: <alarm_string>
Surveillance Notification
Text	Notify:Sys Admin - ALM <alarm_string>
Source	Both servers
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	191
Trap MIB Name	minorPlatAlarmMask

4101

Explanation

All platform alarms in the minor category have been cleared. When the active server reports that all minor platform alarms have cleared on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

No action necessary.

Event Details

Table B-82 Event 4101 Details

GUI Notification
Severity	Cleared
Text	Minor Platform Alarms Cleared
Surveillance Notification
Text	Notify:Sys Admin - Minor Plat alrms clear
Source	Both servers
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	197
Trap MIB Name	minorPlatAlarmClear

4200

Explanation

One or more platform alarms in the major category exists. To determine which major platform alarms are being reported, see “How to Decode Platform Alarms”. When the active server reports major platform alarms that originated on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-83 Event 4200 Details

GUI Notification
Severity	Major
Text	Major Platform Alarm [hostname]: <alarm_string>
Surveillance Notification
Text	Notify:Sys Admin - ALM <alarm_string>
Source	Both servers
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	192
Trap MIB Name	majorPlatAlarmMask

4201

Explanation

All platform alarms in the major category have been cleared. When the active server reports that all major platform alarms have cleared on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

No action necessary.

Event Details

Table B-84 Event 4201 Details

GUI Notification
Severity	Cleared
Text	Major Platform Alarms Cleared
Surveillance Notification
Text	Notify:Sys Admin - Major Plat alrms clear
Source	Both servers
Frequency	Once
Trap
Trap ID	198
Trap MIB Name	majorPlatAlarmClear

4300

Explanation

One or more platform alarms in the critical category exists. To determine which critical platform alarms are being reported, see “How to Decode Platform Alarms”. When the active server reports critical platform alarms that originated on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-85 Event 4300 Details

GUI Notification
Severity	Critical
Text	Critical Platform Alarm [hostname]: <alarm_string>
Surveillance Notification
Text	Notify:Sys Admin - ALM <alarm_string>
Source	Both servers
Frequency	Once
Trap
Trap ID	193
Trap MIB Name	criticalPlatAlarmMask

4301

Explanation

Recovery

No action necessary.

Event Details

Table B-86 Event 4301 Details

GUI Notification
Severity	Cleared
Text	Critical Platform Alarms Cleared
Surveillance Notification
Text	Notify:Sys Admin - Crit Plat alrms clear
Source	Both servers
Frequency	Once
Trap
Trap ID	199
Trap MIB Name	criticalPlatAlarmClear

6000

Explanation

The eagleagent process has been started.

Recovery

No action required; for information only.

Event Details

Table B-87 Event 6000 Details

GUI Notification
Severity	Cleared
Text	Eagleagent <CLLI> Has Been Started
Surveillance Notification
Text	Notify:Sys Admin - <CLLI> started
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	1
Trap MIB Name	eagleAgentStarted

6001

Explanation

The eagleagent process has been stopped by the eagle script.

Recovery

No action required; for information only.

Event Details

Table B-88 Event 6001 Details

GUI Notification
Severity	Critical
Text	Eagleagent <CLLI> Has Been Stopped by User
Surveillance Notification
Text	Notify:Sys Admin - <CLLI> norm exit
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	2
Trap MIB Name	eagleAgentStoppedbyscript

6002

Explanation

The npacagent for the region indicated by <NPAC_region_ID> has been started.

Recovery

No action required; for information only.

Event Details

Table B-89 Event 6002 Details

GUI Notification
Severity	Cleared
Text	NPACagent Has Been Started
Surveillance Notification
Text	Notify:Sys Admin - <NPAC_region_ID> NPACagent started
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	3
Trap MIB Name	NPACAgentStarted

6003

Explanation

The npacagent for the region indicated by <region> has been stopped using the lsms command.

Recovery

No action required; for information only. If you desire to restart the agent, do the following:

Log in to the active server as lsmsadm.
Enter the following commands to start the npacagent where <region> is the name of the NPAC region:
$ cd $LSMS_DIR
$ lsms start <region>

Event Details

Table B-90 Event 6003 Details

GUI Notification
Severity	Critical
Text	NPACAgent Has Been Stopped by User
Surveillance Notification
Text	Notify:Sys Admin - <NPAC_region_ID> norm exit
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	4
Trap MIB Name	lsmsCommandStoppedNPACAgent

6004

Explanation

The eagleagent process for the network element identified by <CLLI> has failed. The sentryd process will attempt to restart.

Recovery

No action required; the sentryd process will attempt to restart the eagleagent process.

Event Details

Table B-91 Event 6004 Details

GUI Notification
Severity	Critical
Text	Eagleagent [<CLLI>] Has Failed
Surveillance Notification
Text	Notify:Sys Admin - FAILD: <CLLI>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	74
Trap MIB Name	lsmsEagleAgentFailed

6005

Explanation

The eagleagent process for the network element identified by <CLLI> has been successfully restarted by the sentryd process.

Recovery

No action required.

Event Details

Table B-92 Event 6005 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - RECOV: <CLLI>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	75
Trap MIB Name	lsmsEagleAgentRestarted

6006

Explanation

The sentryd process was unable to restart the eagleagent process for the network element identified by <CLLI>.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-93 Event 6006 Details

GUI Notification
Severity	Critical
Text	Failure Restarting Eagleagent [<CLLI>]
Surveillance Notification
Text	Notify:Sys Admin - RFAILD: <CLLI>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	76
Trap MIB Name	failureToRestartEagleAgent

6008

Explanation

The npacagent process for the region specified by <NPAC_region_ID> has failed. The sentryd process will attempt to restart.

Recovery

No action required; the sentryd process will attempt to restart the npacagent process.

Event Details

Table B-94 Event 6008 Details

GUI Notification
Severity	Critical
Text	NPACagent [<NPAC_region_ID>] Failure
Surveillance Notification
Text	Notify:Sys Admin - FAILD: <NPAC_region_ID> agent
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	78
Trap MIB Name	NPACagentForRegionFailure

6009

Explanation

The npacagent process for the region specified by <NPAC_region_ID> has been successfully restarted by the sentryd process.

Recovery

No action required. Any active LSMS GUI processes will automatically reconnect.

Event Details

Table B-95 Event 6009 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - RECOV: <NPAC_region_ID> agent
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	79
Trap MIB Name	NPACagentForRegionRestarted

6010

Explanation

The sentryd process was unable to restart the npacagent process for the region specified by <NPAC_region_ID>.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-96 Event 6010 Details

GUI Notification
Severity	Critical
Text	Failure Restarting NPACagent [<NPAC_region_ID>]
Surveillance Notification
Text	Notify:Sys Admin - RFAILD: <NPAC_region_ID> agent
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	80
Trap MIB Name	failureToRestartNPACagentRegion

6020

Explanation

The npacagent process has been stopped due to a fault in accessing the regional database.

Recovery

A database error has occurred. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-97 Event 6020 Details

GUI Notification
Severity	Critical
Text	NPACagent Has Been Shut Down - Database Access Error
Surveillance Notification
Text	Notify:Sys Admin - <NPAC_region_ID> DB error
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	189
Trap MIB Name	NPACagentStopRegDBaccessFault

8000

Explanation

The LSMS Surveillance feature is in operation.

Recovery

No action required; for information only.

Event Details

Table B-98 Event 8000 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	`Keep alive`
Source	Both primary and secondary servers
Frequency	Every five minutes as long as condition exists
Trap
Trap ID	19
Trap MIB Name	survFeatureOn

8001

Explanation

The network element resynchronization database contains more than 1 million entries.

Recovery

Each day, as part of a cron job, the LSMS trims the resynchronization database so that it contains 768,000 entries. The occurrence of this event means that more than 232,000 transactions have been received since the last cron job. If this event occurs early in the day, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-99 Event 8001 Details

GUI Notification
Severity	Major
Text	ResyncDB Contains 1 Mil Entries
Surveillance Notification
Text	Notify:Sys Admin - ResyncDB 1 Mil
Source	Active server
Frequency	Once
Trap
Trap ID	34
Trap MIB Name	resynchLogMidFull

8003

Explanation

The pending queue, used to hold the transactions to send to the network element (which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text), is over half full.

Recovery

No recovery is required. Informational only.

Event Details

Table B-100 Event 8003 Details

GUI Notification
Severity	Major
Text	EMS Pending Queue Is Half full
Surveillance Notification
Text	Notify:Sys Admin - CLLI=<CLLI>
Source	Active server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	43
Trap MIB Name	ensPendingQueueHalfFull

8004

Explanation

Recovery

No manual recovery required. The LSMS will automatically re-establish the association to the EMS and synchronization will take place.

Event Details

Table B-101 Event 8004 Details

GUI Notification
Severity	Critical
Text	EMS Pending Queue Is Full
Surveillance Notification
Text	Notify:Sys Admin - CLLI=<CLLI>
Source	Active server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
Trap
Trap ID	44
Trap MIB Name	emsPendingQueueMaxReached

8005

Explanation

There was a data error in a record that prevented the LSMS eagleagent from sending the record to the network element.

Recovery

Both the error and the ignored record are written to the file /var/TKLC/lsms/logs/trace/LsmsTrace.log.<mmdd>, where <mmdd> indicates the month and day the error occurred. Examine the log file for the month and day this error was reported to determine what the error was. Enter the data manually or send it again.

Event Details

Table B-102 Event 8005 Details

GUI Notification
Severity	Minor
Text	Eagleagent <CLLI> Ignoring Record: <DataError>
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	46
Trap MIB Name	eagleAgentIgnoredRecord

8024

Explanation

The Service Assurance agent has started successfully.

Recovery

No action required; for information only.

Event Details

Table B-103 Event 8024 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	67
Trap MIB Name	serviceAssuranceAgentStarted

8025

Explanation

Association with the Service Assurance Manager, identified by <Service_Assurance_Manager_Name>, has been established successfully.

Recovery

No action required; for information only.

Event Details

Table B-104 Event 8025 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - <Service_Assurance_Manager_Name>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	68
Trap MIB Name	establishServAssuranceMgrAssoc

8026

Explanation

Association with the Service Assurance Manager, identified by <Service_Assurance_Manager_Name>, has been stopped or disconnected.

Recovery

Contact the Service Assurance system administrator to determine the cause of disconnection, then have Service Assurance system administrator reassociate the Service Assurance Manager to the Service Assurance Agent.

Event Details

Table B-105 Event 8026 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - <Service_Assurance_Manager_Name>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	69
Trap MIB Name	servAssuranceMgrAssocBroken

8027

Explanation

The Service Assurance agent is not currently running.

Recovery

No action required; the Service Assurance agent should be restarted automatically.

Event Details

Table B-106 Event 8027 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	70
Trap MIB Name	servAssuranceAgentNotRunning

8030

Explanation

This notification indicates that the LSMS is not able to confirm physical connectivity with the DCM.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-107 Event 8030 Details

GUI Notification
Severity	Critical
Text	EBDA Physical Connection Lost
Surveillance Notification
Text	Notify:Sys Admin - NE=< NE CLLI > EBDA conn lost
Source	Active server
Frequency	Every 5 minutes
Trap
Trap ID	73
Trap MIB Name	noPhysicalConnectivityToDCM

8037

Explanation

The OSI process has failed. The sentryd process will attempt to restart.

Recovery

No action required; the sentryd process will attempt to restart the failed process.

Event Details

Table B-108 Event 8037 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - FAILD: OSI
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	88
Trap MIB Name	osiDaemonFailure

8038

Explanation

The OSI process has been successfully restarted by the sentryd process.

Recovery

No action required. The sentryd process will attempt to restart the npacagent processes for all active regions. Any active LSMS GUI processes will automatically reconnect.

Event Details

Table B-109 Event 8038 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - RECOV: OSI
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	89
Trap MIB Name	osiDaemonRestarted

8039

Explanation

The sentryd process was not able to restart the OSI process.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-110 Event 8039 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - RFAILD: OSI
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	90
Trap MIB Name	osiDaemonRestartFailure

8040

Explanation

The Surveillance feature has detected that the sentryd process is no longer running.

Recovery

No action required; the LSMS HA software will attempt to restart the sentryd process.

Event Details

Table B-111 Event 8040 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - FAILD: sentryd
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	91
Trap MIB Name	sentrydFailure

8041

Explanation

This notification indicates that the surveillance process has detected that the Legacy lddAgent process has restarted and all functionality has resumed.

Recovery

No action required; this notification is for information only.

Event Details

Table B-112 Event 8041 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - RECOV: lddAgent legacy
Source	Both servers
Frequency	Once, as soon as the condition occurs
Trap
Trap ID	92
Trap MIB Name	IddAgentRestarted

8042

Explanation

This notification indicates that the surveillance process has detected that the SCPMS lddAgent process has restarted and all functionality has resumed.

Recovery

No action required; this notification is for information only.

Event Details

Table B-113 Event 8042 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	Notify:Sys Admin - RECOV: lddAgent scpms
Source	Both servers
Frequency	Once, as soon as the condition occurs
Trap
Trap ID	93
Trap MIB Name	scpmsIddAgentRestarted

8044

Explanation

This notification indicates that the LDD SCPMS Confirmation of Arrival message retry attempts have been exhausted. The MQSeries interface is not operational or network connectivity to the remote system is lost.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-114 Event 8044 Details

GUI Notification
Severity	Critical
Text	LDD SCPMS COA Retry Attempts Exhausted
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	116
Trap MIB Name	scpmsMqSeriesFault

8045

Explanation

This notification indicates that the LDD SCPMS system has not provided a response within the time limit specified by the LDD_SCP_SYSTEM_RESPONSE_TIMEOUT configuration parameter. The SCPMS system is not active.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-115 Event 8045 Details

GUI Notification
Severity	Critical
Text	LDD SCPMS Response Retry Attempts Exhausted
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	117
Trap MIB Name	scpmsNotActive

8046

Explanation

This notification indicates that the LDD Legacy Confirmation of Arrival message retry attempts have been exhausted.

The MQSeries interface is not operational or network connectivity to the remote system is lost.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-116 Event 8046 Details

GUI Notification
Severity	Critical
Text	LDD SCPMS COA Retry Attempts Exhausted
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	118
Trap MIB Name	legacyMqSeriesFault

8047

Explanation

This notification indicates that the LDD Legacy system has not provided a response within the time limit specified by the LDD_SCP_SYSTEM_RESPONSE_TOMEOUT configuration parameter. The SCPMS system is not active.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-117 Event 8047 Details

GUI Notification
Severity	Critical
Text	LDD Legacy Response Retry Attempts Exhausted
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	119
Trap MIB Name	scpmsLegacyNotActive

8048

Explanation

This notification indicates that a connection could not be made to the MQSeries local queue manager. The local queue manager is not started or operational.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-118 Event 8048 Details

GUI Notification
Severity	Critical
Text	Unable to Connect to Queue Manager: < queueMgrName >
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	120
Trap MIB Name	mqSeriesQueueManagerNotActive

8049

Explanation

The EMS/NE has rejected the NPANXX GTT creation, deletion, or modification transaction, and the NPANXX value in the transaction could not be determined.

Recovery

Look in the transaction log file, /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD, and locate the NE’s response to the NPANXX GTT command to determine why the command failed. Re-enter the NPANXX GTT data correctly, which will cause the LSMS to try to command again.

Event Details

Table B-119 Event 8049 Details

GUI Notification
Severity	Major
Text	<CLLI>: NPANXX GTT <type_of_operation> Failed
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	126
Trap MIB Name	npanxxGTTValueNotFound

8050

Explanation

The EMS/NE has rejected the NPANXX GTT creation, deletion, or modification transaction for the specified NPANXX value.

Recovery

Event Details

Table B-120 Event 8050 Details

GUI Notification
Severity	Major
Text	<CLLI>: NPANXX GTT <type_of_operation> Failed for NPANXX <NPANXX_value>
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	127
Trap MIB Name	npanxxGTTValueRejected

8051

Explanation

The EMS/NE has rejected the Override GTT creation, deletion, or modification transaction, and the LRN value in the transaction could not be determined.

Recovery

Look in the transaction log file, /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD, and locate the NE’s response to the Override GTT command to determine why the command failed. Re-enter the Override GTT data correctly, which will cause the LSMS to try to command again.

Event Details

Table B-121 Event 8051 Details

GUI Notification
Severity	Major
Text	<CLLI>: Override GTT <type_of_operation> Failed
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	128
Trap MIB Name	overrideGTTValueNotFound

8052

Explanation

The EMS/NE has rejected the Override GTT creation, deletion, or modification transaction for the specified LRN value.

Recovery

Event Details

Table B-122 Event 8052 Details

GUI Notification
Severity	Major
Text	<CLLI>: Override GTT <type_of_operation> Failed for LRN <LRN_value>
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	129
Trap MIB Name	overrideGTTValueRejected

8053

Explanation

The LSMS was not able to complete the automatic synchronization with the EMS/NE. Possible reasons include:

The network failed temporarily but not long enough to cause the association with the EMS to fail.
The EMS/NE rejected the data because it is busy updating its databases.

Recovery

Verify the connection between the LSMS and the EMS; then reinitialize the MPS. If this notification appears again, perform one of the bulk download procedures in the LNP Database Synchronization User's Guide.

Event Details

Table B-123 Event 8053 Details

GUI Notification
Severity	Major
Text	Short Synchronization Failed
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	131
Trap MIB Name	unableToCompleteAutoResynch

8054

Explanation

The LSMS has started its automatic synchronization with the EMS/NE.

Recovery

No action required; for information only.

Event Details

Table B-124 Event 8054 Details

GUI Notification
Severity	Major
Text	Short Synchronization Started
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	132
Trap MIB Name	autoResynchNEStarted

8055

Explanation

The automatic resynchronization of databases after an outage between the LSMS and the NPAC has completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-125 Event 8055 Details

GUI Notification
Severity	Cleared
Text	Recovery Complete
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	133
Trap MIB Name	dbResynchCompleted

8059

Explanation

The LSMS has completed its automatic synchronization with the EMS/NE.

Recovery

No action required; for information only.

Event Details

Table B-126 Event 8059 Details

GUI Notification
Severity	Cleared
Text	Short Synchronization Complete
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	138
Trap MIB Name	emsShortSynchCompleted

8060

Explanation

The EMS pending queue used to hold the transactions to send to the EMS/NE identified by <CLLI> in the Survellance notification, has fallen sufficiently below the halfway full point.

Recovery

No action required; for information only.

Event Details

Table B-127 Event 8060 Details

GUI Notification
Severity	Cleared
Text	EMS Pending Queue Less Than Half Full
Surveillance Notification
Text	Notify:Sys Admin - CLLI=<CLLI>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	141
Trap MIB Name	pendingQueueHalfFull

8061

Explanation

The EMS pending queue used to hold the transactions to send to the EMS/NE identified by <CLLI> in the Survellance notification, has fallen sufficiently below the full point.

Recovery

No action required; for information only.

Event Details

Table B-128 Event 8061 Details

GUI Notification
Severity	Cleared
Text	EMS Pending Queue No Longer Full
Surveillance Notification
Text	Notify:Sys Admin - CLLI=<CLLI>
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	142
Trap MIB Name	pendingQueueNotFull

8062

Explanation

This notification indicates that physical connection has been restored with the DCM.

Recovery

No action required; for information only.

Event Details

Table B-129 Event 8062 Details

GUI Notification
Severity	Cleared
Text	EBDA Physical Connection Restored
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	143
Trap MIB Name	dcmConnectionRestored

8063

Explanation

This notification indicates that the connection to the MQSeries local queue manager has been established following an outage.

Recovery

No action required; for information only.

Event Details

Table B-130 Event 8063 Details

GUI Notification
Severity	Cleared
Text	Connected to Queue Manager: < queueMgrName >
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	144
Trap MIB Name	connToMqSeriesQueueMngrRest

8064

Explanation

The specified NPA-NXX is opened for portability starting at the value of the <EffectiveTimestamp> field.

Recovery

No action required; for information only.

Event Details

Table B-131 Event 8064 Details

GUI Notification
Severity	Event
Text	New NPA-NXX: SPID [<SPID>], NPANXX [<NPANXX>], TS [<EffectiveTimestamp>]
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	145
Trap MIB Name	npaNxxOpenedForPortabilityAtTS

8065

Explanation

The first telephone number in the specified NPA-NXX is ported starting at the value of the <EffectiveTimestamp> field.

Recovery

No action required; for information only.

Event Details

Table B-132 Event 8065 Details

GUI Notification
Severity	Event
Text	First use of NPA-NXX: SPID [<SPID>], NPANXX [<NPANXX>], TS [<EffectiveTimestamp>]
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	146
Trap MIB Name	npaNxxPortedAtTS

8066

Explanation

An audit of the network element identified by <CLLI> has begun.

Recovery

No action required; for information only.

Event Details

Table B-133 Event 8066 Details

GUI Notification
Severity	Cleared
Text	Audit LNP DB Synchronization Started
Surveillance Notification
Text	NE <CLLI> Audit started
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	147
Trap MIB Name	ebdaAuditActive

8067

Explanation

An audit of the network element identified by <CLLI> has completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-134 Event 8067 Details

GUI Notification
Severity	Cleared
Text	Audit LNP DB Synchronization Completed
Surveillance Notification
Text	NE <CLLI> Audit completed
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	148
Trap MIB Name	ebdaAuditSuccess

8068

Explanation

An audit of the network element identified by <CLLI> has failed.

Recovery

Inspect the log file /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD for details as to the cause of the error. After clearing the cause of the error, start the audit again.

Event Details

Table B-135 Event 8068 Details

GUI Notification
Severity	Critical
Text	Audit LNP DB Synchronization Failed
Surveillance Notification
Text	NE <CLLI> Audit failed
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	149
Trap MIB Name	ebdaAuditFailure

8069

Explanation

The user aborted an audit of the network element identified by <CLLI> before it had completed.

Recovery

No action required; for information only.

Event Details

Table B-136 Event 8069 Details

GUI Notification
Severity	Cleared
Text	Audit LNP DB Synchronization Aborted
Surveillance Notification
Text	NE <CLLI> Audit aborted
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	150
Trap MIB Name	ebdaAuditAbortedByUser

8070

Explanation

A reconcile has started at the completion of an audit.

Recovery

No action required; for information only.

Event Details

Table B-137 Event 8070 Details

GUI Notification
Severity	Cleared
Text	Reconcile LNP DB Synchronization Started
Surveillance Notification
Text	NE <CLLI> Reconcile started
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	151
Trap MIB Name	ebdaReconcileActive

8071

Explanation

A reconcile, which was performed at the end of an audit, has completed.

Recovery

No action required; for information only.

Event Details

Table B-138 Event 8071 Details

GUI Notification
Severity	Cleared
Text	Reconcile LNP DB Synchronization Complete
Surveillance Notification
Text	NE <CLLI> Reconcile completed
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	152
Trap MIB Name	ebdaReconcileSuccess

8072

Explanation

A reconcile, which was performed at the end of an audit, has failed before it completed.

Recovery

Inspect the log file /var/TKLC/lsms/logs/<CLLI>/LsmsAudit.log.MMDD for details as to the cause of the error. After clearing the cause of the error, start the reconcile again.

Event Details

Table B-139 Event 8072 Details

GUI Notification
Severity	Critical
Text	Reconcile LNP DB Synchronization Failed
Surveillance Notification
Text	NE <CLLI> Reconcile failed
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	153
Trap MIB Name	ebdaReconcileFailure

8073

Explanation

The user has stopped a reconcile before it completed.

Recovery

No action required; for information only.

Event Details

Table B-140 Event 8073 Details

GUI Notification
Severity	Cleared
Text	Reconcile LNP DB Synchronization Aborted
Surveillance Notification
Text	NE <CLLI> Reconcile aborted
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	154
Trap MIB Name	ebdaReconcileAbortedByUser

8078

Explanation

A bulk download is currently running.

Recovery

No action required; for information only.

Event Details

Table B-141 Event 8078 Details

GUI Notification
Severity	Cleared
Text	Bulk Load LNP DB Synchronization Started
Surveillance Notification
Text	NE <CLLI> Bulk load started
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	159
Trap MIB Name	ebdaBulkLoadActive

8079

Explanation

A bulk download has completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-142 Event 8079 Details

GUI Notification
Severity	Cleared
Text	Bulk Load LNP DB Synchronization Complete
Surveillance Notification
Text	NE <CLLI> Bulk load completed
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	160
Trap MIB Name	ebdaBulkLoadSuccess

8080

Explanation

A bulk download has failed before it completed.

Recovery

Inspect the log file /var/TKLC/lsms/logs/<CLLI>/LsmsBulkLoad.log.MMDD for details as to the cause of the error. After clearing the cause of the error, start the bulk download again.

Event Details

Table B-143 Event 8080 Details

GUI Notification
Severity	Critical
Text	Bulk Load LNP DB Synchronization Failed
Surveillance Notification
Text	NE <CLLI> Bulk load failed
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	161
Trap MIB Name	ebdaBulkLoadFailure

8081

Explanation

The user has stopped a bulk download before it completed.

Recovery

No action required; for information only.

Event Details

Table B-144 Event 8081 Details

GUI Notification
Severity	Cleared
Text	Bulk Load LNP DB Synchronization Aborted
Surveillance Notification
Text	NE <CLLI> Bulk load aborted
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	162
Trap MIB Name	ebdaBulkLoadAbortedByUser

8082

Explanation

A user-initiated resynchronization is currently running.

Recovery

No action required; for information only.

Event Details

Table B-145 Event 8082 Details

GUI Notification
Severity	Cleared
Text	Re-sync LNP DB Synchronization Started
Surveillance Notification
Text	NE <CLLI> Re-sync started
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	163
Trap MIB Name	ebdaResyncActive

8083

Explanation

A user-initiated resynchronization has completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-146 Event 8083 Details

GUI Notification
Severity	Cleared
Text	Re-sync LNP DB Synchronization Complete
Surveillance Notification
Text	NE <CLLI> Re-sync completed
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	164
Trap MIB Name	ebdaResyncSuccess

8084

Explanation

A user-initiated resynchronization has failed before it completed.

Recovery

Inspect the contents of the file /var/TKLC/lsms/logs/<CLLI>/LsmsResync.log.MMDD to determine the cause of the error. After clearing the cause of the error, start the user-initiated resynchronization again.

Event Details

Table B-147 Event 8084 Details

GUI Notification
Severity	Critical
Text	Re-sync LNP DB Synchronization Failed
Surveillance Notification
Text	NE <CLLI> Re-sync failed
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	165
Trap MIB Name	ebdaResyncFailure

8085

Explanation

The user has stopped a user-initiated resynchronization before it completed.

Recovery

No action required; for information only.

Event Details

Table B-148 Event 8085 Details

GUI Notification
Severity	Cleared
Text	Re-sync LNP DB Synchronization Aborted
Surveillance Notification
Text	NE <CLLI> Re-sync aborted
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	166
Trap MIB Name	ebdaResyncAbortedByUser

8086

Explanation

This notification indicates that the Sprint lddAgent has failed to communicate with the Sprint Legacy System.

Recovery

No action required; for information only.

Event Details

Table B-149 Event 8086 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	FAILED:IddAgent legacy
Source	Both servers
Frequency	Once, as soon as condition occurs
Trap
Trap ID	167
Trap MIB Name	sprintIddAgentCommFailureLegSys

8087

Explanation

This notification indicates that the Sprint lddAgent has failed to communicate with the Sprint SCPMS System.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-150 Event 8087 Details

GUI Notification
Severity	None
Text	None
Surveillance Notification
Text	FAILED:IddAgent scpms
Source	Both servers
Frequency	Once, as soon as condition occurs
Trap
Trap ID	168
Trap MIB Name	sprintIddAgentCommFailureScpmsSys

8088

Explanation

A scheduled file transfer has failed.

Recovery

Inspect the error log file/var/TKLC/lsms/logs/aft/aft.log.MMDD for details as to the cause of the error.

Event Details

Table B-151 Event 8088 Details

GUI Notification
Severity	Major
Text	Automatic File Transfer Failure - See Log for Details
Surveillance Notification
Text	Notify:Sys Admin- Auto xfer Failure
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	171
Trap MIB Name	automaticFileTransferFeatureFailure

8089

Explanation

An NPA-NXX split activation completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-152 Event 8089 Details

GUI Notification
Severity	Cleared
Text	Activate Split Successful OldNPA=<old_NPA> NewNPA=<new_NPA> NXX=<NXX>
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	10
Trap MIB Name	npaSplitActOk

8090

Explanation

An NPA-NXX split activation failed.

Recovery

Perform and audit and reconcile of NPA Split information at the network element.

Event Details

Table B-153 Event 8090 Details

GUI Notification
Severity	Critical
Text	Activate Split Failed OldNPA=<old_NPA> NewNPA=<new_NPA> NXX=<NXX>
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	172
Trap MIB Name	npaSplitActFailed

8091

Explanation

At least one active NPA-NXX split is past its end date and needs to be deleted.

Recovery

Do the following:

View all split objects (for information, refer to the Database Administrator's Guide) to determine which objects have end dates that have already passed.
Delete the objects whose end dates have passed (for information, refer to the Database Administrator's Guide).

Event Details

Table B-154 Event 8091 Details

GUI Notification
Severity	Major
Text	Active Splits Are Past Their End Dates
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	173
Trap MIB Name	activeSplitsPastEndDates

8092

Explanation

This notification indicates the LDD SCPMS agent is switching from primary to backup SCPMS system.

Recovery

No action required; this notification is for information only.

Event Details

Table B-155 Event 8092 Details

GUI Notification
Severity	Critical
Text	LDD SCPMS Agent Switching from Primary to Backup SCPMS System
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	174
Trap MIB Name	lddScpmsAgentSwitchToBackupScpms

8093

Explanation

This notification indicates the LDD SCPMS agent is switching from backup to primary SCPMS system.

Recovery

No action required; this notification is for information only.

Event Details

Table B-156 Event 8093 Details

GUI Notification
Severity	Critical
Text	LDD SCPMS Agent Switching from Backup to Primary SCPMS System
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	175
Trap MIB Name	lddScpmsAgentSwitchFromBackupToPrim

8094

Explanation

This notification indicates the LDD SCPMS current system is primary SCPMS.

Recovery

No action required; this notification is for information only.

Event Details

Table B-157 Event 8094 Details

GUI Notification
Severity	Cleared
Text	LDD SCPMS Current System is Primary SCPMS
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	176
Trap MIB Name	lddScpmsPrimary

8095

Explanation

This notification indicates the LDD SCPMS current system is backup SCPMS.

Recovery

No action required; this notification is for information only.

Event Details

Table B-158 Event 8095 Details

GUI Notification
Severity	Cleared
Text	LDD SCPMS Current System is Backup SCPMS
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	177
Trap MIB Name	lddScpmsBackup

8096

Explanation

The EMS/NE has rejected the NPANXX Split operation indicated by <operation>, and the NPANXX value in the transaction could not be determined.

Recovery

Look in the transaction log file, /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD, and locate the NE’s response to the NPANXX Split command to determine why the command failed. Delete and re-enter the NPANXX Split data correctly, which will cause the LSMS to try to command again.

Event Details

Table B-159 Event 8096 Details

GUI Notification
Severity	Major
Text	<CLLI>: NPANXX Split <operation> Failed
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	178
Trap MIB Name	EmsNeRejNpaNxxSplitNotDetermined

8097

Explanation

The EMS/NE has rejected the NPANXX Split operation indicated by <operation> for the indicated NPANXX value.

Recovery

Event Details

Table B-160 Event 8097 Details

GUI Notification
Severity	Major
Text	<CLLI>: NPANXX Split <operation> Failed for New NPANXX <NPANXX>
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	179
Trap MIB Name	EmsNeRejectedNpaNxxSplit

8098

Explanation

The LSMS is not able to confirm the physical connectivity with the directly connected query server identified by <hostname>. The problem may be one of the following:

Physical connectivity issues between the LSMS and directly connected Query Server.
The query server host name is not associated with the appropriate Internet Protocol (IP) address in /etc/hosts file.
The Internet Protocol (IP) address specified for the special replication user for the for the query server is incorrect.
The proper TCP/IP ports are not open in the firewall(s) between the LSMS and the query servers.

Recovery

Check the physical connectivity of the LSMS to the query server.
Check that the query server hosts name is associated with corresponding Internet Protocol (IP) addresses in /etc/hosts file.
Verify that the IP address for the query server is correct. Display the IP address of all configured query servers by using the $LSMS_TOOLS_DIR/lsmsdb -c queryservers command.
Verify that the firewall TCP/IP port configuration is set correctly for both the LSMS and query servers directly connected to the LSMS (refer to Appendix A, “Configuring the Query Server,” of the Configuration Guide for information about port configuration for firewall protocol filtering).

Event Details

Table B-161 Event 8098 Details

GUI Notification
Severity	Major
Text	`Query Server <hostname> Physical Connection Lost`
Surveillance Notification
Text	`Query Server=<hostname> Physical Conn Lost`
Source	Active Server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
SNMP Trap
Trap ID	180
Trap MIB Name	physicalConnectivityWithQueryServerLost

8099

Explanation

The query server identified by <hostname> does not have a replication connection established with the LSMS. The problem may be one of the following:

Query server cannot establish a connection with the master.
Query server not properly configured to connect to the master.
A query that succeeded on the master failed on the query server.
The binary log(s) that are needed by the query server to resynchronize itself to its master no longer exist.
Data on the query server does not agree with what is on the master when the binary log was started.
Replication was stopped at the query server by a user.

Recovery

At the query server, perform the following substeps:
1. Start the MySQL command line utility on the slave server:
  # cd /opt/mysql/mysql/bin
  # mysql -u root -p
```
Enter password:
```
  <Query Server/s MySql root user password>
2. Determine whether the query server is running by entering the following command and looking at the Slave_IO_Running and Slave_SQL_Running column values.
  mysql> SHOW SLAVE STATUS \G;
  - If the Slave_IO_Running and Slave_SQL_Running column values show that the slave is not running, verify the query server's /usr/mysql1/my.cnf option file (refer to “MySQL Replication Configuration for Query Servers,” in Appendix A, “Configuring the Query Server,” of the Configuration Guide) and check the error log (/usr/mysql1/<hostname>.err) for messages.
  - If the Slave_IO_Running and Slave_SQL_Running column values show that the slave (query server) is running, enter the following command to verify whether the slave established a connection with the master (LSMS or another query server acting as a master/slave).
    
    mysql> SHOW PROCESSLIST;
    
    Find the thread with the system user value in the User column and none in the Host column, and check the State column. If the State column says “connecting to master,” verify that the master hostname is correct, that the DNS is properly set up, whether the master is actually running, and whether it is reachable from the slave (refer to Appendix A, “Configuring the Query Server,” of the Configuration Guide for information about port configuration for firewall protocol filtering if the master and slave are connecting through a firewall).
  - If the slave was running, but then stopped, enter the following command:
    
    mysql> SHOW SLAVE STATUS;
    
    Look at the output. This error can happen when some query that succeeded on the master fails on the slave, but this situation should never happen while the replication is active if you have taken a proper snapshot of the master and never modify the data on the slave outside of the slave thread.
However, if this is not the case, or if the failed items are not needed and there are only a few of them, try the following:
1. First see if there is some stray record in the way on the query server. Understand how it got there, then delete it from the query server database and run start slave.
2. If the above does not work or does not apply, try to understand if it would be safe to make the update manually (if needed) and then ignore the next query from the LSMS.
3. If you have decided you can skip the next query, enter one of the following command sequences:
  - To skip a query that uses AUTO_INCREMENT or LAST_INSERT_ID(), enter:
    mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER=2;
    mysql> start slave;
    
    Queries that use AUTO_INCREMENT or LAST_INSERT_ID() take two events in the binary log of the master.
  - Otherwise, enter:
    mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
    mysql> start slave;
If you are sure the query server database started out perfectly in sync with the LSMS database, and no one has updated the tables involved outside of the slave thread, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 so you will not have to do the above steps again.
If all else fails, read the error log, /usr/mysql/<hostname>.err. If the log is big, run the following command on the slave:

grep -i slave /usr/mysql1/<hostname>.err

(There is no generic pattern to search for on the master, as the only errors it logs are general system errors. If it can, the master will send the error to the slave when things go wrong.)
- If the error log on the slave conveys that it could not find a binary log file, this indicates that the binary log files on the master have been removed (purged). Binary logs are periodically purged from the master to prevent them from growing unbounded and consuming large amounts of disk resources. However, if a query server was not replicating and one of the binary log files it wants to read is purged, it will be unable to replicate once it comes up. If this occurs, the query server is required to be reset with another snapshot of data from the master or another query server (see “Reload a Query Server Database from the LSMS” and “Reload a Query Server Database from Another Query Server”).
- When you have determined that there is no user error involved, and replication still either does not work at all or is unstable, please contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-162 Event 8099 Details

GUI Notification
Severity	Major
Text	Query Server <hostname> Replication Connection Lost
Surveillance Notification
Text	Query Server=<hostname> Replication Conn Lost
Source	Active Server
Frequency	As soon as condition occurs, and at five-minute intervals as long as condition exists
SNMP Trap
Trap ID	181
Trap MIB Name	queryServerConnectionWithLsmsLost

8100

Explanation

The SV/NPB storage database has exceeded the configured percent usage threshhold.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-163 Event 8100 Details

GUI Notification
Severity	Event
Text	SV/NPB Storage Exceeds <%> percent
Surveillance Notification
Text	Notify:Sys Admin - SV/NPB threshold %
Source	Both servers
Frequency	Every 5 minutes after condition occurs
Trap
Trap ID	194
Trap MIB Name	svNpbPercentUsage

8101

Explanation

This event indicates that the SV/NPB storage database usage is below the configured percent usage threshold.

Recovery

No action is required

Event Details

Table B-164 Event 8101 Details

GUI Notification
Severity	Cleared
Text	SV/NPB storage falls below <%> percent
Surveillance Notification
Text	Notify: Sys Admin - SV/NPB cleared
Source	Both servers
Frequency	As soon as condition clears
Trap
Trap ID	207
Trap MIB Name	svNpbBelowLimit

8102

Explanation

The event number present in the untilClear filter list is cleared. The event number is removed from the untilClear filter list.

Recovery

No action is required.

Event Details

Table B-165 Event 8102 Details

GUI Notification
Severity	Event
Text	<Event number> in the untilClear filter list, event clear received at <%s>
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	None
Trap MIB Name

8103

Explanation

The alarm filter counter has reached its limit; the counter will start again from one.

Recovery

No action is required.

Event Details

Table B-166 Event 8103 Details

GUI Notification
Severity	Event
Text	Counter associated with event <event number> exceeds limit <%s>. Resetting counter.
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	None
Trap MIB Name

8104

Explanation

The event number present in the untilTimeout filter list is cleared. The event number is removed from the untilTimeout filter list.

Recovery

No action is required.

Event Details

Table B-167 Event 8104 Details

GUI Notification
Severity	Event
Text	<Event number> in the untilTimeout filter list, event timeout at <%s>
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	None
Trap MIB Name

8105

Explanation

The log capture started by the user has failed.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-168 Event 8105 Details

GUI Notification
Severity	Minor
Text	Logs Capture Failed
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	None
Trap MIB Name

8106

Explanation

The MySQL Port has been updated. The LSMS application must be restarted.

Recovery

The application must be restarted. Restart the LSMS application first on the active server and then on the standby server. For more information, refer to the Configuration Guide.

Event Details

Table B-169 Event 8106 Details

GUI Notification
Severity	Event
Text	MySQL Port changed from <%s> to <%s>. LSMS application restart required.
Surveillance Notification
Text	Notify: Sys Admin - LSMS restart required
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	208
Trap MIB Name	mysqlPortUpdated

8107

Explanation

The MySQL Port has been updated. The Query Server configuration needs to be updated with the new MySQL port.

Recovery

Configure the Query Server with the updated MySQL port. For more information, refer to the Configuration Guide.

Event Details

Table B-170 Event 8107 Details

GUI Notification
Severity	Event
Text	MySQL Port changed from <%s> to <%s>. Query Server configuration updated required.
Surveillance Notification
Text	Notify: Sys Admin - QS updated required
Source	Active server
Frequency	Once, as soon as condition occurs
Trap
Trap ID	209
Trap MIB Name	queryServerResetConfiguration

8108

Explanation

At least one of the connected Query Servers is out of sync, and the binary logs cannot be purged without user confirmation.

Recovery

When the Query Server is out of sync, automatic purging is not possible. To delete all but the last 10 binary logs, log on to the active LSMS server as root and enter the following command:

pruneBinaryLogs -force

Event Details

Table B-171 Event 8108 Details

GUI Notification
Severity	Minor
Text	Automatic purging of binary logs cannot be done. User confirmation required.
Surveillance Notification
Text	Notify: Sys Admin - Purge need confirmation
Source	Both servers
Frequency	Every 45 minutes
Trap
Trap ID	210
Trap MIB Name	purgeConfirmRequired

8109

Explanation

Disk usage is reaching the capacity threshold, and an automatic purge of binary logs is imminent.

Recovery

No action is required.

Event Details

Table B-172 Event 8109 Details

GUI Notification
Severity	Minor
Text	Disk usage reaching <%> percent. Purging of binary logs is imminent.
Surveillance Notification
Text	Notify: Sys Admin - Purging is imminent
Source	Both servers
Frequency	Every 45 minutes
Trap
Trap ID	211
Trap MIB Name	purgeImminent

8110

Explanation

Logs capture has been started by the user.

Recovery

No action is required.

Event Details

Table B-173 Event 8110 Details

GUI Notification
Severity	Cleared
Text	Logs Capture Started
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	None
Trap MIB Name

8111

Explanation

The logs capture started by the user completed successfully.

Recovery

No action is required.

Event Details

Table B-174 Event 8111 Details

GUI Notification
Severity	Minor
Text	Logs Captured Successfully
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	None
Trap MIB Name

8112

Explanation

Syscheck was not able to restart automatically by the cron job.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-175 Event 8112 Details

GUI Notification
Severity	Event
Text	Failed to restart syscheck services
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	None
Trap MIB Name

8116

Explanation

The HTTP protocol is enabled but secure HTTP (HTTPS) is recommended.

Recovery

For information on configuring the protocols, see Starting an Web-Based LSMS GUI Session.

Event Details

Table B-176 Event 8116 Details

GUI Notification
Severity	Event
Text	HTTP is enabled and it is recommended to use HTTPS.
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	None
Trap MIB Name

8117

Explanation

HTTP is disabled and HTTPS is enabled.

Recovery

No recovery required; only HTTPS is enabled now.

Event Details

Table B-177 Event 8117 Details

GUI Notification
Severity	Event
Text	Only HTTPS is enabled now.
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	None
Trap MIB Name

8118

Explanation

Both HTTP and HTTPS are enabled, but using only HTTPS is recommended.

Recovery

For information on configuring the protocols, see Starting an Web-Based LSMS GUI Session.

Event Details

Table B-178 Event 8118 Details

GUI Notification
Severity	Event
Text	Both HTTP and HTTPS are enabled and it is recommended to use HTTPS.
Surveillance Notification
Text	None
Source
Frequency
Trap
Trap ID	None
Trap MIB Name

Additional Trap Information

Trap Id	Trap MIB Name	Notification Description	Trap variables def	Retry Interval	Severity	Event Num	GUI Event Text	Pair Event Num
25	dataReplInfo	This notification indicates that database replication is delayed.	eventNbr = Oracle specific unique identifier for event notification. This eventNbr field can be used to reference Oracle documentation. dbReplInfo = Info message from database replication.	Every 5 mins	event_notif_event	4011	DB Repl Info - %s	0
201	snapInvalidErr	This notification indicates that the Invalid Snapshot has been detected.	eventNbr = Oracle specific unique identifier for event notification. This eventNbr field can be used to reference Oracle documentation. snapName = Name of the invalid snapshot.	Every 30 mins	event_notif_critical	4034	Invalid Snapshot - %s	4035
203	snapFullErr	This notification indicates that the Snapshot is greater than 80% full.	eventNbr = Oracle specific unique identifier for event notification. This eventNbr field can be used to reference Oracle documentation. snapName = Name of the invalid/hanging snapshot.	Every 30 mins	event_notif_critical	4036	Full Snapshot - %s	4037

Trap Id	Trap MIB Name	Notification Description	Frequency	Source	Clearing behavior
212	resyncStartTrap	The trap is sent by the LSMS to NMS when the LSMS is about to start resynchronization	Every time when starting a resynchronization with a NMS	/vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl	None
213	resyncStopTrap	The trap is sent by the LSMS to NMS when resynchronization is complete	Every time when a resynchronization with a NMS is complete	/vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl	None
214	resyncRejectTrap	The trap is sent by the LSMS to NMS when a resynchronization request is rejected by LSMS	Every time when a resynchronization request is initialized while an existing resynchronization is still being processed	/vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl	None
215	resyncRequiredTrap	The trap is sent by the LSMS to NMS when the LSMS is rebooted or LSMS is started	Every time when LSM S is rebooted or restarted	/vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl	None
216	heartBeatTrap	The trap is sent by the LSMS to NMS periodically to indicate that the LSMS is up	Per the configured value in second (0, 5-7200), where 0 indicates the heartbeat trap is disabled.	/vobs/lsms/apps/snmp/ lsmsSnmpHeartbeatSender.pl	None
217	lsmsAlarmTrapV3	The trap will indicate that the following information is for a particular event	Every v3 trap message sent to nms will carry this OID	/vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl	None
218	resyncErrCode	errorCode = 0, Resynchronization completed successfully. errorCode = 1, Resynchronization aborted by NMS. errorCode = 2, Resynchronization already in progress for the NMS. errorCode = 3, Resynchronization Aborted, Database error occurred. errorCode = 4, Resynchronization not in progress.	Every time when either resyncStopTrap or resyncRejectTrap sent to NMS	/vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl	None

Platform Alarms

This section describes the following:

How Platform Alarms Are Reported

Each server runs syscheck periodically and reports any problems found through platform alarms. The severity of platform alarms is one of the following:

Critical, reported through event 4300
Major, reported through event 4200
Minor, reported through event 4100

When one or more problems in a given category has been found, the server reports one corresponding event notification to its Surveillance log and its serial port 3. If the server is not the active server, it also sends the event notification to the active server. The active server reports its own platform events to its own Surveillance log and to its Serial Port 3, and also sends an SNMP trap and displays a GUI notification for either its own platform events or for the non-active server’s platform events.

Each of the events 4100, 4200, and 4300 contain a 16-character hexadecimal bitmasked string that indicates all of the platform events in that category that currently exist. To decode which platform events exist, use the procedure described in “How to Decode Platform Alarms”.

Each time the combination of platform events in a given category changes, a new event is reported. Following is an example of how platform events are reported:

At first, only one major platform event is reported on the standby server. A 4200 event with the alarm number of the event is reported.
One minute later, another platform event exists on the standby server (and the first one still exists). Another 4200 event is reported, with a bitmasked string that indicates both of the platform events that exist.
One minute later, another platform event exists on the standby server (and the previous ones still exist). Another 4200 event is reported, with a bitmasked string that indicates all of the platform events that exist.
One minute later, the first platform event is cleared. Another 4200 event is reported, with a bitmasked string that indicates the two platform events that still exist.

How to Decode Platform Alarms

Use the following procedure to determine all the platform alarms that exist in a given category:

Look in Platform Alarms to see if the alarm number is shown there.
- If the alarm number matches one of the alarms shown in this table, only one alarm (the one that appears in the table) is being reported and you have completed this procedure.
- If the alarm number does not match one of the alarms shown in this table, perform the remaining steps of this procedure.
Log in as any user to either server.
Enter the following command to decode the reported hexadecimal alarm string:
$ /usr/TKLC/plat/bin/almdecode <alarm_number>
The output displays the information about the alarm category and displays the text string for each of the alarms that is represented by the string. For example, if you enter:

$ /usr/TKLC/plat/bin/almdecode 3000000000000180

the following text displays:
```
The string alarm value comes from the Major Platform alarm category.
```
The following alarms are encoded within the hex string:
Server Swap Space Shortage FailureServer Provisioning Network Error

Platform Alarms

Platform errors are grouped by category and severity. The categories are listed from most to least severe:

Table B-179 shows the alarm numbers and alarm text for all alarms generated by the MPS platform. The order within a category is not significant. Some of the alarms described are not available with specific configurations.

Table B-179 Platform Alarms

Alarm Codes and Error Descriptor
Major Platform Alarms
3000000000000001 – Server fan failure
3000000000000002 - Server Internal Disk Error
3000000000000008 - Server Platform Error
3000000000000010 - Server File System Error
3000000000000020 - Server Platform Process Error
3000000000000080 - Server Swap Space Shortage Failure
3000000000000100 - Server provisioning network error
3000000000001000 - Server Disk Space Shortage Error
3000000000002000 - Server Default Route Network Error
3000000000004000 - Server Temperature Error
3000000000008000 - Server Mainboard Voltage Error
3000000000010000 - Server Power Feed Error
3000000000020000 - Server Disk Health Test Error
3000000000040000 - Server Disk Unavailable Error
3000000000080000 - Device Error
3000000000100000 - Device Interface Error
3000000008000000 - Server HA Keepalive Error
3000000010000000 - DRBD block device can not be mounted
3000000020000000 - DRBD block device is not being replicated to peer
3000000040000000 - DRBD peer needs intervention
3000000400000000 - Multipath device access link problem
3000000800000000 – Switch Link Down Error
3000001000000000 - Half-open Socket Limit
3000002000000000 - Flash Program Failure
3000004000000000 - Serial Mezzanine Unseated
Minor Platform Alarms
5000000000000001 - Server Disk Space Shortage Warning
5000000000000002 - Server Application Process Error
5000000000000004 - Server Hardware Configuration Error
5000000000000008 - Server RAM Shortage Warning
5000000000000020 - Server Swap Space Shortage Warning
5000000000000040 - Server Default Router Not Defined
5000000000000080 – Server temperature warning
5000000000000100 - Server Core File Detected
5000000000000200 - Server NTP Daemon Not Synchronized
5000000000000400 - Server CMOS Battery Voltage Low
5000000000000800 - Server Disk Self Test Warning
5000000000001000 - Device Warning
5000000000002000 - Device Interface Warning
5000000000004000 - Server Reboot Watchdog Initiated
5000000000008000 - Server HA Failover Inhibited
5000000000010000 - Server HA Active To Standby Transition
5000000000020000 - Server HA Standby To Active Transition
5000000000040000 - Platform Health Check Failure
5000000000080000 - NTP Offset Check Failure
5000000000100000 - NTP Stratum Check Failure
5000000000200000 - SAS Presence Sensor Missing
5000000000400000 - SAS Drive Missing
5000000000800000 - DRBD failover busy
5000000001000000 - HP disk resync
5000000020000000 – Server Kernel Dump File Detected
5000000040000000 – TPD Upgrade Failed
5000000080000000 - Half Open Socket Warning Limit
NOTE: The order within a category is not significant.

Alarm Recovery Procedures

This section provides recovery procedures for the MPS, listed by alarm category and Alarm Code (alarm data string) within each category.

Major Platform Alarms

Major platform alarms involve hardware components, memory, and network connections.

3000000000000001 – Server fan failure

Alarm Type: TPD

Description: This alarm indicates that a fan in the EAGLE fan tray in the EAGLE shelf where the E5-APP-B is "jacked in" is either failing or has failed completely. In either case, there is a danger of component failure due to overheating.

Severity: Major

OID: TpdFanErrorNotify 1.3.6.1.4.1.323.5.3.18.3.1.2.1

Alarm ID: TKSPLATMA13000000000000001

Recovery

Note:

Run syscheck in Verbose mode to verify a fan failure using the following command:

[root@hostname1351690497 ~]# syscheck -v hardware fan
Running modules in class hardware...
         fan: Checking Status of Server Fans.
*         fan: FAILURE:: MAJOR::3000000000000001 -- Server Fan Failure. This test uses the leaky bucket algorithm.
*         fan: FAILURE:: Fan RPM is too low, fana: 0, CHIP: FAN
One or more module in class "hardware" FAILED

LOG LOCATION: /var/TKLC/log/syscheck/fail_log

Refer to the procedure for determining the location of the fan assembly that contains the failed fan and replacing a fan assembly in the appropriate hardware manual. After you have opened the front lid to access the fan assemblies, determine whether any objects are interfering with the fan rotation. If some object is interfering with fan rotation, remove the object.
Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

3000000000000002 - Server Internal Disk Error

This alarm indicates that the server is experiencing issues replicating data to one or more of its mirrored disk drives. This could indicate that one of the server disks has failed or is approaching failure.

Recovery

Run syscheck in Verbose mode.
Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 and provide the system health check output.

3000000000000008 - Server Platform Error

This alarm indicates a major platform error such as a corrupt system configuration or missing files, or indicates that syscheck itself is corrupt.

Recovery

Run syscheck in Verbose mode.
Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 and provide the system health check output.

3000000000000010 - Server File System Error

This alarm indicates that syscheck was unsuccessful in writing to at least one of the server file systems.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

3000000000000020 - Server Platform Process Error

This alarm indicates that either the minimum number of instances for a required process are not currently running or too many instances of a required process are running.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for recovery procedures.

3000000000000080 - Server Swap Space Shortage Failure

This alarm indicates that the server’s swap space is in danger of being depleted. This is usually caused by a process that has allocated a very large amount of memory over time.

Note:

In order for this alarm to clear, the underlying failure condition must be consistently undetected for a number of polling intervals. Therefore, the alarm may continue to be reported for several minutes after corrective actions are completed.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

3000000000000100 - Server provisioning network error

Alarm Type: TPD

Description: This alarm indicates that the connection between the server’s eth01ethernet interface and the customer network is not functioning properly. The eth01 interface is at the upper right port on the rear of the server on the EAGLE backplane.

Note:

The interface identified as eth01 on the hardware is identified as eth91 by the software (in syscheck output, for example).

Severity: Major

OID: TpdProvNetworkErrorNotify 1.3.6.1.4.1.323.5.3.18.3.1.2.9

Alarm ID: TKSPLATMA93000000000000100

Recovery

Check the physical network connectivity between the LSMS and the NAS.
Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

3000000000001000 - Server Disk Space Shortage Error

This alarm indicates that one of the following conditions has occurred:

A file system has exceeded a failure threshold, which means that more than 90% of the available disk storage has been used on the file system.
More than 90% of the total number of available files have been allocated on the file system.
A file system has a different number of blocks than it had when installed.

Recovery

Run syscheck.
Examine the syscheck output to determine if the file system /var/TKLC/lsms/free is low on space. If it is, continue to the next step; otherwise go to Step 4.
If possible, recover space on the free partition by deleting unnecessary files:
1. Log in to the server generating the alarm as the root user:
  
  Login: root
  
  Password:<Enter root password>
2. Change to the /var/TKLC/lsms/free directory: # cd /var/TKLC/lsms/free
3. Confirm that you are in the /var/TKLC/lsms/free directory: # pwd /var/TKLC/lsms/free
4. When the pwd command is executed, if /var/TKLC/lsms/free is not output, go back to Sub-step b
5. List files to be deleted and delete them using the rm command
6. Re-run syscheck
If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to the next Step.
If the file system mounted on /var/TKLC/lsms/logs is the file system that syscheck is reporting to be low on space, execute the following steps:
1. Log in to the server generating the alarm as the root user:
  
  Login: root
  
  Password:<Enter root password>
2. Change to the /var/TKLC/lsms/logs directory: # cd /var/TKLC/lsms/logs
3. Confirm that you are in the /var/TKLC/lsms/logs directory: # pwd /var/TKLC/lsms/logs
4. When the pwd command is executed, if /var/TKLC/lsms/logs is not output, go back to Sub-step b
5. Look for files with names matching: logs_(hostname)_(date/timestamp).tar, where (hostname) is replaced by the server’s hostname, and (date/timestamp) is any date or timestamp. # ls logs_'hostname'_*.tar. Any files listed may be safely deleted, so for each file listed in the ls output, execute an rm command: # rm <filename> where <filename> is replaced by the name of the file to be deleted.
6. Re-run syscheck
  If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to the next Step.
Core files can occupy a large amount of disk space and may the cause of this alarm. To collect and remove any core files from the server:
1. Log in to the server generating the alarm as the root user:
  
  Login: root
  
  Password:<Enter root password>
2. Change directory to /var/TKLC/core and list the core files. # cd /var/TKLC/core # ls -l
Note:

The ls command shown above will list any core files found and then compresses and renames the file, adding a ".gz" extension. If any core files are found, transfer them off the system and save them for examination by Oracle Engineering. Once a copy of a compressed file has been saved, it is safe to delete it from the server.
1. Re-run syscheck
  If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to the next Step.
Execute the following Sub-steps if the file system reported by syscheck is /tmp, otherwise skip to Step 7:
1. Log in to the server generating the alarm as the root user:
  
  Login: root
  
  Password:<Enter root password>
2. Change to the /tmp directory: # cd /tmp
3. Confirm that you are in the /tmpdirectory: # pwd /tmp
4. When the pwd command is executed, if /tmp is not output, go back to Step 5.
5. Look for possible candidates for deletion: # ls *.iso *.bz2 *.gz *.tar *.tgz *.zip
6. If any deletable files exist, the output of the ls will show them. For each of the files listed, execute the rm command to delete the file: # rm <filename>
7. Run syscheck
  If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to Step 4.
8. Upon a reboot, the system will clean the /tmp directory.
  To reboot the system issue the # shutdown -r now command.
9. Re-run syscheck
  If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to the next Step.
Execute the following steps if the file system reported by syscheck is /var, otherwise skip to Step 10:
1. Log in to the server generating the alarm as the root user:
  
  Login: root
  
  Password:<Enter root password>
2. Change to the /var/tmp directory: # cd /var/tmp
3. Confirm that you are in the /var/tmp directory: # pwd, then /var/tmp
4. When the pwd command is executed, if /var/tmp is not output, go back to Step 5.
5. Since all files in this directory can be safely deleted, execute the rm * command to delete all files from the directory: # rm -i *.
6. Re-run syscheck
  If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to Step 10.
Execute the following steps if the file system reported by syscheck is /var/TKLC, otherwise skip to Step 10.
1. Log in to the server generating the alarm as the root user:
  
  Login: root
  
  Password:<Enter root password>
2. Change to the /var/TKLC/upgrade directory: # cd /var/TKLC/upgrade
3. Confirm that you are in the /var/TKLC/upgrade directory: # pwd, then /var/TKLC/upgrade
4. When the pwd command is executed, if /var/tmp is not output, go back to Step 5.
5. Since all files in this directory can be safely deleted, execute the rm * command to delete all files from the directory: # rm -i *.
6. Re-run syscheck
  If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to Step 10.
For any other file system, execute the following command, where <mountpoint> is the file system’s mount point: # find <mountpoint> -type f -exec du -k {} \; | sort -nr > /tmp/file_sizes.txt
This will produce a list of files in the given file system sorted by file size in the file /tmp/file_sizes.txt.

Note:
The find command above may take a few minutes to complete if the given mountpoint contains many files. Do not delete any files unless care certain that they are not needed. Continue to Step 10.
Run savelogs to gather all application logs (see Saving Logs Using the LSMS GUI or Command Line).
Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

3000000000002000 - Server Default Route Network Error

This alarm indicates that the default network route of the server is experiencing a problem. Running syscheck in Verbose mode will provide information about which type of problem.

Caution:

When changing the network routing configuration of the server, verify that the modifications will not impact the method of connectivity for the current login session. The route information must be entered correctly and set to the correct values. Incorrectly modifying the routing configuration of the server may result in total loss of remote network access.

Recovery

Run syscheck in Verbose mode.
The output should indicate one of the following errors:
- ```
The default router at <IP_address> cannot be pinged.
```
  This error indicates that the router may not be operating or is unreachable. If the syscheck Verbose output returns this error, go to the next Step.
- ```
The default route is not on the provisioning network.
```
  This error indicates that the default route has been defined in the wrong network. If the syscheck Verbose output returns this error, go to Step 3.
- ```
An active route cannot be found for a configured default route.
```
  This error indicates that a mismatch exists between the active configuration and the stored configuration. If the syscheck Verbose output returns this error, go to Step 4.
Note:
If the syscheck Verbose output does not indicate one of the errors above, go to step 5.
Perform the following substeps when syscheck Verbose output indicates:
```
The default router at <IP_address> cannot be pinged
```
1. Verify that the network cables are firmly attached to the server, network switch, router, Ethernet switch or hub, and any other connection points.
2. Verify that the configured router is functioning properly.
  
  Request that the network administrator verify the router is powered on and routing traffic as required.
3. Request that the router administrator verify that the router is configured to reply to pings on that interface.
4. Run syscheck.
  - If the alarm is cleared, the problem is resolved and this procedure is complete.
  - If the alarm is not cleared, go to step 5.
Perform Network Reconfiguration from the Command Line using su - lsmsmgr command. Update the default router.
Contact the Customer Care Center for further assistance. Provide the syscheck output collected in the previous steps.

3000000000004000 - Server Temperature Error

Alarm Type: TPD

Description: The internal temperature within the server is unacceptably high.

Severity: Major

OID: TpdTemperatureErrorNotify 1.3.6.1.4.1.323.5.3.18.3.1.2.15

Alarm ID: TKSPLATMA153000000000004000

Recovery

Ensure that nothing is blocking the fan's intake. Remove any blockage.

Verify that the temperature in the room is normal with the following table. If it is too hot, lower the temperature in the room to an acceptable level.

Table B-180 Server Environmental Conditions

Ambient Temperature	Operating: 5 degrees C to 40 degrees C Exceptional Operating Limit: 0 degrees C to 50 degrees C Storage: -20 degrees C to 60 degrees C
Ambient Temperature	Operating: 5° C to 35° C Storage: -20° C to 60° C
Relative Humidity	Operating: 5% to 85% non-condensing Storage: 5% to 950% non-condensing
Elevation	Operating: -300m to +300m Storage: -300m to +1200m
Heating, Ventilation, and Air Conditioning	Capacity must compensate for up to 5100 BTUs/hr for each installed frame. Calculate HVAC capacity as follows: Determine the wattage of the installed equipment. Use the formula: watts x 3.143 = BTUs/hr

Note:

Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. The alarm may take up to five minutes to clear after conditions improve. It may take about ten minutes after the room returns to an acceptable temperature before syscheck shows the alarm cleared.

Verify that the temperature in the room is normal. If it is too hot, lower the temperature in the room to an acceptable level.

Note:
Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. It may take about ten minutes after the room returns to an acceptable temperature before the alarm cleared.
Run syscheck Check to see if the alarm has cleared
- If the alarm has been cleared, the problem is resolved.
- If the alarm has not been cleared, continue with the next step.
Run syscheck Check to see if the alarm has cleared
- If the alarm has been cleared, the problem is resolved.
- If the alarm has not been cleared, continue with the next step.
Replace the filter (refer to the appropriate hardware manual).

Note:
Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. The alarm may take up to five minutes to clear after conditions improve. It may take about ten minutes after the filter is replaced before syscheck shows the alarm cleared.
Run syscheck.
- If the alarm has been cleared, the problem is resolved.
- If the alarm has not been cleared, continue with the next step.
If the problem has not been resolved, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

3000000000008000 - Server Mainboard Voltage Error

This alarm indicates that at least one monitored voltages on the server mainboard is not within the normal operating range.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

3000000000010000 - Server Power Feed Error

This alarm indicates that one of the power feeds to the server has failed.

Recovery

Locate the server supplied by the faulty power feed. Verify that all connections to the power supply units are connected securely. To determine where the cables connect to the servers, see the Power Connections and Cables page of the NAS on LSMS E5-APP-B Interconnect.
Run syscheck.
1. If the alarm is cleared, the problem is resolved.
2. If the alarm is not cleared, go to the next step.
Trace the power feed to its connection on the power source.
Verify that the power source is on and that the power feed is properly secured.
Run syscheck.
1. If the alarm is cleared, the problem is resolved.
2. If the alarm is not cleared, go to the next step.
If the power source is functioning properly and all connections are secure, request that an electrician check the voltage on the power feed.
Run syscheck.
1. If the alarm is cleared, the problem is resolved.
2. If the alarm is not cleared, go to the next step.
If the problem is not resolved, call the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
Run savelogs_plat to gather system information for further troubleshooting, (see Saving Logs Using the LSMS GUI or Command Line), and contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

3000000000020000 - Server Disk Health Test Error

This alarm indicates that the hard drive has failed or failure is imminent.

Recovery

Immediately contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance with a disk replacement.

3000000000040000 - Server Disk Unavailable Error

This alarm indicates that the smartd service is not able to read the disk status because the disk has other problems that are reported by other alarms. This alarm appears only while a server is booting.

Recovery

Perform the recovery procedures for the other alarms that accompany this alarm.

3000000000080000 - Device Error

This alarm indicates that the offboard storage server has a problem with its disk volume filling.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

3000000000100000 - Device Interface Error

This alarm indicates that the IP bond is either not configured or not functioning.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

3000000400000000 - Multipath device access link problem

Alarm Type: TPD

Description: One or more "access paths" of a multipath device are failing or are not healthy, or the multipath device does not exist.

Severity: Major

OID: TpdMpathDeviceProblemNotify1.3.6.1.4.1.323.5.3.18.3.1.2.35

Alarm ID: TKSPLATMA353000000400000000

Recovery

unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 should do the following:
1. Check in the MSA administration console (web-application) that correct "volumes" on MSA exist, and read/write access is granted to the blade server.
2. Check if multipath daemon/service is running on the blade server: service multipathd status. Resolution:
  1. start multipathd: service multipathd start
3. Check output of "multipath -ll": it shows all multipath devices existing in the system and their access paths; check that particular /dev/sdX devices exist. This may be due to SCSI bus and/or FC HBAs haven't been rescanned to see if new devices exist. Resolution:
  1. run "/opt/hp/hp_fibreutils/hp_rescan -a",
  2. "echo 1 > /sys/class/fc_host/host*/issue_lip",
  3. "echo '- - -' > /sys/class/scsi_host/host*/scan"
4. Check if syscheck::disk::multipath test is configured to monitor right multipath devices and its access paths: see output of "multipath -ll" and compare them to "syscheckAdm disk multipath - -get - -var=MPATH_LINKS" output. Resolution:
  1. configure disk::multipath check correctly.
Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

3000000800000000 – Switch Link Down Error

This alarm indicates that the switch is reporting that the link is down. The link that is down is reported in the alarm. For example, port 1/1/2 is reported as 1102.

Recovery Procedure:

Verify cabling between the offending port and remote side.
Verify networking on the remote end.
If problem persists, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 to verify port settings on both the server and the switch.

3000001000000000 - Half-open Socket Limit

Alarm Type: TPD

Description:This alarm indicates that the number of half open TCP sockets has reached the major threshold. This problem is caused by a remote system failing to complete the TCP 3-way handshake.

Severity: Major

OID: tpdHalfOpenSocketLimit 1.3.6.1.4.1.323.5.3.18.3.1.2.37

Alarm ID: TKSPLATMA37 3000001000000000

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

3000002000000000 - Flash Program Failure

Alarm Type: TPD

Description: This alarm indicates there was an error while trying to update the firmware flash on the E5-APP-B cards.

Severity: Major

OID: tpdFlashProgramFailure 1.3.6.1.4.1.323.5.3.18.3.1.2.38

Alarm ID: TKSPLATMA383000002000000000

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

3000004000000000 - Serial Mezzanine Unseated

Alarm Type: TPD

Description:This alarm indicates the serial mezzanine board was not properly seated.

Severity: Major

OID: tpdSerialMezzUnseated 1.3.6.1.4.1.323.5.3.18.3.1.2.39

Alarm ID: TKSPLATMA393000004000000000

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

3000000008000000 - Server HA Keepalive Error

This alarm indicates that heartbeat process has detected that it has failed to receive a heartbeat packet within the timeout period.

Recovery

Determine if the mate server is currently operating. If the mate server is not operating, attempt to restore it to operation.
Determine if the keepalive interface is operating.
Determine if heartbeart is running (service TKLCha status).
Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

3000000010000000 - DRBD block device can not be mounted

This alarm indicates that DRBD is not functioning properly on the local server. The DRBD state (disk state, node state, or connection state) indicates a problem.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

3000000020000000 - DRBD block device is not being replicated to peer

This alarm indicates that DRBD is not replicating to the peer server. Usually this alarm indicates that DRBD is not connected to the peer server. A DRBD Split Brain may have occurred.

Recovery

Determine if the mate server is currently operating.
Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

3000000040000000 - DRBD peer needs intervention

This alarm indicates that DRBD is not functioning properly on the peer server. DRBD is connected to the peer server, but the DRBD state on the peer server is either unknown or indicates a problem.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Minor Platform Alarms

Minor platform alarms involve disk space, application processes, RAM, and configuration errors.

5000000000000001 - Server Disk Space Shortage Warning

This alarm indicates that one of the following conditions has occurred:

A file system has exceeded a warning threshold, which means that more than 80% (but less than 90%) of the available disk storage has been used on the file system.
More than 80% (but less than 90%) of the total number of available files have been allocated on the file system.

Recovery

Run syscheck.
Examine the syscheck output to determine if the file system /var/TKLC/lsms/free is low on space. If it is, continue to the next step; otherwise go to Step 4.
If possible, recover space on the free partition by deleting unnecessary files:
1. Log in to the server generating the alarm as the root user:
  
  Login: root
  
  Password:<Enter root password>
2. Change to the /var/TKLC/lsms/free directory: # cd /var/TKLC/lsms/free
3. Confirm that you are in the /var/TKLC/lsms/free directory: # pwd /var/TKLC/lsms/free
4. When the pwd command is executed, if /var/TKLC/lsms/free is not output, go back to Sub-step b
5. List files to be deleted and delete them using the rm command
6. Re-run syscheck
If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to the next Step.
Run savelogs to gather all application logs (see Saving Logs Using the LSMS GUI or Command Line).
Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

5000000000000002 - Server Application Process Error

This alarm indicates that either the minimum number of instances for a required process are not currently running or too many instances of a required process are running.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
If a 3000000000000020 - Server Platform Process Error alarm is also present, execute the recovery procedure associated with that alarm before proceeding.
Log in to the LSMS CLI using root password.
Stop the LSMS application.
Start the LSMS Application.
Capture the log files on both LSMSs (see Saving Logs Using the LSMS GUI or Command Lineand contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

5000000000000004 - Server Hardware Configuration Error

This alarm indicates that one or more of the server’s hardware components are not in compliance with proper specifications (refer to Application B Card Hardware and Installation Guide.

Recovery

Run syscheck in verbose mode.
Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000000000008 - Server RAM Shortage Warning

This alarm indicates one of two conditions:

Less memory than the expected amount is installed.
The system is swapping pages in and out of physical memory at a fast rate, indicating a possible degradation in system performance.

This alarm may not clear immediately when conditions fall below the alarm threshold. Conditions must be below the alarm threshold consistently for the alarm to clear. The alarm may take up to five minutes to clear after conditions improve.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000000000020 - Server Swap Space Shortage Warning

This alarm indicates that the swap space available on the server is less than expected. This is usually caused by a process that has allocated a very large amount of memory over time.

Note:

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000000000040 - Server Default Router Not Defined

This alarm indicates that the default network route is either not configured or the current configuration contains an invalid IP address or hostname.

Caution:

When changing the server’s network routing configuration it is important to verify that the modifications will not impact the method of connectivity for the current login session. It is also crucial that this information not be entered incorrectly or set to improper values. Incorrectly modifying the server’s routing configuration may result in total loss of remote network access.

Recovery

To define the default router:
1. Obtain the proper Provisioning Network netmask and the IP address of the appropriate Default Route on the provisioning network. These are maintained by the customer network administrators.
2. Log in to the LSMS CLI from lsmspri server with username root and run su - lsmsmgr
3. Select Network Configuration Menu, from the LSMS Configuration Menu
4. Select Network Reconfiguration Menu from the Network Configuration Menu. The following warning appears:
  WARNING: This action is service impacting. Are you sure?
5. Chose yes. This displays the configuration screen. See the Configuration Guide for Initial Configuration information.
6. Do the configuration.
7. Exit from the lsmsmgr menu.
8. Run syscheck again. If the alarm has not been cleared, go to Sub-step j.
9. Run savelogs to gather all application logs.
10. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

5000000000000080 – Server temperature warning

Alarm Type: TPD

Description: This alarm indicates that the internal temperature within the server is outside of the normal operating range. A server Fan Failure may also exist along with the Server Temperature Warning.

Severity: Minor

OID: tpdTemperatureWarningNotify 1.3.6.1.4.1.323.5.3.18.3.1.3.8

Alarm ID: TKSPLATMI85000000000000080

Recovery

Ensure that nothing is blocking the fan's intake. Remove any blockage.

Verify that the temperature in the room is normal. If it is too hot, lower the temperature in the room to an acceptable level.

Table B-181 Server Environmental Conditions

Ambient Temperature	Operating: 5 degrees C to 40 degrees C Exceptional Operating Limit: 0 degrees C to 50 degrees C Storage: -20 degrees C to 60 degrees C
Relative Humidity	Operating: 5% to 85% non-condensing Storage: 5% to 950% non-condensing
Elevation	Operating: -300m to +300m Storage: -300m to +1200m
Heating, Ventilation, and Air Conditioning	Capacity must compensate for up to 5100 BTUs/hr for each installed frame. Calculate HVAC capacity as follows: Determine the wattage of the installed equipment. Use the formula: watts x 3.143 = BTUs/hr

Note:

Verify that the temperature in the room is normal. If it is too hot, lower the temperature in the room to an acceptable level.

Note:
Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. It may take about ten minutes after the room returns to an acceptable temperature before the alarm cleared.
Run syscheck to see if the alarm has cleared
- If the alarm has been cleared, the problem is resolved.
- If the alarm has not been cleared, continue with the next step.
Replace the filter (refer to the appropriate hardware manual).

Note:
Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. It may take about ten minutes after the filter is replaced before the alarm cleared.
Run syscheck to see if the alarm has cleared
- If the alarm has been cleared, the problem is resolved.
- If the alarm has not been cleared, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 and provide the system health check output.

5000000000000100 - Server Core File Detected

This alarm indicates that an application process has failed and debug information is available.

Recovery

Run syscheck in verbose mode.
Run savelogs to gather system information (see Saving Logs Using the LSMS GUI or Command Line).
Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

5000000000000200 - Server NTP Daemon Not Synchronized

This alarm indicates that the NTP daemon (background process) has been unable to locate a server to provide an acceptable time reference for synchronization.

Severity: Minor

Alarm ID: TKSPLATMI10

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

5000000000000400 - Server CMOS Battery Voltage Low

The presence of this alarm indicates that the CMOS battery voltage has been detected to be below the expected value. This alarm is an early warning indicator of CMOS battery end-of-life failure which will cause problems in the event the server is powered off.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

5000000000000800 - Server Disk Self Test Warning

A non-fatal disk issue (such as a sector cannot be read) exists.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

5000000000001000 - Device Warning

This alarm indicates that either a snmpget cannot be performed on the configured SNMP OID or the returned value failed the specified comparison operation.

Recovery

Run syscheck in Verbose mode.
Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000000002000 - Device Interface Warning

This alarm can be generated by either an SNMP trap or an IP bond error. If syscheck is configured to receive SNMP traps, this alarm indicates that a SNMP trap was received with the set state. If syscheck is configured for IP bond monitoring, this alarm can mean that a slave device is not operating, a primary device is not active, or syscheck is unable to read bonding information from interface configuration files.

Recovery

Run syscheck in Verbose mode.
Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000000004000 - Server Reboot Watchdog Initiated

This alarm indicates that the server has been rebooted due to a hardware watchdog.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
This condition should never happen.

5000000000008000 - Server HA Failover Inhibited

This alarm indicates that the server has been inhibited and HA failover is prevented from occurring.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000000010000 - Server HA Active To Standby Transition

This alarm indicates that the server is in the process of transitioning HA state from Active to Standby.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000000020000 - Server HA Standby To Active Transition

This alarm indicates that the server is in the process of transitioning HA state from Standby to Active.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000000040000 - Platform Health Check Failure

This alarm indicates a syscheck configuration error.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000000080000 - NTP Offset Check Failure

This alarm indicates that time on the server is outside the acceptable range or offset from the NTP server. The alarm message provides the offset value of the server from the NTP server and the offset limit set for the system by the application.

Alarm Type: TPD

Severity: Minor

Alarm ID: TKSPLATMI20

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000000100000 - NTP Stratum Check Failure

This alarm indicates that NTP is syncing to a server, but the stratum level of the NTP server is outside the acceptable limit. The alarm message provides the stratum value of the NTP server and the stratum limit set for the system by the application.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000020000000 – Server Kernel Dump File Detected

Alarm Type: TPD

Description: This alarm indicates that the kernel has crashed and debug information is available.

Severity: Minor

OID: 1.3.6.1.4.1.323.5.3.18.3.1.3.30

Alarm ID: TKSPLATMI305000000020000000

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

5000000040000000 – TPD Upgrade Failed

Alarm Type: TPD

Description: This alarm indicates that a TPD upgrade has failed.

Severity: Minor

OID: tpdServerUpgradeFailDetectedNotify 1.3.6.1.4.1.323.5.3.18.3.1.3.31

Alarm ID: TKSPLATMI315000000040000000

Recovery

Run the following command to clear the alarm.
/usr/TKLC/plat/bin/alarmMgr –clear TKSPLATMI31
Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

5000000080000000 - Half Open Socket Warning Limit

Alarm Type: TPD

This alarm indicates that the number of half open TCP sockets has reached the major threshold. This problem is caused by a remote system failing to complete the TCP 3-way handshake.

Severity: Minor

OID: tpdHalfOpenSocketWarningNotify1.3.6.1.4.1.323.5.3.18.3.1.3.32

Alarm ID: TKSPLATMI325000000080000s000

Recovery

Run syscheck.
Contact the Customer Care Center and provide the system health check output.

5000000000200000 - SAS Presence Sensor Missing

This alarm indicates that the server drive sensor is not working.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance with a replacement server.

5000000000400000 - SAS Drive Missing

This alarm indicates that the number of drives configured for this server is not being detected.

Recovery

Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 to determine if the alarm is caused by a failed drive or failed configuration.

5000000000800000 - DRBD failover busy

This alarm indicates that a DRBD sync is in progress from the peer server to the local server. The local server is not ready to bethe primary DRBD node because its data is not current.

Recovery

Wait for approximately 20 minutes, then check if the DRBD sync has completed. A DRBD sync should take no more than 15 minutes to complete.
If the alarm persists longer than this time interval, call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

5000000001000000 - HP disk resync

This alarm indicates that the HP disk subsystem is currently resyncing after a failed or replaced drive, or after another change in the configuration of the HP disk subsystem. The output of the message will include the disk that is resyncing and the percentage complete. This alarm eventually clears after the resync of the disk is completed. The time to clear is dependant on the size of the disk and the amount of activity on the system..

Recovery

Run syscheck in Verbose mode.
If the percent recovering is not updating, wait at least 5 minutes between subsequent runs of syscheck, then call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 with the syscheck output.

Saving Logs Using the LSMS GUI or Command Line

During some corrective procedures, it may be necessary to provide Oracle Communications with information about the LSMS for help in clearing an alarm. These log files are used to aid the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 when troubleshooting the LSMS.

Use the following procedure to save logs using menu selections from the LSMS GUI.

Log in to the User Interface screen of the LSMS GUI (see Starting an Web-Based LSMS GUI Session).
From the menu, select Logs>Capture Logs.
Select the number of days for which you want to capture the logs, as well as the specific logs, and click OK.
To capture logs from the Command Line, enter the following command: /usr/TKLC/plat/sbin/savelogs_plat