B Automatic Monitoring of Events

This appendix contains overviews of monitored events, GUI and surveillance notifications, and traps.

Introduction

This appendix contains:

  • “Overview of Monitored Events”, which describes how the LSMS monitors itself for events and alarms and how it reports them.

  • “Overview of GUI Notifications”, which describes the display, format, and logging of notifications that appear on the graphical user interface.

  • “Overview of Surveillance Notifications”, which describes the display, format, and logging of Surveillance notifications.

  • “Overview of Traps”, which describes the transmission, format, and logging of SNMP traps.

  • A listing of all events, in numerical order, starting on page B-18. For each event, this appendix includes:

    • Explanation of the probable cause for the event

    • Suggested recovery

    • Indication of whether the event results in a GUI notification, Surveillance notification, trap, or some combination of these.

Overview of Monitored Events

This section describes:

Types of Events and Alarms Reported

The LSMS monitors itself for the types of events and alarms shown in Table B-1 . When one of these events occurs, the LSMS does one or more of the following:

  • Displays a notification on the graphical user interface (GUI notification)

  • Posts a Surveillance notification at a certain frequency to the administration console by default, or to the second serial port if so configured

  • Sends a trap to a Network Management System (NMS) if you have installed the optional Remote Monitoring feature

Every GUI notification and Surveillance notification contains its associated event number. Traps contain a trap ID, which is explained in Overview of Traps.

Table B-1 Notification Event Number Categories

Event Number Range Category Description

0000–1999

EMS

Events that pertain to an Element Management System (EMS). The EMS is a process that runs on the Multi-Purpose Server (MPS) at a network element.

2000–3999

NPAC

Events that pertain to a Number Portability Administration Center (NPAC)

4000–5999

Platform and switchover (some of these events do not produce GUI notifications)

Events that pertain to system resources, such as disks, hardware, memory, central processing unit (CPU) utilization and to switchover functions

6000–7999

Main LSMS processes

Events that pertain to one of the following main LSMS processes: lsman, supman, npacagent, or eagleagent

8000–8999

Applications

Events that pertain to LSMS applications that are feature or application dependent, such as LNP Database Synchronization, Service Assurance, or NPA Split Administration

How Servers Report Alarms and Events

The LSMS 9.0 servers perform the following functions to monitor and report events:

  • The standby server:
    • Monitors itself only for:
      • Platform events (see Platform Alarms)
      • Switchover-readiness events, such as those that describe database replication or critical network interfaces
    • Controls the appropriate AlarmLED (Critical, Major, or Minor) on the front of the server by illuminating the LED when one or more platform alarm in that category exists and turning off the LED when no platform alarms in that category exist
    • Sends any notification to its Serial Port 3 and logs the notification in its Surveillance log
    • Sends the notification to the active server
  • The active server performs the following functions:
    • Monitors itself for both platform events and application events
    • Controls the appropriate AlarmLED (Critical, Major, or Minor) on the front of the server by illuminating the LED when one or more platform alarm in that category exists and turning off the LED when no platform alarms in that category exist
    • Sends all platform events for itself, events reported from the standby server, and appropriate application events for itself to its Serial Port 3 and also logs the event as appropriate in its Surveillance log (some event notifications are reported repeatedly; for more information about which events are reported repeatedly, see the individual event descriptions)
      • Alarms that originate from the active server contain the alarm text with no hostname
      • Alarms that originate from the standby server contain the alarm text preceded by the standby server’s hostname

        Note:

        Although all events are reported through SNMP traps and all platform alarms are reported through Surveillance notifications, not all application alarms are reported both through the GUI and through Surveillance notifications; for more information about which alarms are reported in which way, see the individual event descriptions.
    • Displays one time on the GUI each platform or application event for itself and each platform event received from the standby server:
      • Alarms that originate from the active server display the alarm text with no hostname
      • Alarms that originate from the standby server display the alarm text preceded by the standby server’s hostname
    • Sends one Simple Network Management Protocol (SNMP) trap for each platform or application event for itself and for each platform event received from the standby server. Each trap contains the IP address of the server from which the notification originated.

Overview of GUI Notifications

Displaying GUI Notifications

GUI notifications are displayed on the GUI only if the GUI is active when the reported event occurs, but all GUI notifications are logged in an appropriate log as described in Logging GUI Notifications. Figure B-1 shows an example of notifications displayed on the GUI.

Figure B-1 GUI Notifications


img/c_overview_of_gui_notifications_mm-fig1-r13.jpg

Format of GUI Notifications

This section describes the general format used for most GUI notifications, as well as additional fields used for GUI event notifications (used to report information only) and for EMS GUI notifications. The formats are expressed as an ordered sequence of variables. Variables are expressed with the name of the variable enclosed by angle brackets; for example, <Severity> indicates a variable for the severity assigned to a GUI notification. Variables Used in GUI Notification Format Descriptionsshows the variables used in GUI notification formats.

General Format for GUI Notifications

The format for most GUI notifications is:


[<Severity>]:<Time Stamp> <Event Number> <Message Text String>

In addition, the following types of GUI notifications contain additional fields:

Format for EMS GUI Notifications

EMS GUI notifications (event numbers in the range 0000–1999) contain a <CLLI> value to indicate the Common Language Location Identifier for the network element where the EMS resides. The format for EMS GUI notifications is:


[<Severity>]:<Time Stamp> <Event Number> <CLLI>: <Message Text String>

Format for GUI Notifications with EVENT Severity

Notifications that have the severity EVENT can contain additional event data fields. The format for GUI notifications with severity EVENT is:


[EVENT]:<Time Stamp> <Event Number> <EventType>:<EventData1>, [<EventData2>],...
Variables Used in GUI Notification Format Descriptions

Table B-2 shows the possible values and meanings for each of the variables shown in format definitions for GUI notifications.

Table B-2 Variables Used in GUI Notifications

Field Description

<Severity>

Indicates seriousness of event, using both text and color, as follows:

Text

Color

Meaning

[Critical]

Red

Reports a serious condition that requires immediate attention

[Major]

Yellow

Reports a moderately serious condition that should be monitored, but does not require immediate attention

[Minor] Turquoise Reports a condition of minor significance that should be monitored, but which does not require immediate attention.

[Cleared]

Green

Reports status information or the clearing of a condition that caused previous posting of a [Critical] or [Major] GUI notification

[EVENT]

White

For information only

<Time Stamp>

Indicates time that the event was detected, in format:

YYYY-MM-DD hh:mm:ss where fields are as follows:

Field

Meaning

Possible Values

YYYY

Year

Any four digits

MM

Month

01 through 12

DD

Day

01 through 31

hh

Hour

00 through 23

mm

Minute

00 through 59

ss

Second

00 through 59

<Event Number>

Four-digit number that identifies the specific GUI notification (also indicates the type of GUI notification, as shown in Table B-1 ).

<Message Text String>

Text string (which may contain one or more variables defined in Table B-3 ) that provides a small amount of information about the event. For more information about the event, look up the corresponding event number in this appendix; for each event number, this appendix shows the text string as it appears in a GUI notification, as well as a more detailed explanation and suggested recovery.

<CLLI>

Used in all EMS GUI notifications to indicate the Common Language Location Identifier for the network element where the EMS resides.

<EventType>: <EventData1>, [<EventData2>],...

Optional event data fields, as indicated by square brackets around the field, included in GUI notifications with severity [EVENT]. If no data is available for a given field, the field is empty. If other fields follow an empty field, the empty field is indicated by consecutive commas with no intervening data. One of the optional fields in an event notification is an effective timestamp field. This field indicates the time that the event actually occurred. When present, it uses the ASN.1 Generalized Time format.

Variables Used in Message Text String of GUI Notifications

Table B-3 shows the variables that can appear in the message text of a GUI notification.

Table B-3 Variables Used in Message Text of GUI Notifications

Symbol Possible Values and Meanings Number of Characters

<PRIMARY|SECONDARY>

PRIMARY=Primary NPAC

SECONDARY=Secondary NPAC

7 or 9

<retry_interval>

Time, in minutes, between retries of a request sent to an NPAC after it sent a failure response

1-10

<retry_number>

Number of times the LSMS will retry to recover from a failure response sent by NPAC

1-10

<YYYYMMDDhhmmss>

Year, month, day, hour, minute, second

14

<NPAC_region_ID>

CA = Canada

MA = MidAtlantic

MW = Midwest

NE = Northeast

SE = Southeast

SW = Southwest

WE = Western

WC = WestCoast

2

Examples of GUI Notifications

Example of General Format GUI Notifications

Following is an example of a general GUI notification (for a description of its format, see General Format for GUI Notifications):


[Critical]:1998-07-05 11:49:56 2012 NPAC PRIMARY-NE Connection Attempt Failed:
Access Control Failure

Example of an EMS GUI Notification

Following is an example of an EMS GUI notification (for a description of its format, see Format for EMS GUI Notifications). In this example, <CLLI> has the value LNPBUICK:


[Critical]:1998-07-05 11:49:56 0003 LNPBUICK: Primary Association Failed

Example of GUI Notification with EVENT Severity Level

Following is an example of a GUI notification with severity [EVENT]. For a description of its format, see Format for GUI Notifications with EVENT Severity:


[EVENT]: 2000-02-05 11:49:56 8069 LNPBUICK: Audit LNP DB Synchronization Aborted

Logging GUI Notifications

When an event that generates a GUI notification occurs, that notification is logged in the file created for those events. Table B-4 shows the types of log files used for each of these file names, where <mmdd> indicates the month and day the event was logged.

Table B-4 Logs for GUI Notifications

Event Type Log File

EMS Alarms, NPAC Alarms, and Main LSMS Process Alarms

/var/TKLC/lsms/logs/alarm/LsmsAlarm.log.<mmdd>

Non-alarm Events

/var/TKLC/lsms/logs/<region>/LsmsEvent.log.<mmdd>, where <region> indicates the region of the NPAC that generated the information

For information about the format of the logs and how to view the logs, refer to the Database Administrator's Guide.

Overview of Surveillance Notifications

Surveillance notifications are created by the Surveillance feature. These notifications can report status that is not available through the GUI notifications and report status that can be monitored without human intervention.

Displaying Surveillance Notifications

Surveillance notifications are sent to Serial Port 3 on each server.

Format of Surveillance Notifications

All Surveillance notifications reported on the same server where the event occurred have the following format:


<Event Number>|<Time Stamp>|<Message Text String>

Surveillance notifications that originated from the non-active server and are reported on the active server where the event occurred have an additional field that shows the hostname of the server where the event occurred, as shown in the following format:


<Event Number>|<Time Stamp>|<Host Name>|<Message Text String>

Variables Used in Surveillance Notification Format Descriptions

Table B-5 shows the possible values and meanings for each of the variables shown in format definition for Surveillance notifications.

Table B-5 Variables Used in Surveillance Notifications

Field Description

<Event Number>

Four-digit number that identifies the specific Surveillance notification and also indicates the type of Surveillance notification, as shown in Table B-2 .

<Time Stamp>

Indicates time that the event was detected, in format:

hh:mm Mon DD, YYYY where fields are as follows:

Field

Meaning

Possible Values

hh

Hour

00 through 23

mm

Minute

00 through 59

Mon

Month

First three letters of month’s name

DD

Day

01 through 31

YYYY

Year

Any four digits

<Host Name>

First seven letters of the name of the host (one of two redundant servers) that noted the event. (In addition, the documentation of the individual event includes information about whether the event is reported by the active server or inactive server, or both servers.)

<Message Text String>

Text string (which may contain one or more variables defined in Table B-6) that provides a small amount of information about the event. For more information about the event, look up the corresponding event number in this appendix; for each event number, this appendix shows the text string as it appears in a Surveillance notification, as well as a more detailed explanation and suggested recovery.

Variables Used in Message Text String of Surveillance Notifications

Table B-6 shows the variables that can appear in the message text of a Surveillance notification.

Table B-6 Variables Used in Message Text of Surveillance Notifications

Symbol Possible Values and Meanings Number of Characters

<CLLI>

Common Language Location Identifier for the network element

11

<PRIMARY|SECONDARY>

PRIMARY=Primary NPAC

SECONDARY=Secondary NPAC

7 or 9

<NPAC_cust_ID>

0000 = Midwest

0001 = MidAtlantic

0002 = Northeast

0003 = Southeast

0004 = Southwest

0005 = Western

0006 = WestCoast

0008 = Canada

4

<NPAC_IP_Address>

IP address of the NPAC

10

<process_name>

First 12 characters of process name

12

<region>

Midwest

MidAtlantic

Northeast

Southeast

Southwest

Western

WestCoast

Canada

6 to 12

<return_code>

Return code

1 or 2

<Service_Assurance_Manager_name>

System name of machine that implements the Service Assurance Manager

12

<volume_name>

Name of disk volume, for example: a01

3

<volume_name_of_disk_

partition>

Name of disk volume, for example: a01

3

Example of a Surveillance Notification

Following is an example of a Surveillance notification:


LSMS8088|14:58 Mar 10, 2000|lsmspri|Notify: sys Admin - Auto Xfer Failure

Logging Surveillance Notifications

In addition to displaying Surveillance notifications, the Surveillance feature logs all Surveillance notifications in the file survlog.log in the/var/TKLC/lsms/logs directory.

If the LSMS Surveillance feature becomes unable to properly report conditions, it logs the error information in a file, named lsmsSurv.log, in the /var/TKLC/lsms/logs directory on each server’s system disk. When the size of lsmsSurv.log exceeds 1MB, it is copied to a backup file, named lsmsSurv.log.bak,in the same directory. There is only one LSMS Surveillance feature backup log file, which limits the amount of log disk space to approximately 2MB.

Overview of Traps

The optional Remote Monitoring feature provides the capability for the LSMS to report certain events and alarms to a remote location, using the industry-standard Simple Network Management Protocol (SNMP). The LSMS implements an SNMP agent.

Customers can use this feature to cause the LSMS to report events and alarms to another location, which implements an SNMP Network Management System (NMS). An NMS is typically a standalone device, such as a workstation, which serves as an interface through which a human network manager can monitor and control the network. The NMS typically has a set of management applications (for example, data analysis and fault recovery applications).

For more information about the LSMS implementation of an SNMP agent, see “Understanding the SNMP Agent Process”.

SNMP Version 3 Trap PDU Format

An SNMPv3 trap PDU consists of the following fields:

  • PDU Type

    Specifies the type of PDU (in this case, trap).

  • Request ID

    Used to associate requests with responses.

  • Error Status

    Specifies an error or error type in response PDUs only (else set to 0)

  • Error Index

    Associates an error with a particular object instance in response PDUs only (else set to 0)

  • Variable Bindings

    Each variable binding contains an object field followed by its value field. The object and value fields together specify information about the event being reported.

SNMP Version 1 Trap PDU Format

Following is an overview of the format of the SNMP version 1 trap request. For more information about SNMP message formats, refer to SNMP, SNMPv2, SNMPv3, and RMON 1 and 2, Third Edition, William Stallings, Addison Wesley, ISBN 0-201-48534-6, 1999.

Each SNMP message consists of the following fields:

  • SNMP authentication header, which consists of:
    • Version identifier, used to ensure that both the sender and receiver of the message are using the same version of the SNMP protocol. Currently, the LSMS supports only version 1, which has a version identifier of 0 (zero).
    • Community name, used to authenticate the NMS. The SNMP agent uses this field as a password to ensure that the sender of the message is allowed to access the SNMP agent’s information. The LSMS supports only trap requests, which originate at the LSMS; therefore, this field is not significant.
  • Protocol data unit (PDU), which for a trap request consists of:

An SNMPv1 trap PDU consists of the following fields:

  • PDU Type field, which specifies the type of PDU (in this case, trap).
  • Enterprise field, which identifies the device generating the message. For the LSMS SNMP agent, this field is 323.
  • Agent address field, which contains the IP address of the host that runs SNMP agent. For the LSMS SNMP agent, this field contains the IP address of the LSMS active server.
  • Generic trap type, which can be set to any value from 0 through 6. Currently, the LSMS supports only the value 6, which corresponds to the enterpriseSpecific type of trap request.
  • Specific trap type, which can be used to identify a specific trap.
  • Time stamp, which indicates how many hundredths of a second have elapsed since the last reinitialization of the host that runs the SNMP agent.
  • One or more variables bindings, each of which contains an object field followed by a value field. The object and value fields together specify information about the event being reported.

Logging SNMP Agent Actions

When the LSMS SNMP agent process starts, stops, or sends a trap request, it logs information about the action in a log file. The log file is named lsmsSNMP.log.<MMDD>, where <MMDD> represents the current month and day. The log file is stored in the directory /usr/TKLC/lsms/logs/snmp.

Table B-7 shows the actions and information logged by the LSMS SNMP agent.

Table B-7 Information Logged by the LSMS SNMP Agent

Action Information Logged

The SNMP agent starts

Action, followed by day, date, time, and year; for example:

LSMS SNMP agent started: Thu Mar 09 09:02:53 2000

The SNMP agent stops

Action, followed by day, date, time, and year; for example:

LSMS SNMP agent stopped: Thu Mar 09 15:34:50 2000

The SNMP agent sends a trap request

The following fields, delimited by pipe characters:

  • Timestamp, recorded as YYYYMMDDhhmmss (year, month, date, hour, minute, second)

  • trap_ID, a unique numeric identifier that corresponds to the specific trap request sent.

  • For each NMS configured (up to five allowed):

    • The NMS’s IP address

    • Status (either of the following):

      • S to indicate that the LSMS SNMP agent succeeded in sending the trap request. (Even if the LSMS SNMP agent successfully sends the trap request, there is no guarantee that the NMS receives it.)

      • F to indicate that the LSMS SNMP agent failed in sending the trap request.

Following is a sample entry logged when a trap is sent (in this entry, a trap with a trap_ID of 3 is sent to two NMSs):

20000517093127|3|10.25.60.33|S|10.25.60.10|S

Event Descriptions

0001

Explanation

The EMS Ethernet interface has a problem. The ping utility did not receive a response from the interface associated with the EMS.

Recovery

Consult with your network administrator.

Event Details

Table B-8 Event 0001 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - EMS interface failure

Source

Both servers

Frequency

Every 2.5 minutes as long as condition exists

Trap

Trap ID

16

Trap MIB Name

emsInterfaceFailure

0002

Explanation

The EMS, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text, requires a resynchronization with the LSMS that cannot be accomplished by automatic resynchronization between the LSMS and the EMS.

Recovery

Perform one of the synchronization procedures described in the LNP Database Synchronization User's Guide.

Event Details

Table B-9 Event 0002 Details

GUI Notification

Severity

Critical

Text

DB Maintenance Required

Surveillance Notification

Text

Notify:Sys Admin - NE CLLI=<CLLI>

Source

Active server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

33

Trap MIB Name

emsRequiresResynchWithLSMS

0003

Explanation

The LSMS has lost association with the primary EMS of the network element, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text; the association with the secondary EMS is established.

Recovery

Determine why the primary association failed (connectivity problem, EMS software problems, NE software problem, etc.). Correct the problem. Association will be automatically retried.

Event Details

Table B-10 Event 0003 Details

GUI Notification

Severity

Major

Text

Primary Association Failed

Surveillance Notification

Text

Notify:Sys Admin - NE CLLI=<CLLI>

Source

Active server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

5

Trap MIB Name

primaryEMSAssocLostSecEstablished

0004

Explanation

The LSMS has lost association with the primary EMS of the network element, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text; the association with the secondary EMS is not established.

Recovery

Determine why the primary association failed (connectivity problem, EMS software problems, NE software problem, etc.). Correct the problem, and then reestablish the association with the primary EMS.

Event Details

Table B-11 Event 0004 Details

GUI Notification

Severity

Critical

Text

Primary Association Failed

Surveillance Notification

Text

Notify:Sys Admin - NE CLLI=<CLLI>

Source

Active server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

36

Trap MIB Name

primaryEMSAssocLostNoSec

0006

Explanation

The pending queue used to hold transactions to be sent to the EMS/NE, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text, is full. To help ensure that no updates are lost, the eagleagent will abort associations with both the primary EMS and secondary EMS. Updates will be queued in a resynchronization log until the EMS reassociates.

Recovery

Determine why the EMS/NE is not receiving LNP updates, and correct the problem.

Event Details

Table B-12 Event 0006 Details

GUI Notification

Severity

Critical

Text

All Association(s) Aborted: Pending Queue Full

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

97

Trap MIB Name

emsAssociationAbortedQueueFull

0007

Explanation

The network element, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text, is busy and is sending ’retry later’ in response to a message sent by the eagleagent. The eagleagent has already tried resending the same message the maximum number of times. The eagleagent has aborted associations with both the primary EMS and secondary EMS.

Recovery

Correct the problem at the network element. When the EMS reconnects with the LSMS, the LSMS will automatically resynchronize the network element’s LNP database.

Event Details

Table B-13 Event 0007 Details

GUI Notification

Severity

Critical

Text

All Association(s) Aborted: Retries Exhausted

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

98

Trap MIB Name

emsAssocAbortedMaxResend

0008

Explanation

The LSMS has lost association with the secondary EMS which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text. The association with the primary EMS is still up.

Recovery

Determine why the secondary association failed (connectivity problem, EMS software problems, NE software problem, etc.) and then reestablish the association with the secondary EMS.

Event Details

Table B-14 Event 0008 Details

GUI Notification

Severity

Major

Text

Secondary Association Failed

Surveillance Notification

Text

Notify:Sys Admin - NE CLLI=<CLLI>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

130

Trap MIB Name

secondaryEMSAssocLost

0009

Explanation

The LSMS has established the first association with the network element (NE) which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text. The first association established is called the primary association. This EMS is called the primary EMS.

Recovery

No action required; this notification is for information only.

Event Details

Table B-15 Event 0009 Details

GUI Notification

Severity

Cleared

Text

Primary Association Established

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

8

Trap MIB Name

primaryEMSAssocEstablished

0010

Explanation

The LSMS has established the second association with the network element (NE) which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text. The association is established only if a primary association already exists. This EMS is called the secondary EMS.

Recovery

No action required; this notification is for information only.

Event Details

Table B-16 Event 0010 Details

GUI Notification

Severity

Cleared

Text

Secondary Association Established

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

134

Trap MIB Name

secondaryEMSAssocEstablished

0011

Explanation

The primary association for the EMS/NE, which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text, is either down or is inhibited, such that transactions sent to the primary EMS will not be received by the NE. Transactions are being sent to the secondary EMS instead of the primary EMS.

Recovery

Determine why the primary association failed (connectivity problem, EMS software problem, NE software problem, or other problem). Correct the problem. Association will be automatically retried. When the association is reestablished, it will be a secondary association, and the EMS will be the secondary EMS.

Event Details

Table B-17 Event 0011 Details

GUI Notification

Severity

Cleared

Text

Successful Switchover Occurred to Secondary EMS

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

139

Trap MIB Name

transactionToSecondary

2000

Explanation

The NPAC Ethernet interface has a problem. The ping utility did not receive a response from the interface associated with the NPAC.

Recovery

Consult with your network administrator.

Event Details

Table B-18 Event 2000 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - NPAC interface failure

Source

Both primary and secondary servers

Frequency

Every 2.5 minutes as long as condition exists

Trap

Trap ID

15

Trap MIB Name

npacInterfaceFailure

2001

Explanation

The association with the NPAC identified by <NPAC_region_ID> has been disconnected by the user.

Recovery

Examine additional GUI notifications to determine whether the LSMS is retrying the association. Follow the recovery actions described for the GUI notification.

Event Details

Table B-19 Event 2001 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Disconnected

Surveillance Notification

Text

Notify:Sys Admin - NPAC=<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

37

Trap MIB Name

lostNPACAssoc

2002

Explanation

The LSMS is not able to confirm the physical connectivity with the NPAC, which is specified in the System field on the GUI or is indicated by <NPAC_region_ID> in the Surveillance notification.

Recovery

Check the physical connection between the LSMS and the NPAC. The problem may be in the network, a router, or both.

Event Details

Table B-20 Event 2002 Details

GUI Notification

Severity

Critical

Text

LSMS Physical Disconnect with NPAC

Surveillance Notification

Text

Notify:Sys Admin - NPAC=<NPAC_region_ID>

Source

Active server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

45

Trap MIB Name

failedNPACConnectivity

2003

Explanation

The NPAC (PRIMARY or SECONDARY, as indicated) identified by <NPAC_region_ID> rejected the association because it received a message from the LSMS that failed security checks. This can be due to one of the following:

  • The CMIP departure time is more than five minutes out of synchronization with the NPAC servers.

  • The security key is not valid.

  • The CMIP sequence number is out of sequence (messages must be returned to the NPAC in the same order in which they were received).

Recovery

Do the following:

  1. Log in as lsmsadm to the active server.

  2. Enter the following command to determine what the LSMS system time is:

    $ date
  3. Contact the NPAC administrator to determine what the NPAC time is. If the NPAC time is more than five minutes different from the LSMS time, reset the LSMS system time on both servers and on the administration console using one of the procedures described in “Managing the System Clock”.

  4. After you have verified that the NPAC and LSMS times are within five minutes of each other, cause a different security key to be used by stopping and restarting the regional agent. Enter the following commands, where <region> is the name of the region in which this notification occurred:

    $LSMS_DIR/lsms stop <region> $LSMS_DIR/lsms start <region>
  5. Start the GUI again.

  6. Attempt to reassociate with the NPAC. For information about associating with an NPAC, refer to the Configuration Guide.

  7. If the problem persists, contact Oracle Technical Service.

Event Details

Table B-21 Event 2003 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Aborted by PEER: Access Control Failure

Surveillance Notification

Text

Notify:Sys Admin - NPAC=<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

95

Trap MIB Name

npacRejectedAssocAccessCtrlFail

2004

Explanation

The primary or secondary NPAC, identified by <NPAC_region_ID>, rejected the association because it received data that was not valid.

Recovery

Contact the NPAC administrator.

Event Details

Table B-22 Event 2004 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Aborted by PEER: Invalid Data Received

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

96

Trap MIB Name

npacRejectedAssocInvalidData

2005

Explanation

The LSMS has lost association with the primary or secondary NPAC identified by <NPAC_region_ID> because the user aborted the association.

Recovery

Reassociate with the NPAC when the reason for aborting the association no longer exists. For information about associating with an NPAC, refer to the Configuration Guide.

Event Details

Table B-23 Event 2005 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>]-<NPAC_region_ID> Association Aborted by User

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

9

Trap MIB Name

npacAbortByUser

2006

Explanation

The LSMS did not receive an association response from the NPAC within the timeout period. The LSMS will attempt the association with the NPAC again after an interval that defaults to two minutes, but can be configured to a different value by Oracle.

Recovery

Determine whether there is a network connection problem and/or contact the NPAC administrator to determine whether the NPAC is up and running.

Event Details

Table B-24 Event 2006 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Bind Timed Out - Auto Retry After NPAC_RETRY_INTERVAL

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

As soon as condition occurs, and at two-minute intervals as long as condition exists

Trap

Trap ID

100

Trap MIB Name

assocRespNPACTimeout

2007

Explanation

The NPAC association attempt was rejected by the NPAC, and the LSMS was informed to attempt the NPAC association again to the same NPAC host after an interval that defaults to two minutes, but can be configured to a different value by Oracle.

Recovery

No action required; the LSMS will automatically try to associate again.

Event Details

Table B-25 Event 2007 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Aborted by PEER - Auto Retry Same Host After NPAC_RETRY_INTERVAL

Surveillance Notification

Text

Notify:Sys Admin - NPAC=<

PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

101

Trap MIB Name

assocRejectedRetrySameHost

2008

Explanation

The NPAC association attempt was rejected by the NPAC, and the LSMS was informed to attempt the NPAC association again to the other NPAC host after an interval that defaults to two minutes, but can be configured to a different value by Oracle.

Recovery

No action required; the LSMS will automatically try to associate again.

Event Details

Table B-26 Event 2008 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>]-<NPAC_region_ID>- Connection Aborted by PEER - Auto Retry Other Host After NPAC_RETRY_INTERVAL

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

102

Trap MIB Name

assocRejectedRetryOtherHost

2009

Explanation

A problem exists in the network connectivity. The LSMS will attempt the association with the NPAC again after an interval that defaults to two minutes, but can be configured to a different value by Oracle.

Recovery

Check the network connectivity for errors. Verify the ability to ping the NPAC from the LSMS.

Event Details

Table B-27 Event 2009 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Aborted by PROVIDER - Auto Retry Same Host After NPAC_RETRY_INTERVAL

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

103

Trap MIB Name

nwtkProblemRetryNPACAssoc

2010

Explanation

The LSMS received three consecutive responses from the NPAC with a download status of failure from a recovery action request. The LSMS has aborted the association and will attempt to associate again after a retry interval that defaults to five minutes, but can be configured to a different value by Oracle. The LSMS will retry the recovery action after the association is reestablished.

Recovery

No action required; the LSMS will automatically try to associate again.

Event Details

Table B-28 Event 2010 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Aborted Due to Recovery Failure - Auto Retry After NPAC_RETRY_INTERVAL

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

104

Trap MIB Name

lsmsAbortedNPACassocDowRecFail

2011

Explanation

The LSMS has disconnected the association with the NPAC region in question due to the lack of a response to heartbeat messages from the LSMS to the NPAC.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-29 Event 2011 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Disconnected by Heartbeat

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

111

Trap MIB Name

lostNPACAssoc

2012

Explanation

The NPAC (primary or secondary, as indicated) identified by <NPAC_region_ID> rejected the association because of an access control failure. This can be due to one of the following:

  • The OSI Presentation Address is incorrect.

  • The Service Provider ID in the regional configuration file is incorrect.

  • The CMIP departure time is more than five minutes out of synchronization with the NPAC servers.

  • The security key is not valid.

Recovery

Do the following:

  1. Verify that the correct PSEL, SSEL, TSEL, and NSAP values have been configured for the OSI Presentation Address (for more information, refer to “Viewing a Configured NPAC Component” in the Configuration Guide). If you need to change the values, use the procedure described in “Modifying an NPAC Component” in the Configuration Guide.

  2. Verify that the configured Service Provider ID (SPID) is the same as the SPID assigned by the NPAC. For more information about this configuration file, refer to “Modifying LSMS Configuration Components” in the Configuration Guide.

  3. Verify that the configured NPAC_SMS_NAME is the same as the value assigned by the NPAC (this field is case-sensitive). For more information about this configuration file, refer to “Modifying an NPAC Component” in the Configuration Guide.

  4. Log in as lsmsadm to the active server.

  5. Enter the following command to determine what the LSMS system time is:

    $ date
  6. Contact the NPAC administrator to determine what the NPAC time is. If the NPAC time is more than five minutes different from the LSMS time, reset the LSMS system time on both servers and on the administration console by performing one of the procedures described in “Managing the System Clock”.

  7. After you have verified that the NPAC and LSMS times are within five minutes of each other, cause a different security key to be used by stopping and restarting the regional agent. Enter the following commands, where <region> is the name of the region in which this notification occurred:

    $ $LSMS_DIR/lsms stop <region> $ $LSMS_DIR/lsms start <region>
  8. Start the GUI again.

  9. Attempt to reassociate with the NPAC.

  10. If the problem persists, contact Oracle Technical Service.

Event Details

Table B-30 Event 2012 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Attempt Failed: Access Control Failure

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

106

Trap MIB Name

assocRejDueToAccessControl

2014

Explanation

The userInfo value in the cmipUserInfo portion of the NPAC association response CMIP message is not valid.

Recovery

Contact the NPAC administrator to determine why the NPAC is sending an invalid association response.

Event Details

Table B-31 Event 2014 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Attempt Failed: Invalid Data Received

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

108

Trap MIB Name

npacConnFailedCMIP

2015

Explanation

The NPAC association was terminated gracefully by the NPAC.

Recovery

According to the NANC specifications, this should never occur; if this message is seen, contact the NPAC administrator for the reason for the association unbind.

Event Details

Table B-32 Event 2015 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Disconnected by NPAC

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

109

Trap MIB Name

npacAssocGracefullyTerminated

2018

Explanation

The LSMS was unable to properly resynchronize (with the NPAC) the data that was lost while the LSMS was not associated with the NPAC.

Recovery

Do the following:

  1. Abort the NPAC association (refer to the Configuration Guide).

  2. Attempt to reassociate with the NPAC (refer to the Configuration Guide).

  3. If the reassociation is not successful, contact the NPAC and contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-33 Event 2018 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Recovery Failed

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

112

Trap MIB Name

lsmsDataLostBadResynch

2019

Explanation

The LSMS data lost during the resynchronization time was not resynchronized properly with the NPAC.

Recovery

Do the following:

  1. Abort the NPAC association (refer to the Configuration Guide).

  2. Reestablish the NPAC association (refer to the Configuration Guide).

  3. Determine whether notification automatic-monitoring-events1.html NPAC <PRIMARY|SECONDARY> Recovery Complete is posted. If instead notification 2019 reappears, perform a resynchronization for a period of time starting one hour before the 2019 notification first appeared, using either the GUI (refer to “Resynchronizing for a Defined Period of Time Using the GUI” in the Database Administrator's Guide).

  4. If 2019 continues to appear, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-34 Event 2019 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Recovery Partial Failure

Surveillance Notification

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Recovery Failure

Source

Active server

Frequency

Once , as soon as condition occurs

Trap

Trap ID

113

Trap MIB Name

badNPACresynchTime

2020

Explanation

The LSMS aborted the NPAC association because the LSMS received a message from the NPAC that did not have the correct LSMS key signature.

Recovery

Verify that the correct keys are being used by both the NPAC and the LSMS.

Event Details

Table B-35 Event 2020 Details

GUI Notification

Severity

Critical

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Security Violation. Association Aborted. Retrying

Surveillance Notification

Text

Notify:Sys Admin - NPAC=

<PRIMARY|SECONDARY>-<NPAC_region_ID>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

114

Trap MIB Name

assocAbortedBadKeys

2021

Explanation

An associate retry timer was in effect. The retry attempt was canceled because a GUI user issued an Associate, Abort or Disconnect request. If an Associate request was issued, the association is attempted immediately.

Recovery

No action required; for information only.

Event Details

Table B-36 Event 2021 Details

GUI Notification

Severity

Major

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Automatic Association Retry Canceled

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

122

Trap MIB Name

npacAutoAssociationRetryCanceled

2022

Explanation

Either the LSMS did not receive any response from the NPAC before a timeout expired or the LSMS received a response from the NPAC with a download status of failure from a recovery action request. The NPAC is unable to process the recovery action due to a temporary resource limitation. The LSMS will retry the request for the number of times indicated by <retry_number> with the interval between each retry indicated by <retry_interval> minutes. If recovery is not successful after the indicated number of retries, the LSMS will abort the association and post the following notification:


[Critical]: <Timestamp>  2010
:  NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Aborted Due to Recovery Failure - Auto Retry After NPAC_RETRY_INTERVAL

Recovery

No action required; for information only.

Event Details

Table B-37 Event 2022 Details

GUI Notification

Severity

Major

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Fail/No Response from NPAC Recovery - Auto Retry <retry_number> Times in <retry_interval> Minutes

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

123

Trap MIB Name

npacRecoveryFailureResourceLimit

2023

Explanation

The NPAC association will be down for the specified period of time (from the first time field shown in the notification to the second time field shown in the notification) due to NPAC-scheduled down time.

Recovery

When the scheduled down time is over, manually reestablish the NPAC association. For information about aborting and reestablishing an association, refer to the Configuration Guide.

Event Details

Table B-38 Event 2023 Details

GUI Notification

Severity

Major

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] ScheduleDownTime from [<YYYYMMDDhhmmss>] to [<YYYYMMDDhhmmss>]

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

124

Trap MIB Name

npacAssocPeriodDown

2024

Explanation

An Associate request has been sent to the NPAC after a retry timer expired.

Recovery

No action required; for information only.

Event Details

Table B-39 Event 2024 Details

GUI Notification

Severity

Major

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Timer Expired - Resending Association Request

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

125

Trap MIB Name

npacAssocRequestSentAfterRetryTimer

2025

Explanation

The NPAC association was successfully established.

Recovery

No action required; for information only.

Event Details

Table B-40 Event 2025 Details

GUI Notification

Severity

Cleared

Text

NPAC [<PRIMARY|SECONDARY>-<NPAC_region_ID>] Connection Successfully Established

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

7

Trap MIB Name

npacAssocEstablished

4000

Explanation

The active server has initiated an automatic switchover to the inactive server.

Recovery

No action required; for information only.

Event Details

Table B-41 Event 4000 Details

GUI Notification

Severity

Event

Text

Switchover Initiated

Surveillance Notification

Text

Notify:Sys Admin - Switchover initiated

Source

Active server

Frequency

Once, soon as condition occurs.

Trap

Trap ID

11

Trap MIB Name

switchOverStarted

4001

Explanation

LSMS service has been switched over.

Recovery

No action required; for information only.

Event Details

Table B-42 Event 4001 Details

GUI Notification

Severity

Event

Text

Switchover complete

Surveillance Notification

Text

Notify:Sys Admin - Switchover complete

Source

Active server

Frequency

Once, soon as condition occurs.

Trap

Trap ID

12

Trap MIB Name

switchOverCompleted

4002

Explanation

LSMS service could not be switched over to the inactive server; the inactive server was not able to start LSMS service.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-43 Event 4002 Details

GUI Notification

Severity

Event

Text

Switchover Failed

Surveillance Notification

Text

Notify:Sys Admin - Switchover Failed

Source

Active server

Frequency

Once, as soon as condition occurs.

Trap

Trap ID

13

Trap MIB Name

switchOverFailed

4003

Explanation

This notification indicates that the disk controller <controllerId> is out of service and is affecting shared storage. This notification is only valid on E3000 systems.

controllerId= The specific controller number (either 0 or 1).

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-44 Event 4003 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - Loss of disk on < controllerId>

Source

Either server

Frequency

Every 5 minutes as long as condition exists

Trap

Trap ID

14

Trap MIB Name

diskContrService

4004

Explanation

The Ethernet interface used to connect to the application network has a problem. This interface usually connects to network-connected workstations. The ping utility did not receive a response from the interface associated with the application network.

Recovery

Consult with your network administrator.

Event Details

Table B-45 Event 4004 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - APP interface failure

Source

Either server

Frequency

Every 2.5 minutes as long as condition exists

Trap

Trap ID

17

Trap MIB Name

appsInterfaceFailure

4005

Explanation

This notification indicates that the Ethernet interface used to connect to the ADMINISTRATION network has a problem.

Recovery

Consult with your network administrator.

Event Details

Table B-46 Event 4005 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - ADMIN interface faire

Source

Either server

Frequency

Every 2.5 minutes as long as condition exists

Trap

Trap ID

18

Trap MIB Name

adminInterfaceFailure

4006

Explanation

This notification indicates that the system disk has lost synchronization, possibly due to a hardware problem.

driveSpecId= disk drive specification.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-47 Event 4006 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - < driveSpecId >

Source

Either server

Frequency

Every 5 minutes as long as condition exists

Trap

Trap ID

20

Trap MIB Name

systemDiskSynch

4007

Explanation

Database replication has failed.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-48 Event 4007 Details

GUI Notification

Severity

Critical

Text

DB Repl Err - <dbReplErr>

Surveillance Notification

Text

Notify:Sys Admin - DB repl error

Source

Both servers

Frequency

Every minute as long as condition exists.

Trap

Trap ID

21

Trap MIB Name

dataReplError

4008

Explanation

The database replication process monitor has failed.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-49 Event 4008 Details

GUI Notification

Severity

Critical

Text

DB Proc Mon Err - <dbMonErr>

Surveillance Notification

Text

Notify:Sys Admin - DB monitor failure

Source

Active server

Frequency

Every five minutes as long as condition exists.

Trap

Trap ID

22

Trap MIB Name

dbMonitorFail

4009

Explanation

The server has an internal disk error.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-50 Event 4009 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - Internal Disk Error

Source

Either server

Frequency

Within five minutes of the condition occurring and at five-minute intervals as long as condition exists

Trap

Trap ID

23

Trap MIB Name

internalDiskError

4010

Explanation

This notification indicates that the hot-spare feature has completed automatic data resynchronization.

Recovery

No action required; this notification is for information only.

Event Details

Table B-51 Event 4010 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - < driveSpecId >- < driveSpecId >

Source

Either server

Frequency

Once

Trap

Trap ID

24

Trap MIB Name

hotSparedDataResynch

4011

Explanation

This notification indicates that LSMS database replication is delayed.

Recovery

No action required.

Event Details

Table B-52 Event 4011 Details

GUI Notification

Severity

N/A

Text

DB Repl Info

Surveillance Notification

Text

Notify:Sys Admin - DB repl info

Source

Either server

Frequency

Within five minutes of the condition occurring and every minute thereafter as long as condition exists.

Trap

Trap ID

25

Trap MIB Name

dataReplInfo

4012

Explanation

A process specified by <process_name> is utilizing 40 percent or more of the LSMS’s CPU resource and the <second_ID> indicates a specific instance of the process, as follows:

  • When the <process_name> is eagleagent, the <second_ID> specifies the Common Language Location Indicator (CLLI) of the network element

  • When the <process_name> is npacagent, the <second_ID> specifies the name of the region

  • When the <process_name> is not eagleagent or npacagent, the <second_ID> specifies the process ID (PID) of the process.

Recovery

Because this notification is posted every five minutes as long as the condition exists, you may choose to ignore this notification the first time that it appears. However, if this notification is repeated several times in a row, do one of the following:

  1. If the <process_name> is not npacagent, go to step 4. Otherwise, determine whether the npacagent is still using 40% or more of the CPU resource by entering the following command, where <region> can be optionally specified (it is the name of the region as displayed at the end of the notification text):

    $ ps -eo pid,pcpu,args | grep npacagent | grep <region>
  2. If the npacagent is still using 40% or more of the CPU resource, enter the following commands to stop the npacagent and restart it, where <region> is the name of the NPAC region whose npacagent is using 40% or more of the CPU resource:

    $ cd $LSMS_DIR

    $ lsms stop <region>

    $ lsms start <region>

  3. Repeat step 1. If the npacagent you tried to stop is still using 40% or more of the CPU resource, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

  4. If the <process_name> is not eagleagent, go to step 7. Otherwise, determine whether the eagleagent is still using 40% or more of the CPU resource by entering the following command, where <CLLI> can be optionally specified (it is the name of the network element as displayed at the end of the notification text):

    $ ps -eo pid,pcpu,args | grep eagleagent | grep <CLLI>
  5. If the eagleagent is still using 40% or more of the CPU resource, enter the following commands to stop the eagleagent and restart it, where <CLLI> is the Common Language Location Indicator (CLLI) of the network element whose eagleagent is using 40% or more of the CPU resource:

    $ cd $LSMS_DIR

    $ eagle stop <CLLI>

    $ eagle start <region>

  6. Repeat step 1. If the process you tried to stop is still using 40% or more of the CPU resource, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

  7. If the <process_name> is not eagleagent or npacagent, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-53 Event 4012 Details

GUI Notification

Severity

Major

Text

Process [<process_name>-<second_ID>] Utilizing High Percentage of CPU

Surveillance Notification

Text

Notify:Sys Admin - [<process_name>-<second_ID>]

Source

Either server

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

26

Trap MIB Name

cpuUtilitzationOver39

4013

Explanation

The LSMS server with default hostname lsmspri has been inhibited.

Recovery

As soon as possible, start the server by performing the procedure described in “Starting a Server”.

Event Details

Table B-54 Event 4013 Details

GUI Notification

Severity

Major

Text

Primary Server Inhibited

Surveillance Notification

Text

Notify:Sys Admin - Primary inhibited

Source

Server with default hostname lsmspri

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

27

Trap MIB Name

primaryServerInhibited

4014

Explanation

The LSMS server with default hostname lsmssec has been inhibited.

Recovery

As soon as possible, start the server by performing the procedure described in “Starting a Server”.

Event Details

Table B-55 Event 4014 Details

GUI Notification

Severity

Major

Text

Secondary Server Inhibited

Surveillance Notification

Text

Notify:Sys Admin - Secondary inhibited

Source

Server with default hostname lsmssec

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

28

Trap MIB Name

secondaryServerInhibited

4015

Explanation

A heartbeat link is down.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-56 Event 4015 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - Heartbeat failure

Source

Both servers

Frequency

Once, as soon as condition occurs

Trap

Trap ID

29

Trap MIB Name

heartbeatLinkDown

4016

Explanation

This notification indicates that the Heartbeat 2 link is down.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-57 Event 4016 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - Heartbeat 2 failure

Source

Both server

Frequency

Once

Trap

Trap ID

30

Trap MIB Name

heartbeatLinkTwoDown

4017

Explanation

This notification indicates that the LSMS network configuration is incorrect.

Recovery

Customer or field engineers should:

Event Details

Table B-58 Event 4017 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - Network setup error

Source

Active server

Frequency

Every 5 minutes

Trap

Trap ID

31

Trap MIB Name

lsmsNtwkConfigError

4018

Explanation

This notification indicates that the LSMS network configuration is not supported or recommended.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-59 Event 4018 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - Network setup unsupp

Source

Active server

Frequency

Every 5 minutes

Trap

Trap ID

32

Trap MIB Name

lsmsNtwkConfigNotSupported

4019

Explanation

This notification indicates that the disk volume specified by diskVolName has exceeded the 95 percent usage threshold.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-60 Event 4019 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - < diskVolName >

Source

Either server

Frequency

Every 5 minutes

Trap

Trap ID

38

Trap MIB Name

diskVolume95Usage

4020

Explanation

The server’s swap space has exceeded the critical usage threshold (default = 95%).

Recovery

If the problem persists, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-61 Event 4020 Details

GUI Notification

Severity

Critical

Text

Swap space exceeds Critical

Surveillance Notification

Text

Notify:Sys Admin - Swap space Critical

Source

Either server

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

39

Trap MIB Name

swapSpaceCritical

4021

Explanation

The LSMS application or system daemon whose name has <process_name> as the first 12 characters is not running.

Recovery

No user action is necessary. The Surveillance process automatically restarts the Service Assurance process (sacw) and the sentryd process automatically restarts other processes.

Event Details

Table B-62 Event 4021 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - <process_name> failed

Source

Active server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

40

Trap MIB Name

lsmsAppsNotRunning

4022

Explanation

The backup of the LSMS database has completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-63 Event 4022 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

DATABASE backup complete

Source

Standby server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

41

Trap MIB Name

backupCompleted

4023

Explanation

The backup of the LSMS database has failed.

Recovery

Review backup output to determine why backup failed, correct the problems, and run backup script again manually.

Note:

Determine whether the NAS can be reached using the ping command. If the NAS cannot be reached, restart the NAS. To restart the NAS turn the power off, then turn the power on. If the NAS can be reached, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-64 Event 4023 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - DATABASE backup failed

Source

Standby server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

42

Trap MIB Name

backupFailed

4024

Explanation

The primary LSMS server (Server 1A) is not providing the LSMS service.

Recovery

No action required; for information only.

Event Details

Table B-65 Event 4024 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - Primary not online

Source

Both primary and secondary servers

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

63

Trap MIB Name

primaryServerNotOnline

4025

Explanation

The standby server is not prepared to take over LSMS service.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-66 Event 4025 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - Can't switch to standby

Source

Standby server

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

64

Trap MIB Name

standbyNotReadyForSwitchover

4026

Explanation

The secondary LSMS server (Server 1B) is currently providing the LSMS service.

Recovery

No action required; for information only.

Event Details

Table B-67 Event 4026 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - Secondary online

Source

Both primary and secondary servers

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

65

Trap MIB Name

secServerProvidingLSMSService

4027

Explanation

The standby LSMS server cannot determine the availability of the LSMS service on the active server.

Recovery

Determine if the other server is working normally. Also, verify that the heartbeat connections (eth2, eth3, and the serial cable) are connected and functioning properly

Event Details

Table B-68 Event 4027 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - Primary status unknown

Source

Standby server

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

66

Trap MIB Name

secServerCannotDeterminePrimAvailability

4028

Explanation

This notification indicates an LSMS mirroring inconsistency.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-69 Event 4028 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - < volume_name >

Source

Either server

Frequency

Every 5 minutes

Trap

Trap ID

169

Trap MIB Name

lsmsMirroringInconsistance

4029

Explanation

This notification indicates that the LSMS filesystem is not writeable.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-70 Event 4029 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - < fileSystem >

Source

Either server

Frequency

Every 5 minutes

Trap

Trap ID

170

Trap MIB Name

lsmsFilesystemNotWritable

4030

Explanation

The server’s swap space has exceeded the major usage threshold (default = 80%).

Recovery

If the problem persists, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-71 Event 4030 Details

GUI Notification

Severity

Major

Text

Swap Space Warning

Surveillance Notification

Text

Notify:Sys Admin - Swap space warning

Source

Both servers

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

190

Trap MIB Name

swapSpaceWarning

4031

Explanation

A database replication error that was reported earlier by the 4007 event has now been cleared.

Recovery

No action necessary.

Event Details

Table B-72 Event 4031 Details

GUI Notification

Severity

Cleared

Text

Database Replication cleared - <dbReplErr>

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

195

Trap MIB Name

dataReplClear

4032

Explanation

A database process monitor error that was reported earlier by the 4008 event has now been cleared.

Recovery

No action necessary.

Event Details

Table B-73 Event 4032 Details

GUI Notification

Severity

Cleared

Text

Database Replication cleared - <dbMonErr>

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

196

Trap MIB Name

dbMonitorCLear

4033

Explanation

The LSMS database failed count operation, which suggests a corrupt MySQL index.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-74 Event 4033 Details

GUI Notification

Severity

Critical

Text

Database Corrupt Index

Surveillance Notification

Text

None

Source

Both servers

Frequency

Every 30 minutes.

Trap

Trap ID

200

Trap MIB Name

dbCorruptIndex

4034

Explanation

This notification indicates that the Invalid Snapshot has been detected.

Recovery

Clean Up After Failed or Interrupted Snapshot

Event Details

Table B-75 Event 4034 Details

GUI Notification

Severity

Critical

Text

Invalid Snapshot - <snapName>

Surveillance Notification

Text

Notify:Sys Admin - Invalid Snapshot

Source

Active server

Frequency

Every 30 minutes

Trap

Trap ID

201

Trap MIB Name

snapInvalidErr

4035

Explanation

This notification indicates that the Invalid Snapshot error has been cleared.

Recovery

No action required; this notification is for information only.

Event Details

Table B-76 Event 4035 Details

GUI Notification

Severity

Cleared

Text

Invalid Snapshot cleared - <snapName>

Surveillance Notification

Text

Invalid Snapshot cleared - <snapName>

Source

Active server

Frequency

Every 30 minutes

Trap

Trap ID

202

Trap MIB Name

snapInvalidClear

4036

Explanation

This notification indicates that the Snapshot is greater than 80% full.

Recovery

No action required; this notification is for information only.

Event Details

Table B-77 Event 4036 Details

GUI Notification

Severity

Critical

Text

Full Snapshot - <snapName>

Surveillance Notification

Text

Notify:Sys Admin - Full Snapshot

Source

Active server

Frequency

Every 30 minutes

Trap

Trap ID

203

Trap MIB Name

fullSnapshot

4037

Explanation

This notification indicates that the Snapshot full error is cleared.

Recovery

No action required; this notification is for information only.

Event Details

Table B-78 Event 4037 Details

GUI Notification

Severity

Cleared

Text

Full Snapshot cleared - <snapName>

Surveillance Notification

Text

Full Snapshot cleared - <snapName>

Source

Active server

Frequency

Every 30 minutes

Trap

Trap ID

204

Trap MIB Name

fullSnapshotClear

4038

Explanation

The mate server is down.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-79 Event 4038 Details

GUI Notification

Severity

Critical

Text

Mate Server Down

Surveillance Notification

Text

Notify:Sys Admin - Mate Server Down

Source

Both servers

Frequency

Every minute as long as condition exists

Trap

Trap ID

205

Trap MIB Name

mateServerDown

4039

Explanation

The mate server is up.

Recovery

No action is required.

Event Details

Table B-80 Event 4039 Details

GUI Notification

Severity

Cleared

Text

Mate Server Up

Surveillance Notification

Text

Notify:Sys Admin - Mate Server Up

Source

Both servers

Frequency

As soon as condition clears

Trap

Trap ID

206

Trap MIB Name

mateServerUp

4100

Explanation

One or more platform alarms in the minor category exists. To determine which minor platform alarms are being reported, see “How to Decode Platform Alarms”. When the active server reports minor platform alarms that originated on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Note:

If you received Event 4100 in response to an snmpget error, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 to have the NAS snmp daemon stopped and restarted.

Event Details

Table B-81 Event 4100 Details

GUI Notification

Severity

Minor

Text

Minor Platform Alarm [hostname]: <alarm_string>

Surveillance Notification

Text

Notify:Sys Admin - ALM <alarm_string>

Source

Both servers

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

191

Trap MIB Name

minorPlatAlarmMask

4101

Explanation

All platform alarms in the minor category have been cleared. When the active server reports that all minor platform alarms have cleared on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

No action necessary.

Event Details

Table B-82 Event 4101 Details

GUI Notification

Severity

Cleared

Text

Minor Platform Alarms Cleared

Surveillance Notification

Text

Notify:Sys Admin - Minor Plat alrms clear

Source

Both servers

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

197

Trap MIB Name

minorPlatAlarmClear

4200

Explanation

One or more platform alarms in the major category exists. To determine which major platform alarms are being reported, see “How to Decode Platform Alarms”. When the active server reports major platform alarms that originated on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-83 Event 4200 Details

GUI Notification

Severity

Major

Text

Major Platform Alarm [hostname]: <alarm_string>

Surveillance Notification

Text

Notify:Sys Admin - ALM <alarm_string>

Source

Both servers

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

192

Trap MIB Name

majorPlatAlarmMask

4201

Explanation

All platform alarms in the major category have been cleared. When the active server reports that all major platform alarms have cleared on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

No action necessary.

Event Details

Table B-84 Event 4201 Details

GUI Notification

Severity

Cleared

Text

Major Platform Alarms Cleared

Surveillance Notification

Text

Notify:Sys Admin - Major Plat alrms clear

Source

Both servers

Frequency

Once

Trap

Trap ID

198

Trap MIB Name

majorPlatAlarmClear

4300

Explanation

One or more platform alarms in the critical category exists. To determine which critical platform alarms are being reported, see “How to Decode Platform Alarms”. When the active server reports critical platform alarms that originated on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-85 Event 4300 Details

GUI Notification

Severity

Critical

Text

Critical Platform Alarm [hostname]: <alarm_string>

Surveillance Notification

Text

Notify:Sys Admin - ALM <alarm_string>

Source

Both servers

Frequency

Once

Trap

Trap ID

193

Trap MIB Name

criticalPlatAlarmMask

4301

Explanation

All platform alarms in the major category have been cleared. When the active server reports that all major platform alarms have cleared on the other server, the hostname of the other server is inserted before the alarm string.

Recovery

No action necessary.

Event Details

Table B-86 Event 4301 Details

GUI Notification

Severity

Cleared

Text

Critical Platform Alarms Cleared

Surveillance Notification

Text

Notify:Sys Admin - Crit Plat alrms clear

Source

Both servers

Frequency

Once

Trap

Trap ID

199

Trap MIB Name

criticalPlatAlarmClear

6000

Explanation

The eagleagent process has been started.

Recovery

No action required; for information only.

Event Details

Table B-87 Event 6000 Details

GUI Notification

Severity

Cleared

Text

Eagleagent <CLLI> Has Been Started

Surveillance Notification

Text

Notify:Sys Admin - <CLLI> started

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

1

Trap MIB Name

eagleAgentStarted

6001

Explanation

The eagleagent process has been stopped by the eagle script.

Recovery

No action required; for information only.

Event Details

Table B-88 Event 6001 Details

GUI Notification

Severity

Critical

Text

Eagleagent <CLLI> Has Been Stopped by User

Surveillance Notification

Text

Notify:Sys Admin - <CLLI> norm exit

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

2

Trap MIB Name

eagleAgentStoppedbyscript

6002

Explanation

The npacagent for the region indicated by <NPAC_region_ID> has been started.

Recovery

No action required; for information only.

Event Details

Table B-89 Event 6002 Details

GUI Notification

Severity

Cleared

Text

NPACagent Has Been Started

Surveillance Notification

Text

Notify:Sys Admin - <NPAC_region_ID> NPACagent started

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

3

Trap MIB Name

NPACAgentStarted

6003

Explanation

The npacagent for the region indicated by <region> has been stopped using the lsms command.

Recovery

No action required; for information only. If you desire to restart the agent, do the following:

  1. Log in to the active server as lsmsadm.

  2. Enter the following commands to start the npacagent where <region> is the name of the NPAC region:

    $ cd $LSMS_DIR

    $ lsms start <region>

Event Details

Table B-90 Event 6003 Details

GUI Notification

Severity

Critical

Text

NPACAgent Has Been Stopped by User

Surveillance Notification

Text

Notify:Sys Admin - <NPAC_region_ID> norm exit

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

4

Trap MIB Name

lsmsCommandStoppedNPACAgent

6004

Explanation

The eagleagent process for the network element identified by <CLLI> has failed. The sentryd process will attempt to restart.

Recovery

No action required; the sentryd process will attempt to restart the eagleagent process.

Event Details

Table B-91 Event 6004 Details

GUI Notification

Severity

Critical

Text

Eagleagent [<CLLI>] Has Failed

Surveillance Notification

Text

Notify:Sys Admin - FAILD: <CLLI>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

74

Trap MIB Name

lsmsEagleAgentFailed

6005

Explanation

The eagleagent process for the network element identified by <CLLI> has been successfully restarted by the sentryd process.

Recovery

No action required.

Event Details

Table B-92 Event 6005 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - RECOV: <CLLI>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

75

Trap MIB Name

lsmsEagleAgentRestarted

6006

Explanation

The sentryd process was unable to restart the eagleagent process for the network element identified by <CLLI>.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-93 Event 6006 Details

GUI Notification

Severity

Critical

Text

Failure Restarting Eagleagent [<CLLI>]

Surveillance Notification

Text

Notify:Sys Admin - RFAILD: <CLLI>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

76

Trap MIB Name

failureToRestartEagleAgent

6008

Explanation

The npacagent process for the region specified by <NPAC_region_ID> has failed. The sentryd process will attempt to restart.

Recovery

No action required; the sentryd process will attempt to restart the npacagent process.

Event Details

Table B-94 Event 6008 Details

GUI Notification

Severity

Critical

Text

NPACagent [<NPAC_region_ID>] Failure

Surveillance Notification

Text

Notify:Sys Admin - FAILD: <NPAC_region_ID> agent

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

78

Trap MIB Name

NPACagentForRegionFailure

6009

Explanation

The npacagent process for the region specified by <NPAC_region_ID> has been successfully restarted by the sentryd process.

Recovery

No action required. Any active LSMS GUI processes will automatically reconnect.

Event Details

Table B-95 Event 6009 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - RECOV: <NPAC_region_ID> agent

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

79

Trap MIB Name

NPACagentForRegionRestarted

6010

Explanation

The sentryd process was unable to restart the npacagent process for the region specified by <NPAC_region_ID>.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-96 Event 6010 Details

GUI Notification

Severity

Critical

Text

Failure Restarting NPACagent [<NPAC_region_ID>]

Surveillance Notification

Text

Notify:Sys Admin - RFAILD: <NPAC_region_ID> agent

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

80

Trap MIB Name

failureToRestartNPACagentRegion

6020

Explanation

The npacagent process has been stopped due to a fault in accessing the regional database.

Recovery

A database error has occurred. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-97 Event 6020 Details

GUI Notification

Severity

Critical

Text

NPACagent Has Been Shut Down - Database Access Error

Surveillance Notification

Text

Notify:Sys Admin - <NPAC_region_ID> DB error

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

189

Trap MIB Name

NPACagentStopRegDBaccessFault

8000

Explanation

The LSMS Surveillance feature is in operation.

Recovery

No action required; for information only.

Event Details

Table B-98 Event 8000 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Keep alive

Source

Both primary and secondary servers

Frequency

Every five minutes as long as condition exists

Trap

Trap ID

19

Trap MIB Name

survFeatureOn

8001

Explanation

The network element resynchronization database contains more than 1 million entries.

Recovery

Each day, as part of a cron job, the LSMS trims the resynchronization database so that it contains 768,000 entries. The occurrence of this event means that more than 232,000 transactions have been received since the last cron job. If this event occurs early in the day, contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-99 Event 8001 Details

GUI Notification

Severity

Major

Text

ResyncDB Contains 1 Mil Entries

Surveillance Notification

Text

Notify:Sys Admin - ResyncDB 1 Mil

Source

Active server

Frequency

Once

Trap

Trap ID

34

Trap MIB Name

resynchLogMidFull

8003

Explanation

The pending queue, used to hold the transactions to send to the network element (which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text), is over half full.

Recovery

No recovery is required. Informational only.

Event Details

Table B-100 Event 8003 Details

GUI Notification

Severity

Major

Text

EMS Pending Queue Is Half full

Surveillance Notification

Text

Notify:Sys Admin - CLLI=<CLLI>

Source

Active server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

43

Trap MIB Name

ensPendingQueueHalfFull

8004

Explanation

The pending queue, used to hold the transactions to send to the network element (which is indicated in the System field on the GUI or whose CLLI has the value that replaces <CLLI> in the Surveillance notification text), is completely full. The association to that EMS will be broken.

Recovery

No manual recovery required. The LSMS will automatically re-establish the association to the EMS and synchronization will take place.

Event Details

Table B-101 Event 8004 Details

GUI Notification

Severity

Critical

Text

EMS Pending Queue Is Full

Surveillance Notification

Text

Notify:Sys Admin - CLLI=<CLLI>

Source

Active server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

Trap

Trap ID

44

Trap MIB Name

emsPendingQueueMaxReached

8005

Explanation

There was a data error in a record that prevented the LSMS eagleagent from sending the record to the network element.

Recovery

Both the error and the ignored record are written to the file /var/TKLC/lsms/logs/trace/LsmsTrace.log.<mmdd>, where <mmdd> indicates the month and day the error occurred. Examine the log file for the month and day this error was reported to determine what the error was. Enter the data manually or send it again.

Event Details

Table B-102 Event 8005 Details

GUI Notification

Severity

Minor

Text

Eagleagent <CLLI> Ignoring Record: <DataError>

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

46

Trap MIB Name

eagleAgentIgnoredRecord

8024

Explanation

The Service Assurance agent has started successfully.

Recovery

No action required; for information only.

Event Details

Table B-103 Event 8024 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

67

Trap MIB Name

serviceAssuranceAgentStarted

8025

Explanation

Association with the Service Assurance Manager, identified by <Service_Assurance_Manager_Name>, has been established successfully.

Recovery

No action required; for information only.

Event Details

Table B-104 Event 8025 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - <Service_Assurance_Manager_Name>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

68

Trap MIB Name

establishServAssuranceMgrAssoc

8026

Explanation

Association with the Service Assurance Manager, identified by <Service_Assurance_Manager_Name>, has been stopped or disconnected.

Recovery

Contact the Service Assurance system administrator to determine the cause of disconnection, then have Service Assurance system administrator reassociate the Service Assurance Manager to the Service Assurance Agent.

Event Details

Table B-105 Event 8026 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - <Service_Assurance_Manager_Name>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

69

Trap MIB Name

servAssuranceMgrAssocBroken

8027

Explanation

The Service Assurance agent is not currently running.

Recovery

No action required; the Service Assurance agent should be restarted automatically.

Event Details

Table B-106 Event 8027 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

70

Trap MIB Name

servAssuranceAgentNotRunning

8030

Explanation

This notification indicates that the LSMS is not able to confirm physical connectivity with the DCM.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-107 Event 8030 Details

GUI Notification

Severity

Critical

Text

EBDA Physical Connection Lost

Surveillance Notification

Text

Notify:Sys Admin - NE=< NE CLLI > EBDA conn lost

Source

Active server

Frequency

Every 5 minutes

Trap

Trap ID

73

Trap MIB Name

noPhysicalConnectivityToDCM

8037

Explanation

The OSI process has failed. The sentryd process will attempt to restart.

Recovery

No action required; the sentryd process will attempt to restart the failed process.

Event Details

Table B-108 Event 8037 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - FAILD: OSI

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

88

Trap MIB Name

osiDaemonFailure

8038

Explanation

The OSI process has been successfully restarted by the sentryd process.

Recovery

No action required. The sentryd process will attempt to restart the npacagent processes for all active regions. Any active LSMS GUI processes will automatically reconnect.

Event Details

Table B-109 Event 8038 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - RECOV: OSI

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

89

Trap MIB Name

osiDaemonRestarted

8039

Explanation

The sentryd process was not able to restart the OSI process.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-110 Event 8039 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - RFAILD: OSI

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

90

Trap MIB Name

osiDaemonRestartFailure

8040

Explanation

The Surveillance feature has detected that the sentryd process is no longer running.

Recovery

No action required; the LSMS HA software will attempt to restart the sentryd process.

Event Details

Table B-111 Event 8040 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - FAILD: sentryd

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

91

Trap MIB Name

sentrydFailure

8041

Explanation

This notification indicates that the surveillance process has detected that the Legacy lddAgent process has restarted and all functionality has resumed.

Recovery

No action required; this notification is for information only.

Event Details

Table B-112 Event 8041 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - RECOV: lddAgent legacy

Source

Both servers

Frequency

Once, as soon as the condition occurs

Trap

Trap ID

92

Trap MIB Name

IddAgentRestarted

8042

Explanation

This notification indicates that the surveillance process has detected that the SCPMS lddAgent process has restarted and all functionality has resumed.

Recovery

No action required; this notification is for information only.

Event Details

Table B-113 Event 8042 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

Notify:Sys Admin - RECOV: lddAgent scpms

Source

Both servers

Frequency

Once, as soon as the condition occurs

Trap

Trap ID

93

Trap MIB Name

scpmsIddAgentRestarted

8044

Explanation

This notification indicates that the LDD SCPMS Confirmation of Arrival message retry attempts have been exhausted. The MQSeries interface is not operational or network connectivity to the remote system is lost.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-114 Event 8044 Details

GUI Notification

Severity

Critical

Text

LDD SCPMS COA Retry Attempts Exhausted

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

116

Trap MIB Name

scpmsMqSeriesFault

8045

Explanation

This notification indicates that the LDD SCPMS system has not provided a response within the time limit specified by the LDD_SCP_SYSTEM_RESPONSE_TIMEOUT configuration parameter. The SCPMS system is not active.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-115 Event 8045 Details

GUI Notification

Severity

Critical

Text

LDD SCPMS Response Retry Attempts Exhausted

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

117

Trap MIB Name

scpmsNotActive

8046

Explanation

This notification indicates that the LDD Legacy Confirmation of Arrival message retry attempts have been exhausted.

The MQSeries interface is not operational or network connectivity to the remote system is lost.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-116 Event 8046 Details

GUI Notification

Severity

Critical

Text

LDD SCPMS COA Retry Attempts Exhausted

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

118

Trap MIB Name

legacyMqSeriesFault

8047

Explanation

This notification indicates that the LDD Legacy system has not provided a response within the time limit specified by the LDD_SCP_SYSTEM_RESPONSE_TOMEOUT configuration parameter. The SCPMS system is not active.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-117 Event 8047 Details

GUI Notification

Severity

Critical

Text

LDD Legacy Response Retry Attempts Exhausted

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

119

Trap MIB Name

scpmsLegacyNotActive

8048

Explanation

This notification indicates that a connection could not be made to the MQSeries local queue manager. The local queue manager is not started or operational.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-118 Event 8048 Details

GUI Notification

Severity

Critical

Text

Unable to Connect to Queue Manager:

< queueMgrName >

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

120

Trap MIB Name

mqSeriesQueueManagerNotActive

8049

Explanation

The EMS/NE has rejected the NPANXX GTT creation, deletion, or modification transaction, and the NPANXX value in the transaction could not be determined.

Recovery

Look in the transaction log file, /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD, and locate the NE’s response to the NPANXX GTT command to determine why the command failed. Re-enter the NPANXX GTT data correctly, which will cause the LSMS to try to command again.

Event Details

Table B-119 Event 8049 Details

GUI Notification

Severity

Major

Text

<CLLI>: NPANXX GTT <type_of_operation> Failed

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

126

Trap MIB Name

npanxxGTTValueNotFound

8050

Explanation

The EMS/NE has rejected the NPANXX GTT creation, deletion, or modification transaction for the specified NPANXX value.

Recovery

Look in the transaction log file, /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD, and locate the NE’s response to the NPANXX GTT command to determine why the command failed. Re-enter the NPANXX GTT data correctly, which will cause the LSMS to try to command again.

Event Details

Table B-120 Event 8050 Details

GUI Notification

Severity

Major

Text

<CLLI>: NPANXX GTT <type_of_operation> Failed for NPANXX <NPANXX_value>

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

127

Trap MIB Name

npanxxGTTValueRejected

8051

Explanation

The EMS/NE has rejected the Override GTT creation, deletion, or modification transaction, and the LRN value in the transaction could not be determined.

Recovery

Look in the transaction log file, /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD, and locate the NE’s response to the Override GTT command to determine why the command failed. Re-enter the Override GTT data correctly, which will cause the LSMS to try to command again.

Event Details

Table B-121 Event 8051 Details

GUI Notification

Severity

Major

Text

<CLLI>: Override GTT <type_of_operation> Failed

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

128

Trap MIB Name

overrideGTTValueNotFound

8052

Explanation

The EMS/NE has rejected the Override GTT creation, deletion, or modification transaction for the specified LRN value.

Recovery

Look in the transaction log file, /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD, and locate the NE’s response to the Override GTT command to determine why the command failed. Re-enter the Override GTT data correctly, which will cause the LSMS to try to command again.

Event Details

Table B-122 Event 8052 Details

GUI Notification

Severity

Major

Text

<CLLI>: Override GTT <type_of_operation> Failed for LRN <LRN_value>

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

129

Trap MIB Name

overrideGTTValueRejected

8053

Explanation

The LSMS was not able to complete the automatic synchronization with the EMS/NE. Possible reasons include:

  • The network failed temporarily but not long enough to cause the association with the EMS to fail.

  • The EMS/NE rejected the data because it is busy updating its databases.

Recovery

Verify the connection between the LSMS and the EMS; then reinitialize the MPS. If this notification appears again, perform one of the bulk download procedures in the LNP Database Synchronization User's Guide.

Event Details

Table B-123 Event 8053 Details

GUI Notification

Severity

Major

Text

Short Synchronization Failed

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

131

Trap MIB Name

unableToCompleteAutoResynch

8054

Explanation

The LSMS has started its automatic synchronization with the EMS/NE.

Recovery

No action required; for information only.

Event Details

Table B-124 Event 8054 Details

GUI Notification

Severity

Major

Text

Short Synchronization Started

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

132

Trap MIB Name

autoResynchNEStarted

8055

Explanation

The automatic resynchronization of databases after an outage between the LSMS and the NPAC has completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-125 Event 8055 Details

GUI Notification

Severity

Cleared

Text

Recovery Complete

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

133

Trap MIB Name

dbResynchCompleted

8059

Explanation

The LSMS has completed its automatic synchronization with the EMS/NE.

Recovery

No action required; for information only.

Event Details

Table B-126 Event 8059 Details

GUI Notification

Severity

Cleared

Text

Short Synchronization Complete

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

138

Trap MIB Name

emsShortSynchCompleted

8060

Explanation

The EMS pending queue used to hold the transactions to send to the EMS/NE identified by <CLLI> in the Survellance notification, has fallen sufficiently below the halfway full point.

Recovery

No action required; for information only.

Event Details

Table B-127 Event 8060 Details

GUI Notification

Severity

Cleared

Text

EMS Pending Queue Less Than Half Full

Surveillance Notification

Text

Notify:Sys Admin - CLLI=<CLLI>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

141

Trap MIB Name

pendingQueueHalfFull

8061

Explanation

The EMS pending queue used to hold the transactions to send to the EMS/NE identified by <CLLI> in the Survellance notification, has fallen sufficiently below the full point.

Recovery

No action required; for information only.

Event Details

Table B-128 Event 8061 Details

GUI Notification

Severity

Cleared

Text

EMS Pending Queue No Longer Full

Surveillance Notification

Text

Notify:Sys Admin - CLLI=<CLLI>

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

142

Trap MIB Name

pendingQueueNotFull

8062

Explanation

This notification indicates that physical connection has been restored with the DCM.

Recovery

No action required; for information only.

Event Details

Table B-129 Event 8062 Details

GUI Notification

Severity

Cleared

Text

EBDA Physical Connection Restored

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

143

Trap MIB Name

dcmConnectionRestored

8063

Explanation

This notification indicates that the connection to the MQSeries local queue manager has been established following an outage.

Recovery

No action required; for information only.

Event Details

Table B-130 Event 8063 Details

GUI Notification

Severity

Cleared

Text

Connected to Queue Manager: < queueMgrName >

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

144

Trap MIB Name

connToMqSeriesQueueMngrRest

8064

Explanation

The specified NPA-NXX is opened for portability starting at the value of the <EffectiveTimestamp> field.

Recovery

No action required; for information only.

Event Details

Table B-131 Event 8064 Details

GUI Notification

Severity

Event

Text

New NPA-NXX: SPID [<SPID>], NPANXX [<NPANXX>], TS [<EffectiveTimestamp>]

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

145

Trap MIB Name

npaNxxOpenedForPortabilityAtTS

8065

Explanation

The first telephone number in the specified NPA-NXX is ported starting at the value of the <EffectiveTimestamp> field.

Recovery

No action required; for information only.

Event Details

Table B-132 Event 8065 Details

GUI Notification

Severity

Event

Text

First use of NPA-NXX: SPID [<SPID>], NPANXX [<NPANXX>], TS [<EffectiveTimestamp>]

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

146

Trap MIB Name

npaNxxPortedAtTS

8066

Explanation

An audit of the network element identified by <CLLI> has begun.

Recovery

No action required; for information only.

Event Details

Table B-133 Event 8066 Details

GUI Notification

Severity

Cleared

Text

Audit LNP DB Synchronization Started

Surveillance Notification

Text

NE <CLLI> Audit started

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

147

Trap MIB Name

ebdaAuditActive

8067

Explanation

An audit of the network element identified by <CLLI> has completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-134 Event 8067 Details

GUI Notification

Severity

Cleared

Text

Audit LNP DB Synchronization Completed

Surveillance Notification

Text

NE <CLLI> Audit completed

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

148

Trap MIB Name

ebdaAuditSuccess

8068

Explanation

An audit of the network element identified by <CLLI> has failed.

Recovery

Inspect the log file /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD for details as to the cause of the error. After clearing the cause of the error, start the audit again.

Event Details

Table B-135 Event 8068 Details

GUI Notification

Severity

Critical

Text

Audit LNP DB Synchronization Failed

Surveillance Notification

Text

NE <CLLI> Audit failed

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

149

Trap MIB Name

ebdaAuditFailure

8069

Explanation

The user aborted an audit of the network element identified by <CLLI> before it had completed.

Recovery

No action required; for information only.

Event Details

Table B-136 Event 8069 Details

GUI Notification

Severity

Cleared

Text

Audit LNP DB Synchronization Aborted

Surveillance Notification

Text

NE <CLLI> Audit aborted

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

150

Trap MIB Name

ebdaAuditAbortedByUser

8070

Explanation

A reconcile has started at the completion of an audit.

Recovery

No action required; for information only.

Event Details

Table B-137 Event 8070 Details

GUI Notification

Severity

Cleared

Text

Reconcile LNP DB Synchronization Started

Surveillance Notification

Text

NE <CLLI> Reconcile started

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

151

Trap MIB Name

ebdaReconcileActive

8071

Explanation

A reconcile, which was performed at the end of an audit, has completed.

Recovery

No action required; for information only.

Event Details

Table B-138 Event 8071 Details

GUI Notification

Severity

Cleared

Text

Reconcile LNP DB Synchronization Complete

Surveillance Notification

Text

NE <CLLI> Reconcile completed

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

152

Trap MIB Name

ebdaReconcileSuccess

8072

Explanation

A reconcile, which was performed at the end of an audit, has failed before it completed.

Recovery

Inspect the log file /var/TKLC/lsms/logs/<CLLI>/LsmsAudit.log.MMDD for details as to the cause of the error. After clearing the cause of the error, start the reconcile again.

Event Details

Table B-139 Event 8072 Details

GUI Notification

Severity

Critical

Text

Reconcile LNP DB Synchronization Failed

Surveillance Notification

Text

NE <CLLI> Reconcile failed

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

153

Trap MIB Name

ebdaReconcileFailure

8073

Explanation

The user has stopped a reconcile before it completed.

Recovery

No action required; for information only.

Event Details

Table B-140 Event 8073 Details

GUI Notification

Severity

Cleared

Text

Reconcile LNP DB Synchronization Aborted

Surveillance Notification

Text

NE <CLLI> Reconcile aborted

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

154

Trap MIB Name

ebdaReconcileAbortedByUser

8078

Explanation

A bulk download is currently running.

Recovery

No action required; for information only.

Event Details

Table B-141 Event 8078 Details

GUI Notification

Severity

Cleared

Text

Bulk Load LNP DB Synchronization Started

Surveillance Notification

Text

NE <CLLI> Bulk load started

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

159

Trap MIB Name

ebdaBulkLoadActive

8079

Explanation

A bulk download has completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-142 Event 8079 Details

GUI Notification

Severity

Cleared

Text

Bulk Load LNP DB Synchronization Complete

Surveillance Notification

Text

NE <CLLI> Bulk load completed

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

160

Trap MIB Name

ebdaBulkLoadSuccess

8080

Explanation

A bulk download has failed before it completed.

Recovery

Inspect the log file /var/TKLC/lsms/logs/<CLLI>/LsmsBulkLoad.log.MMDD for details as to the cause of the error. After clearing the cause of the error, start the bulk download again.

Event Details

Table B-143 Event 8080 Details

GUI Notification

Severity

Critical

Text

Bulk Load LNP DB Synchronization Failed

Surveillance Notification

Text

NE <CLLI> Bulk load failed

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

161

Trap MIB Name

ebdaBulkLoadFailure

8081

Explanation

The user has stopped a bulk download before it completed.

Recovery

No action required; for information only.

Event Details

Table B-144 Event 8081 Details

GUI Notification

Severity

Cleared

Text

Bulk Load LNP DB Synchronization Aborted

Surveillance Notification

Text

NE <CLLI> Bulk load aborted

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

162

Trap MIB Name

ebdaBulkLoadAbortedByUser

8082

Explanation

A user-initiated resynchronization is currently running.

Recovery

No action required; for information only.

Event Details

Table B-145 Event 8082 Details

GUI Notification

Severity

Cleared

Text

Re-sync LNP DB Synchronization Started

Surveillance Notification

Text

NE <CLLI> Re-sync started

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

163

Trap MIB Name

ebdaResyncActive

8083

Explanation

A user-initiated resynchronization has completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-146 Event 8083 Details

GUI Notification

Severity

Cleared

Text

Re-sync LNP DB Synchronization Complete

Surveillance Notification

Text

NE <CLLI> Re-sync completed

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

164

Trap MIB Name

ebdaResyncSuccess

8084

Explanation

A user-initiated resynchronization has failed before it completed.

Recovery

Inspect the contents of the file /var/TKLC/lsms/logs/<CLLI>/LsmsResync.log.MMDD to determine the cause of the error. After clearing the cause of the error, start the user-initiated resynchronization again.

Event Details

Table B-147 Event 8084 Details

GUI Notification

Severity

Critical

Text

Re-sync LNP DB Synchronization Failed

Surveillance Notification

Text

NE <CLLI> Re-sync failed

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

165

Trap MIB Name

ebdaResyncFailure

8085

Explanation

The user has stopped a user-initiated resynchronization before it completed.

Recovery

No action required; for information only.

Event Details

Table B-148 Event 8085 Details

GUI Notification

Severity

Cleared

Text

Re-sync LNP DB Synchronization Aborted

Surveillance Notification

Text

NE <CLLI> Re-sync aborted

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

166

Trap MIB Name

ebdaResyncAbortedByUser

8086

Explanation

This notification indicates that the Sprint lddAgent has failed to communicate with the Sprint Legacy System.

Recovery

No action required; for information only.

Event Details

Table B-149 Event 8086 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

FAILED:IddAgent legacy

Source

Both servers

Frequency

Once, as soon as condition occurs

Trap

Trap ID

167

Trap MIB Name

sprintIddAgentCommFailureLegSys

8087

Explanation

This notification indicates that the Sprint lddAgent has failed to communicate with the Sprint SCPMS System.

Recovery

Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Event Details

Table B-150 Event 8087 Details

GUI Notification

Severity

None

Text

Surveillance Notification

Text

FAILED:IddAgent scpms

Source

Both servers

Frequency

Once, as soon as condition occurs

Trap

Trap ID

168

Trap MIB Name

sprintIddAgentCommFailureScpmsSys

8088

Explanation

A scheduled file transfer has failed.

Recovery

Inspect the error log file/var/TKLC/lsms/logs/aft/aft.log.MMDD for details as to the cause of the error.

Event Details

Table B-151 Event 8088 Details

GUI Notification

Severity

Major

Text

Automatic File Transfer Failure - See Log for Details

Surveillance Notification

Text

Notify:Sys Admin- Auto xfer Failure

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

171

Trap MIB Name

automaticFileTransferFeatureFailure

8089

Explanation

An NPA-NXX split activation completed successfully.

Recovery

No action required; for information only.

Event Details

Table B-152 Event 8089 Details

GUI Notification

Severity

Cleared

Text

Activate Split Successful OldNPA=<old_NPA> NewNPA=<new_NPA> NXX=<NXX>

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

10

Trap MIB Name

npaSplitActOk

8090

Explanation

An NPA-NXX split activation failed.

Recovery

Perform and audit and reconcile of NPA Split information at the network element.

Event Details

Table B-153 Event 8090 Details

GUI Notification

Severity

Critical

Text

Activate Split Failed OldNPA=<old_NPA> NewNPA=<new_NPA> NXX=<NXX>

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

172

Trap MIB Name

npaSplitActFailed

8091

Explanation

At least one active NPA-NXX split is past its end date and needs to be deleted.

Recovery

Do the following:

  1. View all split objects (for information, refer to the Database Administrator's Guide) to determine which objects have end dates that have already passed.

  2. Delete the objects whose end dates have passed (for information, refer to the Database Administrator's Guide).

Event Details

Table B-154 Event 8091 Details

GUI Notification

Severity

Major

Text

Active Splits Are Past Their End Dates

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

173

Trap MIB Name

activeSplitsPastEndDates

8092

Explanation

This notification indicates the LDD SCPMS agent is switching from primary to backup SCPMS system.

Recovery

No action required; this notification is for information only.

Event Details

Table B-155 Event 8092 Details

GUI Notification

Severity

Critical

Text

LDD SCPMS Agent Switching from Primary to Backup SCPMS System

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

174

Trap MIB Name

lddScpmsAgentSwitchToBackupScpms

8093

Explanation

This notification indicates the LDD SCPMS agent is switching from backup to primary SCPMS system.

Recovery

No action required; this notification is for information only.

Event Details

Table B-156 Event 8093 Details

GUI Notification

Severity

Critical

Text

LDD SCPMS Agent Switching from Backup to Primary SCPMS System

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

175

Trap MIB Name

lddScpmsAgentSwitchFromBackupToPrim

8094

Explanation

This notification indicates the LDD SCPMS current system is primary SCPMS.

Recovery

No action required; this notification is for information only.

Event Details

Table B-157 Event 8094 Details

GUI Notification

Severity

Cleared

Text

LDD SCPMS Current System is Primary SCPMS

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

176

Trap MIB Name

lddScpmsPrimary

8095

Explanation

This notification indicates the LDD SCPMS current system is backup SCPMS.

Recovery

No action required; this notification is for information only.

Event Details

Table B-158 Event 8095 Details

GUI Notification

Severity

Cleared

Text

LDD SCPMS Current System is Backup SCPMS

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

177

Trap MIB Name

lddScpmsBackup

8096

Explanation

The EMS/NE has rejected the NPANXX Split operation indicated by <operation>, and the NPANXX value in the transaction could not be determined.

Recovery

Look in the transaction log file, /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD, and locate the NE’s response to the NPANXX Split command to determine why the command failed. Delete and re-enter the NPANXX Split data correctly, which will cause the LSMS to try to command again.

Event Details

Table B-159 Event 8096 Details

GUI Notification

Severity

Major

Text

<CLLI>: NPANXX Split <operation> Failed

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

178

Trap MIB Name

EmsNeRejNpaNxxSplitNotDetermined

8097

Explanation

The EMS/NE has rejected the NPANXX Split operation indicated by <operation> for the indicated NPANXX value.

Recovery

Look in the transaction log file, /var/TKLC/lsms/logs/<CLLI>/LsmsTrans.log.MMDD, and locate the NE’s response to the NPANXX Split command to determine why the command failed. Delete and re-enter the NPANXX Split data correctly, which will cause the LSMS to try to command again.

Event Details

Table B-160 Event 8097 Details

GUI Notification

Severity

Major

Text

<CLLI>: NPANXX Split <operation> Failed for New NPANXX <NPANXX>

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

179

Trap MIB Name

EmsNeRejectedNpaNxxSplit

8098

Explanation

The LSMS is not able to confirm the physical connectivity with the directly connected query server identified by <hostname>. The problem may be one of the following:

  • Physical connectivity issues between the LSMS and directly connected Query Server.

  • The query server host name is not associated with the appropriate Internet Protocol (IP) address in /etc/hosts file.

  • The Internet Protocol (IP) address specified for the special replication user for the for the query server is incorrect.

  • The proper TCP/IP ports are not open in the firewall(s) between the LSMS and the query servers.

Recovery

  • Check the physical connectivity of the LSMS to the query server.

  • Check that the query server hosts name is associated with corresponding Internet Protocol (IP) addresses in /etc/hosts file.

  • Verify that the IP address for the query server is correct. Display the IP address of all configured query servers by using the $LSMS_TOOLS_DIR/lsmsdb -c queryservers command.

  • Verify that the firewall TCP/IP port configuration is set correctly for both the LSMS and query servers directly connected to the LSMS (refer to Appendix A, “Configuring the Query Server,” of the Configuration Guide for information about port configuration for firewall protocol filtering).

Event Details

Table B-161 Event 8098 Details

GUI Notification

Severity

Major

Text

Query Server <hostname> Physical Connection Lost

Surveillance Notification

Text

Query Server=<hostname> Physical Conn Lost

Source

Active Server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

SNMP Trap

Trap ID

180

Trap MIB Name

physicalConnectivityWithQueryServerLost

8099

Explanation

The query server identified by <hostname> does not have a replication connection established with the LSMS. The problem may be one of the following:

  • Query server cannot establish a connection with the master.

  • Query server not properly configured to connect to the master.

  • A query that succeeded on the master failed on the query server.

  • The binary log(s) that are needed by the query server to resynchronize itself to its master no longer exist.

  • Data on the query server does not agree with what is on the master when the binary log was started.

  • Replication was stopped at the query server by a user.

Recovery

  1. At the query server, perform the following substeps:

    1. Start the MySQL command line utility on the slave server:

      # cd /opt/mysql/mysql/bin

      # mysql -u root -p

      
      Enter password:
      
      <Query Server/s MySql root user password>
    2. Determine whether the query server is running by entering the following command and looking at the Slave_IO_Running and Slave_SQL_Running column values.

      mysql> SHOW SLAVE STATUS \G;
      • If the Slave_IO_Running and Slave_SQL_Running column values show that the slave is not running, verify the query server's /usr/mysql1/my.cnf option file (refer to “MySQL Replication Configuration for Query Servers,” in Appendix A, “Configuring the Query Server,” of the Configuration Guide) and check the error log (/usr/mysql1/<hostname>.err) for messages.

      • If the Slave_IO_Running and Slave_SQL_Running column values show that the slave (query server) is running, enter the following command to verify whether the slave established a connection with the master (LSMS or another query server acting as a master/slave).

        mysql> SHOW PROCESSLIST;

        Find the thread with the system user value in the User column and none in the Host column, and check the State column. If the State column says “connecting to master,” verify that the master hostname is correct, that the DNS is properly set up, whether the master is actually running, and whether it is reachable from the slave (refer to Appendix A, “Configuring the Query Server,” of the Configuration Guide for information about port configuration for firewall protocol filtering if the master and slave are connecting through a firewall).

      • If the slave was running, but then stopped, enter the following command:

        mysql> SHOW SLAVE STATUS;

        Look at the output. This error can happen when some query that succeeded on the master fails on the slave, but this situation should never happen while the replication is active if you have taken a proper snapshot of the master and never modify the data on the slave outside of the slave thread.

  2. However, if this is not the case, or if the failed items are not needed and there are only a few of them, try the following:

    1. First see if there is some stray record in the way on the query server. Understand how it got there, then delete it from the query server database and run start slave.

    2. If the above does not work or does not apply, try to understand if it would be safe to make the update manually (if needed) and then ignore the next query from the LSMS.

    3. If you have decided you can skip the next query, enter one of the following command sequences:

      • To skip a query that uses AUTO_INCREMENT or LAST_INSERT_ID(), enter:

        mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER=2;

        mysql> start slave;

        Queries that use AUTO_INCREMENT or LAST_INSERT_ID() take two events in the binary log of the master.

      • Otherwise, enter:

        mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;

        mysql> start slave;

  3. If you are sure the query server database started out perfectly in sync with the LSMS database, and no one has updated the tables involved outside of the slave thread, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 so you will not have to do the above steps again.

  4. If all else fails, read the error log, /usr/mysql/<hostname>.err. If the log is big, run the following command on the slave:

    grep -i slave /usr/mysql1/<hostname>.err

    (There is no generic pattern to search for on the master, as the only errors it logs are general system errors. If it can, the master will send the error to the slave when things go wrong.)

    • If the error log on the slave conveys that it could not find a binary log file, this indicates that the binary log files on the master have been removed (purged). Binary logs are periodically purged from the master to prevent them from growing unbounded and consuming large amounts of disk resources. However, if a query server was not replicating and one of the binary log files it wants to read is purged, it will be unable to replicate once it comes up. If this occurs, the query server is required to be reset with another snapshot of data from the master or another query server (see “Reload a Query Server Database from the LSMS and “Reload a Query Server Database from Another Query Server”).

    • When you have determined that there is no user error involved, and replication still either does not work at all or is unstable, please contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-162 Event 8099 Details

GUI Notification

Severity

Major

Text

Query Server <hostname> Replication Connection Lost

Surveillance Notification

Text

Query Server=<hostname> Replication Conn Lost

Source

Active Server

Frequency

As soon as condition occurs, and at five-minute intervals as long as condition exists

SNMP Trap

Trap ID

181

Trap MIB Name

queryServerConnectionWithLsmsLost

8100

Explanation

The SV/NPB storage database has exceeded the configured percent usage threshhold.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-163 Event 8100 Details

GUI Notification

Severity

Event

Text

SV/NPB Storage Exceeds <%> percent

Surveillance Notification

Text

Notify:Sys Admin - SV/NPB threshold %

Source

Both servers

Frequency

Every 5 minutes after condition occurs

Trap

Trap ID

194

Trap MIB Name

svNpbPercentUsage

8101

Explanation

This event indicates that the SV/NPB storage database usage is below the configured percent usage threshold.

Recovery

No action is required

Event Details

Table B-164 Event 8101 Details

GUI Notification

Severity

Cleared

Text

SV/NPB storage falls below <%> percent

Surveillance Notification

Text

Notify: Sys Admin - SV/NPB cleared

Source

Both servers

Frequency

As soon as condition clears

Trap

Trap ID

207

Trap MIB Name

svNpbBelowLimit

8102

Explanation

The event number present in the untilClear filter list is cleared. The event number is removed from the untilClear filter list.

Recovery

No action is required.

Event Details

Table B-165 Event 8102 Details

GUI Notification

Severity

Event

Text

<Event number> in the untilClear filter list, event clear received at <%s>

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

None

Trap MIB Name

8103

Explanation

The alarm filter counter has reached its limit; the counter will start again from one.

Recovery

No action is required.

Event Details

Table B-166 Event 8103 Details

GUI Notification

Severity

Event

Text

Counter associated with event <event number> exceeds limit <%s>. Resetting counter.

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

None

Trap MIB Name

8104

Explanation

The event number present in the untilTimeout filter list is cleared. The event number is removed from the untilTimeout filter list.

Recovery

No action is required.

Event Details

Table B-167 Event 8104 Details

GUI Notification

Severity

Event

Text

<Event number> in the untilTimeout filter list, event timeout at <%s>

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

None

Trap MIB Name

8105

Explanation

The log capture started by the user has failed.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-168 Event 8105 Details

GUI Notification

Severity

Minor

Text

Logs Capture Failed

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

None

Trap MIB Name

8106

Explanation

The MySQL Port has been updated. The LSMS application must be restarted.

Recovery

The application must be restarted. Restart the LSMS application first on the active server and then on the standby server. For more information, refer to the Configuration Guide.

Event Details

Table B-169 Event 8106 Details

GUI Notification

Severity

Event

Text

MySQL Port changed from <%s> to <%s>. LSMS application restart required.

Surveillance Notification

Text

Notify: Sys Admin - LSMS restart required

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

208

Trap MIB Name

mysqlPortUpdated

8107

Explanation

The MySQL Port has been updated. The Query Server configuration needs to be updated with the new MySQL port.

Recovery

Configure the Query Server with the updated MySQL port. For more information, refer to the Configuration Guide.

Event Details

Table B-170 Event 8107 Details

GUI Notification

Severity

Event

Text

MySQL Port changed from <%s> to <%s>. Query Server configuration updated required.

Surveillance Notification

Text

Notify: Sys Admin - QS updated required

Source

Active server

Frequency

Once, as soon as condition occurs

Trap

Trap ID

209

Trap MIB Name

queryServerResetConfiguration

8108

Explanation

At least one of the connected Query Servers is out of sync, and the binary logs cannot be purged without user confirmation.

Recovery

When the Query Server is out of sync, automatic purging is not possible. To delete all but the last 10 binary logs, log on to the active LSMS server as root and enter the following command:
pruneBinaryLogs -force

Event Details

Table B-171 Event 8108 Details

GUI Notification

Severity

Minor

Text

Automatic purging of binary logs cannot be done. User confirmation required.

Surveillance Notification

Text

Notify: Sys Admin - Purge need confirmation

Source

Both servers

Frequency

Every 45 minutes

Trap

Trap ID

210

Trap MIB Name

purgeConfirmRequired

8109

Explanation

Disk usage is reaching the capacity threshold, and an automatic purge of binary logs is imminent.

Recovery

No action is required.

Event Details

Table B-172 Event 8109 Details

GUI Notification

Severity

Minor

Text

Disk usage reaching <%> percent. Purging of binary logs is imminent.

Surveillance Notification

Text

Notify: Sys Admin - Purging is imminent

Source

Both servers

Frequency

Every 45 minutes

Trap

Trap ID

211

Trap MIB Name

purgeImminent

8110

Explanation

Logs capture has been started by the user.

Recovery

No action is required.

Event Details

Table B-173 Event 8110 Details

GUI Notification

Severity

Cleared

Text

Logs Capture Started

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

None

Trap MIB Name

8111

Explanation

The logs capture started by the user completed successfully.

Recovery

No action is required.

Event Details

Table B-174 Event 8111 Details

GUI Notification

Severity

Minor

Text

Logs Captured Successfully

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

None

Trap MIB Name

8112

Explanation

Syscheck was not able to restart automatically by the cron job.

Recovery

Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.

Event Details

Table B-175 Event 8112 Details

GUI Notification

Severity

Event

Text

Failed to restart syscheck services

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

None

Trap MIB Name

8116

Explanation

The HTTP protocol is enabled but secure HTTP (HTTPS) is recommended.

Recovery

For information on configuring the protocols, see Starting an Web-Based LSMS GUI Session.

Event Details

Table B-176 Event 8116 Details

GUI Notification

Severity

Event

Text

HTTP is enabled and it is recommended to use HTTPS.

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

None

Trap MIB Name

8117

Explanation

HTTP is disabled and HTTPS is enabled.

Recovery

No recovery required; only HTTPS is enabled now.

Event Details

Table B-177 Event 8117 Details

GUI Notification

Severity

Event

Text

Only HTTPS is enabled now.

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

None

Trap MIB Name

8118

Explanation

Both HTTP and HTTPS are enabled, but using only HTTPS is recommended.

Recovery

For information on configuring the protocols, see Starting an Web-Based LSMS GUI Session.

Event Details

Table B-178 Event 8118 Details

GUI Notification

Severity

Event

Text

Both HTTP and HTTPS are enabled and it is recommended to use HTTPS.

Surveillance Notification

Text

None

Source

Frequency

Trap

Trap ID

None

Trap MIB Name

Additional Trap Information

Trap Id Trap MIB Name Notification Description Trap variables def Retry Interval Severity Event Num GUI Event Text Pair Event Num
25 dataReplInfo This notification indicates that database replication is delayed. eventNbr = Oracle specific unique identifier for event notification. This eventNbr field can be used to reference Oracle documentation. dbReplInfo = Info message from database replication. Every 5 mins event_notif_event 4011 DB Repl Info - %s 0
201 snapInvalidErr This notification indicates that the Invalid Snapshot has been detected. eventNbr = Oracle specific unique identifier for event notification. This eventNbr field can be used to reference Oracle documentation. snapName = Name of the invalid snapshot. Every 30 mins event_notif_critical 4034 Invalid Snapshot - %s 4035
203 snapFullErr This notification indicates that the Snapshot is greater than 80% full. eventNbr = Oracle specific unique identifier for event notification. This eventNbr field can be used to reference Oracle documentation. snapName = Name of the invalid/hanging snapshot. Every 30 mins event_notif_critical 4036 Full Snapshot - %s 4037
Trap Id Trap MIB Name Notification Description Frequency Source Clearing behavior
212 resyncStartTrap The trap is sent by the LSMS to NMS when the LSMS is about to start resynchronization Every time when starting a resynchronization with a NMS /vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl None
213 resyncStopTrap The trap is sent by the LSMS to NMS when resynchronization is complete Every time when a resynchronization with a NMS is complete /vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl None
214 resyncRejectTrap The trap is sent by the LSMS to NMS when a resynchronization request is rejected by LSMS Every time when a resynchronization request is initialized while an existing resynchronization is still being processed /vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl None
215 resyncRequiredTrap The trap is sent by the LSMS to NMS when the LSMS is rebooted or LSMS is started Every time when LSM S is rebooted or restarted /vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl None
216 heartBeatTrap The trap is sent by the LSMS to NMS periodically to indicate that the LSMS is up Per the configured value in second (0, 5-7200), where 0 indicates the heartbeat trap is disabled. /vobs/lsms/apps/snmp/ lsmsSnmpHeartbeatSender.pl None
217 lsmsAlarmTrapV3 The trap will indicate that the following information is for a particular event Every v3 trap message sent to nms will carry this OID /vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl None
218 resyncErrCode errorCode = 0, Resynchronization completed successfully. errorCode = 1, Resynchronization aborted by NMS. errorCode = 2, Resynchronization already in progress for the NMS. errorCode = 3, Resynchronization Aborted, Database error occurred. errorCode = 4, Resynchronization not in progress. Every time when either resyncStopTrap or resyncRejectTrap sent to NMS /vobs/lsms/apps/snmp/ lsmsSNMPResyncHandler.pl None

Platform Alarms

This section describes the following:

How Platform Alarms Are Reported

Each server runs syscheck periodically and reports any problems found through platform alarms. The severity of platform alarms is one of the following:

  • Critical, reported through event 4300
  • Major, reported through event 4200
  • Minor, reported through event 4100

When one or more problems in a given category has been found, the server reports one corresponding event notification to its Surveillance log and its serial port 3. If the server is not the active server, it also sends the event notification to the active server. The active server reports its own platform events to its own Surveillance log and to its Serial Port 3, and also sends an SNMP trap and displays a GUI notification for either its own platform events or for the non-active server’s platform events.

Each of the events 4100, 4200, and 4300 contain a 16-character hexadecimal bitmasked string that indicates all of the platform events in that category that currently exist. To decode which platform events exist, use the procedure described in “How to Decode Platform Alarms”.

Each time the combination of platform events in a given category changes, a new event is reported. Following is an example of how platform events are reported:

  1. At first, only one major platform event is reported on the standby server. A 4200 event with the alarm number of the event is reported.

  2. One minute later, another platform event exists on the standby server (and the first one still exists). Another 4200 event is reported, with a bitmasked string that indicates both of the platform events that exist.

  3. One minute later, another platform event exists on the standby server (and the previous ones still exist). Another 4200 event is reported, with a bitmasked string that indicates all of the platform events that exist.

  4. One minute later, the first platform event is cleared. Another 4200 event is reported, with a bitmasked string that indicates the two platform events that still exist.

How to Decode Platform Alarms

Use the following procedure to determine all the platform alarms that exist in a given category:

  1. Look in Platform Alarms to see if the alarm number is shown there.
    • If the alarm number matches one of the alarms shown in this table, only one alarm (the one that appears in the table) is being reported and you have completed this procedure.
    • If the alarm number does not match one of the alarms shown in this table, perform the remaining steps of this procedure.
  2. Log in as any user to either server.
  3. Enter the following command to decode the reported hexadecimal alarm string:
    $ /usr/TKLC/plat/bin/almdecode <alarm_number>

    The output displays the information about the alarm category and displays the text string for each of the alarms that is represented by the string. For example, if you enter:

    $ /usr/TKLC/plat/bin/almdecode 3000000000000180

    the following text displays:

    
    The string alarm value comes from the Major Platform alarm category.
    
    The following alarms are encoded within the hex string:

    Server Swap Space Shortage FailureServer Provisioning Network Error

Platform Alarms

Platform errors are grouped by category and severity. The categories are listed from most to least severe:

Table B-179 shows the alarm numbers and alarm text for all alarms generated by the MPS platform. The order within a category is not significant. Some of the alarms described are not available with specific configurations.

Table B-179 Platform Alarms

Alarm Codes and Error Descriptor
Major Platform Alarms
3000000000000001 – Server fan failure
3000000000000002 - Server Internal Disk Error
3000000000000008 - Server Platform Error
3000000000000010 - Server File System Error
3000000000000020 - Server Platform Process Error
3000000000000080 - Server Swap Space Shortage Failure
3000000000000100 - Server provisioning network error
3000000000001000 - Server Disk Space Shortage Error
3000000000002000 - Server Default Route Network Error
3000000000004000 - Server Temperature Error
3000000000008000 - Server Mainboard Voltage Error
3000000000010000 - Server Power Feed Error
3000000000020000 - Server Disk Health Test Error
3000000000040000 - Server Disk Unavailable Error
3000000000080000 - Device Error
3000000000100000 - Device Interface Error
3000000008000000 - Server HA Keepalive Error
3000000010000000 - DRBD block device can not be mounted
3000000020000000 - DRBD block device is not being replicated to peer
3000000040000000 - DRBD peer needs intervention
3000000400000000 - Multipath device access link problem
3000000800000000 – Switch Link Down Error
3000001000000000 - Half-open Socket Limit
3000002000000000 - Flash Program Failure
3000004000000000 - Serial Mezzanine Unseated
Minor Platform Alarms
5000000000000001 - Server Disk Space Shortage Warning
5000000000000002 - Server Application Process Error
5000000000000004 - Server Hardware Configuration Error
5000000000000008 - Server RAM Shortage Warning
5000000000000020 - Server Swap Space Shortage Warning
5000000000000040 - Server Default Router Not Defined
5000000000000080 – Server temperature warning
5000000000000100 - Server Core File Detected
5000000000000200 - Server NTP Daemon Not Synchronized
5000000000000400 - Server CMOS Battery Voltage Low
5000000000000800 - Server Disk Self Test Warning
5000000000001000 - Device Warning
5000000000002000 - Device Interface Warning
5000000000004000 - Server Reboot Watchdog Initiated
5000000000008000 - Server HA Failover Inhibited
5000000000010000 - Server HA Active To Standby Transition
5000000000020000 - Server HA Standby To Active Transition
5000000000040000 - Platform Health Check Failure
5000000000080000 - NTP Offset Check Failure
5000000000100000 - NTP Stratum Check Failure
5000000000200000 - SAS Presence Sensor Missing
5000000000400000 - SAS Drive Missing
5000000000800000 - DRBD failover busy
5000000001000000 - HP disk resync
5000000020000000 – Server Kernel Dump File Detected
5000000040000000 – TPD Upgrade Failed
5000000080000000 - Half Open Socket Warning Limit
NOTE: The order within a category is not significant.

Alarm Recovery Procedures

This section provides recovery procedures for the MPS, listed by alarm category and Alarm Code (alarm data string) within each category.

Major Platform Alarms

Major platform alarms involve hardware components, memory, and network connections.

3000000000000001 – Server fan failure

Alarm Type: TPD

Description: This alarm indicates that a fan in the EAGLE fan tray in the EAGLE shelf where the E5-APP-B is "jacked in" is either failing or has failed completely. In either case, there is a danger of component failure due to overheating.

Severity: Major

OID: TpdFanErrorNotify 1.3.6.1.4.1.323.5.3.18.3.1.2.1

Alarm ID: TKSPLATMA13000000000000001

Recovery

Note:

  1. Run syscheck in Verbose mode to verify a fan failure using the following command:
    [root@hostname1351690497 ~]# syscheck -v hardware fan
    Running modules in class hardware...
             fan: Checking Status of Server Fans.
    *         fan: FAILURE:: MAJOR::3000000000000001 -- Server Fan Failure. This test uses the leaky bucket algorithm.
    *         fan: FAILURE:: Fan RPM is too low, fana: 0, CHIP: FAN
    One or more module in class "hardware" FAILED
    
    LOG LOCATION: /var/TKLC/log/syscheck/fail_log
    
  2. Refer to the procedure for determining the location of the fan assembly that contains the failed fan and replacing a fan assembly in the appropriate hardware manual. After you have opened the front lid to access the fan assemblies, determine whether any objects are interfering with the fan rotation. If some object is interfering with fan rotation, remove the object.
  3. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
3000000000000002 - Server Internal Disk Error

This alarm indicates that the server is experiencing issues replicating data to one or more of its mirrored disk drives. This could indicate that one of the server disks has failed or is approaching failure.

Recovery

  1. Run syscheck in Verbose mode.
  2. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 and provide the system health check output.
3000000000000008 - Server Platform Error

This alarm indicates a major platform error such as a corrupt system configuration or missing files, or indicates that syscheck itself is corrupt.

Recovery

  1. Run syscheck in Verbose mode.
  2. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 and provide the system health check output.
3000000000000010 - Server File System Error

This alarm indicates that syscheck was unsuccessful in writing to at least one of the server file systems.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
3000000000000020 - Server Platform Process Error

This alarm indicates that either the minimum number of instances for a required process are not currently running or too many instances of a required process are running.

Recovery

  1. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for recovery procedures.
3000000000000080 - Server Swap Space Shortage Failure

This alarm indicates that the server’s swap space is in danger of being depleted. This is usually caused by a process that has allocated a very large amount of memory over time.

Note:

In order for this alarm to clear, the underlying failure condition must be consistently undetected for a number of polling intervals. Therefore, the alarm may continue to be reported for several minutes after corrective actions are completed.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
3000000000000100 - Server provisioning network error

Alarm Type: TPD

Description: This alarm indicates that the connection between the server’s eth01ethernet interface and the customer network is not functioning properly. The eth01 interface is at the upper right port on the rear of the server on the EAGLE backplane.

Note:

The interface identified as eth01 on the hardware is identified as eth91 by the software (in syscheck output, for example).

Severity: Major

OID: TpdProvNetworkErrorNotify 1.3.6.1.4.1.323.5.3.18.3.1.2.9

Alarm ID: TKSPLATMA93000000000000100

Recovery

  1. Check the physical network connectivity between the LSMS and the NAS.
  2. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
3000000000001000 - Server Disk Space Shortage Error

This alarm indicates that one of the following conditions has occurred:

  • A file system has exceeded a failure threshold, which means that more than 90% of the available disk storage has been used on the file system.

  • More than 90% of the total number of available files have been allocated on the file system.

  • A file system has a different number of blocks than it had when installed.

Recovery

  1. Run syscheck.
  2. Examine the syscheck output to determine if the file system /var/TKLC/lsms/free is low on space. If it is, continue to the next step; otherwise go to Step 4.
  3. If possible, recover space on the free partition by deleting unnecessary files:
    1. Log in to the server generating the alarm as the root user:

      Login: root

      Password:<Enter root password>

    2. Change to the /var/TKLC/lsms/free directory: # cd /var/TKLC/lsms/free
    3. Confirm that you are in the /var/TKLC/lsms/free directory: # pwd /var/TKLC/lsms/free
    4. When the pwd command is executed, if /var/TKLC/lsms/free is not output, go back to Sub-step b
    5. List files to be deleted and delete them using the rm command
    6. Re-run syscheck
    If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to the next Step.
  4. If the file system mounted on /var/TKLC/lsms/logs is the file system that syscheck is reporting to be low on space, execute the following steps:
    1. Log in to the server generating the alarm as the root user:

      Login: root

      Password:<Enter root password>

    2. Change to the /var/TKLC/lsms/logs directory: # cd /var/TKLC/lsms/logs
    3. Confirm that you are in the /var/TKLC/lsms/logs directory: # pwd /var/TKLC/lsms/logs
    4. When the pwd command is executed, if /var/TKLC/lsms/logs is not output, go back to Sub-step b
    5. Look for files with names matching: logs_(hostname)_(date/timestamp).tar, where (hostname) is replaced by the server’s hostname, and (date/timestamp) is any date or timestamp. # ls logs_'hostname'_*.tar. Any files listed may be safely deleted, so for each file listed in the ls output, execute an rm command: # rm <filename> where <filename> is replaced by the name of the file to be deleted.
    6. Re-run syscheck
      If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to the next Step.
  5. Core files can occupy a large amount of disk space and may the cause of this alarm. To collect and remove any core files from the server:
    1. Log in to the server generating the alarm as the root user:

      Login: root

      Password:<Enter root password>

    2. Change directory to /var/TKLC/core and list the core files. # cd /var/TKLC/core # ls -l

    Note:

    The ls command shown above will list any core files found and then compresses and renames the file, adding a ".gz" extension. If any core files are found, transfer them off the system and save them for examination by Oracle Engineering. Once a copy of a compressed file has been saved, it is safe to delete it from the server.

    1. Re-run syscheck
      If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to the next Step.
  6. Execute the following Sub-steps if the file system reported by syscheck is /tmp, otherwise skip to Step 7:
    1. Log in to the server generating the alarm as the root user:

      Login: root

      Password:<Enter root password>

    2. Change to the /tmp directory: # cd /tmp
    3. Confirm that you are in the /tmpdirectory: # pwd /tmp
    4. When the pwd command is executed, if /tmp is not output, go back to Step 5.
    5. Look for possible candidates for deletion: # ls *.iso *.bz2 *.gz *.tar *.tgz *.zip
    6. If any deletable files exist, the output of the ls will show them. For each of the files listed, execute the rm command to delete the file: # rm <filename>
    7. Run syscheck
      If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to Step 4.
    8. Upon a reboot, the system will clean the /tmp directory.
      To reboot the system issue the # shutdown -r now command.
    9. Re-run syscheck
      If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to the next Step.
  7. Execute the following steps if the file system reported by syscheck is /var, otherwise skip to Step 10:
    1. Log in to the server generating the alarm as the root user:

      Login: root

      Password:<Enter root password>

    2. Change to the /var/tmp directory: # cd /var/tmp
    3. Confirm that you are in the /var/tmp directory: # pwd, then /var/tmp
    4. When the pwd command is executed, if /var/tmp is not output, go back to Step 5.
    5. Since all files in this directory can be safely deleted, execute the rm * command to delete all files from the directory: # rm -i *.
    6. Re-run syscheck
      If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to Step 10.
  8. Execute the following steps if the file system reported by syscheck is /var/TKLC, otherwise skip to Step 10.
    1. Log in to the server generating the alarm as the root user:

      Login: root

      Password:<Enter root password>

    2. Change to the /var/TKLC/upgrade directory: # cd /var/TKLC/upgrade
    3. Confirm that you are in the /var/TKLC/upgrade directory: # pwd, then /var/TKLC/upgrade
    4. When the pwd command is executed, if /var/tmp is not output, go back to Step 5.
    5. Since all files in this directory can be safely deleted, execute the rm * command to delete all files from the directory: # rm -i *.
    6. Re-run syscheck
      If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to Step 10.
  9. For any other file system, execute the following command, where <mountpoint> is the file system’s mount point: # find <mountpoint> -type f -exec du -k {} \; | sort -nr > /tmp/file_sizes.txt
    This will produce a list of files in the given file system sorted by file size in the file /tmp/file_sizes.txt.

    Note:

    The find command above may take a few minutes to complete if the given mountpoint contains many files. Do not delete any files unless care certain that they are not needed. Continue to Step 10.
  10. Run savelogs to gather all application logs (see Saving Logs Using the LSMS GUI or Command Line).
  11. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
3000000000002000 - Server Default Route Network Error

This alarm indicates that the default network route of the server is experiencing a problem. Running syscheck in Verbose mode will provide information about which type of problem.

Caution:

When changing the network routing configuration of the server, verify that the modifications will not impact the method of connectivity for the current login session. The route information must be entered correctly and set to the correct values. Incorrectly modifying the routing configuration of the server may result in total loss of remote network access.

Recovery

  1. Run syscheck in Verbose mode.

    The output should indicate one of the following errors:

    • The default router at <IP_address> cannot be pinged.

      This error indicates that the router may not be operating or is unreachable. If the syscheck Verbose output returns this error, go to the next Step.

    • The default route is not on the provisioning network.

      This error indicates that the default route has been defined in the wrong network. If the syscheck Verbose output returns this error, go to Step 3.

    • An active route cannot be found for a configured default route.

      This error indicates that a mismatch exists between the active configuration and the stored configuration. If the syscheck Verbose output returns this error, go to Step 4.

    Note:

    If the syscheck Verbose output does not indicate one of the errors above, go to step 5.
  2. Perform the following substeps when syscheck Verbose output indicates:
    
    The default router at <IP_address> cannot be pinged
    
    1. Verify that the network cables are firmly attached to the server, network switch, router, Ethernet switch or hub, and any other connection points.
    2. Verify that the configured router is functioning properly.

      Request that the network administrator verify the router is powered on and routing traffic as required.

    3. Request that the router administrator verify that the router is configured to reply to pings on that interface.
    4. Run syscheck.
      • If the alarm is cleared, the problem is resolved and this procedure is complete.
      • If the alarm is not cleared, go to step 5.
  3. Perform Network Reconfiguration from the Command Line using su - lsmsmgr command. Update the default router.
  4. Contact the Customer Care Center for further assistance. Provide the syscheck output collected in the previous steps.
3000000000004000 - Server Temperature Error

Alarm Type: TPD

Description: The internal temperature within the server is unacceptably high.

Severity: Major

OID: TpdTemperatureErrorNotify 1.3.6.1.4.1.323.5.3.18.3.1.2.15

Alarm ID: TKSPLATMA153000000000004000

Recovery

  1. Ensure that nothing is blocking the fan's intake. Remove any blockage.
  2. Verify that the temperature in the room is normal with the following table. If it is too hot, lower the temperature in the room to an acceptable level.

    Table B-180 Server Environmental Conditions

    Ambient Temperature

    Operating: 5 degrees C to 40 degrees C

    Exceptional Operating Limit: 0 degrees C to 50 degrees C

    Storage: -20 degrees C to 60 degrees C

    Ambient Temperature

    Operating: 5° C to 35° C

    Storage: -20° C to 60° C

    Relative Humidity

    Operating: 5% to 85% non-condensing

    Storage: 5% to 950% non-condensing

    Elevation

    Operating: -300m to +300m

    Storage: -300m to +1200m

    Heating, Ventilation, and Air Conditioning

    Capacity must compensate for up to 5100 BTUs/hr for each installed frame.

    Calculate HVAC capacity as follows:

    Determine the wattage of the installed equipment. Use the formula: watts x 3.143 = BTUs/hr

    Note:

    Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. The alarm may take up to five minutes to clear after conditions improve. It may take about ten minutes after the room returns to an acceptable temperature before syscheck shows the alarm cleared.
  3. Verify that the temperature in the room is normal. If it is too hot, lower the temperature in the room to an acceptable level.

    Note:

    Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. It may take about ten minutes after the room returns to an acceptable temperature before the alarm cleared.
  4. Run syscheck Check to see if the alarm has cleared
    • If the alarm has been cleared, the problem is resolved.
    • If the alarm has not been cleared, continue with the next step.
  5. Run syscheck Check to see if the alarm has cleared
    • If the alarm has been cleared, the problem is resolved.
    • If the alarm has not been cleared, continue with the next step.
  6. Replace the filter (refer to the appropriate hardware manual).

    Note:

    Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. The alarm may take up to five minutes to clear after conditions improve. It may take about ten minutes after the filter is replaced before syscheck shows the alarm cleared.
  7. Run syscheck.
    • If the alarm has been cleared, the problem is resolved.
    • If the alarm has not been cleared, continue with the next step.
  8. If the problem has not been resolved, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
3000000000008000 - Server Mainboard Voltage Error

This alarm indicates that at least one monitored voltages on the server mainboard is not within the normal operating range.

Recovery

  1. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
3000000000010000 - Server Power Feed Error

This alarm indicates that one of the power feeds to the server has failed.

Recovery

  1. Locate the server supplied by the faulty power feed. Verify that all connections to the power supply units are connected securely. To determine where the cables connect to the servers, see the Power Connections and Cables page of the NAS on LSMS E5-APP-B Interconnect.
  2. Run syscheck.
    1. If the alarm is cleared, the problem is resolved.
    2. If the alarm is not cleared, go to the next step.
  3. Trace the power feed to its connection on the power source.
    Verify that the power source is on and that the power feed is properly secured.
  4. Run syscheck.
    1. If the alarm is cleared, the problem is resolved.
    2. If the alarm is not cleared, go to the next step.
  5. If the power source is functioning properly and all connections are secure, request that an electrician check the voltage on the power feed.
  6. Run syscheck.
    1. If the alarm is cleared, the problem is resolved.
    2. If the alarm is not cleared, go to the next step.
  7. If the problem is not resolved, call the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
  8. Run savelogs_plat to gather system information for further troubleshooting, (see Saving Logs Using the LSMS GUI or Command Line), and contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
3000000000020000 - Server Disk Health Test Error

This alarm indicates that the hard drive has failed or failure is imminent.

Recovery

  1. Immediately contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance with a disk replacement.
3000000000040000 - Server Disk Unavailable Error

This alarm indicates that the smartd service is not able to read the disk status because the disk has other problems that are reported by other alarms. This alarm appears only while a server is booting.

Recovery

  1. Perform the recovery procedures for the other alarms that accompany this alarm.
3000000000080000 - Device Error

This alarm indicates that the offboard storage server has a problem with its disk volume filling.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
3000000000100000 - Device Interface Error

This alarm indicates that the IP bond is either not configured or not functioning.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
3000000400000000 - Multipath device access link problem

Alarm Type: TPD

Description: One or more "access paths" of a multipath device are failing or are not healthy, or the multipath device does not exist.

Severity: Major

OID: TpdMpathDeviceProblemNotify1.3.6.1.4.1.323.5.3.18.3.1.2.35

Alarm ID: TKSPLATMA353000000400000000

Recovery

  1. unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 should do the following:
    1. Check in the MSA administration console (web-application) that correct "volumes" on MSA exist, and read/write access is granted to the blade server.
    2. Check if multipath daemon/service is running on the blade server: service multipathd status. Resolution:
      1. start multipathd: service multipathd start
    3. Check output of "multipath -ll": it shows all multipath devices existing in the system and their access paths; check that particular /dev/sdX devices exist. This may be due to SCSI bus and/or FC HBAs haven't been rescanned to see if new devices exist. Resolution:
      1. run "/opt/hp/hp_fibreutils/hp_rescan -a",
      2. "echo 1 > /sys/class/fc_host/host*/issue_lip",
      3. "echo '- - -' > /sys/class/scsi_host/host*/scan"
    4. Check if syscheck::disk::multipath test is configured to monitor right multipath devices and its access paths: see output of "multipath -ll" and compare them to "syscheckAdm disk multipath - -get - -var=MPATH_LINKS" output. Resolution:
      1. configure disk::multipath check correctly.
  2. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
3000000800000000 – Switch Link Down Error

This alarm indicates that the switch is reporting that the link is down. The link that is down is reported in the alarm. For example, port 1/1/2 is reported as 1102.

Recovery Procedure:

  1. Verify cabling between the offending port and remote side.
  2. Verify networking on the remote end.
  3. If problem persists, contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 to verify port settings on both the server and the switch.
3000001000000000 - Half-open Socket Limit

Alarm Type: TPD

Description:This alarm indicates that the number of half open TCP sockets has reached the major threshold. This problem is caused by a remote system failing to complete the TCP 3-way handshake.

Severity: Major

OID: tpdHalfOpenSocketLimit 1.3.6.1.4.1.323.5.3.18.3.1.2.37

Alarm ID: TKSPLATMA37 3000001000000000

Recovery

  1. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
3000002000000000 - Flash Program Failure

Alarm Type: TPD

Description: This alarm indicates there was an error while trying to update the firmware flash on the E5-APP-B cards.

Severity: Major

OID: tpdFlashProgramFailure 1.3.6.1.4.1.323.5.3.18.3.1.2.38

Alarm ID: TKSPLATMA383000002000000000

Recovery

  1. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
3000004000000000 - Serial Mezzanine Unseated

Alarm Type: TPD

Description:This alarm indicates the serial mezzanine board was not properly seated.

Severity: Major

OID: tpdSerialMezzUnseated 1.3.6.1.4.1.323.5.3.18.3.1.2.39

Alarm ID: TKSPLATMA393000004000000000

Recovery

  1. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
3000000008000000 - Server HA Keepalive Error

This alarm indicates that heartbeat process has detected that it has failed to receive a heartbeat packet within the timeout period.

Recovery

  1. Determine if the mate server is currently operating. If the mate server is not operating, attempt to restore it to operation.
  2. Determine if the keepalive interface is operating.
  3. Determine if heartbeart is running (service TKLCha status).
  4. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
3000000010000000 - DRBD block device can not be mounted

This alarm indicates that DRBD is not functioning properly on the local server. The DRBD state (disk state, node state, or connection state) indicates a problem.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
3000000020000000 - DRBD block device is not being replicated to peer

This alarm indicates that DRBD is not replicating to the peer server. Usually this alarm indicates that DRBD is not connected to the peer server. A DRBD Split Brain may have occurred.

Recovery

  1. Determine if the mate server is currently operating.
  2. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
3000000040000000 - DRBD peer needs intervention

This alarm indicates that DRBD is not functioning properly on the peer server. DRBD is connected to the peer server, but the DRBD state on the peer server is either unknown or indicates a problem.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.

Minor Platform Alarms

Minor platform alarms involve disk space, application processes, RAM, and configuration errors.

5000000000000001 - Server Disk Space Shortage Warning

This alarm indicates that one of the following conditions has occurred:

  • A file system has exceeded a warning threshold, which means that more than 80% (but less than 90%) of the available disk storage has been used on the file system.

  • More than 80% (but less than 90%) of the total number of available files have been allocated on the file system.

Recovery

  1. Run syscheck.
  2. Examine the syscheck output to determine if the file system /var/TKLC/lsms/free is low on space. If it is, continue to the next step; otherwise go to Step 4.
  3. If possible, recover space on the free partition by deleting unnecessary files:
    1. Log in to the server generating the alarm as the root user:

      Login: root

      Password:<Enter root password>

    2. Change to the /var/TKLC/lsms/free directory: # cd /var/TKLC/lsms/free
    3. Confirm that you are in the /var/TKLC/lsms/free directory: # pwd /var/TKLC/lsms/free
    4. When the pwd command is executed, if /var/TKLC/lsms/free is not output, go back to Sub-step b
    5. List files to be deleted and delete them using the rm command
    6. Re-run syscheck
    If the alarm is cleared, the problem is solved. If the alarm is not cleared, go to the next Step.
  4. Run savelogs to gather all application logs (see Saving Logs Using the LSMS GUI or Command Line).
  5. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
5000000000000002 - Server Application Process Error

This alarm indicates that either the minimum number of instances for a required process are not currently running or too many instances of a required process are running.

Recovery

  1. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
  2. If a 3000000000000020 - Server Platform Process Error alarm is also present, execute the recovery procedure associated with that alarm before proceeding.
  3. Log in to the LSMS CLI using root password.
  4. Stop the LSMS application.
  5. Start the LSMS Application.
  6. Capture the log files on both LSMSs (see Saving Logs Using the LSMS GUI or Command Lineand contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
5000000000000004 - Server Hardware Configuration Error
This alarm indicates that one or more of the server’s hardware components are not in compliance with proper specifications (refer to Application B Card Hardware and Installation Guide.

Recovery

  1. Run syscheck in verbose mode.
  2. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000000000008 - Server RAM Shortage Warning

This alarm indicates one of two conditions:

  • Less memory than the expected amount is installed.
  • The system is swapping pages in and out of physical memory at a fast rate, indicating a possible degradation in system performance.

This alarm may not clear immediately when conditions fall below the alarm threshold. Conditions must be below the alarm threshold consistently for the alarm to clear. The alarm may take up to five minutes to clear after conditions improve.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000000000020 - Server Swap Space Shortage Warning

This alarm indicates that the swap space available on the server is less than expected. This is usually caused by a process that has allocated a very large amount of memory over time.

Note:

In order for this alarm to clear, the underlying failure condition must be consistently undetected for a number of polling intervals. Therefore, the alarm may continue to be reported for several minutes after corrective actions are completed.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000000000040 - Server Default Router Not Defined

This alarm indicates that the default network route is either not configured or the current configuration contains an invalid IP address or hostname.

Caution:

When changing the server’s network routing configuration it is important to verify that the modifications will not impact the method of connectivity for the current login session. It is also crucial that this information not be entered incorrectly or set to improper values. Incorrectly modifying the server’s routing configuration may result in total loss of remote network access.

Recovery

  1. To define the default router:
    1. Obtain the proper Provisioning Network netmask and the IP address of the appropriate Default Route on the provisioning network. These are maintained by the customer network administrators.
    2. Log in to the LSMS CLI from lsmspri server with username root and run su - lsmsmgr
    3. Select Network Configuration Menu, from the LSMS Configuration Menu
    4. Select Network Reconfiguration Menu from the Network Configuration Menu. The following warning appears:
      WARNING: This action is service impacting. Are you sure?
    5. Chose yes. This displays the configuration screen. See the Configuration Guide for Initial Configuration information.
    6. Do the configuration.
    7. Exit from the lsmsmgr menu.
    8. Run syscheck again. If the alarm has not been cleared, go to Sub-step j.
    9. Run savelogs to gather all application logs.
    10. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
5000000000000080 – Server temperature warning

Alarm Type: TPD

Description: This alarm indicates that the internal temperature within the server is outside of the normal operating range. A server Fan Failure may also exist along with the Server Temperature Warning.

Severity: Minor

OID: tpdTemperatureWarningNotify 1.3.6.1.4.1.323.5.3.18.3.1.3.8

Alarm ID: TKSPLATMI85000000000000080

Recovery

  1. Ensure that nothing is blocking the fan's intake. Remove any blockage.
  2. Verify that the temperature in the room is normal. If it is too hot, lower the temperature in the room to an acceptable level.

    Table B-181 Server Environmental Conditions

    Ambient Temperature

    Operating: 5 degrees C to 40 degrees C

    Exceptional Operating Limit: 0 degrees C to 50 degrees C

    Storage: -20 degrees C to 60 degrees C

    Relative Humidity

    Operating: 5% to 85% non-condensing

    Storage: 5% to 950% non-condensing

    Elevation

    Operating: -300m to +300m

    Storage: -300m to +1200m

    Heating, Ventilation, and Air Conditioning

    Capacity must compensate for up to 5100 BTUs/hr for each installed frame.

    Calculate HVAC capacity as follows:

    Determine the wattage of the installed equipment. Use the formula: watts x 3.143 = BTUs/hr

    Note:

    Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. The alarm may take up to five minutes to clear after conditions improve. It may take about ten minutes after the room returns to an acceptable temperature before syscheck shows the alarm cleared.
  3. Verify that the temperature in the room is normal. If it is too hot, lower the temperature in the room to an acceptable level.

    Note:

    Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. It may take about ten minutes after the room returns to an acceptable temperature before the alarm cleared.
  4. Run syscheck to see if the alarm has cleared
    • If the alarm has been cleared, the problem is resolved.
    • If the alarm has not been cleared, continue with the next step.
  5. Replace the filter (refer to the appropriate hardware manual).

    Note:

    Be prepared to wait the appropriate period of time before continuing with the next step. Conditions need to be below alarm thresholds consistently for the alarm to clear. It may take about ten minutes after the filter is replaced before the alarm cleared.
  6. Run syscheck to see if the alarm has cleared
5000000000000100 - Server Core File Detected

This alarm indicates that an application process has failed and debug information is available.

Recovery

  1. Run syscheck in verbose mode.
  2. Run savelogs to gather system information (see Saving Logs Using the LSMS GUI or Command Line).
  3. Contact the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
5000000000000200 - Server NTP Daemon Not Synchronized

This alarm indicates that the NTP daemon (background process) has been unable to locate a server to provide an acceptable time reference for synchronization.

Severity: Minor

Alarm ID: TKSPLATMI10

Recovery

  1. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
5000000000000400 - Server CMOS Battery Voltage Low

The presence of this alarm indicates that the CMOS battery voltage has been detected to be below the expected value. This alarm is an early warning indicator of CMOS battery end-of-life failure which will cause problems in the event the server is powered off.

Recovery

  1. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
5000000000000800 - Server Disk Self Test Warning

A non-fatal disk issue (such as a sector cannot be read) exists.

Recovery

  1. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
5000000000001000 - Device Warning

This alarm indicates that either a snmpget cannot be performed on the configured SNMP OID or the returned value failed the specified comparison operation.

Recovery

  1. Run syscheck in Verbose mode.
  2. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000000002000 - Device Interface Warning

This alarm can be generated by either an SNMP trap or an IP bond error. If syscheck is configured to receive SNMP traps, this alarm indicates that a SNMP trap was received with the set state. If syscheck is configured for IP bond monitoring, this alarm can mean that a slave device is not operating, a primary device is not active, or syscheck is unable to read bonding information from interface configuration files.

Recovery

  1. Run syscheck in Verbose mode.
  2. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000000004000 - Server Reboot Watchdog Initiated

This alarm indicates that the server has been rebooted due to a hardware watchdog.

Recovery

  1. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
    This condition should never happen.
5000000000008000 - Server HA Failover Inhibited

This alarm indicates that the server has been inhibited and HA failover is prevented from occurring.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000000010000 - Server HA Active To Standby Transition

This alarm indicates that the server is in the process of transitioning HA state from Active to Standby.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000000020000 - Server HA Standby To Active Transition

This alarm indicates that the server is in the process of transitioning HA state from Standby to Active.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000000040000 - Platform Health Check Failure

This alarm indicates a syscheck configuration error.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000000080000 - NTP Offset Check Failure

This alarm indicates that time on the server is outside the acceptable range or offset from the NTP server. The alarm message provides the offset value of the server from the NTP server and the offset limit set for the system by the application.

Alarm Type: TPD

Severity: Minor

Alarm ID: TKSPLATMI20

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000000100000 - NTP Stratum Check Failure

This alarm indicates that NTP is syncing to a server, but the stratum level of the NTP server is outside the acceptable limit. The alarm message provides the stratum value of the NTP server and the stratum limit set for the system by the application.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000020000000 – Server Kernel Dump File Detected

Alarm Type: TPD

Description: This alarm indicates that the kernel has crashed and debug information is available.

Severity: Minor

OID: 1.3.6.1.4.1.323.5.3.18.3.1.3.30

Alarm ID: TKSPLATMI305000000020000000

Recovery

  1. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
5000000040000000 – TPD Upgrade Failed

Alarm Type: TPD

Description: This alarm indicates that a TPD upgrade has failed.

Severity: Minor

OID: tpdServerUpgradeFailDetectedNotify 1.3.6.1.4.1.323.5.3.18.3.1.3.31

Alarm ID: TKSPLATMI315000000040000000

Recovery

  1. Run the following command to clear the alarm.
    /usr/TKLC/plat/bin/alarmMgr –clear TKSPLATMI31
  2. Contact unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66.
5000000080000000 - Half Open Socket Warning Limit

Alarm Type: TPD

This alarm indicates that the number of half open TCP sockets has reached the major threshold. This problem is caused by a remote system failing to complete the TCP 3-way handshake.

Severity: Minor

OID: tpdHalfOpenSocketWarningNotify1.3.6.1.4.1.323.5.3.18.3.1.3.32

Alarm ID: TKSPLATMI325000000080000s000

Recovery

  1. Run syscheck.
  2. Contact the Customer Care Center and provide the system health check output.
5000000000200000 - SAS Presence Sensor Missing
This alarm indicates that the server drive sensor is not working.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance with a replacement server.
5000000000400000 - SAS Drive Missing

This alarm indicates that the number of drives configured for this server is not being detected.

Recovery

  1. Call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 to determine if the alarm is caused by a failed drive or failed configuration.
5000000000800000 - DRBD failover busy

This alarm indicates that a DRBD sync is in progress from the peer server to the local server. The local server is not ready to bethe primary DRBD node because its data is not current.

Recovery

  1. Wait for approximately 20 minutes, then check if the DRBD sync has completed. A DRBD sync should take no more than 15 minutes to complete.
  2. If the alarm persists longer than this time interval, call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 for assistance.
5000000001000000 - HP disk resync

This alarm indicates that the HP disk subsystem is currently resyncing after a failed or replaced drive, or after another change in the configuration of the HP disk subsystem. The output of the message will include the disk that is resyncing and the percentage complete. This alarm eventually clears after the resync of the disk is completed. The time to clear is dependant on the size of the disk and the amount of activity on the system..

Recovery

  1. Run syscheck in Verbose mode.
  2. If the percent recovering is not updating, wait at least 5 minutes between subsequent runs of syscheck, then call unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 with the syscheck output.

Saving Logs Using the LSMS GUI or Command Line

During some corrective procedures, it may be necessary to provide Oracle Communications with information about the LSMS for help in clearing an alarm. These log files are used to aid the unresolvable-reference.html#GUID-646F2C79-C167-4B5A-A8DF-7ED0EAA9AD66 when troubleshooting the LSMS.

Use the following procedure to save logs using menu selections from the LSMS GUI.

  1. Log in to the User Interface screen of the LSMS GUI (see Starting an Web-Based LSMS GUI Session).
  2. From the menu, select Logs>Capture Logs.

    img/t_savelogs.jpg
  3. Select the number of days for which you want to capture the logs, as well as the specific logs, and click OK.
  4. To capture logs from the Command Line, enter the following command: /usr/TKLC/plat/sbin/savelogs_plat