8 Critical and Major Alarms Analysis

The following procedure identifies critical and major alarms that should be resolved before proceeding with an upgrade and backout.

Note:

During any time of upgrade if the 31149- DB Late Write Nonactive alarm displays, ignore it. This alarm does not have any effect on functionality.
  1. Log/View all current alarms at the NOAM
    1. Navigate to Alarms & Events, then View Active.
    2. Click Report to generate an Alarms report.
    3. Save the report and/or print the report.
  2. Analyze the Active Alarms Data
    Refer to the Table 8-1 and Table 8-2 for the list of alarms.

    Note:

    If any alarms listed in the Table 8-1 and Table 8-2 displays in the system, resolve the alarms before starting the upgrade.

    Refer to DSR Alarms and KPIs Reference for specific alarm in-depth details.

    Following are the two categories of alarms.
    • High impact alarms

      It's almost certain that the presence of this alarm ID in the active alarm list should prevent upgrade from continuing. Alarms of this category should be resolved before upgrading.

    • Medium impact alarms

      It's possible the presence of this alarm ID should prevent upgrade from continuing; concurrence needed. Alarms of this category may/may not be resolved before upgrading.

    Some ideas of inclusion of alarms in the categories include.
    • Any alarm indicating an actual hardware error, or an impending/potential hardware error, is automatically mentioned in high impact alarm list. Included in this category are all Platform Group alarms (PLAT) of severity Minor, Major, and Critical.
    • If an alarm ID indicates some sort of (pending) resource exhaustion issue or other threshold crossed condition, it is almost always mentioned in Medium impact alarms. Resource exhaustion states have to be fixed before upgrading.

    Table 8-1 High Impact Alarms

    Alarm ID Name
    5010 Unknown Linux iptables command error
    5011 System or platform error prohibiting operation
    10000 Incompatible database version
    10134 Server Upgrade Failed
    10200 Remote database initialization in progress
    19217 Node isolated - all links down
    19805 Communication Agent Failed to Align Connection
    19855 Communication Agent Resource Has Multiple Actives
    19901 CFG-DB Validation Error
    19902 CFG-DB Update Failure
    19903 CFG-DB post-update Error
    19904 CFG-DB post-update Failure
    22223 MpMemCongested
    22950 Connection Status Inconsistency Exists
    22961 Insufficient Memory for Feature Set
    22733 SBR Failed to Free Binding Memory After PCRF Pooling Binding Migration
    22734 Policy and Charging Unexpected Stack Event Version
    25500 No DA-MP Leader Detected
    25510 Multiple DA-MP Leader Detected
    31101 Database replication to slave failure
    31116 Excessive shared memory
    31117 Low disk free
    31125 Database durability degraded
    31128 ADIC Found Error
    31133 DB Replication Switchover Exceeds Threshold
    31215 Process resources exceeded
    31288 HA Site Configuration Error
    32100 Breaker Panel Feed Unavailable
    32101 Breaker Panel Breaker Failure
    32102 Breaker Panel Monitoring Failure
    32103 Power Feed Unavailable
    32104 Power Supply 1 Failure
    32105 Power Supply 2 Failure
    32106 Power Supply 3 Failure
    32107 Raid Feed Unavailable
    32108 Raid Power 1 Failure
    32109 Raid Power 2 Failure
    32110 Raid Power 3 Failure
    32111 Device Failure
    32112 Device Interface Failure
    32113 Uncorrectable ECC memory error
    32114 SNMP get failure
    32115 TPD NTP Daemon Not Synchronized Failure
    32116 TPD Server's Time Has Gone Backwards
    32117 TPD NTP Offset Check Failure
    32300 Server fan failure
    32301 Server internal disk error
    32302 Server RAID disk error
    32303 Server Platform error
    32304 Server file system error
    32305 Server Platform process error
    32306 Server RAM shortage error
    32307 Server swap space shortage failure
    32308 Server provisioning network error
    32309 Eagle Network A Error
    32310 Eagle Network B Error
    32311 Sync Network Error
    32312 Server disk space shortage error
    32313 Server default route network error
    32314 Server temperature error
    32315 Server mainboard voltage error
    32316 Server power feed error
    32317 Server disk health test error
    32318 Server disk unavailable error
    32319 Device error
    32320 Device interface error
    32321 Correctable ECC memory error
    32322 Power Supply A error
    32323 Power Supply B error
    32324 Breaker panel feed error
    32325 Breaker panel breaker error
    32326 Breaker panel monitoring error
    32327 Server HA Keep alive error
    32328 DRBD is unavailable
    32329 DRBD is not replicating
    32330 DRBD peer problem
    32331 HP disk problem
    32332 HP Smart Array controller problem
    32333 HP hpacucliStatus utility problem
    32334 Multipath device access link problem
    32335 Switch link down error
    32336 Half Open Socket Limit
    32337 Flash Program Failure
    32338 Serial Mezzanine Unseated
    32339 TPD Max Number Of Running Processes Error
    32340 TPD NTP Daemon Not Synchronized Error
    32341 TPD NTP Daemon Not Synchronized Error
    32342 NTP Offset Check Error
    32343 TPD RAID disk
    32344 TPD RAID controller problem
    32345 Server Upgrade snapshot(s) invalid
    32346 OEM hardware management service reports an error
    32347 The hwmgmtcliStatus daemon needs intervention
    32348 FIPS subsystem problem
    32349 File Tampering
    32350 Security Process Terminated
    32500 Server disk space shortage warning
    32501 Server application process error
    32502 Server hardware configuration error
    32503 Server RAM shortage warning
    32504 Software Configuration Error
    32505 Server swap space shortage warning
    32506 Server default router not defined
    32507 Server temperature warning
    32508 Server core file detected
    32509 Server NTP Daemon not synchronized
    32510 CMOS battery voltage low
    32511 Server disk self-test warning
    32512 Device warning
    32513 Device interface warning
    32514 Server reboot watchdog initiated
    32515 Server HA failover inhibited
    32516 Server HA Active to Standby transition
    32517 Server HA Standby to Active transition
    32518 Platform Health Check failure
    32519 NTP Offset Check failure
    32520 NTP Stratum Check failure
    32521 SAS Presence Sensor Missing
    32522 SAS Drive Missing
    32523 DRBD failover busy
    32524 HP disk resync
    32525 Telco Fan Warning
    32526 Telco Temperature Warning
    32527 Telco Power Supply Warning
    32528 Invalid BIOS value
    32529 Server Kernel Dump File Detected
    32530 TPD Upgrade Failed
    32531 Half Open Socket Warning Limit
    32532 Server Upgrade Pending Accept/Reject
    32533 TPD Max Number Of Running Processes Warning
    32534 TPD NTP Source Is Bad Warning
    32535 TPD RAID disk resync
    32536 TPD Server Upgrade snapshot(s) warning
    32537 FIPS subsystem warning event
    32538 Platform Data Collection Error
    32539 Server Patch Pending Accept/Reject
    32540 CPU Power limit mismatch

    Table 8-2 Medium Impact Alarms

    Alarm ID Name
    5002 IPFE Address configuration error
    5003 IPFE state sync run error
    5004 IPFE IP tables configuration error
    5006 Error reading from Ethernet device
    5012 Signaling interface heartbeat timeout
    5013 Throttling traffic
    5100 Traffic overload
    5101 CPU Overload
    5102 Disk Becoming Full
    5103 Memory Overload
    10003 Database backup failed
    10006 Database restoration failed
    10020 Backup failure
    10117 Health Check Failed
    10118 Health Check Not Run
    10121 Server Group Upgrade Cancelled - Validation Failed
    10123 Server Group Upgrade Failed
    10131 Server Upgrade Cancelled (Validation Failed)
    10133 Server Upgrade Failed
    10141 Site Upgrade Cancelled (Validation Failed)
    10143 Site Upgrade Failed
    19200 RSP/Destination unavailable
    19202 Linkset unavailable
    19204 Preferred route unavailable
    19246 Local SCCP subsystem prohibited
    19251 Ingress message rate
    19252 PDU buffer pool utilization
    19253 SCCP stack event queue utilization
    19254 M3RL stack event queue utilization
    19255 M3RL network management event queue utilization
    19256 M3UA stack event queue utilization
    19258 SCTP Aggregate Egress queue utilization
    19251 Ingress message rate
    19252 PDU buffer pool utilization
    19253 SCCP stack event queue utilization
    19254 M3RL stack event queue utilization
    19255 M3RL network management event queue utilization
    19256 M3UA stack event queue utilization
    19258 SCTP Aggregate Egress queue utilization
    19272 TCAP active dialogue utilization
    19273 TCAP active operation utilization
    19274 TCAP stack event queue utilization
    19276 SCCP Egress Message Rate
    19408 Single Transport Egress-Queue Utilization
    19800 Communication Agent Connection Down
    19803 Communication Agent stack event queue utilization
    19806 Communication Agent CommMessage mempool utilization
    19807 Communication Agent User Data FIFO Queue Utilization
    19808 Communication Agent Connection FIFO Queue utilization
    19818 Communication Agent DataEvent Mempool utilization
    19820 Communication Agent Routed Service Unavailable
    19824 Communication Agent Pending Transaction Utilization
    19825 Communication Agent Transaction Failure Rate
    19827 SMS stack event queue utilization
    19846 Communication Agent Resource Degraded
    19847 Communication Agent Resource Unavailable
    19848 Communication Agent Resource Error
    19860 Communication Agent Configuration Daemon Table Monitoring Failure
    19861 Communication Agent Configuration Daemon Script Failure
    19862 Communication Agent Ingress Stack Event Rate
    19900 Process CPU Utilization
    19905 Measurement Initialization Failure
    19910 Message Discarded at Test Connection
    8000-001 MpEvFsmException_SocketFailure
    8000-002 MpEvFsmException_BindFailure
    8000-003 MpEvFsmException_OptionFailure
    8000-101 MpEvFsmException_ListenFailure
    8002-003 MpEvRxException_CpuCongested
    8002-004 MpEvRxException_SigEvPoolCongested
    8002-006 MpEvRxException_DstMpCongested
    8002-007 MpEvRxException_DrlReqQueueCongested
    8002-008 MpEvRxException_DrlAnsQueueCongested
    8002-009 MpEvRxException_ComAgentCongested
    8002-203 MpEvRxException_RadiusMsgPoolCongested
    8006-101 EvFsmException_SocketFailure
    8011 EcRate
    8013 MpNgnPsStateMismatch
    8200 MpRadiusMsgPoolCongested
    8201 RclRxTaskQueueCongested
    8202 RclItrPoolCongested
    8203 RclTxTaskQueueCongested
    8204 RclEtrPoolCongested
    22016 Peer Node Alarm Aggregation Threshold
    22017 Route List Alarm Aggregation Threshold
    22056 Connection Admin State Inconsistency Exists
    22200 MpCpuCongested
    22201 MpRxAllRate
    22202 MpDiamMsgPoolCongested
    22203 PTR Buffer Pool Utilization
    22204 Request Message Queue Utilization
    22205 Answer Message Queue Utilization
    22206 Reroute Queue Utilization
    22207 DclTxTaskQueueCongested
    22208 DclTxConnQueueCongested
    22214 Message Copy Queue Utilization
    22221 Routing MPS Rate
    22222 Long Timeout PTR Buffer Pool Utilization
    22349 IPFE Conneetion Alarm Aggregation Threshold
    22350 Fixed Connection Alarm Aggregation Threshold
    22407 Routing attempt failed duto internal database inconsistency failure
    22500 DSR Application Unavailable
    22501 DSR Application Degraded
    22502 DSR Application Request Message Queue Utilization
    22503 DSR Application Answer Message Queue Utilization
    22504 DSR Application Ingress Message Rate
    22607 Routing attempt failed due to DRL queue exhaustion
    22608 Database query could not be sent due to DB congestion
    22609 Database connection exhausted
    22631 FABR DP Response Task Message Queue Utilization
    22632 COM Agent Registration Failure
    22703 Diameter Message Routing Failure Due to Full DRL Queue
    22710 SBR Sessions Threshold Exceeded
    22711 SBR Database Error
    22712 SBR Communication Error
    22717 SBR Alternate Key Creation Failure Rate
    22720 Policy SBR To PCA Response Queue Utilization Threshold Exceeded
    22721 Policy and Charging Server In Congestion
    22722 Policy Binding Sub-resource Unavailable
    22723 Policy and Charging Session Sub-resource Unavailable
    22724 SBR Memory Utilization Threshold Exceeded
    22725 SBR Server In Congestion
    22726 SBR Queue Utilization Threshold Exceeded
    22727 SBR Initialization Failure
    22728 SBR Bindings Threshold Exceeded
    22729 PCRF Not Configured
    22730 Policy and Charging Configuration Error
    22731 Policy and Charging Database Inconsistency
    22732 SBR Process CPU Utilization Threshold Exceeded
    22737 Configuration Database Not Synced
    22740 SBR Reconfiguration Plan Completion Failure
    31100 Database replication fault
    31102 Database replication from master failure
    31103 DB Replication update fault
    31104 DB Replication latency over threshold
    31106 Database merge to parent failure
    31107 Database merge from child failure
    31108 Database merge latency over threshold
    31113 DB replication manually disabled
    31114 DB replication over SOAP has failed
    31118 Database disk store fault
    31121 Low disk free early warning
    31122 Excessive shared memory early warning
    31124 ADIC error
    31126 Audit blocked
    31130 Network health warning
    31131 DB Ousted Throttle Behind
    31134 DB Site Replication To Slave Failure
    31135 DB Site Replication to Master Failure
    31137 DB Site Replication Latency Over Threshold
    31146 DB mastership fault
    31147 DB upsynclog overrun
    31200 Process management fault
    31201 Process not running
    31202 Unkillable zombie process
    31209 Hostname lookup failed
    31217 Network Health Warning
    31220 HA configuration monitor fault
    31113 DB replication manually disabled
    31114 DB replication over SOAP has failed
    31118 Database disk store fault
    31121 Low disk free early warning
    31122 Excessive shared memory early warning
    31124 ADIC error
    31126 Audit blocked
    31130 Network health warning
    31131 DB Ousted Throttle Behind
    31134 DB Site Replication To Slave Failure
    31135 DB Site Replication to Master Failure
    31137 DB Site Replication Latency Over Threshold
    31146 DB mastership fault
    31147 DB upsynclog overrun
    31200 Process management fault
    31201 Process not running
    31202 Unkillable zombie process
    31209 Hostname lookup failed
    31217 Network Health Warning
    31220 HA configuration monitor fault
    31221 HA alarm monitor fault
    31222 HA not configured
    31233 HA Heartbeat transmit failure
    31224 HA configuration error
    31225 HA service start failure
    31226 HA availability status degraded
    31228 HA standby offline
    31230 Recent alarm processing fault
    31231 Platform alarm agent fault
    31233 HA Path Down
    31234 Untrusted Time Upon Initialization
    31234 Untrusted time After Initialization
    31236 HA Link Down
    31282 HA Management Fault
    31283 Lost Communication with server
    31322 HA Configuration Error
    33001 Diameter-to-MAP Service Registration Failure on DA-MP
    33105 Routing Attempt failed due to queue exhaustion
    33120 Policy SBR Binding Sub-Resource Unavailable
    33301 Update Config Data Failure
    33303 U-SBR Event Queue Utilization
    33310 U-SBR Sub-resource Unavailable
    33312 DCA Script Generation Error
    33301 Update Config Data Failure