- DSR Cloud Software Upgrade Guide
- Critical and Major Alarms Analysis
8 Critical and Major Alarms Analysis
Note:
During any time of upgrade if the 31149- DB Late Write Nonactive alarm displays, ignore it. This alarm does not have any effect on functionality.- Log/View all current alarms at the NOAM
- Navigate to Alarms & Events, then View Active.
- Click Report to generate an Alarms report.
- Save the report and/or print the report.
- Analyze the Active Alarms DataRefer to the Table 8-1 and Table 8-2 for the list of alarms.
Note:
If any alarms listed in the Table 8-1 and Table 8-2 displays in the system, resolve the alarms before starting the upgrade.Refer to DSR Alarms and KPIs Reference for specific alarm in-depth details.
Following are the two categories of alarms.- High impact alarms
It's almost certain that the presence of this alarm ID in the active alarm list should prevent upgrade from continuing. Alarms of this category should be resolved before upgrading.
- Medium impact alarms
It's possible the presence of this alarm ID should prevent upgrade from continuing; concurrence needed. Alarms of this category may/may not be resolved before upgrading.
- Any alarm indicating an actual hardware error, or an impending/potential hardware error, is automatically mentioned in high impact alarm list. Included in this category are all Platform Group alarms (PLAT) of severity Minor, Major, and Critical.
- If an alarm ID indicates some sort of (pending) resource exhaustion issue or other threshold crossed condition, it is almost always mentioned in Medium impact alarms. Resource exhaustion states have to be fixed before upgrading.
Table 8-1 High Impact Alarms
Alarm ID Name 5010 Unknown Linux iptables command error 5011 System or platform error prohibiting operation 10000 Incompatible database version 10134 Server Upgrade Failed 10200 Remote database initialization in progress 19217 Node isolated - all links down 19805 Communication Agent Failed to Align Connection 19855 Communication Agent Resource Has Multiple Actives 19901 CFG-DB Validation Error 19902 CFG-DB Update Failure 19903 CFG-DB post-update Error 19904 CFG-DB post-update Failure 22223 MpMemCongested 22950 Connection Status Inconsistency Exists 22961 Insufficient Memory for Feature Set 22733 SBR Failed to Free Binding Memory After PCRF Pooling Binding Migration 22734 Policy and Charging Unexpected Stack Event Version 25500 No DA-MP Leader Detected 25510 Multiple DA-MP Leader Detected 31101 Database replication to slave failure 31116 Excessive shared memory 31117 Low disk free 31125 Database durability degraded 31128 ADIC Found Error 31133 DB Replication Switchover Exceeds Threshold 31215 Process resources exceeded 31288 HA Site Configuration Error 32100 Breaker Panel Feed Unavailable 32101 Breaker Panel Breaker Failure 32102 Breaker Panel Monitoring Failure 32103 Power Feed Unavailable 32104 Power Supply 1 Failure 32105 Power Supply 2 Failure 32106 Power Supply 3 Failure 32107 Raid Feed Unavailable 32108 Raid Power 1 Failure 32109 Raid Power 2 Failure 32110 Raid Power 3 Failure 32111 Device Failure 32112 Device Interface Failure 32113 Uncorrectable ECC memory error 32114 SNMP get failure 32115 TPD NTP Daemon Not Synchronized Failure 32116 TPD Server's Time Has Gone Backwards 32117 TPD NTP Offset Check Failure 32300 Server fan failure 32301 Server internal disk error 32302 Server RAID disk error 32303 Server Platform error 32304 Server file system error 32305 Server Platform process error 32306 Server RAM shortage error 32307 Server swap space shortage failure 32308 Server provisioning network error 32309 Eagle Network A Error 32310 Eagle Network B Error 32311 Sync Network Error 32312 Server disk space shortage error 32313 Server default route network error 32314 Server temperature error 32315 Server mainboard voltage error 32316 Server power feed error 32317 Server disk health test error 32318 Server disk unavailable error 32319 Device error 32320 Device interface error 32321 Correctable ECC memory error 32322 Power Supply A error 32323 Power Supply B error 32324 Breaker panel feed error 32325 Breaker panel breaker error 32326 Breaker panel monitoring error 32327 Server HA Keep alive error 32328 DRBD is unavailable 32329 DRBD is not replicating 32330 DRBD peer problem 32331 HP disk problem 32332 HP Smart Array controller problem 32333 HP hpacucliStatus utility problem 32334 Multipath device access link problem 32335 Switch link down error 32336 Half Open Socket Limit 32337 Flash Program Failure 32338 Serial Mezzanine Unseated 32339 TPD Max Number Of Running Processes Error 32340 TPD NTP Daemon Not Synchronized Error 32341 TPD NTP Daemon Not Synchronized Error 32342 NTP Offset Check Error 32343 TPD RAID disk 32344 TPD RAID controller problem 32345 Server Upgrade snapshot(s) invalid 32346 OEM hardware management service reports an error 32347 The hwmgmtcliStatus daemon needs intervention 32348 FIPS subsystem problem 32349 File Tampering 32350 Security Process Terminated 32500 Server disk space shortage warning 32501 Server application process error 32502 Server hardware configuration error 32503 Server RAM shortage warning 32504 Software Configuration Error 32505 Server swap space shortage warning 32506 Server default router not defined 32507 Server temperature warning 32508 Server core file detected 32509 Server NTP Daemon not synchronized 32510 CMOS battery voltage low 32511 Server disk self-test warning 32512 Device warning 32513 Device interface warning 32514 Server reboot watchdog initiated 32515 Server HA failover inhibited 32516 Server HA Active to Standby transition 32517 Server HA Standby to Active transition 32518 Platform Health Check failure 32519 NTP Offset Check failure 32520 NTP Stratum Check failure 32521 SAS Presence Sensor Missing 32522 SAS Drive Missing 32523 DRBD failover busy 32524 HP disk resync 32525 Telco Fan Warning 32526 Telco Temperature Warning 32527 Telco Power Supply Warning 32528 Invalid BIOS value 32529 Server Kernel Dump File Detected 32530 TPD Upgrade Failed 32531 Half Open Socket Warning Limit 32532 Server Upgrade Pending Accept/Reject 32533 TPD Max Number Of Running Processes Warning 32534 TPD NTP Source Is Bad Warning 32535 TPD RAID disk resync 32536 TPD Server Upgrade snapshot(s) warning 32537 FIPS subsystem warning event 32538 Platform Data Collection Error 32539 Server Patch Pending Accept/Reject 32540 CPU Power limit mismatch Table 8-2 Medium Impact Alarms
Alarm ID Name 5002 IPFE Address configuration error 5003 IPFE state sync run error 5004 IPFE IP tables configuration error 5006 Error reading from Ethernet device 5012 Signaling interface heartbeat timeout 5013 Throttling traffic 5100 Traffic overload 5101 CPU Overload 5102 Disk Becoming Full 5103 Memory Overload 10003 Database backup failed 10006 Database restoration failed 10020 Backup failure 10117 Health Check Failed 10118 Health Check Not Run 10121 Server Group Upgrade Cancelled - Validation Failed 10123 Server Group Upgrade Failed 10131 Server Upgrade Cancelled (Validation Failed) 10133 Server Upgrade Failed 10141 Site Upgrade Cancelled (Validation Failed) 10143 Site Upgrade Failed 19200 RSP/Destination unavailable 19202 Linkset unavailable 19204 Preferred route unavailable 19246 Local SCCP subsystem prohibited 19251 Ingress message rate 19252 PDU buffer pool utilization 19253 SCCP stack event queue utilization 19254 M3RL stack event queue utilization 19255 M3RL network management event queue utilization 19256 M3UA stack event queue utilization 19258 SCTP Aggregate Egress queue utilization 19251 Ingress message rate 19252 PDU buffer pool utilization 19253 SCCP stack event queue utilization 19254 M3RL stack event queue utilization 19255 M3RL network management event queue utilization 19256 M3UA stack event queue utilization 19258 SCTP Aggregate Egress queue utilization 19272 TCAP active dialogue utilization 19273 TCAP active operation utilization 19274 TCAP stack event queue utilization 19276 SCCP Egress Message Rate 19408 Single Transport Egress-Queue Utilization 19800 Communication Agent Connection Down 19803 Communication Agent stack event queue utilization 19806 Communication Agent CommMessage mempool utilization 19807 Communication Agent User Data FIFO Queue Utilization 19808 Communication Agent Connection FIFO Queue utilization 19818 Communication Agent DataEvent Mempool utilization 19820 Communication Agent Routed Service Unavailable 19824 Communication Agent Pending Transaction Utilization 19825 Communication Agent Transaction Failure Rate 19827 SMS stack event queue utilization 19846 Communication Agent Resource Degraded 19847 Communication Agent Resource Unavailable 19848 Communication Agent Resource Error 19860 Communication Agent Configuration Daemon Table Monitoring Failure 19861 Communication Agent Configuration Daemon Script Failure 19862 Communication Agent Ingress Stack Event Rate 19900 Process CPU Utilization 19905 Measurement Initialization Failure 19910 Message Discarded at Test Connection 8000-001 MpEvFsmException_SocketFailure 8000-002 MpEvFsmException_BindFailure 8000-003 MpEvFsmException_OptionFailure 8000-101 MpEvFsmException_ListenFailure 8002-003 MpEvRxException_CpuCongested 8002-004 MpEvRxException_SigEvPoolCongested 8002-006 MpEvRxException_DstMpCongested 8002-007 MpEvRxException_DrlReqQueueCongested 8002-008 MpEvRxException_DrlAnsQueueCongested 8002-009 MpEvRxException_ComAgentCongested 8002-203 MpEvRxException_RadiusMsgPoolCongested 8006-101 EvFsmException_SocketFailure 8011 EcRate 8013 MpNgnPsStateMismatch 8200 MpRadiusMsgPoolCongested 8201 RclRxTaskQueueCongested 8202 RclItrPoolCongested 8203 RclTxTaskQueueCongested 8204 RclEtrPoolCongested 22016 Peer Node Alarm Aggregation Threshold 22017 Route List Alarm Aggregation Threshold 22056 Connection Admin State Inconsistency Exists 22200 MpCpuCongested 22201 MpRxAllRate 22202 MpDiamMsgPoolCongested 22203 PTR Buffer Pool Utilization 22204 Request Message Queue Utilization 22205 Answer Message Queue Utilization 22206 Reroute Queue Utilization 22207 DclTxTaskQueueCongested 22208 DclTxConnQueueCongested 22214 Message Copy Queue Utilization 22221 Routing MPS Rate 22222 Long Timeout PTR Buffer Pool Utilization 22349 IPFE Conneetion Alarm Aggregation Threshold 22350 Fixed Connection Alarm Aggregation Threshold 22407 Routing attempt failed duto internal database inconsistency failure 22500 DSR Application Unavailable 22501 DSR Application Degraded 22502 DSR Application Request Message Queue Utilization 22503 DSR Application Answer Message Queue Utilization 22504 DSR Application Ingress Message Rate 22607 Routing attempt failed due to DRL queue exhaustion 22608 Database query could not be sent due to DB congestion 22609 Database connection exhausted 22631 FABR DP Response Task Message Queue Utilization 22632 COM Agent Registration Failure 22703 Diameter Message Routing Failure Due to Full DRL Queue 22710 SBR Sessions Threshold Exceeded 22711 SBR Database Error 22712 SBR Communication Error 22717 SBR Alternate Key Creation Failure Rate 22720 Policy SBR To PCA Response Queue Utilization Threshold Exceeded 22721 Policy and Charging Server In Congestion 22722 Policy Binding Sub-resource Unavailable 22723 Policy and Charging Session Sub-resource Unavailable 22724 SBR Memory Utilization Threshold Exceeded 22725 SBR Server In Congestion 22726 SBR Queue Utilization Threshold Exceeded 22727 SBR Initialization Failure 22728 SBR Bindings Threshold Exceeded 22729 PCRF Not Configured 22730 Policy and Charging Configuration Error 22731 Policy and Charging Database Inconsistency 22732 SBR Process CPU Utilization Threshold Exceeded 22737 Configuration Database Not Synced 22740 SBR Reconfiguration Plan Completion Failure 31100 Database replication fault 31102 Database replication from master failure 31103 DB Replication update fault 31104 DB Replication latency over threshold 31106 Database merge to parent failure 31107 Database merge from child failure 31108 Database merge latency over threshold 31113 DB replication manually disabled 31114 DB replication over SOAP has failed 31118 Database disk store fault 31121 Low disk free early warning 31122 Excessive shared memory early warning 31124 ADIC error 31126 Audit blocked 31130 Network health warning 31131 DB Ousted Throttle Behind 31134 DB Site Replication To Slave Failure 31135 DB Site Replication to Master Failure 31137 DB Site Replication Latency Over Threshold 31146 DB mastership fault 31147 DB upsynclog overrun 31200 Process management fault 31201 Process not running 31202 Unkillable zombie process 31209 Hostname lookup failed 31217 Network Health Warning 31220 HA configuration monitor fault 31113 DB replication manually disabled 31114 DB replication over SOAP has failed 31118 Database disk store fault 31121 Low disk free early warning 31122 Excessive shared memory early warning 31124 ADIC error 31126 Audit blocked 31130 Network health warning 31131 DB Ousted Throttle Behind 31134 DB Site Replication To Slave Failure 31135 DB Site Replication to Master Failure 31137 DB Site Replication Latency Over Threshold 31146 DB mastership fault 31147 DB upsynclog overrun 31200 Process management fault 31201 Process not running 31202 Unkillable zombie process 31209 Hostname lookup failed 31217 Network Health Warning 31220 HA configuration monitor fault 31221 HA alarm monitor fault 31222 HA not configured 31233 HA Heartbeat transmit failure 31224 HA configuration error 31225 HA service start failure 31226 HA availability status degraded 31228 HA standby offline 31230 Recent alarm processing fault 31231 Platform alarm agent fault 31233 HA Path Down 31234 Untrusted Time Upon Initialization 31234 Untrusted time After Initialization 31236 HA Link Down 31282 HA Management Fault 31283 Lost Communication with server 31322 HA Configuration Error 33001 Diameter-to-MAP Service Registration Failure on DA-MP 33105 Routing Attempt failed due to queue exhaustion 33120 Policy SBR Binding Sub-Resource Unavailable 33301 Update Config Data Failure 33303 U-SBR Event Queue Utilization 33310 U-SBR Sub-resource Unavailable 33312 DCA Script Generation Error 33301 Update Config Data Failure - High impact alarms