16 OCNADD Alarms
This section provides information on all the alarms generated by the OCNADD.
Alarm Types
The following table depicts the OCNADD alarm type and range:
Table 16-1 Alarm Type
Alarm Type | Reason | Range |
---|---|---|
SECURITY | Security Violation | 1000-1999 |
COMMUNICATION | Communication Failure | 2000-2999 |
QOS | Quality Of Service | 3000-3999 |
PROCESSING_ERROR | Processing Error | 4000-4999 |
OPERATIONAL_ALARMS | Operational Alarms | 5000-5999 |
Note:
Alarm Purge or Clear Criteria:The raised alarm persists in the database and is cleared or purged when either of the following conditions are met:
- The corresponding service sends a clear alarm request to the alarm service.
- The alarm is purged after the expiry of configured purge alarm timeout. The default timeout value is 7 days.
OCNADD OIDs
OCNADD OIDs are listed below:
OCNADD OID: 1.3.6.1.4.1.323.5.3.53
Table 16-2 OCNADD OID: 1.3.6.1.4.1.323.5.3.53
Name | Value |
---|---|
ocnaddconfiguration | 1.3.6.1.4.1.323.5.3.53.1.20 |
ocnaddgui | 1.3.6.1.4.1.323.5.3.53.1.21 |
ocnaddscpaggregation | 1.3.6.1.4.1.323.5.3.53.1.22 |
ocnaddegw (deprecated) | 1.3.6.1.4.1.323.5.3.53.1.23 |
ocnaddalarm | 1.3.6.1.4.1.323.5.3.53.1.24 |
ocnaddadapter | 1.3.6.1.4.1.323.5.3.53.1.25 |
ocnadduirouter | 1.3.6.1.4.1.323.5.3.53.1.26 |
ocnaddkafka | 1.3.6.1.4.1.323.5.3.53.1.27 |
ocnaddhealthmonitoring | 1.3.6.1.4.1.323.5.3.53.1.28 |
ocnaddsystem | 1.3.6.1.4.1.323.5.3.53.1.29 |
ocnaddadmin | 1.3.6.1.4.1.323.5.3.53.1.30 |
ocnaddnrfaggregation | 1.3.6.1.4.1.323.5.3.53.1.31 |
ocnaddseppaggregation | 1.3.6.1.4.1.323.5.3.53.1.32 |
ocnaddcorrelation | 1.3.6.1.4.1.323.5.3.53.1.33 |
ocnaddfilter | 1.3.6.1.4.1.323.5.3.53.1.34 |
ocnaddredundancyagent | 1.3.6.1.4.1.323.5.3.53.1.35 |
ocnaddingressadapter | 1.3.6.1.4.1.323.5.3.53.1.36 |
ocnaddnonoracleaggregation | 1.3.6.1.4.1.323.5.3.53.1.37 |
ocnaddstorageadapter | 1.3.6.1.4.1.323.5.3.53.1.38 |
ocnaddexport | 1.3.6.1.4.1.323.5.3.53.1.39 |
ocnaddbsfaggregation | 1.3.6.1.4.1.323.5.3.53.1.40 |
ocnaddpcfaggregation | 1.3.6.1.4.1.323.5.3.53.1.41 |
Alarm Details
Table 16-3 Alarm Details
Alarm Detail | Description |
---|---|
alarmName | The alarm Name will be constructed as OCNADDnnnnn (OCNADD followed by a five-digit number). Fro example, OCNADD01000, where the number is the alarm number for the defined alarm type. |
alarmType | Type of alarm [SECURITY, COMMUNICATION, QOS, PROCESSING_ERROR, OPERATIONAL_ALARMS]. |
alarmSeverity | The severity of alarms as per the alarm cause [CRITICAL, MAJOR, MINOR, WARN, INFO]. |
alarmDescription | The alarm description shall report the specific problem for which the alarm is raised. |
additionalInfo | This is optional and provides additional troubleshooting and recovery steps that the user should perform on the occurrence of an alarm. |
serviceName | Name of the service raising the alarm. |
instance | Instance ID of the POD in which the alarm is raised. |
workerGroup | This field describes worker group or management group(i.e in case of Export service) in which affected micro services are present. |
Communication Failure Alarms
Table 16-4 Communication Failure Alarms
alarmName | alarmType | alarmSeverity | alarmDescription | additionalInfo | serviceName | instance(POD Instance Id) |
---|---|---|---|---|---|---|
OCNADD02000: Loss of Connection | COMMUNICATION | MAJOR |
Raise: Connection could not be established with the service <service_name> Clear: Connection Established again for service <service_name> In case of ocnaddredundancyagent, Raise: Connection could not be established with the mate redundancy agent service Clear: Connection Established again with the mate service agent ocnaddredundancyagent |
In case of ocnaddredundancyagent, The alarm is raised when the configurable number of heartbeats between the mate redundancy agents are missed. The configuration sync is impacted with possible traffic disruption. |
ocnaddhealthmonitoring ocnaddredundancyagent |
- |
OCNADD02001: Loss of Heartbeat | COMMUNICATION | MINOR |
Raise: Missing heartbeat from service <service_name> Clear: Heartbeat received from <service_name> In case of ocnaddredundancyagent, Raise: Missing heartbeat from service ocnaddredundancyagent Clear: Heartbeat received from ocnaddredundancyagent |
The heartbeat from a service is missed. In case of ocnaddredundancyagent, the heartbeat from mate site redundancy agent is missed. |
ocnaddhealthmonitoring ocnaddredundancyagent |
- |
OCNADD02002: Service Down | COMMUNICATION | MAJOR |
Raise: Service <service_name> is down Clear: Service <service_name> is up |
The service is not accessible. The configured number of continuous HBs may have been missed or the service is not getting connected after configured retries | All the services | Prometheus Alert |
OCNADD02003: Kafka Broker Not Available | COMMUNICATION | CRITICAL |
Raise: Service <service_name> is not able to connect to Kafka Broker Clear: Service <service_name> is able to connect to Kafka again |
ocnaddadminservice | - | |
OCNADD02004: Kafka Consumption Paused | COMMUNICATION | MINOR |
Raise: Kafka consumption by service <service_name> paused Raise: Kafka consumption by service <service_name> resumed |
The service may have experienced the connection timeout or failures from the peer end and applied the circuit breaking and paused the consumption from Kafka topic | ocnaddadminservice | - |
OCNADD02005: ThirdParty Connection Failure | COMMUNICATION | MAJOR |
Raise: Connection to third-party is failed Clear: Connection to third-party is successful |
Check connectivity to the third-party from the server where Egress adapter is deployed | ocnaddconsumeradapter | - |
OCNADD02006: Mate Site Down | COMMUNICATION | Major |
Raise: Mate worker group down in the mate site Clear: Mate worker group comes up again in the mate site |
<worker group name> down, the traffic is configured in <mode> and NFs should be switched to the secondary site. | ocnaddredundancyagent | - |
OCNADD02007: Database not available | COMMUNICATION | Major |
Raise: Database Connection Goes Down The alarm may also be triggered by the export service when the connection to the XDR database goes down. Clear: Mate Worker Group Comes Up Again in the Mate Site The alarm will also be cleared when the XDR database connection is restored. |
"<worker group name>" is down, and the traffic is configured in "<mode>". NFs should be switched to the secondary site. The database connection to the XDR database is affected, triggering an alarm during the export processing |
ocnaddredundancyagent ocnaddexport |
- |
OCNADD02008: Xdr Data Send Not Successful to Database | COMMUNICATION | Major |
Raise: The alarm is triggered by the storage adapter service when the XDR cannot be written to the XDR database. Clear: The alarm is cleared when the XDRs are successfully written to the XDR database by the storage adapter service. |
The storage adapter service will raise this alarm when it encounters any database-related error while writing the xDRs into the xDR database. | ocnaddstorageadapter | - |
OCNADD02009: SFTP service is unreachable | COMMUNICATION | Major |
Raise: The alarm is triggered by the export service when the SFTP connection fails to establish or is broken with the third-party server during the transfer of the export file. Clear: The alarm is cleared when the file transfer to the third-party storage server via SFTP is successfully restored. |
The user should check the connectivity between the export service and the third-party storage server. | ocnaddexport | - |
Quality of Service Alarms
Table 16-5 Quality of Service Alarms
alarmName | alarmType | alarmSeverity | alarmDescription | additionalInfo | serviceName | instance(POD Instance Id) | Remarks |
---|---|---|---|---|---|---|---|
OCNADD03006: No Data Available | QOS | MINOR |
Raise: No Data available on the Kafka Stream Clear: Data received on the Kafka Stream |
Check the connectivity between producer and kafka and verify if data is generated by producers or not. |
ocnaddadminservice |
Processing Error Alarms
Table 16-6 Processing Error Alarms
alarmName | alarmType | alarmSeverity | alarmDescription | additionalInfo | serviceName | instance(POD Instance Id) |
---|---|---|---|---|---|---|
OCNADD04000: Out of Memory | PROCESSING_ERROR | MAJOR |
Raise: Not enough memory available for service<service_name> Clear: Memory Available to service <service_name> |
- | All the services | - |
OCNADD04002: CPU Overload | PROCESSING_ERROR | MAJOR |
Raise: CPU usage crossed 70% service<service_name> Clear: CPU usage back to less than 70% for service <service_name> |
- | All the services | Prometheus Alert |
OCNADD04004: Storage full | PROCESSING_ERROR | MAJOR |
Raise: Storage full for the service <service_name> Clear: Storage available for the service <service_name> |
The alarm may be raised by the export service when the third-party SFTP server's storage is full |
ocnaddhealthmonitoring ocnaddexport |
- |
OCNADD04005: Memory overload | PROCESSING_ERROR | MAJOR |
Raise: Memory usage crossed 70% service<service_name> Clear: Memory usage back to less than 70% for service <service_name> |
- | All the services | Prometheus Alert |
Operational Alarms
Table 16-7 Operational Alarms
alarmName | alarmType | alarmSeverity | alarmDescription | additionalInfo | serviceName | instance (POD Instance ID) |
---|---|---|---|---|---|---|
OCNADD05001: POD Instance Created | OPERATIONAL_ALARM | INFO | New POD for the service <service_name> created/registered | - | ocnaddhealthmonitoring | - |
OCNADD05002: POD Instance Destroyed | OPERATIONAL_ALARM | INFO | POD for the service <service_name> destroyed/de-registerd | - | ocnaddhealthmonitoring | - |
OCNADD05005: Max instances reached | OPERATIONAL_ALARM | INFO | Max instance reached for the service <service_name> | - | ocnaddhealthmonitoring | - |
OCNADD05006: POD Restarted | OPERATIONAL_ALARM | MINOR | Raised by Prometheus when A POD for OCNADD has restarted. | - | All services | Prometheus Alert |
OCNADD05007: Ingress MPS Threshold crossed | OPERATIONAL_ALARM | WARN, MINOR, MAJOR, CRITICAL |
The ingress MPS threshold crossed WARN: 80% MINOR: 90% MAJOR:95% CRITICAL:100% Clear: The threshold alerts are cleared when the traffic goes back to below set threshold alert values. |
- | Kafka Aggregation | Prometheus Alert |
OCNADD05008: Egress MPS Threshold crossed | OPERATIONAL_ALARM | WARN, MINOR, MAJOR, CRITICAL |
The egress MPS threshold crossed WARN: 80% MINOR: 90% MAJOR:95% CRITICAL:100% Clear: The threshold alerts are cleared when the traffic goes back to below set threshold alert values. |
- | ocddconsumeradapter | Prometheus Alert |
OCNADD05009: Egress MPS Threshold crossed for a particular consumer application | OPERATIONAL_ALARM | CRITICAL |
The egress MPS threshold crossed for a particular consumer CRITICAL:100% Clear: The threshold alerts are cleared when the traffic goes back to below set threshold alert values. |
- | ocddconsumeradapter | Prometheus Alert |
OCNADD05010: Average E2E latency threshold crossed | OPERATIONAL_ALARM | WARN, MINOR, MAJOR, CRITICAL |
The average e2e latency threshold crossed WARN: 80% MINOR: 90% MAJOR:95% CRITICAL:100% Clear: The threshold alerts are cleared when the latency goes back to below set threshold alert values. |
- | ocddconsumeradapter | Prometheus Alert |
OCNADD05011: Average Ingress Packet Drop rate threshold crossed | OPERATIONAL_ALARM | MAJOR,CRITICAL |
The average ingress packet drop rate threshold crossed MAJOR:1% CRITICAL:10% Clear: The threshold alerts are cleared when the packet drop rate goes back to below set threshold alert values. |
- | Kafka Aggregation | Prometheus Alert |
OCNADD05012: Average Egress failure rate threshold crossed | OPERATIONAL_ALARM | INFO,WARN, MINOR, MAJOR, CRITICAL |
The egress failure rate threshold crossed WARN:1% MINOR:10% MAJOR:25% CRITICAL:50% Clear: The threshold alerts are cleared when the failure rate goes back to below set threshold alert values. |
- | ocddconsumeradapter | Prometheus Alert |
OCNADD05013: Ingress Traffic spike threshold crossed | OPERATIONAL_ALARM | MAJOR |
The Ingress traffic spike threshold crossed Major :10% Clear: The threshold alerts are cleared when the traffic spike goes back to below set threshold alert values. |
- | Kafka Aggregation | Prometheus Alert |
OCNADD050014: Topic unavailable:<TopicName> | OPERATIONAL_ALARM | MAJOR |
Raise: <TopicName> topic is not available Clear: <TopicName> topic is available |
Create <TopicName> topic in Kafka from Admin service. |
ocddconsumeradapter ocnaddaggregation, ocnaddstorageadapter ocnaddingressadapter |
- |
OCNADD05015: Worker Group Created | OPERATIONAL_ALARM | INFO | New worker group <workerGroup> created | - | ocnaddsystem | - |
OCNADD05016: Worker Group Deleted | OPERATIONAL_ALARM | WARN | Worker group <workerGroup> deleted | - | ocnaddsystem | - |
OCNADD05017: Max Worker Groups Reached | OPERATIONAL_ALARM | WARN | Number of worker groups crossed threshold value =<Threshold Value>% of maximum worker groups supported | - | ocnaddsystem | - |
OCNADD050018: Consumer Feed Configuration Sync Discrepancy | OPERATIONAL_ALARM | MAJOR |
Raise: The consumer feed configuration mismatch between mated worker group pair. Clear: User Needs to Manually clear Discrepancy alarm |
The <consumer feed name>, is not matching for the <parameters> in <worker group name> and <mate worker group name> | ocnaddredundancyagent | - |
OCNADD050019: Kafka Feed Configuration Sync Discrepancy | OPERATIONAL_ALARM | MAJOR |
Raise: The kafka feed configuration mismatch between mated worker group pair. Clear: User Needs to Manually clear Discrepancy alarm |
The <kafka feed name>, is not matching for the <parameters> in <worker group name> and <mate worker group name> | ocnaddredundancyagent | - |
OCNADD050020: Filter Configuration Sync Discrepancy | OPERATIONAL_ALARM | MAJOR |
Raise: The filter configuration mismatch between mated worker group pair. Clear: User Needs to Manually clear Discrepancy alarm |
The <filter name>, is not matching for the <parameters> in <worker group name> and <mate worker group name> | ocnaddredundancyagent | - |
OCNADD050021: Correlation Configuration Sync Discrepancy | OPERATIONAL_ALARM | MAJOR |
Raise: The correlation configuration mismatch between mated worker group pair. Clear: User Needs to Manually clear Discrepancy alarm |
The <correlation config name>, is not matching for the <parameters> in <worker group name> and <mate worker group name> | ocnaddredundancyagent | - |
OCNADD050022: Mate configuration failure | OPERATIONAL_ALARM | MAJOR |
Raise: The mate configuration could not be added between the primary and secondary worker group mated pairs. Clear: User Needs to Manually clear alarm if Mate site configuration issue is fixed |
<worker group name> and <mate worker group name> mate configuration creation failure | ocnaddredundancyagent | - |
OCNADD050023: Configuration Out of Sync | OPERATIONAL_ALARM | INFO |
Raise: The mate configuration set to Unidirectional and Secondary Site config is updated. Clear: User Needs to Manually clear alarm when configuration is fixed. |
<Sync Config Name> <Sync Config Type>configuration in <worker group> worker group may be inconsistent. To verify run sync from the Redundancy configuration | ocnaddredundancyagent | - |
OCNADD050024: Same third-party URI for Consumer Feeds | OPERATIONAL_ALARM | MINOR |
Raise: The Consumer Feed in Primary and Secondary sites with different name has same third-party URI endpoint, then traffic received at third-party will be more. Clear: User Needs to Manually clear alarm |
<Consumer Feed Name> and <worker group name> worker group Consumer Feed sync triggered Same third-party URI alarm | ocnaddredundancyagent | - |
OCNADD050025: File Server Credentials unavailable | OPERATIONAL_ALARM | MAJOR |
Raise: The alarm is raised by the export service when the credentials of the third-party export server is missing or could not be reterieved Clear: The alarm is cleared when the credentials are available again |
There may be an issue in fetching the credentials of the third-party server, ensure if the communication between all the ocnadd services is fine. | ocnaddexport | - |
OCNADD050026: No Data available for export | OPERATIONAL_ALARM | MINOR |
Raise: The alarm is raised by the export service when there is no data available for the export in XDR database. Clear: The alarm is cleared when the data is available in the XDR database |
The database should be checked for the presence of the data or query should be checked. | ocnaddexport | - |