OCNADD Alarms

16 OCNADD Alarms

This section provides information on all the alarms generated by the OCNADD.

Alarm Types

The following table depicts the OCNADD alarm type and range:

Table 16-1 Alarm Type

Alarm Type	Reason	Range
SECURITY	Security Violation	1000-1999
COMMUNICATION	Communication Failure	2000-2999
QOS	Quality Of Service	3000-3999
PROCESSING_ERROR	Processing Error	4000-4999
OPERATIONAL_ALARMS	Operational Alarms	5000-5999

Note:

Alarm Purge or Clear Criteria:

The raised alarm persists in the database and is cleared or purged when either of the following conditions are met:

The corresponding service sends a clear alarm request to the alarm service.
The alarm is purged after the expiry of configured purge alarm timeout. The default timeout value is 7 days.

OCNADD OIDs

OCNADD OIDs are listed below:

OCNADD OID: 1.3.6.1.4.1.323.5.3.53

Table 16-2 OCNADD OID: 1.3.6.1.4.1.323.5.3.53

Name	Value
ocnaddconfiguration	1.3.6.1.4.1.323.5.3.53.1.20
ocnaddgui	1.3.6.1.4.1.323.5.3.53.1.21
ocnaddscpaggregation	1.3.6.1.4.1.323.5.3.53.1.22
ocnaddegw (deprecated)	1.3.6.1.4.1.323.5.3.53.1.23
ocnaddalarm	1.3.6.1.4.1.323.5.3.53.1.24
ocnaddadapter	1.3.6.1.4.1.323.5.3.53.1.25
ocnadduirouter	1.3.6.1.4.1.323.5.3.53.1.26
ocnaddkafka	1.3.6.1.4.1.323.5.3.53.1.27
ocnaddhealthmonitoring	1.3.6.1.4.1.323.5.3.53.1.28
ocnaddsystem	1.3.6.1.4.1.323.5.3.53.1.29
ocnaddadmin	1.3.6.1.4.1.323.5.3.53.1.30
ocnaddnrfaggregation	1.3.6.1.4.1.323.5.3.53.1.31
ocnaddseppaggregation	1.3.6.1.4.1.323.5.3.53.1.32
ocnaddcorrelation	1.3.6.1.4.1.323.5.3.53.1.33
ocnaddfilter	1.3.6.1.4.1.323.5.3.53.1.34
ocnaddredundancyagent	1.3.6.1.4.1.323.5.3.53.1.35
ocnaddingressadapter	1.3.6.1.4.1.323.5.3.53.1.36
ocnaddnonoracleaggregation	1.3.6.1.4.1.323.5.3.53.1.37
ocnaddstorageadapter	1.3.6.1.4.1.323.5.3.53.1.38
ocnaddexport	1.3.6.1.4.1.323.5.3.53.1.39
ocnaddbsfaggregation	1.3.6.1.4.1.323.5.3.53.1.40
ocnaddpcfaggregation	1.3.6.1.4.1.323.5.3.53.1.41

Alarm Details

Table 16-3 Alarm Details

Alarm Detail	Description
alarmName	The alarm Name will be constructed as OCNADDnnnnn (OCNADD followed by a five-digit number). Fro example, OCNADD01000, where the number is the alarm number for the defined alarm type.
alarmType	Type of alarm [SECURITY, COMMUNICATION, QOS, PROCESSING_ERROR, OPERATIONAL_ALARMS].
alarmSeverity	The severity of alarms as per the alarm cause [CRITICAL, MAJOR, MINOR, WARN, INFO].
alarmDescription	The alarm description shall report the specific problem for which the alarm is raised.
additionalInfo	This is optional and provides additional troubleshooting and recovery steps that the user should perform on the occurrence of an alarm.
serviceName	Name of the service raising the alarm.
instance	Instance ID of the POD in which the alarm is raised.
workerGroup	This field describes worker group or management group(i.e in case of Export service) in which affected micro services are present.

Communication Failure Alarms

Table 16-4 Communication Failure Alarms

alarmName	alarmType	alarmSeverity	alarmDescription	additionalInfo	serviceName	instance(POD Instance Id)
OCNADD02000: Loss of Connection	COMMUNICATION	MAJOR	Raise: Connection could not be established with the service <service_name> Clear: Connection Established again for service <service_name> In case of ocnaddredundancyagent, Raise: Connection could not be established with the mate redundancy agent service Clear: Connection Established again with the mate service agent ocnaddredundancyagent	In case of ocnaddredundancyagent, The alarm is raised when the configurable number of heartbeats between the mate redundancy agents are missed. The configuration sync is impacted with possible traffic disruption.	ocnaddhealthmonitoring ocnaddredundancyagent	-
OCNADD02001: Loss of Heartbeat	COMMUNICATION	MINOR	Raise: Missing heartbeat from service <service_name> Clear: Heartbeat received from <service_name> In case of ocnaddredundancyagent, Raise: Missing heartbeat from service ocnaddredundancyagent Clear: Heartbeat received from ocnaddredundancyagent	The heartbeat from a service is missed. In case of ocnaddredundancyagent, the heartbeat from mate site redundancy agent is missed.	ocnaddhealthmonitoring ocnaddredundancyagent	-
OCNADD02002: Service Down	COMMUNICATION	MAJOR	Raise: Service <service_name> is down Clear: Service <service_name> is up	The service is not accessible. The configured number of continuous HBs may have been missed or the service is not getting connected after configured retries	All the services	Prometheus Alert
OCNADD02003: Kafka Broker Not Available	COMMUNICATION	CRITICAL	Raise: Service <service_name> is not able to connect to Kafka Broker Clear: Service <service_name> is able to connect to Kafka again		ocnaddadminservice	-
OCNADD02004: Kafka Consumption Paused	COMMUNICATION	MINOR	Raise: Kafka consumption by service <service_name> paused Raise: Kafka consumption by service <service_name> resumed	The service may have experienced the connection timeout or failures from the peer end and applied the circuit breaking and paused the consumption from Kafka topic	ocnaddadminservice	-
OCNADD02005: ThirdParty Connection Failure	COMMUNICATION	MAJOR	Raise: Connection to third-party is failed Clear: Connection to third-party is successful	Check connectivity to the third-party from the server where Egress adapter is deployed	ocnaddconsumeradapter	-
OCNADD02006: Mate Site Down	COMMUNICATION	Major	Raise: Mate worker group down in the mate site Clear: Mate worker group comes up again in the mate site	<worker group name> down, the traffic is configured in <mode> and NFs should be switched to the secondary site.	ocnaddredundancyagent	-
OCNADD02007: Database not available	COMMUNICATION	Major	Raise: Database Connection Goes Down The alarm may also be triggered by the export service when the connection to the XDR database goes down. Clear: Mate Worker Group Comes Up Again in the Mate Site The alarm will also be cleared when the XDR database connection is restored.	"<worker group name>" is down, and the traffic is configured in "<mode>". NFs should be switched to the secondary site. The database connection to the XDR database is affected, triggering an alarm during the export processing	ocnaddredundancyagent ocnaddexport	-
OCNADD02008: Xdr Data Send Not Successful to Database	COMMUNICATION	Major	Raise: The alarm is triggered by the storage adapter service when the XDR cannot be written to the XDR database. Clear: The alarm is cleared when the XDRs are successfully written to the XDR database by the storage adapter service.	The storage adapter service will raise this alarm when it encounters any database-related error while writing the xDRs into the xDR database.	ocnaddstorageadapter	-
OCNADD02009: SFTP service is unreachable	COMMUNICATION	Major	Raise: The alarm is triggered by the export service when the SFTP connection fails to establish or is broken with the third-party server during the transfer of the export file. Clear: The alarm is cleared when the file transfer to the third-party storage server via SFTP is successfully restored.	The user should check the connectivity between the export service and the third-party storage server.	ocnaddexport	-

Quality of Service Alarms

Table 16-5 Quality of Service Alarms

alarmName	alarmType	alarmSeverity	alarmDescription	additionalInfo	serviceName	instance(POD Instance Id)	Remarks
OCNADD03006: No Data Available	QOS	MINOR	Raise: No Data available on the Kafka Stream Clear: Data received on the Kafka Stream	Check the connectivity between producer and kafka and verify if data is generated by producers or not.	ocnaddadminservice

alarmName

alarmType

alarmSeverity

alarmDescription

additionalInfo

serviceName

instance(POD Instance Id)

Remarks

OCNADD03006: No Data Available

QOS

MINOR

Raise: No Data available on the Kafka Stream

Clear: Data received on the Kafka Stream

Check the connectivity between producer and kafka and verify if data is generated by producers or not.

ocnaddadminservice

Processing Error Alarms

Table 16-6 Processing Error Alarms

alarmName	alarmType	alarmSeverity	alarmDescription	additionalInfo	serviceName	instance(POD Instance Id)
OCNADD04000: Out of Memory	PROCESSING_ERROR	MAJOR	Raise: Not enough memory available for service<service_name> Clear: Memory Available to service <service_name>	-	All the services	-
OCNADD04002: CPU Overload	PROCESSING_ERROR	MAJOR	Raise: CPU usage crossed 70% service<service_name> Clear: CPU usage back to less than 70% for service <service_name>	-	All the services	Prometheus Alert
OCNADD04004: Storage full	PROCESSING_ERROR	MAJOR	Raise: Storage full for the service <service_name> Clear: Storage available for the service <service_name>	The alarm may be raised by the export service when the third-party SFTP server's storage is full	ocnaddhealthmonitoring ocnaddexport	-
OCNADD04005: Memory overload	PROCESSING_ERROR	MAJOR	Raise: Memory usage crossed 70% service<service_name> Clear: Memory usage back to less than 70% for service <service_name>	-	All the services	Prometheus Alert

Operational Alarms

Table 16-7 Operational Alarms

alarmName	alarmType	alarmSeverity	alarmDescription	additionalInfo	serviceName	instance (POD Instance ID)
OCNADD05001: POD Instance Created	OPERATIONAL_ALARM	INFO	New POD for the service <service_name> created/registered	-	ocnaddhealthmonitoring	-
OCNADD05002: POD Instance Destroyed	OPERATIONAL_ALARM	INFO	POD for the service <service_name> destroyed/de-registerd	-	ocnaddhealthmonitoring	-
OCNADD05005: Max instances reached	OPERATIONAL_ALARM	INFO	Max instance reached for the service <service_name>	-	ocnaddhealthmonitoring	-
OCNADD05006: POD Restarted	OPERATIONAL_ALARM	MINOR	Raised by Prometheus when A POD for OCNADD has restarted.	-	All services	Prometheus Alert
OCNADD05007: Ingress MPS Threshold crossed	OPERATIONAL_ALARM	WARN, MINOR, MAJOR, CRITICAL	The ingress MPS threshold crossed WARN: 80% MINOR: 90% MAJOR:95% CRITICAL:100% Clear: The threshold alerts are cleared when the traffic goes back to below set threshold alert values.	-	Kafka Aggregation	Prometheus Alert
OCNADD05008: Egress MPS Threshold crossed	OPERATIONAL_ALARM	WARN, MINOR, MAJOR, CRITICAL	The egress MPS threshold crossed WARN: 80% MINOR: 90% MAJOR:95% CRITICAL:100% Clear: The threshold alerts are cleared when the traffic goes back to below set threshold alert values.	-	ocddconsumeradapter	Prometheus Alert
OCNADD05009: Egress MPS Threshold crossed for a particular consumer application	OPERATIONAL_ALARM	CRITICAL	The egress MPS threshold crossed for a particular consumer CRITICAL:100% Clear: The threshold alerts are cleared when the traffic goes back to below set threshold alert values.	-	ocddconsumeradapter	Prometheus Alert
OCNADD05010: Average E2E latency threshold crossed	OPERATIONAL_ALARM	WARN, MINOR, MAJOR, CRITICAL	The average e2e latency threshold crossed WARN: 80% MINOR: 90% MAJOR:95% CRITICAL:100% Clear: The threshold alerts are cleared when the latency goes back to below set threshold alert values.	-	ocddconsumeradapter	Prometheus Alert
OCNADD05011: Average Ingress Packet Drop rate threshold crossed	OPERATIONAL_ALARM	MAJOR,CRITICAL	The average ingress packet drop rate threshold crossed MAJOR:1% CRITICAL:10% Clear: The threshold alerts are cleared when the packet drop rate goes back to below set threshold alert values.	-	Kafka Aggregation	Prometheus Alert
OCNADD05012: Average Egress failure rate threshold crossed	OPERATIONAL_ALARM	INFO,WARN, MINOR, MAJOR, CRITICAL	The egress failure rate threshold crossed WARN:1% MINOR:10% MAJOR:25% CRITICAL:50% Clear: The threshold alerts are cleared when the failure rate goes back to below set threshold alert values.	-	ocddconsumeradapter	Prometheus Alert
OCNADD05013: Ingress Traffic spike threshold crossed	OPERATIONAL_ALARM	MAJOR	The Ingress traffic spike threshold crossed Major :10% Clear: The threshold alerts are cleared when the traffic spike goes back to below set threshold alert values.	-	Kafka Aggregation	Prometheus Alert
OCNADD050014: Topic unavailable:<TopicName>	OPERATIONAL_ALARM	MAJOR	Raise: <TopicName> topic is not available Clear: <TopicName> topic is available	Create <TopicName> topic in Kafka from Admin service.	ocddconsumeradapter ocnaddaggregation, ocnaddstorageadapter ocnaddingressadapter	-
OCNADD05015: Worker Group Created	OPERATIONAL_ALARM	INFO	New worker group <workerGroup> created	-	ocnaddsystem	-
OCNADD05016: Worker Group Deleted	OPERATIONAL_ALARM	WARN	Worker group <workerGroup> deleted	-	ocnaddsystem	-
OCNADD05017: Max Worker Groups Reached	OPERATIONAL_ALARM	WARN	Number of worker groups crossed threshold value =<Threshold Value>% of maximum worker groups supported	-	ocnaddsystem	-
OCNADD050018: Consumer Feed Configuration Sync Discrepancy	OPERATIONAL_ALARM	MAJOR	Raise: The consumer feed configuration mismatch between mated worker group pair. Clear: User Needs to Manually clear Discrepancy alarm	The <consumer feed name>, is not matching for the <parameters> in <worker group name> and <mate worker group name>	ocnaddredundancyagent	-
OCNADD050019: Kafka Feed Configuration Sync Discrepancy	OPERATIONAL_ALARM	MAJOR	Raise: The kafka feed configuration mismatch between mated worker group pair. Clear: User Needs to Manually clear Discrepancy alarm	The <kafka feed name>, is not matching for the <parameters> in <worker group name> and <mate worker group name>	ocnaddredundancyagent	-
OCNADD050020: Filter Configuration Sync Discrepancy	OPERATIONAL_ALARM	MAJOR	Raise: The filter configuration mismatch between mated worker group pair. Clear: User Needs to Manually clear Discrepancy alarm	The <filter name>, is not matching for the <parameters> in <worker group name> and <mate worker group name>	ocnaddredundancyagent	-
OCNADD050021: Correlation Configuration Sync Discrepancy	OPERATIONAL_ALARM	MAJOR	Raise: The correlation configuration mismatch between mated worker group pair. Clear: User Needs to Manually clear Discrepancy alarm	The <correlation config name>, is not matching for the <parameters> in <worker group name> and <mate worker group name>	ocnaddredundancyagent	-
OCNADD050022: Mate configuration failure	OPERATIONAL_ALARM	MAJOR	Raise: The mate configuration could not be added between the primary and secondary worker group mated pairs. Clear: User Needs to Manually clear alarm if Mate site configuration issue is fixed	<worker group name> and <mate worker group name> mate configuration creation failure	ocnaddredundancyagent	-
OCNADD050023: Configuration Out of Sync	OPERATIONAL_ALARM	INFO	Raise: The mate configuration set to Unidirectional and Secondary Site config is updated. Clear: User Needs to Manually clear alarm when configuration is fixed.	<Sync Config Name> <Sync Config Type>configuration in <worker group> worker group may be inconsistent. To verify run sync from the Redundancy configuration	ocnaddredundancyagent	-
OCNADD050024: Same third-party URI for Consumer Feeds	OPERATIONAL_ALARM	MINOR	Raise: The Consumer Feed in Primary and Secondary sites with different name has same third-party URI endpoint, then traffic received at third-party will be more. Clear: User Needs to Manually clear alarm	<Consumer Feed Name> and <worker group name> worker group Consumer Feed sync triggered Same third-party URI alarm	ocnaddredundancyagent	-
OCNADD050025: File Server Credentials unavailable	OPERATIONAL_ALARM	MAJOR	Raise: The alarm is raised by the export service when the credentials of the third-party export server is missing or could not be reterieved Clear: The alarm is cleared when the credentials are available again	There may be an issue in fetching the credentials of the third-party server, ensure if the communication between all the ocnadd services is fine.	ocnaddexport	-
OCNADD050026: No Data available for export	OPERATIONAL_ALARM	MINOR	Raise: The alarm is raised by the export service when there is no data available for the export in XDR database. Clear: The alarm is cleared when the data is available in the XDR database	The database should be checked for the presence of the data or query should be checked.	ocnaddexport	-