OCNADD Alarms

Alarm Types

The following table depicts the OCNADD alarm type and range:

Table 9-1 Alarm Type

Alarm Type	Reason	Range
SECURITY	Security Violation	1000-1999
COMMUNICATION	Communication Failure	2000-2999
QOS	Quality Of Service	3000-3999
PROCESSING_ERROR	Processing Error	4000-4999
OPERATIONAL_ALARMS	Operational Alarms	5000-5999

Note:

Alarm Purge or Clear Criteria:

The raised alarm persists in the database and is cleared or purged when either of the following conditions are met:

The corresponding service sends a clear alarm request to the alarm service.
The alarm is purged after the expiry of configured purge alarm timeout. The default timeout value is 7 days.

OCNADD OIDs

OCNADD OIDs are listed below:

OCNADD OID: 1.3.6.1.4.1.323.5.3.51

Table 9-2 OCNADD OID

Name	Value
ocnaddconfiguration	1.3.6.1.4.1.323.5.3.51.20
ocnaddscpaggregation ocnaddnrfaggregation	1.3.6.1.4.1.323.5.3.51.22
ocnaddalarm	1.3.6.1.4.1.323.5.3.51.24
<appname>-adapter	1.3.6.1.4.1.323.5.3.51.25
ocnaddgui	1.3.6.1.4.1.323.5.3.51
ocnadduirouter	1.3.6.1.4.1.323.5.3.51
ocnaddkafka	1.3.6.1.4.1.323.5.3.51.27
ocnaddhealthmonitoring	1.3.6.1.4.1.323.5.3.51.28
ocnaddsystem	1.3.6.1.4.1.323.5.3.51.29
ocnaddadmin	1.3.6.1.4.1.323.5.3.51.30

Alarm Details

Table 9-3 Alarm Information

Alarm Detail	Description
alarmName	Alarm Name will be constructed as OCNADDnnnnn (OCNADD followed by five digit number), e.g. OCNADD01000, where number is the alarm number for the defined alarm type.
alarmType	Type of alarm [SECURITY, COMMUNICATION, QOS, PROCESSING_ERROR, OPERATIONAL_ALARMS]
alarmSeverity	Severity of alarms as per the alarm cause [CRITICAL, MAJOR, MINOR, WARN, INFO]
alarmDescription	The alarm description shall report the specific problem for which the alarm is raised
additionalInfo	This is an optional and providing additional troubleshooting and recovery steps that user should perform on the occurrence of alarm
serviceName	Name of the service that raises the alarm
instance	Instance Id of the POD in which the alarm is raised

Communication Failure Alarms

Table 9-4 Communication Failure Alarms

alarmName	alarmType	alarmSeverity	alarmDescription	additionalInfo	serviceName	instance(POD Instance Id)
OCNADD02000: Loss of Connection	COMMUNICATION	MAJOR	Raise: Connection could not be established with the service <service_name> Clear: Connection Established again for service <service_name>		ocnaddhealthmonitoring
OCNADD02001: Loss of Heartbeat	COMMUNICATION	MINOR	Raise: Missing heartbeat from service <service_name> Clear: Heartbeat received from <service_name>	The heartbeat from a service is missed	ocnaddhealthmonitoring
OCNADD02002: Service Down	COMMUNICATION	MAJOR	Raise: Service <service_name> is down Clear: Service <service_name> is up	The service is not accessible. The configured number of continuous HBs may have been missed or the service is not connected after configured number of retries	All the services	Prometheus Alert
OCNADD02003: Kafka Broker Not Available	COMMUNICATION	CRITICAL	Raise: Service <service_name> is not able to connect to Kafka Broker Clear: Service <service_name> is able to connect to Kafka again		ocnaddadminservice
OCNADD02004: Kafka Consumption Paused	COMMUNICATION	MINOR	Raise: Kafka consumption by service <service_name> paused Raise: Kafka consumption by service <service_name> resumed	The service may have experienced connection timeout or failures from the peer end, applied circuit breaking and paused the consumption from the Kafka topic.	ocnaddadminservice
OCNADD02005: ThirdParty Connection Failure	COMMUNICATION	MAJOR	Raise: Connection to third party is failed Clear: Connection to third party is successful	Check connectivity to third party from server where Egress adapter is deployed	ocnaddconsumeradapter

Quality of Service Alarms

Table 9-5 Quality of Service Alarms

alarmName	alarmType	alarmSeverity	alarmDescription	additionalInfo	serviceName	instance(POD Instance Id)	Remarks
OCNADD03006: No Data Available	QOS	MINOR	Raise: No Data available on the Kafka Stream Clear: Data received on the Kafka Stream	Check the connectivity between producer and kafka and verify if data is generated by producers or not.	ocnaddadminservice

alarmName

alarmType

alarmSeverity

alarmDescription

additionalInfo

serviceName

instance(POD Instance Id)

Remarks

OCNADD03006: No Data Available

QOS

MINOR

Raise: No Data available on the Kafka Stream

Clear: Data received on the Kafka Stream

Check the connectivity between producer and kafka and verify if data is generated by producers or not.

ocnaddadminservice

Processing Error Alarms

Table 9-6 Processing Error Alarms

alarmName	alarmType	alarmSeverity	alarmDescription	serviceName	Remarks
OCNADD04000: Out of Memory	PROCESSING_ERROR	MAJOR	Raise: Not enough memory available for service<service_name> Clear: Memory Available to service <service_name>	All the services
OCNADD04002: CPU Overload	PROCESSING_ERROR	MAJOR	Raise: CPU usage crossed 70% service<service_name> Clear: CPU usage back to less than 70% for service <service_name>	All the services	Prometheus Alert
OCNADD04004: Storage full	PROCESSING_ERROR	MAJOR	Raise: Storage full for the service <service_name> Clear: Storage available for the service <service_name>	ocnaddhealthmonitoring
OCNADD04005: Memory overload	PROCESSING_ERROR	MAJOR	Raise: Memory usage crossed 70% service<service_name> Clear: Memory usage back to less than 70% for service <service_name>	All the services	Prometheus Alert

Operational Alarms

Table 9-7 Operational Alarms

alarmName	alarmType	alarmSeverity	alarmDescription	additionalInfo	serviceName	Remarks
OCNADD05001: POD Instance Created	OPERATIONAL_ALARM	INFO	New POD for the service <service_name> created or registered		ocnaddhealthmonitoring
OCNADD05002: POD Instance Destroyed	OPERATIONAL_ALARM	INFO	POD for the service <service_name> destroyed or de-registered		ocnaddhealthmonitoring
OCNADD05005: Max instances reached	OPERATIONAL_ALARM	INFO	Max instance reached for the service <service_name>		ocnaddhealthmonitoring
OCNADD05006: POD Restarted	OPERATIONAL_ALARM	MINOR	Raised by Prometheus when A POD for OCNADD has restarted		All services	Prometheus Alert
OCNADD05007: Ingress MPS Threshold crossed	OPERATIONAL_ALARM	WARN, MINOR, MAJOR, CRITICAL	The ingress MPS threshold crossed WARN: 80%, MINOR: 90%, MAJOR:95%, and CRITICAL:100% The threshold alerts are cleared when the traffic goes back to below set threshold alert values.		Kafka Aggregation	Prometheus Alert
OCNADD05008: Egress MPS Threshold crossed	OPERATIONAL_ALARM	WARN, MINOR, MAJOR, CRITICAL	The egress MPS threshold crossed WARN: 80%, MINOR: 90%, MAJOR:95%, and CRITICAL:100% The threshold alerts are cleared when the traffic goes back to below set threshold alert values.		ocddconsumeradapter	Prometheus Alert
OCNADD05009: Egress MPS Threshold crossed for a particular consumer application	OPERATIONAL_ALARM	CRITICAL	The egress MPS threshold crossed for a particular consumer CRITICAL:100% The threshold alerts are cleared when the traffic goes back to below set threshold alert values.		ocddconsumeradapter	Prometheus Alert
OCNADD05010: Average E2E latency threshold crossed	OPERATIONAL_ALARM	WARN, MINOR, MAJOR, CRITICAL	The average e2e latency threshold crossed WARN: 80%, MINOR: 90%, MAJOR:95%, and CRITICAL:100% The threshold alerts are cleared when the latency goes back to below set threshold alert values.		ocddconsumeradapter	Prometheus Alert
OCNADD05011: Average Ingress Packet Drop rate threshold crossed	OPERATIONAL_ALARM	MAJOR,CRITICAL	The average ingress packet drop rate threshold crossed MAJOR:1% and CRITICAL:10% The threshold alerts are cleared when the packet drop rate goes back to below set threshold alert values.		Kafka Aggregation	Prometheus Alert
OCNADD05012: Average Egress failure rate threshold crossed	OPERATIONAL_ALARM	INFO,WARN, MINOR, MAJOR, CRITICAL	The egress failure rate threshold crossed WARN:1% MINOR:10% MAJOR:25% CRITICAL:50% The threshold alerts are cleared when the failure rate goes back to below set threshold alert values.		ocddconsumeradapter	Prometheus Alert
OCNADD05013: Ingress Traffic spike threshold crossed	OPERATIONAL_ALARM	MAJOR	The Ingress traffic spike threshold crossed Major :10% Clear: The threshold alerts are cleared when the traffic spike goes back to below set threshold alert values.		Kafka Aggregation	Prometheus Alert
OCNADD050014: Topic unavailable	OPERATIONAL_ALARM	MAJOR	Raise: <TopicName> topic is not available Clear: <TopicName> topic is available	Create <TopicName> topic in kafka from Admin service.	ocddconsumeradapter, ocnaddaggregation

9 OCNADD Alarms