OCNRF Metrics, KPIs, and Alerts

6 OCNRF Metrics, KPIs, and Alerts

OCNRF Metrics

This section includes information about Metrics for Oracle Communications Network Repository Function.

Note:

Sample OCNRF dashboard for Grafana is delivered to the customer through OCNRF Custom Templates. Metrics and functions used to achieve KPI are covered in OCNRF Custom Templates. Refer to Oracle Help Center site for the information about OCNRF Custom Templates.

Dimensions Legend for the Metrics

The following table includes the details about the metrics dimensions:

Table 6-1 Dimensions Legend

Dimension	Details
Method	HTTP Method Name. For Example:- PUT, GET
Status	HTTP Status Code in response
Uri	URI defined to identify the Service Operation at Ingress Gateway
Node	Name of the kubernetes worker node on which microservice is running
NrfLevel	OCNRF Deployment Name by which OCNRF can be identified, it will be OCNRF Instance Id passed through helm
NfType	Types of Network Functions (NF)
NfInstanceId	Unique identity of the NF Instance sending request to OCNRF
HttpStatusCode	HTTP Status Code
ServiceName	Name of the service instance (e.g. "nudm-sdm")
ServiceInstanceId	Unique ID of the service instance within a given NF Instance
UpdateType(Partial/Complete)	NF Update with PUT (Complete) or PATCH (Partial) methods
OperationType	Dimension is for NFSubscribe Service operation to tell if the request is to create or update the subscription
NotificationEventType	This dimension indicates subscription request is for which event types. For example:- NF_REGISTERED, NF_DEREGISTERED and NF_PROFILE_CHANGED
TargetNfType	Dimension indicates request is for which target NF type
RequesterNfType	Dimension indicates the NF type which originating the request. This value comes from UserAgent header. For NFDiscover Service operation it is taken from Search Query. In case no header or value, this value will be UNKNOWN in the metrics
TargetNfInstanceId	Dimension indicates the target NF Instance Id for NF Access Token
ClientNfInstanceId	Dimension indicates the client NF Instance Id for NF Access Token
RejectionReason	Dimension indicates the rejection reason for NF Access Token
SubscriptionIdType	Dimension indicates the Subscription Id type for which SLF query is received
GroupId	Dimension indicates the GroupId returned by SLF/UDR corresponding to SubscriptionId
BucketSize	Dimension indicates how many profiles are returned in the response of Discovery request. Range is not configurable. Possible values are 0-10, +Inf. According to NF profiles returned, corresponding bucket will be incremented by one. For example, if 2 profiles are returned, then bucket 2 will be incremented by one. Profiles getting returned more than 10 will fall in +Inf bucket.
DBOperation	Create,update,delete and find
TableName	OCNRF Table Name
SubscriptionStatus	Status of subscription shall be 'SUBSCRIBED', 'SUSPENDED' or 'UNSUBSCRIBED'
DbReplicationStatus	"ACTIVE" or "INACTIVE"
RemoteNrfInstanceId	Remote OCNRF Instance Id
HeartbeatTimer	The heartbeatTimer of the NfProfile. The value is considered in seconds.

Table 6-2 OCNRF Metrics

Sl. No#	Metric Name	Metric Details	Metric filter	Recommended legend to see dimension level data (as applicable)	Dimensions
1	Total number of ingress requests	Total number of requests received at OCNRF	oc_ingressgateway_http_requests_total
2	NF Register Success	Total number of successful NFRegister service operations at OCNRF	oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".nnrf-nfm/v1/nf-instances.",Method="PUT"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
3	NF Update Success (Complete Replacement)	Total number of successful NFUpdate service operations at OCNRF	oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".nnrf-nfm/v1/nf-instances.",Method="PUT"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
4	NF Update Success (Partial Replacement)	Total number of successful NFUpdate service operations at OCNRF	oc_ingressgateway_http_responses_total{Status=~".2.",Route_path=~".nnrf-nfm/v1/nf-instances.",Method="PATCH"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
5	NF List/Profile Retrieval Success	Total number of successful NF List/Profile retrieval service operations at OCNRF	oc_ingressgateway_http_responses_total{Status=~".2.",Route_path=~".nnrf-nfm/v1/nf-instances.",Method="GET"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
6	Access Token Success	Total number of successful Access Token service operations at OCNRF	oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~"./oauth2/token."}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the Kubernetes worker node on which micro-service is running
7	NF De-register Success	Total number of successful service operations at OCNRF	oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".nnrf-nfm/v1/nf-instances.",Method="DELETE"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the Kubernetes worker node on which micro-service is running
8	NF Subscribe Success	Total number of successful NFSubscribe service operations at OCNRF	oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".nnrf-nfm/v1/subscriptions.",Method="POST"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the Kubernetes worker node on which micro-service is running
9	NF Unsubscribe Success	Total number of successful NFUnSubscribe service operations at OCNRF	oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".nnrf-nfm/v1/subscriptions.",Method="DELETE"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the Kubernetes worker node on which micro-service is running
10	NF Discover Success	Total number of successful NFDiscover service operations at OCNRF	oc_ingressgateway_http_responses_total{Status=~"2.",Route_path=~".nnrf-disc/v1/nf-instances.*",Method="GET"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the Kubernetes worker node on which micro-service is running
11	4xx Responses (NF-Instances)	Total number of 4xx responses(NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval)	oc_ingressgateway_http_responses_total{Status=~"4.",Route_path=~".nnrf-nfm/v1/nf-instances.*"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
12	4xx Responses (Subscriptions)	Total number of 4xx responses(NfSubscribe/NfUnsubscribe)	oc_ingressgateway_http_responses_total{Status=~"4.",Route_path=~".nnrf-nfm/v1/subscriptions.*"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
13	4xx Responses (Discovery)	Total number of 4xx responses(NfDiscover)	oc_ingressgateway_http_responses_total{Status=~"4.",Route_path=~".nnrf-disc/v1/nf-instances.*"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
14	4xx Responses (AccessToken)	Total number of 4xx responses(NfAccessToken)	oc_ingressgateway_http_responses_total{Status=~"4.",Route_path=~".oauth2/token.*"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
15	5xx Responses (NF-Instances)	Total number of 5xx responses(NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval)	oc_ingressgateway_http_responses_total{Status=~"5.",Route_path=~".nnrf-nfm/v1/nf-instances.*"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
16	5xx Responses (Subscriptions)	Total number of 5xx responses(NfSubscribe/NfUnsubscribe)	oc_ingressgateway_http_responses_total{Status=~"5.",Route_path=~".nnrf-nfm/v1/subscriptions.*"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
17	5xx Responses (Discovery)	Total number of 5xx responses(NfDiscover)	oc_ingressgateway_http_responses_total{Status=~"5.",Route_path=~".nnrf-disc/v1/nf-instances.*"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
18	5xx Responses (AccessToken)	Total number of 5xx responses(NfAccessToken)	oc_ingressgateway_http_responses_total{Status=~"5.",Route_path=~".oauth2/token.*"}		Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running
19	NfRegistrations Total	Number of Registration Requests received	ocnrf_nfRegister_rx_requests_total	NfRegistrations Total	NrfLevel NfInstanceId RequesterNfType
20	NfRegistrations Responses Total	Number of Registration Responses sent.	ocnrf_nfRegister_tx_responses_total	NfRegistrations Responses Total	NrfLevel NfInstanceId RequesterNfType HttpStatusCode
21	NfRegistrations Per Service Total	Number of Registrations received and processed successfully per Service.	ocnrf_nfRegister_rx_requests_success_perService_total	NfRegistrations Per Service [ serviceName :- {{ serviceName }}, nfInstanceId :- {{NfInstanceId}} ]	NrfLevel NfInstanceId ServiceName ServiceInstanceId
22	NFUpdates Total	Number of Update Requests received.	ocnrf_nfUpdate_rx_requests_total	NfUpdates Total	NrfLevel NfInstanceId RequesterNfType UpdateType(Partial/Complete)
23	NFUpdates Responses Total	Number of Update Responses sent.	ocnrf_nfUpdate_tx_responses_total	NfUpdates Responses Total	NrfLevel NfInstanceId RequesterNfType UpdateType(Partial/Complete) HttpStatusCode
24	NFUpdates Per Service Total	Number of NfUpdates received and processed successfully per Service.	ocnrf_nfUpdate_rx_requests_success_perService_total	NFUpdates Per Service [ serviceName :- {{ serviceName }}, serviceInstanceId:- {{ServiceInstanceId}} ]	NrfLevel, Updatetype =(Partial/Complete), NfInstanceId, ServiceName, ServiceInstanceId
25	Heartbeat Requests Total	Number of Heartbeat Requests received	ocnrf_nfHeartbeat_rx_requests_total		NrfLevel NfInstanceId RequesterNfType
26	Heartbeat Resposnes Total	Number of Heartbeat Responses sent	ocnrf_nfHeartbeat_tx_responses_total		Nrflevel, NfInstanceId, RequesterNfType , HttpStatusCode
27	NF De-Registration Requests Total	Number of De-registration requests received	ocnrf_nfDeregister_rx_requests_total		NrfLevel, NfInstanceId, RequesterNfType
28	NF De-Registration Responses Total	Number of De-registration responses sent	ocnrf_nfDeregister_tx_responses_total		NrfLevel, NfInstanceId, RequesterNfType , HttpStatusCode
29	NF De-Registrations Per Service Total	Number of De-registration requests received and process successfully per Service	ocnrf_nfDeregister_rx_requests_success_perService_total	NFDeregistration Per Service [ serviceName :- {{ serviceName }}, serviceInstanceId:- {{ServiceInstanceId}} ]	NrfLevel, ServiceName, ServiceInstanceId, NfInstanceId
30	NF List Retrieval Requests Total	Number of NFListRetrieval requests received	ocnrf_nfListRetrieval_rx_requests_total		NrfLevel, RequesterNfType
31	NF List Retrieval Responses Total	Number of NFListRetrieval responses sent	ocnrf_nfListRetrieval_tx_responses_total		NrfLevel, RequesterNfType , HttpStatusCode
32	NF Profile Retrieval Requests Total	Number of NFProfileRetrieval requests received	ocnrf_nfProfileRetrieval_rx_requests_total		NrfLevel, NfInstanceId
33	NF Profile Retrieval Responses Total	Number of NFProfileRetrieval responses sent	ocnrf_nfProfileRetrieval_tx_responses_total		NrfLevel, NfInstanceId, HttpStatusCode
34	Number of Heartbeats missed	Number of heartbeats missed.	ocnrf_heartbeat_missed_total		NrfLevel, RequesterNfType , NfInstanceId
35	NF Status Subscribe Requests Total	Number of NStatusSubscribe requests received	ocnrf_nfStatusSubscribe_rx_requests_total		NrfLevel, RequesterNfType, OperationType
36	NF Status Subscribe Responses Total	Number of NfStatusSubscribe responses sent	ocnrf_nfStatusSubscribe_tx_responses_total		NrfLevel, RequesterNfType , HttpStatusCode, OperationType
37	NF Status UnSubscribe Requests Total	Number of NfStatusUnsubscribe requests received	ocnrf_nfStatusUnsubscribe_rx_requests_total		NrfLevel, RequesterNfType
38	NF Status UnSubscribe Responses Total	Number of NfStatusUnsubscribe responses sent	ocnrf_nfStatusUnsubscribe_tx_responses_total		NrfLevel, RequesterNfType, HttpStatusCode
39	NF Status Notifications Requests Sent	Number of NfStatusNotify requests sent	ocnrf_nfStatusNotify_tx_requests_total		NrfLevel, NotificationEventType, TargetNfType
40	NF Status Notifications Responses Received	Number of NfStatusNotify responses received	ocnrf_nfStatusNotify_rx_responses_total		NrfLevel, NotificationEventType, TargetNfType, HttpStatusCode
41	NF Status Notifications Requests Failed	Number of NfStatusNotify requests failed to sent out	ocnrf_nfStatusNotify_requests_failed_total		NrfLevel, NotificationEventType, TargetNfType
42	NfDiscover Requests Total	Number of NfDiscover Requests received	ocnrf_nfDiscover_rx_requests_total	NfDiscover Req [ TargetNf :- {{ TargetNfType }}, RequesterNfType :- {{RequesterNfType}} ]	NrfLevel, TargetNfType, RequesterNfType
43	NfDiscover Responses Total	Number of NfDiscover responses sent	ocnrf_nfDiscover_tx_responses_total		NrfLevel, TargetNfType, RequesterNfType, HttpResponseCode
44	NFDiscover Per Service Total	Number of NfDiscover requests received and processed successfully per Service	ocnrf_nfDiscover_rx_requests_success_perService_total	NFDiscover Per Service [ serviceName :- {{ serviceName }} ]	NrfLevel, RequesterNfType, ServiceName
45	Discovered profiles	Number of Profiles returned in discovery response. Depending on bucket size and corresponding value will tell how many profiles are returned in discovery response.	ocnrf_nfDiscover_profiles_discovered_total	Discovered profiles [ TargetNfType :- {{TargetNfType}}, Bucket :- {{ Bucket }} ]	NrfLevel, TargetNfType, BucketSize NfFqdn
46	Active Registrations	Number of active registered NFs at any point of time	ocnrf_active_registrations_count	Active Registrations [ NfType-{{ NfType }}, NrfLevel-{{ NrfLevel }} ]	NfType, NrfLevel
47	Avg NRF Latency taken by NRF specific microservice	Time taken by NRF specific microservice to process the service operation (NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval/NfHeartbeat/NfDiscover/NfSubscribe/NfUnsubscribe/NfAccessToken) Note: Latency calculated by this metric doesn't include time taken by OCNRF API gateway.	ocnrf_message_processing_time_seconds	Avg NRF Latency {{ ServiceOperation }} {{ RequesterNfType }}	NrfLevel,RequesterNfType ,ServiceOperation
48	OCNRF database operations	Database operation count corresponding to every service operation		ocnrf_dbmetric_total	Method, DBOperation, NrfLevel, HttpStatusCode
49	Database operation round trip time	Time (in microseconds) taken by database operation corresponding to every service operation NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval/NfHeartbeat/NfDiscover/NfSubscribe/NfUnsubscribe/NfAccessToken)	ocnrf_dbmetrics_round_trip_time_seconds		Method DBOperation ServiceOperation TableName: (NRF Table Names) NrfLevel HttpStatusCode

In the above NRF Metrics table, 4xx and 5xx are the error codes in REST API.

Table 6-3 NF Screening specific metrics

Sl. No#	Metric Name	Metric Details	Metric filter	Service Operation	Dimensions	Notes
1	Total NF Requests for which Screening Failed	The total number of requests for which screening failed against NF FQDN screening list.	ocnrf_nfScreening_nfFqdn_requestFailed_total	NFRegister, NFUpdate	NRF level NF type	See Note 1 below this table.
2	Total NF Requests Rejected due to Screening Failed	The total number of requests rejected because screening failed against NF FQDN screening list.	ocnrf_nfScreening_nfFqdn_requestRejected_total	NFRegister, NFUpdate	NRF level NF type	See Note 1 below this table.
3	Total NF Requests for which Screening Failed	The total number of requests for which screening failed against NF IP endpointscreening list.	ocnrf_nfScreening_nfIpEndPoint_requestFailed_total	NFRegister, NFUpdate	NRF level NF type	See Note 1 below this table.
4	Total NF Requests Rejected due to Screening Failed	The total number of requests rejected because screening failed against NF IP endpoint screening list.	ocnrf_nfScreening_nfIpEndPoint_requestRejected_total	NFRegister, NFUpdate	NRF level NF type	See Note 1 below this table.
5	Total NF Requests for which Screening Failed	The total number of requests for which screening failed against Callback URIscreening list.	ocnrf_nfScreening_callbackUri_requestFailed_total	NFRegister, NFUpdate, NFSubscribe	NRF level NF type	See Note 1 below this table.
6	Total NF Requests Rejected due to Screening Failed	The total number of requests rejected because screening failed against Callback URI screening list.	ocnrf_nfScreening_callbackUri_requestRejected_total	NFRegister, NFUpdate, NFSubscribe	NRF level NF type	See Note 1 below this table.
7	Total NF Requests for which Screening Failed	The total number of requests for which screening failed against PLMN idscreening list.	ocnrf_nfScreening_plmnId_requestFailed_total	NFRegister, NFUpdate	NRF level NF type	See Note 1 below this table.
8	Total NF Requests Rejected due to Screening Failed	The total number of requests rejected because screening failed against PLMN id screening list.	ocnrf_nfScreening_plmnId_requestRejected_total	NFRegister, NFUpdate	NRF level NF type	See Note 1 below this table.
9	Total NF Requests for which Screening Failed	The total number of NFRegister requests rejected as NF type was not allowed to register with NRF.	ocnrf_nfScreening_nfTypeRegister_requestFailed_total	NFRegister	NRF level NF type	See Note 1 below this table.
10	Total NF Requests Rejected due to Screening Failed	The total number of NFRegister requests for which screening failed against NF type screening list.	ocnrf_nfScreening_nfTypeRegister_requestRejected_total	NFRegister	NRF level NF type	See Note 1 below this table.
11	NF Screening not applied Internal Error	The total number of times screening not applied due to internal error.	ocnrf_nfScreening_notApplied_InternalError_total	NFRegister, NFUpdate, NFSubscribe	NRF level NF type	See Note 1 below this table.

Note:

In the above "NF Screening metrics" table, the dimension NF Type is a requester NF Type.

NF Access token metrics

Table 6-4 NF Access token metrics

Sl. No#	Metric Name	Metric Details	Metric filter	Service Operation	Dimensions
1	NF Access Token Request Received Total	The total number of access token requests received	ocnrf_accessToken_rx_requests_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel
2	NF Access Token Responses Sent Total	The total number of access token responses sent	ocnrf_accessToken_tx_responses_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode
3	NF Access Token Request Rejected (ClientNotAuthorized)	Number of access token request for which client authorized failed RejectionReason = ClientNotAuthorized	ocnrf_accessToken_tx_rejected_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode RejectionReason = ClientNotAuthorized
4	NF Access Token Request Rejected (ProducerWithRequestedScopeNotFound)	Number of access token not granted because of no producer instance registered for service/s in the scope RejectionReason = ProducerWithRequestedScopeNotFound	ocnrf_accessToken_tx_rejected_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode RejectionReason = ProducerWithRequestedScopeNotFound
5	NF Access Token Request Rejected (ProducerWithRequestedNfInstanceIdNotFound)	Number of access token not granted because of no producer instance registered for No producer instance is registered at all for provided target Instance Id in request. RejectionReason = ProducerWithRequestedNfInstanceIdNotFound	ocnrf_accessToken_tx_rejected_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode RejectionReason = ProducerWithRequestedNfInstanceIdNotFound
6	NF Access Token Request Rejected (InconsistentScope)	Number of access token not granted because services in the scope belong to different NF types. RejectionReason = InconsistentScope	ocnrf_accessToken_tx_rejected_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode RejectionReason = InconsistentScope
7	NF Access Token Request Rejected (ConsumerNFTypeMismatch)	Number of access token not granted because consumer NF type in profile is not matching with the access token request. RejectionReason = ConsumerNFTypeMismatch	ocnrf_accessToken_tx_rejected_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode RejectionReason = ConsumerNFTypeMismatch
8	NF Access Token Request Rejected (ProducerNFTypeMismatch)	Number of access token not granted because producer NF type in profile is not matching with the access token request. RejectionReason = ProducerNFTypeMismatch	ocnrf_accessToken_tx_rejected_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode RejectionReason = ProducerNFTypeMismatch
9	NF Access Token Request Rejected (InternalError)	Number of access token not granted because failure at NRF due to internal error. RejectionReason = InternalError	ocnrf_accessToken_tx_rejected_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode RejectionReason = ProducerNFTypeMismatch
10	NF Access Token Request Rejected (ConsumerNfTypeNotAllowed)	Number of access token not granted because the consumer NFType is not allowed to access the requested NF.	ocnrf_accessToken_tx_rejected_total	AccessToken	TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode RejectionReason = ConsumerNfTypeNotAllowed
11	NF Access Token Request Rejected (ConsumerPlmnNotAllowed)	Number of access token not granted because the consumer NF PLMN is not allowed to access the requested NF.	ocnrf_accessToken_tx_rejected_total	AccessToken	TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode RejectionReason = ConsumerPlmnNotAllowed

NRF-SLF specific metrics

Table 6-5 NRF-SLF specific metrics

Sl. No#	Metric Name	Metric Details	Metric filter	Service Operation	Dimensions
1	Discover Request Received For SLF Total	The total number of NF Discover request received for SLF	ocnrf_nfDiscover_ForSLF_rx_requests_total	NFDiscover	TargetNfType, NRFLevel
2	Discover Response Sent For SLF Total	The total number of NF Discover responses sent for SLF	ocnrf_nfDiscover_ForSLF_tx_responses_total	NFDiscover	TargetNfType, NRFLevel, HttpStatusCode, RejectionReason Possible Reject reasons:- RejectionReason = SLFCommunicationFailure RejectionReason = MandatoryParamsMissing RejectionReason = SLFConfigurationMissing RejectionReason = GroupIdNotFound RejectionReason = ErrorFromSLF RejectionReason = InternalError RejectionReason= NotApplicable NotApplicable is applicable for 2xx Status code
3	SLF Query Requests Sent Total	The total number of SLF query request sent	ocnrf_SLF_tx_requests_total	NFDiscover	TargetNfType, NRFLevel, SubscriptionIdType
4	SLF Query Responses Received Total	The total number of SLF query response received	ocnrf_SLF_rx_responses_total	NFDiscover	TargetNfType, NRFLevel, SubscriptionIdType,HttpStatusCode, GroupId
5	SLF Round Trip Time Total	Time (in microseconds) after sending query to SLF and getting response from SLF	ocnrf_slf_round_trip_time_seconds	NFDiscover	TargetNfType, SubscriptionIdType, HttpStatusCode, GroupId, NrfLevel, SLF ApiRoot

NRF Forwarding Metrics

Table 6-6 NRF Forwarding Metrics

Sl. No#	Metric Name	Metric Details	Metric filter	Service Operation	Dimensions
1	NF Access Token Requests Forwarded Total	The total number of Access Token Request forwarded to Primary/Secondary NRF	ocnrf_forward_accessToken_tx_requests_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel
2	NF Access Token Forwarded Responses Total	The total number of Access Token Responses for request forwarded to Primary/Secondary NRF	ocnrf_forward_accessToken_rx_responses_total	AccessToken	TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel,HttpStatusCode, RejectionReason RejectionReason: InternalError NRFCommunicationFailure ErrorFromNRF NRFForwardingConfigurationMissing LoopDetected *NotApplicable is applicable for 2xx Status code
3	NF Profile Retrieval Requests Forwarded Total	The total number of Profile Retrieval Request forwarded to Primary/Secondary NRF	ocnrf_forward_nfProfileRetrieval_tx_requests_total	NFProfileRetrieval	NrfLevel, NfInstanceId
4	NF Profile Retrieval Forwarded Responses Total	The total number of Profile Retrieval Responses for Request forwarded to Primary/Secondary NRF	ocnrf_forward_nfProfileRetrieval_rx_responses_total	NFProfileRetrieval	NrfLevel, NfInstanceId, HttpStatusCode, RejectionReason RejectionReason: InternalError NRFCommunicationFailure ErrorFromNRF NRFForwardingConfigurationMissing LoopDetected *NotApplicable is applicable for 2xx Status code
5	NF Status Subscribe Forwarded Requests Total	The total number of Status Subscribe Request forwarded to Primary/Secondary NRF	ocnrf_forward_nfStatusSubscribe_tx_requests_total	NFStatusSubscribe, NFStatusUnsubscribe	NrfLevel, RequesterNfType, OperationType
6	NF Status Subscribe Forwarded Responses Total	The total number of Responses for Status Subscribe Request forwarded to Primary/Secondary NRF	ocnrf_forward_nfStatusSubscribe_rx_responses_total	NFStatusSubscribe, NFStatusUnsubscribe,	NrfLevel, RequesterNfType, HttpStatusCode, OperationType, RejectionReason RejectionReason: InternalError NRFCommunicationFailure ErrorFromNRF NRFForwardingConfigurationMissing LoopDetected *NotApplicable is applicable for 2xx Status code
7	NF Discovery Forwarded Requests Total	The total number of NF Discovery Request forwarded to Primary/Secondary NRF	ocnrf_forward_nfDiscover_tx_requests_total	NFDiscover	NrfLevel, TargetNfType, RequesterNfType
8	NF Discovery Forwarded Responses Total	The total number of Responses for NF Discovery Request forwarded to Primary/Secondary NRF	ocnrf_forward_nfDiscover_rx_responses_total	NFDiscover	NrfLevel, TargetNfType, RequesterNfType, HttpResponseCode, RejectionReason RejectionReason: InternalError NrfCommunicationFailure NrfForwardingConfigurationMissing LoopDetected ErrorFromNrf *NotApplicable is applicable for 2xx Status code
9	Avg Latency for NRF Message Forwarding	Time taken by NRF specific microservice to forward the message to other Primary/Secondary NRF with the service operation: (NFProfileRetrieval/NFDiscover/NFStatusSubscribe/NfStatusUnsubscribe/AccessToken)	ocnrf_forward_round_trip_time_seconds	NFStatusSubscribe, NFStatusUnsubscribe, NFProfileRetrieval, NFDiscover, AccessToken	NrfLevel, RequesterNfType, ServiceOperation

GeoRedundancy metrics

Table 6-7 GeoRedundancy metrics

Sl. No#	Metric Name	Metric Details	Metric filter	Service Operation	Dimensions
1.	DB Replication status	The current replication status of the db tier service.	ocnrf_dbreplication_status	NA	NrfLevel,DbReplicationStatus
2.	DB Replication down Time	Time taken for the replication status to change from "INACTIVE" to "ACTIVE"	ocnrf_dbreplication_down_time_seconds	NA	NrfLevel,DbReplicationDownStartTime,DbReplicationDownEndTime
3.	Total NfInstances switched over from mated site	The number of NFInstances that got switched over from the mated site.	ocnrf_nf_switch_over_total	NfRegister, NfUpdate,NfDeregister, NfHeartbeat	NrfLevel, NfInstanceId,RemoteNrfInstanceId,ServiceOperation,OperationType
4.	Total NfSubscriptions switched over from mated site	The number of NfSubscriptions that got switched over from the mated site.	ocnrf_nfSubscriptions_switch_over_total	NfStatusSubscribe,NfStatusUnsubscribe, NrfAuditor	NrfLevel,SubscriptionId,RemoteNrfInstanceId,ServiceOperation,OperationType
5.	Total Nfinstances removed by OCNRF as it is stale	The number of NfInstances that get deleted by the NrfAuditor when it detects a record to be stale.	ocnrf_stale_nf_deleted_total	NA	NrfLevel, NfInstanceId, NfStatus
6.	Total NfSubscriptions removed by OCNRF as it is stale	The number of NfSubscriptions that get deleted by the NrfAuditor when it detects a record to be stale.	ocnrf_stale_nfSubscriptions_deleted_total	NA	NrfLevel,NfSubscriptionId,SubscriptionStatus
7.	Total NfInstances that have been marked as SUSPENDED by the OCNRF Auditor	The number of profiles that have been marked as SUSPENDED when a profile has missed nfHeartBeatMissAllowed.	ocnrf_nf_suspended_total	NA	NrfLevel, NfInstanceId, NfStatus, HeartbeatTimer
8	Total NfSubscriptions whose validityTime has expired	The number of NfSubscriptions whose validityTime has expired	ocnrf_nfSubscriptions_expired_total	NA	NrfLevel,SubscriptionId

NF AccessToken Authorization Metrics

Table 6-8 NF AccessToken Authorization Metrics

Sl. No#	Metric Name	Metric Details	Metric filter	Service Operation	Dimensions
1	NF Access Token Request Rejected (AuthScreeningFailed)	Number of access token not granted because the consumer NF is not authorized to access the requested NF or its services.	ocnrf_accessToken_tx_rejected_total	NfAccessToken	TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel,HttpStatusCode RejectionReason = ClientNotAuthorized

Sl. No#

Metric Name

Metric Details

Metric filter

Service Operation

Dimensions

NF Access Token Request Rejected (AuthScreeningFailed)

Number of access token not granted because the consumer NF is not authorized to access the requested NF or its services.

ocnrf_accessToken_tx_rejected_total

NfAccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel,HttpStatusCode

RejectionReason = ClientNotAuthorized

NF Authentication Metrics

Table 6-9 NF Authentication Metrics

Sl. No#	Metric Name	Metric Details	Metric filter	Service Operation	Dimensions
1	NF Authentication Failure Total	The total number of request for which FQDN based Authentication failed at OCNRF	ocnrf_nf_authentication_failure_total	NrfLevel, Method, ServiceOperation, NfFqdn, TLSFqdn	NFAccessToken/NFRegistration/NFSubscription/NFDiscovery/NfListRetrieval/NfProfileRetrieval For NfListRetrieval and NfProfileRetrieval serviceOperations NfFqdn is filled as NotApplicable. If OC-XFCC-DNS header is not received at NRF Microservice then TLSFqdn is filled as "UNKNOWN"

Sl. No#

Metric Name

Metric Details

Metric filter

Service Operation

Dimensions

NF Authentication Failure Total

The total number of request for which FQDN based Authentication failed at OCNRF

ocnrf_nf_authentication_failure_total

NrfLevel,

Method,

ServiceOperation,

NfFqdn,

TLSFqdn

NFAccessToken/NFRegistration/NFSubscription/NFDiscovery/NfListRetrieval/NfProfileRetrieval

For NfListRetrieval and NfProfileRetrieval serviceOperations NfFqdn is filled as NotApplicable.

If OC-XFCC-DNS header is not received at NRF Microservice then TLSFqdn is filled as "UNKNOWN"

OCNRF KPIs

This section includes information about KPIs for Oracle Communications Network Repository Function (OCNRF).

Note:

Sample OCNRF dashboard for Grafana is delivered to the customer through OCNRF Custom Templates. Metrics and functions used to achieve KPI are already covered in OCNRF Custom Templates.

Table 6-10 KPI Details

KPI Name	KPI Details	Metric used for KPI	Service Operation	Response code
OCNRF Ingress Request	Rate of HTTP requests received at OCNRF Ingress Gateway	oc_ingressgateway_http_requests_total	All	Not Applicable
NF Register Success		sum(irate(oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".nnrf-nfm/v1/nf-instances.",Method="PUT"}[5m]))	NFRegister	201
NF Update Success (Complete Replacement)		sum(irate(oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".nnrf-nfm/v1/nf-instances.",Method="PUT"}[5m]))	NFUpdate	200
NF DeRegister Success		sum(irate(oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".nnrf-nfm/v1/nf-instances.",Method="DELETE"}[5m]))	NFDeregister	204
NF Subscribe Success		sum(irate(oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".nnrf-nfm/v1/subscriptions.",Method="POST"}[5m]))	NFStatusSubscribe	201
NF Unsubscribe Success		sum(irate(oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".nnrf-nfm/v1/subscriptions.",Method="DELETE"}[5m]))	NFStatusUnsubscribe	204
NF Discover Success		sum(irate(oc_ingressgateway_http_responses_total{Status=~"2.",Route_path=~".nnrf-disc/v1/nf-instances.*",Method="GET"}[5m]))	NFDiscover	200
4xx Responses (NF-Instances)		sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.",Route_path=~".nnrf-nfm/v1/nf-instances.*"}[5m]))	NFRegister/NFUpdate/NFDeregister	4xx
4xx Responses (Subscriptions)		sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.",Route_path=~".nnrf-nfm/v1/subscriptions.*"}[5m]))	NFStatusSubscribe/NFStatusUnsubscribe	4xx
4xx Responses (Discovery)		sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.",Route_path=~".nnrf-disc/v1/nf-instances.*"}[5m]))	NFDiscover	4xx
5xx Responses (NF-Instances)		sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.",Route_path=~".nnrf-nfm/v1/nf-instances.*"}[5m]))	NFRegister/NFUpdate/NFDeregister	5xx
5xx Responses (Subscriptions)		sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.",Route_path=~".nnrf-nfm/v1/subscriptions.*"}[5m]))	NFStatusSubscribe/NFStatusUnsubscribe	5xx
5xx Responses (Discovery)		sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.",Route_path=~".nnrf-disc/v1/nf-instances.*"}[5m]))	NFDiscover	5xx

OCNRF Alerts

This section includes information about alerts for OCNRF.

Table 6-11 Alert Details

Alert	Trigger Condition	Severity	Alert details provided	OID	Metric Used	Resolution	Notes
System Level Alerts
OcnrfNfStatusUnavailable	All the OCNRF services are unavailable, either because the OCNRF is getting deployed or purged. These OCNRF services considered are nfregistration, nfsubscription, nrfauditor, nrfconfiguration, nfaccesstoken, nfdiscovery, appinfo, ingressgateway and egressgateway	Critical	description: 'OCNRF services unavailable' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : All OCNRF services are unavailable.'	1.3.6.1.4.1.323.5.3.36.1.2.7016	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared automatically when the OCNRF services start becoming available. Steps: Check for service specific alerts. Refer the application logs on Kibana and check for database related failures like connectivity, invalid secrets etc. The logs can be filtered based on the services. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfPodsRestart	A pod belonging to any of the OCNRF services have restarted.	Major	description: 'Pod <Pod Name> has restarted. summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : A Pod has restarted'	1.3.6.1.4.1.323.5.3.36.1.2.7017	'kube_pod_container_status_restarts_total' Note: This is a kubernetes metric. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared automatically if the specific pod is up. Steps: Refer the application logs on Kibana and filter based on pod name, check for database related failures like connectivity, kubernetes secrets etc. Check orchestration logs for liveness or readiness probe failures. In case the issue persists, contact My Oracle Support.
NnrfNFManagementServiceDown	Either NFRegistration or NFSubscription or NrfAuditor services are unavailable.	Critical	description: 'OCNRF Nnrf_Management service <nfregistration\|nfsubscription\|nrfauditor> is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NFManagement service is down'	1.3.6.1.4.1.323.5.3.36.1.2.7018	''up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when all the Nnrf_NFManagement services are available that is nfregistration, nfsubscription and nrfauditor. Steps: Check if NfService specific alerts are generated to understand which service is down. Check the orchestration logs of nfregistration, nfsubscription and nrfauditor services and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on above service names. Check for ERROR WARNING logs for each of these services. Refer the application logs on Kibana and filter the service appinfo, check for the service status of the nfregistration, nfsubscription and nrfauditor services. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
NnrfAccessTokenServiceDown	NFAccessToken service is unavailable.	Critical	description: 'OCNRF Nnrf_NFAccessToken service nfaccesstoken is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NFAccessToken service down'	1.3.6.1.4.1.323.5.3.36.1.2.7020	''up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available use the similar metric as exposed by the monitoring system.	The alert is cleared when the Nnrf_AccessToken service is available. Steps: Check the orchestration logs of nfaccesstoken service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on nfaccesstoken service names. Check for ERROR WARNING logs. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
NnrfNFDiscoveryServiceDown	NFDiscovery is unavailable.	Critical	description: 'OCNRF Nnrf_NFDiscovery service nfdiscovery is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NFDiscovery service unavailable.'	1.3.6.1.4.1.323.5.3.36.1.2.7019	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when the Nnrf_NFDiscovery service is available. Steps: Check the orchestration logs of nfdiscovery service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on nfdiscovery service names. Check for ERROR WARNING logs. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfRegistrationServiceDown	None of the pods of the NFRegistration microservice is available.	Critical	description: 'OCNRF NFRegistration service nfregistration is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NFRegistration service is down''	1.3.6.1.4.1.323.5.3.36.1.2.7021	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when the nfregistration service is available. Steps: Check the orchestration logs of nfregistration service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on nfregistration service names. Check for ERROR WARNING logs. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfSubscriptionServiceDown	None of the pods of the NFSubscription microservice is available.	Critical	description: 'OCNRF NFSubscription service nfsubscription is down. summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NFSubscription service is down'	1.3.6.1.4.1.323.5.3.36.1.2.7022	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when the nfsubscription service is available. Steps: Check the orchestration logs of nfsubscription service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on nfsubcription service names. Check for ERROR WARNING logs. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfDiscoveryServiceDown	None of the pods of the NFDiscovery microservice is available.	Critical	description: 'OCNRF NFDiscovery service nfdiscovery is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NFDiscovery service down'	1.3.6.1.4.1.323.5.3.36.1.2.7023	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when the nfdiscovery service is available. Steps: Check the orchestration logs of nfdiscovery service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on nfdiscovery service names. Check for ERROR WARNING logs. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfAccessTokenServiceDown	None of the pods of the NFAccessToken microservice is available.	Critical	description: 'OCNRF NFAccessToken service nfaccesstoken is down summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NFAccesstoken service down'	1.3.6.1.4.1.323.5.3.36.1.2.7024	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when the nfaccesstoken service is available. Steps: Check the orchestration logs of nfaccesstoken service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on nfaccesstoken service names. Check for ERROR WARNING logs. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfAuditorServiceDown	None of the pods of the NrfAuditor microservice is available.	Critical	description: 'OCNRF NrfAuditor service nrfauditor is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NrfAuditor service down'	1.3.6.1.4.1.323.5.3.36.1.2.7026	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when the nrfauditor service is available. Steps: Check the orchestration logs of nrfauditor service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on nrfauditor service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfConfigurationServiceDown	None of the pods of the NrfConfiguration microservice is available.	Critical	description: 'OCNRF NrfConfiguration service nrfconfiguration is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NrfConfiguration service down'	1.3.6.1.4.1.323.5.3.36.1.2.7025	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when the nrfconfiguration service is available. Steps: Check the orchestration logs of nrfconfiguration service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on nrfconfiguration service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfAppInfoServiceDown	None of the pods of the App Info microservice is available.	Critical	description: 'OCNRF Appinfo service appinfo is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Appinfo service down'	1.3.6.1.4.1.323.5.3.36.1.2.7027	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when the app-info service is available. Steps: Check the orchestration logs of appinfo service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfIngressGatewayServiceDown	None of the pods of the Ingress-Gateway microservice is available.	Critical	description: 'OCNRF Ingress-Gateway service ingressgateway is down. summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Ingress-gateway service down'	1.3.6.1.4.1.323.5.3.36.1.2.7028	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when the ingressgateway service is available. Steps: Check the orchestration logs of ingress-gateway service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on ingress-gateway service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfEgressGatewayServiceDown	None of the pods of the Egress-Gateway microservice is available.	Critical	description: 'OCNRF Egress-Gateway service egressgateway is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Egress-Gateway service down'	1.3.6.1.4.1.323.5.3.36.1.2.7029	'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.	The alert is cleared when the egressgateway service is available. Note: The threshold is configurable in the alerts.yaml Steps: Check the orchestration logs of egress-gateway service and check for liveness or readiness probe failures. Refer the application logs on Kibana and filter based on egress-gateway service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, contact My Oracle Support.
OcnrfMemoryUsageCrossedMinorThreshold	A pod has reached the configured minor threshold( 50%) of its memory resource limits.	Minor	description: 'OCNRF Memory Usage for pod <Pod name> has crossed the configured minor threshold (50 %) (value={{ $value }}) of its limit.' summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 50% of its limit.'	1.3.6.1.4.1.323.5.3.36.1.2.7030	'container_memory_usage_bytes''container_spec_memory_limit_bytes' Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.	The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case OcnrfMemoryUsageCrossedMajorThreshold alert shall be raised. Note: The threshold is configurable in the alerts.yaml If guidance required, contact My Oracle Support.
OcnrfMemoryUsageCrossedMajorThreshold	A pod has reached the configured major threshold( 60%) of its memory resource limits.	Major	description: 'OCNRF Memory Usage for pod <Pod name> has crossed the major threshold(60%) (value = {{ $value }}) of its limit.' summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 60% of its limit.'	1.3.6.1.4.1.323.5.3.36.1.2.7031	'container_memory_usage_bytes' 'container_spec_memory_limit_bytes' Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.	The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case OcnrfMemoryUsageCrossedCriticalThreshold alert shall be raised. Note: The threshold is configurable in the alerts.yaml If guidance required, contact My Oracle Support.
OcnrfMemoryUsageCrossedCriticalThreshold	A pod has reached the configured critical threshold ( 70% ) of its memory resource limits.	Critical	description: 'OCNRF Memory Usage for pod <Pod name> has crossed the configured critical threshold (70%) (value = {{ $value }}) of its limit.' summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.'	1.3.6.1.4.1.323.5.3.36.1.2.7032	'container_memory_usage_bytes' 'container_spec_memory_limit_bytes' Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.	The alert gets cleared when the memory utilization falls below the Critical Threshold. Note: The threshold is configurable in the alerts.yaml If guidance required, contact My Oracle Support.
OcnrfTotalIngressTrafficRateAboveMinorThreshold	The total OCNRF Ingress Message rate has crossed the configured minor threshold of 800 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate)	Minor	description: Total'Ingress traffic Rate is above configured minor threshold i.e. 800 requests per second (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'	1.3.6.1.4.1.323.5.3.36.1.2.7001	'oc_ingressgateway_http_requests_total'	The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcnrfTotalIngressTrafficRateAboveMinorThreshold alert shall be raised. Note: The threshold is configurable in the alerts.yaml Steps: Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable). If this is unexpected, contact My Oracle Support.
OcnrfTotalIngressTrafficRateAboveMajorThreshold	The total OCNRF Ingress Message rate has crossed the configured major threshold of 900 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate)	Major	description: 'Total Ingress traffic Rate is above major threshold i.e. 900 requests per second (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'	1.3.6.1.4.1.323.5.3.36.1.2.7002	'oc_ingressgateway_http_requests_total'	The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate cross the Critical threshold, in which case the OcnrfTotalIngressTrafficRateAboveCriticalThreshold Note: The threshold is configurable in the alerts.yaml alert shall be raised. Steps: Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable). If this is unexpected, contact My Oracle Support.
OcnrfTotalIngressTrafficRateAboveCriticalThreshold	The total OCNRF Ingress Message rate has crossed the configured critical threshold of 950 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 95 % of 1000 (Maximum ingress request rate)	Critical	description: 'Total Ingress traffic Rate is above critical threshold i.e. 950 requests per second (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Traffic Rate is above 95 Percent of Max requests per second(1000)'	1.3.6.1.4.1.323.5.3.36.1.2.7003	'oc_ingressgateway_http_requests_total'	The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the alerts.yaml Steps: Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable). If this is unexpected, contact My Oracle Support.
OcnrfTransactionErrorRateAbove0.1Percent	The number of failed transactions is above 0.1 percent of the total transactions.	Warning	description: 'Transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 0.1 Percent of Total Transactions'	1.3.6.1.4.1.323.5.3.36.1.2.7004	'oc_ingressgateway_http_responses_total'	The alert is cleared when the number of failure transactions are below 0.1 percent of the total transactions or when the number of failure transactions cross the 1% threshold in which case the OcnrfTransactionErrorRateAbove1Percent shall be raised. Steps: Check the Service specific metrics to understand the specific service request errors. for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx. If guidance required, contact My Oracle Support.
OcnrfTransactionErrorRateAbove1Percent	The number of failed transactions is above 1 percent of the total transactions.	Warning	description: 'Transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 1 Percent of Total Transactions'	1.3.6.1.4.1.323.5.3.36.1.2.7005	'oc_ingressgateway_http_responses_total'	The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised. Steps: Check the Service specific metrics to understand the specific service request errors. for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx. If guidance required, contact My Oracle Support.
OcnrfTransactionErrorRateAbove10Percent	The number of failed transactions has crossed the minor threshold of 10 percent of the total transactions.	Minor	description: 'Transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 10 Percent of Total Transactions'	1.3.6.1.4.1.323.5.3.36.1.2.7006	'oc_ingressgateway_http_responses_total'	The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised. Steps: Check the Service specific metrics to understand the specific service request errors. for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx. If guidance required, contact My Oracle Support.
OcnrfTransactionErrorRateAbove25Percent	The number of failed transactions has crossed the minor threshold of 25 percent of the total transactions.	Major	description: 'Transaction Error rate is above 25 Percent of Total Transactions (current value is {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 25 Percent of Total Transactions'	1.3.6.1.4.1.323.5.3.36.1.2.7007	'oc_ingressgateway_http_responses_total'	The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the OcnrfTransactionErrorRateAbove50Percent shall be raised. Steps: Check the Service specific metrics to understand the specific service request errors. for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx. If guidance required, contact My Oracle Support.
OcnrfTransactionErrorRateAbove50Percent	The number of failed transactions has crossed the minor threshold of 50 percent of the total transactions.	Critical	description: 'Transaction Error rate is above 50 Percent of Total Transactions (current value is {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 50 Percent of Total Transactions'	1.3.6.1.4.1.323.5.3.36.1.2.7008	'oc_ingressgateway_http_responses_total	The alert is cleared when the number of failure transactions are below 50 percent of the total transactions. Steps: Check the Service specific metrics to understand the specific service request errors. for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx. If guidance required, contact My Oracle Support.
OCNRF Application Alerts
OcnrfRegisteredNFsBelowCriticalThreshold	The number of NFs currently registered with OCNRF is below the critical threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is below 2.	Critical	description: 'The number of registered NFs detected below critical threshold (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: The number of registered NFs detected below critical threshold.'	1.3.6.1.4.1.323.5.3.36.1.2.7009	'ocnrf_active_registrations_count'	The alert is cleared when the number of registered NFs are above the critical threshold. Steps: No Action required. This is an information alert.	Operator shall configure the threshold values with respect to the number of NFs expected within the network. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfRegisteredNFsBelowMajorThreshold	The number of NFs currently registered with OCNRF is below the major threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 2 and less than below 10.	Major	description: 'The number of registered NFs detected below major threshold (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: The number of registered NFs detected below major threshold.'	1.3.6.1.4.1.323.5.3.36.1.2.7010	'ocnrf_active_registrations_count	The alert is cleared when the number of registered NFs are above the major threshold. Steps: No Action required. This is an information alert.	Operator shall configure the threshold values with respect to the number of NFs expected within the network. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfRegisteredNFsBelowMinorThreshold	The number of NFs currently registered with OCNRF is below the minor threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 10 and less than below 20.	Minor	description: 'The number of registered NFs detected below minor threshold (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: The number of registered NFs detected below minor threshold.'	1.3.6.1.4.1.323.5.3.36.1.2.7011	'ocnrf_active_registrations_count'	The alert is cleared when the number of registered NFs are above the minor threshold. Steps: No Action required. This is an information alert.	Operator shall configure the threshold values with respect to the number of NFs expected within the network. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfRegisteredNFsBelowThreshold	The number of NFs currently registered with OCNRF is approaching minor threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 20 and less than below 30.	Warning	description: 'The number of registered NFs is approaching minor threshold (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: The number of registered NFs approaching minor threshold.'	1.3.6.1.4.1.323.5.3.36.1.2.7012	'ocnrf_active_registrations_count'	The alert is cleared when the number of registered NFs are approaching minor threshold. Steps: No Action required. This is an information alert.	Operator shall configure the threshold values with respect to the number of NFs expected within the network. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfDbReplicationStatusInactive	The db tier replication service status is inactive across the georedundant OCNRFs.	Critical	description: 'The Database Replication Status is currently INACTIVE.' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, dbreplicationstatus: {{$labels.DbReplicationStatus}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: The database replication status is INACTIVE.'	1.3.6.1.4.1.323.5.3.36.1.2.7013	'ocnrf_dbreplication_status'	The alert is cleared when the dbtier replication services is active.	The Alarm shall be included only if the Georedundancy feature is enabled.
OcnrfAccessTokenRequestsRejected	OCNRF rejected an AccessToken Request	critical warning	description: 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF.'	1.3.6.1.4.1.323.5.3.36.1.2.7014	'ocnrf_accessToken_tx_rejected_total'	The alert is cleared automatically. Steps: The Rejection Reason shall be present in the alert. In case the RejectionReason is AuthScreeningFailed/ClientNotAuthorized, either the configurations need to be reevaluated or check the consumer NF that has requested for unauthorized token. For other reason, follow the RejectionReason.
OcnrfNfAuthenticationFailureRequestsRejected	OCNRF rejected a service request due to NF authentication failure	critical warning	description: 'Service request(s) received from NF have been rejected by OCNRF (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Request rejected for Nf FQDN based Authentication failure.'	1.3.6.1.4.1.323.5.3.36.1.2.7015	'ocnrf_nf_authentication_failure_total'	The alert is cleared automatically. Steps: No Action required for OCNRF. This is an information alert. The Rejection Reason shall be present in the alert

OCNRF Alert Configuration

This section describes the Measurement based Alert rules configuration for OCNRF. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

Note:

Alert file is packaged with OCNRF custom templates. The OCNRF templates.zip file can be downloaded from OHC. Unzip the OCNRF templates.zip file to get NrfAlertRules.yaml file.
Review the NrfAlertRules.yaml file and edit the value of the parameters in the NrfAlertRules.yaml file (if needed to be changed from default values) before configuring the alerts. See below table for details.
kubernetes_namespace is configured as kubernetes namespace in which NRF is deployed. Default value is OCNRF. Please update the NrfAlertRules.yaml file to reflect the correct OCNRF kubernetes namespace.

Alert details which can be updated in NrfAlertRules.yaml file before configuration

Table 6-12 Alerts

Alert Name	Details	Default Value	Notes
OcnrfTotalIngressTrafficRateAboveMinorThreshold	Traffic Rate is above 80 Percent of Max requests per second	Greater than/equal to 800 and Less than 900	Maximum Ingress rate considered is 1000 requests per second. So, here in default value 800 is 80% of 1000 and 900 is 90% of 1000. For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 80% of Max Ingress Request Rate] for this alert
OcnrfTotalIngressTrafficRateAboveMajorThreshold	Traffic Rate is above 90 Percent of Max requests per second	Greater than/equal to 900 and Less than 950	Maximum Ingress rate considered is 1000 requests per second. So, here in default value 900 is 90% of 1000 and 950 is 95% of 1000. For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 95% of Max Ingress Request Rate] for this alert
OcnrfTotalIngressTrafficRateAboveCriticalThreshold	Traffic Rate is above 95 Percent of Max requests per second	Greater than/equal to 950	Maximum Ingress rate considered is 1000 requests per second. So, here in default value 950 is 95% of 1000. For example, if value need to be updated then depending upon maximum ingress request rate, set [ 95% of Max Ingress Request Rate] for this alert

Alert Name

Details

Default Value

Notes

OcnrfTotalIngressTrafficRateAboveMinorThreshold

Traffic Rate is above 80 Percent of Max requests per second

Greater than/equal to 800 and Less than 900

Maximum Ingress rate considered is 1000 requests per second.

So, here in default value 800 is 80% of 1000 and 900 is 90% of 1000.

For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 80% of Max Ingress Request Rate] for this alert

OcnrfTotalIngressTrafficRateAboveMajorThreshold

Traffic Rate is above 90 Percent of Max requests per second

Greater than/equal to 900 and Less than 950

Maximum Ingress rate considered is 1000 requests per second.

So, here in default value 900 is 90% of 1000 and 950 is 95% of 1000.

For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 95% of Max Ingress Request Rate] for this alert

OcnrfTotalIngressTrafficRateAboveCriticalThreshold

Traffic Rate is above 95 Percent of Max requests per second

Greater than/equal to 950

Maximum Ingress rate considered is 1000 requests per second.

So, here in default value 950 is 95% of 1000.

For example, if value need to be updated then depending upon maximum ingress request rate, set [ 95% of Max Ingress Request Rate] for this alert

OCNRF Alert configuration in Prometheus

This section describes the measurement based Alert rules configuration for OCNRF in Prometheus. Please use the NrfAlertRules.yaml file updated in OCNRF Alert configuration section.

_NAME_ :- Helm Release of Prometheus

_Namespace_ :- Kubernetes NameSpace in which Prometheus is installed

Take Backup of current configuration map of Prometheus:

kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml

Check and add OCNRF Alert file name inside Prometheus configuration map:

sed -i '/etc\/config\/alertsnrf/d' /tmp/tempConfig.yaml
sed -i '/rule_files:/a\  \- /etc/config/alertsnrf' /tmp/tempConfig.yaml

Update configuration map with updated file name of OCNRF alert file:
```
kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
```

Add OCNRF Alert rules in configuration map under file name of OCNRF alert file:

kubectl patch configmap _NAME_-server -n _Namespace_--type merge --patch
"$(cat ~/NrfAlertrules.yaml)"

Note:

The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 60 seconds. Refresh the Prometheus GUI to confirm that the OCNRF Alerts have been reloaded.

Disable OCNRF Alert in Prometheus

Steps to disable Alerts in Prometheus:

Edit NrfAlertrules.yaml file to remove specific alert:

Sample alert content from NrfAlertrules.yaml is below. This is to provide idea of a specific alert details in NrfAlertrules.yaml which need to be disabled.

## ALERT SAMPLE START##
      - alert: OcnrfTrafficRateAboveMinorThreshold
        annotations:
          description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})'
          summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)'
        expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900
        labels:
          severity: Minor
## ALERT SAMPLE END##

Remove specific alert content which need to be disabled.
Perform Alert configuration again. See OCNRF Alert configuration in Prometheus section above for detailed steps.

Disabling Alerts

This section explains the procedure to disable the alerts in OCNRF.

Edit NrfAlertrules.yaml file to remove specific alert.

Remove complete content of the specific alert from the NrfAlertrules.yaml file.

For example: If you want to remove OcnrfTrafficRateAboveMinorThreshold alert, remove the complete content:

## ALERT SAMPLE START##

      - alert: OcnrfTrafficRateAboveMinorThreshold
        annotations:
          description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})'
          summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)'
        expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900
        labels:
          severity: Minor
## ALERT SAMPLE END##

Perform Alert configuration. See OCNRF Alert Configuration section above for details.

Configuring SNMP Notifier

This section describes the procedure to configuring SNMP Notifier.

Configure and Validate Alerts in Prometheus Server

Refer to OCNRF Alert Configuration section for procedure to configure the alerts.

Validating Alerts

After configuring the alerts in Prometheus server, a user can verify that by following steps:

Open the Prometheus server from your browser using the <IP>:<Port>
Navigate to Status and then Rules
Search Ocnrf. OcnrfAlerts list is displayed.

Note:
If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try again.

Configuring SNMP-Notifier

Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using the following procedure:

Execute the following command to edit the deployment:

kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>

Example:

$ kubectl edit deploy occne-snmp-notifier -n occne-infra

Edit the destination as follows:
```
--snmp.destination=<destination_ip>:<destination_port>
```
Example:
```
--snmp.destination=10.75.203.94:162
```

Checking SNMP Traps

Following is an example on how to capture the logs of the trap receiver server to view the generated SNMP traps:

$ docker logs <trapd_container_id>

Sample output:

2020-04-29 15:34:24 10.75.203.103 [UDP: [10.75.203.103]:2747->[172.17.0.4]:162]:DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (158510800) 18 days, 8:18:28.00        SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003    SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.1 = STRING: "1.3.6.1.4.1.323.5.3.36.1.2.7003[]"  SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.2 = STRING: "critical"      SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.3 = STRING: "Status: critical- Alert: OcnrfActiveSubscribersBelowCriticalThreshold  Summary: namespace: ocnrf, nftype:5G_EIR, nrflevel:6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c, podname: ocnrf-nrfauditor-6b459f5db5-4kvt4,
        timestamp: 2020-04-29 15:33:24.408 +0000 UTC: Current number of registered NFs detected below critical threshold.  Description: The number of registered NFs detected below critical threshold (current value
          is: 0)

MIB Files for OCNRF

There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.

OCNRF-MIB-TC-1.8.0.mib
This is considered as OCNRF top level mib file, where the Objects and their data types are defined.
OCNRF-MIB-1.8.0.mib
This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.

Note:

MIB files are packaged along with OCNRF Custom Templates. Download the file from OHC. Refer to OCNRF Installation and Upgrade guide for more details.