6 OCNRF Metrics, KPIs, and Alerts

OCNRF Metrics

This section includes information about Metrics for Oracle Communications Network Repository Function.

Note:

Sample OCNRF dashboard for Grafana is delivered to the customer through OCNRF Custom Templates. Metrics and functions used to achieve KPI are covered in OCNRF Custom Templates. Refer to Oracle Help Center site for the information about OCNRF Custom Templates.

Dimensions Legend for the Metrics

The following table includes the details about the metrics dimensions:

Table 6-1 Dimensions Legend

Dimension Details
Method HTTP Method Name. For Example:- PUT, GET
Status HTTP Status Code in response
Uri URI defined to identify the Service Operation at Ingress Gateway
Node Name of the kubernetes worker node on which microservice is running
NrfLevel OCNRF Deployment Name by which OCNRF can be identified, it will be OCNRF Instance Id passed through helm
NfType Types of Network Functions (NF)
NfInstanceId Unique identity of the NF Instance sending request to OCNRF
HttpStatusCode HTTP Status Code
ServiceName Name of the service instance (e.g. "nudm-sdm")
ServiceInstanceId Unique ID of the service instance within a given NF Instance
UpdateType(Partial/Complete) NF Update with PUT (Complete) or PATCH (Partial) methods
OperationType Dimension is for NFSubscribe Service operation to tell if the request is to create or update the subscription
NotificationEventType This dimension indicates subscription request is for which event types. For example:- NF_REGISTERED, NF_DEREGISTERED and NF_PROFILE_CHANGED
TargetNfType Dimension indicates request is for which target NF type
RequesterNfType Dimension indicates the NF type which originating the request. This value comes from UserAgent header. For NFDiscover Service operation it is taken from Search Query.

In case no header or value, this value will be UNKNOWN in the metrics

TargetNfInstanceId Dimension indicates the target NF Instance Id for NF Access Token
ClientNfInstanceId Dimension indicates the client NF Instance Id for NF Access Token
RejectionReason Dimension indicates the rejection reason for NF Access Token
SubscriptionIdType Dimension indicates the Subscription Id type for which SLF query is received
GroupId Dimension indicates the GroupId returned by SLF/UDR corresponding to SubscriptionId
BucketSize Dimension indicates how many profiles are returned in the response of Discovery request. Range is not configurable. Possible values are 0-10, +Inf. According to NF profiles returned, corresponding bucket will be incremented by one. For example, if 2 profiles are returned, then bucket 2 will be incremented by one. Profiles getting returned more than 10 will fall in +Inf bucket.
DBOperation Create,update,delete and find
TableName OCNRF Table Name
SubscriptionStatus Status of subscription shall be 'SUBSCRIBED', 'SUSPENDED' or 'UNSUBSCRIBED'
DbReplicationStatus "ACTIVE" or "INACTIVE"
RemoteNrfInstanceId Remote OCNRF Instance Id
HeartbeatTimer The heartbeatTimer of the NfProfile. The value is considered in seconds.

Table 6-2 OCNRF Metrics

Sl. No# Metric Name Metric Details Metric filter Recommended legend to see dimension level data (as applicable) Dimensions
1 Total number of ingress requests Total number of requests received at OCNRF oc_ingressgateway_http_requests_total    
2 NF Register Success Total number of successful NFRegister service operations at OCNRF oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

3 NF Update Success (Complete Replacement) Total number of successful NFUpdate service operations at OCNRF oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

4 NF Update Success (Partial Replacement) Total number of successful NFUpdate service operations at OCNRF oc_ingressgateway_http_responses_total{Status=~".*2.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PATCH"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

5 NF List/Profile Retrieval Success Total number of successful NF List/Profile retrieval service operations at OCNRF oc_ingressgateway_http_responses_total{Status=~".*2.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="GET"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

6 Access Token Success Total number of successful Access Token service operations at OCNRF oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*/oauth2/token*."}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the Kubernetes worker node on which micro-service is running

7 NF De-register Success Total number of successful service operations at OCNRF oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="DELETE"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the Kubernetes worker node on which micro-service is running

8 NF Subscribe Success Total number of successful NFSubscribe service operations at OCNRF oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="POST"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the Kubernetes worker node on which micro-service is running

9 NF Unsubscribe Success Total number of successful NFUnSubscribe service operations at OCNRF oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="DELETE"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the Kubernetes worker node on which micro-service is running

10 NF Discover Success Total number of successful NFDiscover service operations at OCNRF oc_ingressgateway_http_responses_total{Status=~"2.*",Route_path=~".*nnrf-disc/v1/nf-instances.*",Method="GET"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the Kubernetes worker node on which micro-service is running

11 4xx Responses (NF-Instances) Total number of 4xx responses(NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval) oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

12 4xx Responses (Subscriptions) Total number of 4xx responses(NfSubscribe/NfUnsubscribe) oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

13 4xx Responses (Discovery) Total number of 4xx responses(NfDiscover) oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

14 4xx Responses (AccessToken) Total number of 4xx responses(NfAccessToken) oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*oauth2/token.*"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

15 5xx Responses (NF-Instances) Total number of 5xx responses(NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval) oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

16 5xx Responses (Subscriptions) Total number of 5xx responses(NfSubscribe/NfUnsubscribe) oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

17 5xx Responses (Discovery) Total number of 5xx responses(NfDiscover) oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

18 5xx Responses (AccessToken) Total number of 5xx responses(NfAccessToken) oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*oauth2/token.*"}  

Method- HTTP method of request

Status - status code in HTTP response

Uri- URI from the request line

Node-Name of the kubernetes worker node on which microservice is running

19 NfRegistrations Total Number of Registration Requests received ocnrf_nfRegister_rx_requests_total NfRegistrations Total

NrfLevel

NfInstanceId

RequesterNfType

20 NfRegistrations Responses Total Number of Registration Responses sent. ocnrf_nfRegister_tx_responses_total NfRegistrations Responses Total

NrfLevel

NfInstanceId

RequesterNfType

HttpStatusCode

21 NfRegistrations Per Service Total Number of Registrations received and processed successfully per Service. ocnrf_nfRegister_rx_requests_success_perService_total NfRegistrations Per Service [ serviceName :- {{ serviceName }}, nfInstanceId :- {{NfInstanceId}} ]

NrfLevel

NfInstanceId

ServiceName

ServiceInstanceId

22 NFUpdates Total Number of Update Requests received. ocnrf_nfUpdate_rx_requests_total NfUpdates Total

NrfLevel

NfInstanceId

RequesterNfType

UpdateType(Partial/Complete)

23 NFUpdates Responses Total Number of Update Responses sent. ocnrf_nfUpdate_tx_responses_total NfUpdates Responses Total

NrfLevel

NfInstanceId

RequesterNfType

UpdateType(Partial/Complete)

HttpStatusCode

24 NFUpdates Per Service Total Number of NfUpdates received and processed successfully per Service. ocnrf_nfUpdate_rx_requests_success_perService_total NFUpdates Per Service [ serviceName :- {{ serviceName }}, serviceInstanceId:- {{ServiceInstanceId}} ]

NrfLevel,

Updatetype =(Partial/Complete), NfInstanceId,

ServiceName,

ServiceInstanceId

25 Heartbeat Requests Total Number of Heartbeat Requests received ocnrf_nfHeartbeat_rx_requests_total  

NrfLevel

NfInstanceId

RequesterNfType

26 Heartbeat Resposnes Total Number of Heartbeat Responses sent ocnrf_nfHeartbeat_tx_responses_total  

Nrflevel,

NfInstanceId,

RequesterNfType ,

HttpStatusCode

27 NF De-Registration Requests Total Number of De-registration requests received ocnrf_nfDeregister_rx_requests_total  

NrfLevel,

NfInstanceId,

RequesterNfType

28 NF De-Registration Responses Total Number of De-registration responses sent ocnrf_nfDeregister_tx_responses_total  

NrfLevel,

NfInstanceId,

RequesterNfType ,

HttpStatusCode

29 NF De-Registrations Per Service Total Number of De-registration requests received and process successfully per Service ocnrf_nfDeregister_rx_requests_success_perService_total NFDeregistration Per Service [ serviceName :- {{ serviceName }}, serviceInstanceId:- {{ServiceInstanceId}} ]

NrfLevel,

ServiceName,

ServiceInstanceId,

NfInstanceId

30 NF List Retrieval Requests Total Number of NFListRetrieval requests received ocnrf_nfListRetrieval_rx_requests_total  

NrfLevel,

RequesterNfType

31 NF List Retrieval Responses Total Number of NFListRetrieval responses sent ocnrf_nfListRetrieval_tx_responses_total  

NrfLevel,

RequesterNfType ,

HttpStatusCode

32 NF Profile Retrieval Requests Total Number of NFProfileRetrieval requests received ocnrf_nfProfileRetrieval_rx_requests_total  

NrfLevel,

NfInstanceId

33 NF Profile Retrieval Responses Total Number of NFProfileRetrieval responses sent ocnrf_nfProfileRetrieval_tx_responses_total  

NrfLevel,

NfInstanceId,

HttpStatusCode

34 Number of Heartbeats missed Number of heartbeats missed. ocnrf_heartbeat_missed_total  

NrfLevel,

RequesterNfType ,

NfInstanceId

35 NF Status Subscribe Requests Total Number of NStatusSubscribe requests received ocnrf_nfStatusSubscribe_rx_requests_total  

NrfLevel,

RequesterNfType, OperationType

36 NF Status Subscribe Responses Total Number of NfStatusSubscribe responses sent ocnrf_nfStatusSubscribe_tx_responses_total  

NrfLevel,

RequesterNfType ,

HttpStatusCode, OperationType

37 NF Status UnSubscribe Requests Total Number of NfStatusUnsubscribe requests received ocnrf_nfStatusUnsubscribe_rx_requests_total  

NrfLevel,

RequesterNfType

38 NF Status UnSubscribe Responses Total Number of NfStatusUnsubscribe responses sent ocnrf_nfStatusUnsubscribe_tx_responses_total  

NrfLevel,

RequesterNfType,

HttpStatusCode

39 NF Status Notifications Requests Sent Number of NfStatusNotify requests sent ocnrf_nfStatusNotify_tx_requests_total  

NrfLevel,

NotificationEventType,

TargetNfType

40 NF Status Notifications Responses Received Number of NfStatusNotify responses received ocnrf_nfStatusNotify_rx_responses_total  

NrfLevel,

NotificationEventType,

TargetNfType,

HttpStatusCode

41 NF Status Notifications Requests Failed Number of NfStatusNotify requests failed to sent out ocnrf_nfStatusNotify_requests_failed_total  

NrfLevel,

NotificationEventType,

TargetNfType

42 NfDiscover Requests Total Number of NfDiscover Requests received ocnrf_nfDiscover_rx_requests_total NfDiscover Req [ TargetNf :- {{ TargetNfType }}, RequesterNfType :- {{RequesterNfType}} ]

NrfLevel,

TargetNfType,

RequesterNfType

43 NfDiscover Responses Total Number of NfDiscover responses sent ocnrf_nfDiscover_tx_responses_total  

NrfLevel,

TargetNfType,

RequesterNfType,

HttpResponseCode

44 NFDiscover Per Service Total Number of NfDiscover requests received and processed successfully per Service ocnrf_nfDiscover_rx_requests_success_perService_total NFDiscover Per Service [ serviceName :- {{ serviceName }} ]

NrfLevel,

RequesterNfType,

ServiceName

45 Discovered profiles Number of Profiles returned in discovery response. Depending on bucket size and corresponding value will tell how many profiles are returned in discovery response. ocnrf_nfDiscover_profiles_discovered_total Discovered profiles [ TargetNfType :- {{TargetNfType}}, Bucket :- {{ Bucket }} ]

NrfLevel,

TargetNfType,

BucketSize

NfFqdn

46 Active Registrations Number of active registered NFs at any point of time ocnrf_active_registrations_count Active Registrations [ NfType-{{ NfType }}, NrfLevel-{{ NrfLevel }} ]

NfType,

NrfLevel

47 Avg NRF Latency taken by NRF specific microservice Time taken by NRF specific microservice to process the service operation (NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval/NfHeartbeat/NfDiscover/NfSubscribe/NfUnsubscribe/NfAccessToken)

Note: Latency calculated by this metric doesn't include time taken by OCNRF API gateway.

ocnrf_message_processing_time_seconds Avg NRF Latency {{ ServiceOperation }} {{ RequesterNfType }} NrfLevel,RequesterNfType ,ServiceOperation
48 OCNRF database operations Database operation count corresponding to every service operation   ocnrf_dbmetric_total

Method,

DBOperation,

NrfLevel,

HttpStatusCode

49 Database operation round trip time Time (in microseconds) taken by database operation corresponding to every service operation

NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval/NfHeartbeat/NfDiscover/NfSubscribe/NfUnsubscribe/NfAccessToken)

ocnrf_dbmetrics_round_trip_time_seconds  
  • Method
  • DBOperation
  • ServiceOperation
  • TableName: (NRF Table Names)
  • NrfLevel
  • HttpStatusCode

In the above NRF Metrics table, 4xx and 5xx are the error codes in REST API.

Table 6-3 NF Screening specific metrics

Sl. No# Metric Name Metric Details Metric filter Service Operation Dimensions Notes
1 Total NF Requests for which Screening Failed The total number of requests for which screening failed against NF FQDN screening list. ocnrf_nfScreening_nfFqdn_requestFailed_total NFRegister, NFUpdate NRF level NF type See Note 1 below this table.
2 Total NF Requests Rejected due to Screening Failed The total number of requests rejected because screening failed against NF FQDN screening list. ocnrf_nfScreening_nfFqdn_requestRejected_total NFRegister, NFUpdate NRF level NF type See Note 1 below this table.
3 Total NF Requests for which Screening Failed The total number of requests for which screening failed against NF IP endpointscreening list. ocnrf_nfScreening_nfIpEndPoint_requestFailed_total NFRegister, NFUpdate NRF level NF type See Note 1 below this table.
4 Total NF Requests Rejected due to Screening Failed The total number of requests rejected because screening failed against NF IP endpoint screening list. ocnrf_nfScreening_nfIpEndPoint_requestRejected_total NFRegister, NFUpdate NRF level NF type See Note 1 below this table.
5 Total NF Requests for which Screening Failed The total number of requests for which screening failed against Callback URIscreening list. ocnrf_nfScreening_callbackUri_requestFailed_total NFRegister, NFUpdate, NFSubscribe NRF level NF type See Note 1 below this table.
6 Total NF Requests Rejected due to Screening Failed The total number of requests rejected because screening failed against Callback URI screening list. ocnrf_nfScreening_callbackUri_requestRejected_total NFRegister, NFUpdate, NFSubscribe NRF level NF type See Note 1 below this table.
7 Total NF Requests for which Screening Failed The total number of requests for which screening failed against PLMN idscreening list. ocnrf_nfScreening_plmnId_requestFailed_total NFRegister, NFUpdate NRF level NF type See Note 1 below this table.
8 Total NF Requests Rejected due to Screening Failed The total number of requests rejected because screening failed against PLMN id screening list. ocnrf_nfScreening_plmnId_requestRejected_total NFRegister, NFUpdate NRF level NF type See Note 1 below this table.
9 Total NF Requests for which Screening Failed The total number of NFRegister requests rejected as NF type was not allowed to register with NRF. ocnrf_nfScreening_nfTypeRegister_requestFailed_total NFRegister NRF level NF type See Note 1 below this table.
10 Total NF Requests Rejected due to Screening Failed The total number of NFRegister requests for which screening failed against NF type screening list. ocnrf_nfScreening_nfTypeRegister_requestRejected_total NFRegister NRF level NF type See Note 1 below this table.
11 NF Screening not applied Internal Error The total number of times screening not applied due to internal error. ocnrf_nfScreening_notApplied_InternalError_total NFRegister, NFUpdate, NFSubscribe NRF level NF type See Note 1 below this table.

Note:

In the above "NF Screening metrics" table, the dimension NF Type is a requester NF Type.

NF Access token metrics

Table 6-4 NF Access token metrics

Sl. No# Metric Name Metric Details Metric filter Service Operation Dimensions
1 NF Access Token Request Received Total The total number of access token requests received ocnrf_accessToken_rx_requests_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel
2 NF Access Token Responses Sent Total The total number of access token responses sent ocnrf_accessToken_tx_responses_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode
3 NF Access Token Request Rejected (ClientNotAuthorized) Number of access token request for which client authorized failed RejectionReason = ClientNotAuthorized ocnrf_accessToken_tx_rejected_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode RejectionReason = ClientNotAuthorized
4 NF Access Token Request Rejected (ProducerWithRequestedScopeNotFound) Number of access token not granted because of no producer instance registered for service/s in the scope RejectionReason = ProducerWithRequestedScopeNotFound ocnrf_accessToken_tx_rejected_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode RejectionReason = ProducerWithRequestedScopeNotFound
5 NF Access Token Request Rejected (ProducerWithRequestedNfInstanceIdNotFound) Number of access token not granted because of no producer instance registered for No producer instance is registered at all for provided target Instance Id in request. RejectionReason = ProducerWithRequestedNfInstanceIdNotFound ocnrf_accessToken_tx_rejected_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode

RejectionReason = ProducerWithRequestedNfInstanceIdNotFound

6 NF Access Token Request Rejected (InconsistentScope) Number of access token not granted because services in the scope belong to different NF types. RejectionReason = InconsistentScope ocnrf_accessToken_tx_rejected_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode

RejectionReason = InconsistentScope

7 NF Access Token Request Rejected (ConsumerNFTypeMismatch) Number of access token not granted because consumer NF type in profile is not matching with the access token request. RejectionReason = ConsumerNFTypeMismatch ocnrf_accessToken_tx_rejected_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode

RejectionReason = ConsumerNFTypeMismatch

8 NF Access Token Request Rejected (ProducerNFTypeMismatch) Number of access token not granted because producer NF type in profile is not matching with the access token request. RejectionReason = ProducerNFTypeMismatch ocnrf_accessToken_tx_rejected_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode

RejectionReason = ProducerNFTypeMismatch

9 NF Access Token Request Rejected (InternalError) Number of access token not granted because failure at NRF due to internal error. RejectionReason = InternalError ocnrf_accessToken_tx_rejected_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode

RejectionReason = ProducerNFTypeMismatch

10 NF Access Token Request Rejected (ConsumerNfTypeNotAllowed) Number of access token not granted because the consumer NFType is not allowed to access the requested NF. ocnrf_accessToken_tx_rejected_total AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode

RejectionReason = ConsumerNfTypeNotAllowed

11 NF Access Token Request Rejected (ConsumerPlmnNotAllowed) Number of access token not granted because the consumer NF PLMN is not allowed to access the requested NF. ocnrf_accessToken_tx_rejected_total AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode

RejectionReason = ConsumerPlmnNotAllowed

NRF-SLF specific metrics

Table 6-5 NRF-SLF specific metrics

Sl. No# Metric Name Metric Details Metric filter Service Operation Dimensions
1 Discover Request Received For SLF Total The total number of NF Discover request received for SLF ocnrf_nfDiscover_ForSLF_rx_requests_total NFDiscover TargetNfType, NRFLevel
2 Discover Response Sent For SLF Total The total number of NF Discover responses sent for SLF ocnrf_nfDiscover_ForSLF_tx_responses_total NFDiscover TargetNfType, NRFLevel, HttpStatusCode, RejectionReason Possible Reject reasons:- RejectionReason = SLFCommunicationFailure RejectionReason = MandatoryParamsMissing RejectionReason = SLFConfigurationMissing RejectionReason = GroupIdNotFound RejectionReason = ErrorFromSLF RejectionReason = InternalError RejectionReason= *NotApplicable *NotApplicable is applicable for 2xx Status code
3 SLF Query Requests Sent Total The total number of SLF query request sent ocnrf_SLF_tx_requests_total NFDiscover TargetNfType, NRFLevel, SubscriptionIdType
4 SLF Query Responses Received Total The total number of SLF query response received ocnrf_SLF_rx_responses_total NFDiscover TargetNfType, NRFLevel, SubscriptionIdType,HttpStatusCode, GroupId
5 SLF Round Trip Time Total Time (in microseconds) after sending query to SLF and getting response from SLF ocnrf_slf_round_trip_time_seconds NFDiscover

TargetNfType, SubscriptionIdType, HttpStatusCode, GroupId, NrfLevel, SLF ApiRoot

NRF Forwarding Metrics

Table 6-6 NRF Forwarding Metrics

Sl. No# Metric Name Metric Details Metric filter Service Operation Dimensions
1 NF Access Token Requests Forwarded Total The total number of Access Token Request forwarded to Primary/Secondary NRF ocnrf_forward_accessToken_tx_requests_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel
2 NF Access Token Forwarded Responses Total The total number of Access Token Responses for request forwarded to Primary/Secondary NRF ocnrf_forward_accessToken_rx_responses_total AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel,HttpStatusCode, RejectionReason RejectionReason:
  • InternalError
  • NRFCommunicationFailure
  • ErrorFromNRF
  • NRFForwardingConfigurationMissing
  • LoopDetected

*NotApplicable is applicable for 2xx Status code

3 NF Profile Retrieval Requests Forwarded Total The total number of Profile Retrieval Request forwarded to Primary/Secondary NRF ocnrf_forward_nfProfileRetrieval_tx_requests_total NFProfileRetrieval NrfLevel, NfInstanceId
4 NF Profile Retrieval Forwarded Responses Total The total number of Profile Retrieval Responses for Request forwarded to Primary/Secondary NRF ocnrf_forward_nfProfileRetrieval_rx_responses_total NFProfileRetrieval NrfLevel, NfInstanceId, HttpStatusCode, RejectionReason RejectionReason:
  • InternalError
  • NRFCommunicationFailure
  • ErrorFromNRF
  • NRFForwardingConfigurationMissing
  • LoopDetected

*NotApplicable is applicable for 2xx Status code

5 NF Status Subscribe Forwarded Requests Total The total number of Status Subscribe Request forwarded to Primary/Secondary NRF ocnrf_forward_nfStatusSubscribe_tx_requests_total NFStatusSubscribe, NFStatusUnsubscribe NrfLevel, RequesterNfType, OperationType
6 NF Status Subscribe Forwarded Responses Total The total number of Responses for Status Subscribe Request forwarded to Primary/Secondary NRF ocnrf_forward_nfStatusSubscribe_rx_responses_total NFStatusSubscribe, NFStatusUnsubscribe, NrfLevel, RequesterNfType, HttpStatusCode, OperationType, RejectionReason RejectionReason:
  • InternalError
  • NRFCommunicationFailure
  • ErrorFromNRF
  • NRFForwardingConfigurationMissing
  • LoopDetected

*NotApplicable is applicable for 2xx Status code

7 NF Discovery Forwarded Requests Total The total number of NF Discovery Request forwarded to Primary/Secondary NRF ocnrf_forward_nfDiscover_tx_requests_total NFDiscover NrfLevel, TargetNfType, RequesterNfType
8 NF Discovery Forwarded Responses Total The total number of Responses for NF Discovery Request forwarded to Primary/Secondary NRF ocnrf_forward_nfDiscover_rx_responses_total NFDiscover NrfLevel, TargetNfType, RequesterNfType, HttpResponseCode, RejectionReason RejectionReason:
  • InternalError
  • NrfCommunicationFailure
  • NrfForwardingConfigurationMissing
  • LoopDetected

ErrorFromNrf

*NotApplicable is applicable for 2xx Status code

9 Avg Latency for NRF Message Forwarding Time taken by NRF specific microservice to forward the message to other Primary/Secondary NRF with the service operation: (NFProfileRetrieval/NFDiscover/NFStatusSubscribe/NfStatusUnsubscribe/AccessToken) ocnrf_forward_round_trip_time_seconds NFStatusSubscribe, NFStatusUnsubscribe, NFProfileRetrieval, NFDiscover, AccessToken NrfLevel, RequesterNfType, ServiceOperation

GeoRedundancy metrics

Table 6-7 GeoRedundancy metrics

Sl. No# Metric Name Metric Details Metric filter Service Operation Dimensions
1. DB Replication status The current replication status of the db tier service. ocnrf_dbreplication_status NA NrfLevel,DbReplicationStatus
2. DB Replication down Time Time taken for the replication status to change from "INACTIVE" to "ACTIVE" ocnrf_dbreplication_down_time_seconds NA NrfLevel,DbReplicationDownStartTime,DbReplicationDownEndTime
3. Total NfInstances switched over from mated site The number of NFInstances that got switched over from the mated site. ocnrf_nf_switch_over_total NfRegister, NfUpdate,NfDeregister, NfHeartbeat NrfLevel, NfInstanceId,RemoteNrfInstanceId,ServiceOperation,OperationType
4. Total NfSubscriptions switched over from mated site The number of NfSubscriptions that got switched over from the mated site. ocnrf_nfSubscriptions_switch_over_total NfStatusSubscribe,NfStatusUnsubscribe, NrfAuditor NrfLevel,SubscriptionId,RemoteNrfInstanceId,ServiceOperation,OperationType
5. Total Nfinstances removed by OCNRF as it is stale The number of NfInstances that get deleted by the NrfAuditor when it detects a record to be stale. ocnrf_stale_nf_deleted_total NA

NrfLevel,

NfInstanceId,

NfStatus

6. Total NfSubscriptions removed by OCNRF as it is stale The number of NfSubscriptions that get deleted by the NrfAuditor when it detects a record to be stale. ocnrf_stale_nfSubscriptions_deleted_total NA NrfLevel,NfSubscriptionId,SubscriptionStatus
7. Total NfInstances that have been marked as SUSPENDED by the OCNRF Auditor The number of profiles that have been marked as SUSPENDED when a profile has missed nfHeartBeatMissAllowed. ocnrf_nf_suspended_total NA

NrfLevel,

NfInstanceId,

NfStatus,

HeartbeatTimer

8 Total NfSubscriptions whose validityTime has expired The number of NfSubscriptions whose validityTime has expired ocnrf_nfSubscriptions_expired_total NA NrfLevel,SubscriptionId

NF AccessToken Authorization Metrics

Table 6-8 NF AccessToken Authorization Metrics

Sl. No# Metric Name Metric Details Metric filter Service Operation Dimensions
1 NF Access Token Request Rejected (AuthScreeningFailed) Number of access token not granted because the consumer NF is not authorized to access the requested NF or its services. ocnrf_accessToken_tx_rejected_total NfAccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel,HttpStatusCode

RejectionReason = ClientNotAuthorized

NF Authentication Metrics

Table 6-9 NF Authentication Metrics

Sl. No# Metric Name Metric Details Metric filter Service Operation Dimensions
1 NF Authentication Failure Total The total number of request for which FQDN based Authentication failed at OCNRF ocnrf_nf_authentication_failure_total NrfLevel,

Method,

ServiceOperation,

NfFqdn,

TLSFqdn

NFAccessToken/NFRegistration/NFSubscription/NFDiscovery/NfListRetrieval/NfProfileRetrieval

For NfListRetrieval and NfProfileRetrieval serviceOperations NfFqdn is filled as NotApplicable.

If OC-XFCC-DNS header is not received at NRF Microservice then TLSFqdn is filled as "UNKNOWN"

OCNRF KPIs

This section includes information about KPIs for Oracle Communications Network Repository Function (OCNRF).

Note:

Sample OCNRF dashboard for Grafana is delivered to the customer through OCNRF Custom Templates. Metrics and functions used to achieve KPI are already covered in OCNRF Custom Templates.

Table 6-10 KPI Details

KPI Name KPI Details Metric used for KPI Service Operation Response code
OCNRF Ingress Request Rate of HTTP requests received at OCNRF Ingress Gateway oc_ingressgateway_http_requests_total All Not Applicable
NF Register Success   sum(irate(oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}[5m])) NFRegister 201
NF Update Success (Complete Replacement)   sum(irate(oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}[5m])) NFUpdate 200
NF DeRegister Success   sum(irate(oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="DELETE"}[5m])) NFDeregister 204
NF Subscribe Success   sum(irate(oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="POST"}[5m])) NFStatusSubscribe 201
NF Unsubscribe Success   sum(irate(oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="DELETE"}[5m])) NFStatusUnsubscribe 204
NF Discover Success   sum(irate(oc_ingressgateway_http_responses_total{Status=~"2.*",Route_path=~".*nnrf-disc/v1/nf-instances.*",Method="GET"}[5m])) NFDiscover 200
4xx Responses (NF-Instances)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"}[5m])) NFRegister/NFUpdate/NFDeregister 4xx
4xx Responses (Subscriptions)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"}[5m])) NFStatusSubscribe/NFStatusUnsubscribe 4xx
4xx Responses (Discovery)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"}[5m])) NFDiscover 4xx
5xx Responses (NF-Instances)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"}[5m])) NFRegister/NFUpdate/NFDeregister 5xx
5xx Responses (Subscriptions)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"}[5m])) NFStatusSubscribe/NFStatusUnsubscribe 5xx
5xx Responses (Discovery)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"}[5m])) NFDiscover 5xx

OCNRF Alerts

This section includes information about alerts for OCNRF.

Table 6-11 Alert Details

Alert Trigger Condition Severity Alert details provided OID Metric Used Resolution Notes
System Level Alerts              
OcnrfNfStatusUnavailable All the OCNRF services are unavailable, either because the OCNRF is getting deployed or purged. These OCNRF services considered are nfregistration, nfsubscription, nrfauditor, nrfconfiguration, nfaccesstoken, nfdiscovery, appinfo, ingressgateway and egressgateway Critical

description: 'OCNRF services unavailable'

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : All OCNRF services are unavailable.'

1.3.6.1.4.1.323.5.3.36.1.2.7016

'up'

Note: This is a prometheus metric used for instance availability monitoring.

If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared automatically when the OCNRF services start becoming available.

Steps:

  1. Check for service specific alerts.
  2. Refer the application logs on Kibana and check for database related failures like connectivity, invalid secrets etc. The logs can be filtered based on the services.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfPodsRestart A pod belonging to any of the OCNRF services have restarted. Major

description: 'Pod <Pod Name> has restarted.

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A Pod has restarted'

1.3.6.1.4.1.323.5.3.36.1.2.7017 'kube_pod_container_status_restarts_total'

Note: This is a kubernetes metric. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared automatically if the specific pod is up.

Steps:

  1. Refer the application logs on Kibana and filter based on pod name, check for database related failures like connectivity, kubernetes secrets etc.
  2. Check orchestration logs for liveness or readiness probe failures.
  3. In case the issue persists, contact My Oracle Support.
 
NnrfNFManagementServiceDown Either NFRegistration or NFSubscription or NrfAuditor services are unavailable. Critical

description: 'OCNRF Nnrf_Management service <nfregistration|nfsubscription|nrfauditor> is down'

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFManagement service is down'

1.3.6.1.4.1.323.5.3.36.1.2.7018 ''up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.
The alert is cleared when all the Nnrf_NFManagement services are available that is nfregistration, nfsubscription and nrfauditor.

Steps:

  1. Check if NfService specific alerts are generated to understand which service is down.
  2. Check the orchestration logs of nfregistration, nfsubscription and nrfauditor services and check for liveness or readiness probe failures.
  3. Refer the application logs on Kibana and filter based on above service names. Check for ERROR WARNING logs for each of these services.
  4. Refer the application logs on Kibana and filter the service appinfo, check for the service status of the nfregistration, nfsubscription and nrfauditor services.
  5. Depending on the failure reason, take the resolution steps.
  6. In case the issue persists, contact My Oracle Support.
 
NnrfAccessTokenServiceDown NFAccessToken service is unavailable. Critical

description: 'OCNRF Nnrf_NFAccessToken service nfaccesstoken is down'

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFAccessToken service down'

1.3.6.1.4.1.323.5.3.36.1.2.7020 ''up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available use the similar metric as exposed by the monitoring system.
The alert is cleared when the Nnrf_AccessToken service is available.

Steps:

  1. Check the orchestration logs of nfaccesstoken service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfaccesstoken service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
NnrfNFDiscoveryServiceDown NFDiscovery is unavailable. Critical

description: 'OCNRF Nnrf_NFDiscovery service nfdiscovery is down'

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFDiscovery service unavailable.'

1.3.6.1.4.1.323.5.3.36.1.2.7019 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the Nnrf_NFDiscovery service is available.

Steps:

  1. Check the orchestration logs of nfdiscovery service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfdiscovery service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfRegistrationServiceDown None of the pods of the NFRegistration microservice is available. Critical

description: 'OCNRF NFRegistration service nfregistration is down'

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFRegistration service is down''

1.3.6.1.4.1.323.5.3.36.1.2.7021 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the nfregistration service is available.

Steps:

  1. Check the orchestration logs of nfregistration service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfregistration service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfSubscriptionServiceDown None of the pods of the NFSubscription microservice is available. Critical

description: 'OCNRF NFSubscription service nfsubscription is down.

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFSubscription service is down'

1.3.6.1.4.1.323.5.3.36.1.2.7022 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.
The alert is cleared when the nfsubscription service is available.

Steps:

  1. Check the orchestration logs of nfsubscription service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfsubcription service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfDiscoveryServiceDown None of the pods of the NFDiscovery microservice is available. Critical

description: 'OCNRF NFDiscovery service nfdiscovery is down'

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFDiscovery service down'

1.3.6.1.4.1.323.5.3.36.1.2.7023 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.
The alert is cleared when the nfdiscovery service is available.

Steps:

  1. Check the orchestration logs of nfdiscovery service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfdiscovery service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfAccessTokenServiceDown None of the pods of the NFAccessToken microservice is available. Critical

description: 'OCNRF NFAccessToken service nfaccesstoken is down

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFAccesstoken service down'

1.3.6.1.4.1.323.5.3.36.1.2.7024 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.
The alert is cleared when the nfaccesstoken service is available.

Steps:

  1. Check the orchestration logs of nfaccesstoken service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfaccesstoken service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfAuditorServiceDown None of the pods of the NrfAuditor microservice is available. Critical description: 'OCNRF NrfAuditor service nrfauditor is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfAuditor service down' 1.3.6.1.4.1.323.5.3.36.1.2.7026 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the nrfauditor service is available.

Steps:

  1. Check the orchestration logs of nrfauditor service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nrfauditor service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfConfigurationServiceDown None of the pods of the NrfConfiguration microservice is available. Critical

description: 'OCNRF NrfConfiguration service nrfconfiguration is down'

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfConfiguration service down'

1.3.6.1.4.1.323.5.3.36.1.2.7025 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the nrfconfiguration service is available.

Steps:

  1. Check the orchestration logs of nrfconfiguration service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nrfconfiguration service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfAppInfoServiceDown None of the pods of the App Info microservice is available. Critical

description: 'OCNRF Appinfo service appinfo is down'

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Appinfo service down'

1.3.6.1.4.1.323.5.3.36.1.2.7027 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the app-info service is available.

Steps:

  1. Check the orchestration logs of appinfo service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfIngressGatewayServiceDown None of the pods of the Ingress-Gateway microservice is available. Critical

description: 'OCNRF Ingress-Gateway service ingressgateway is down.

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'

1.3.6.1.4.1.323.5.3.36.1.2.7028 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the ingressgateway service is available.

Steps:

  1. Check the orchestration logs of ingress-gateway service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on ingress-gateway service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfEgressGatewayServiceDown None of the pods of the Egress-Gateway microservice is available. Critical

description: 'OCNRF Egress-Gateway service egressgateway is down'

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Egress-Gateway service down'

1.3.6.1.4.1.323.5.3.36.1.2.7029 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the egressgateway service is available.

Note: The threshold is configurable in the alerts.yaml

Steps:

  1. Check the orchestration logs of egress-gateway service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on egress-gateway service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfMemoryUsageCrossedMinorThreshold A pod has reached the configured minor threshold( 50%) of its memory resource limits. Minor

description: 'OCNRF Memory Usage for pod <Pod name> has crossed the configured minor threshold (50 %) (value={{ $value }}) of its limit.'

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 50% of its limit.'

1.3.6.1.4.1.323.5.3.36.1.2.7030 'container_memory_usage_bytes''container_spec_memory_limit_bytes'

Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.
The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case OcnrfMemoryUsageCrossedMajorThreshold alert shall be raised.

Note: The threshold is configurable in the alerts.yaml

If guidance required, contact My Oracle Support.

 
OcnrfMemoryUsageCrossedMajorThreshold A pod has reached the configured major threshold( 60%) of its memory resource limits. Major

description: 'OCNRF Memory Usage for pod <Pod name> has crossed the major threshold(60%) (value = {{ $value }}) of its limit.'

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 60% of its limit.'

1.3.6.1.4.1.323.5.3.36.1.2.7031

'container_memory_usage_bytes'

'container_spec_memory_limit_bytes'

Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case OcnrfMemoryUsageCrossedCriticalThreshold alert shall be raised.

Note: The threshold is configurable in the alerts.yaml

If guidance required, contact My Oracle Support.

 
OcnrfMemoryUsageCrossedCriticalThreshold A pod has reached the configured critical threshold ( 70% ) of its memory resource limits. Critical

description: 'OCNRF Memory Usage for pod <Pod name> has crossed the configured critical threshold (70%) (value = {{ $value }}) of its limit.'

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.'

1.3.6.1.4.1.323.5.3.36.1.2.7032

'container_memory_usage_bytes'

'container_spec_memory_limit_bytes'

Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

The alert gets cleared when the memory utilization falls below the Critical Threshold.

Note: The threshold is configurable in the alerts.yaml

If guidance required, contact My Oracle Support.

 
OcnrfTotalIngressTrafficRateAboveMinorThreshold

The total OCNRF Ingress Message rate has crossed the configured minor threshold of 800 TPS.

Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate)

Minor

description: Total'Ingress traffic Rate is above configured minor threshold i.e. 800 requests per second (current value is: {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'

1.3.6.1.4.1.323.5.3.36.1.2.7001 'oc_ingressgateway_http_requests_total'

The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcnrfTotalIngressTrafficRateAboveMinorThreshold alert shall be raised.

Note: The threshold is configurable in the alerts.yaml

Steps:

Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable).

If this is unexpected, contact My Oracle Support.

 
OcnrfTotalIngressTrafficRateAboveMajorThreshold

The total OCNRF Ingress Message rate has crossed the configured major threshold of 900 TPS.

Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate)

Major

description: 'Total Ingress traffic Rate is above major threshold i.e. 900 requests per second (current value is: {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'

1.3.6.1.4.1.323.5.3.36.1.2.7002 'oc_ingressgateway_http_requests_total'

The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate cross the Critical threshold, in which case the OcnrfTotalIngressTrafficRateAboveCriticalThreshold Note: The threshold is configurable in the alerts.yaml alert shall be raised.

Steps:

Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable).

If this is unexpected, contact My Oracle Support.

 
OcnrfTotalIngressTrafficRateAboveCriticalThreshold

The total OCNRF Ingress Message rate has crossed the configured critical threshold of 950 TPS.

Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 95 % of 1000 (Maximum ingress request rate)

Critical

description: 'Total Ingress traffic Rate is above critical threshold i.e. 950 requests per second (current value is: {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 95 Percent of Max requests per second(1000)'

1.3.6.1.4.1.323.5.3.36.1.2.7003 'oc_ingressgateway_http_requests_total'

The alert is cleared when the Ingress Traffic rate falls below the Critical threshold.

Note: The threshold is configurable in the alerts.yaml

Steps:

Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable).

If this is unexpected, contact My Oracle Support.

 
OcnrfTransactionErrorRateAbove0.1Percent The number of failed transactions is above 0.1 percent of the total transactions. Warning

description: 'Transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 0.1 Percent of Total Transactions'

1.3.6.1.4.1.323.5.3.36.1.2.7004 'oc_ingressgateway_http_responses_total'

The alert is cleared when the number of failure transactions are below 0.1 percent of the total transactions or when the number of failure transactions cross the 1% threshold in which case the OcnrfTransactionErrorRateAbove1Percent shall be raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.

    for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. If guidance required, contact My Oracle Support.
 
OcnrfTransactionErrorRateAbove1Percent The number of failed transactions is above 1 percent of the total transactions. Warning description: 'Transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 1 Percent of Total Transactions' 1.3.6.1.4.1.323.5.3.36.1.2.7005 'oc_ingressgateway_http_responses_total'

The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.

    for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. If guidance required, contact My Oracle Support.
 
OcnrfTransactionErrorRateAbove10Percent The number of failed transactions has crossed the minor threshold of 10 percent of the total transactions. Minor

description: 'Transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 10 Percent of Total Transactions'

1.3.6.1.4.1.323.5.3.36.1.2.7006 'oc_ingressgateway_http_responses_total'

The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.

    for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. If guidance required, contact My Oracle Support.
 
OcnrfTransactionErrorRateAbove25Percent The number of failed transactions has crossed the minor threshold of 25 percent of the total transactions. Major

description: 'Transaction Error rate is above 25 Percent of Total Transactions (current value is {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 25 Percent of Total Transactions'
1.3.6.1.4.1.323.5.3.36.1.2.7007 'oc_ingressgateway_http_responses_total'

The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the OcnrfTransactionErrorRateAbove50Percent shall be raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.

    for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. If guidance required, contact My Oracle Support.
 
OcnrfTransactionErrorRateAbove50Percent The number of failed transactions has crossed the minor threshold of 50 percent of the total transactions. Critical

description: 'Transaction Error rate is above 50 Percent of Total Transactions (current value is {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 50 Percent of Total Transactions'
1.3.6.1.4.1.323.5.3.36.1.2.7008 'oc_ingressgateway_http_responses_total

The alert is cleared when the number of failure transactions are below 50 percent of the total transactions.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.

    for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. If guidance required, contact My Oracle Support.
 
OCNRF Application Alerts              
OcnrfRegisteredNFsBelowCriticalThreshold

The number of NFs currently registered with OCNRF is below the critical threshold.

Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is below 2.

Critical

description: 'The number of registered NFs detected below critical threshold (current value is: {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below critical threshold.'

1.3.6.1.4.1.323.5.3.36.1.2.7009 'ocnrf_active_registrations_count'

The alert is cleared when the number of registered NFs are above the critical threshold.

Steps:

No Action required. This is an information alert.

  1. Operator shall configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfRegisteredNFsBelowMajorThreshold

The number of NFs currently registered with OCNRF is below the major threshold.

Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 2 and less than below 10.

Major

description: 'The number of registered NFs detected below major threshold (current value is: {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below major threshold.'

1.3.6.1.4.1.323.5.3.36.1.2.7010 'ocnrf_active_registrations_count

The alert is cleared when the number of registered NFs are above the major threshold.

Steps:

No Action required. This is an information alert.

  1. Operator shall configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfRegisteredNFsBelowMinorThreshold

The number of NFs currently registered with OCNRF is below the minor threshold.

Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 10 and less than below 20.

Minor

description: 'The number of registered NFs detected below minor threshold (current value is: {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below minor threshold.'

1.3.6.1.4.1.323.5.3.36.1.2.7011 'ocnrf_active_registrations_count'

The alert is cleared when the number of registered NFs are above the minor threshold.

Steps:

No Action required. This is an information alert.

  1. Operator shall configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfRegisteredNFsBelowThreshold

The number of NFs currently registered with OCNRF is approaching minor threshold.

Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 20 and less than below 30.

Warning

description: 'The number of registered NFs is approaching minor threshold (current value is: {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs approaching minor threshold.'

1.3.6.1.4.1.323.5.3.36.1.2.7012 'ocnrf_active_registrations_count'

The alert is cleared when the number of registered NFs are approaching minor threshold.

Steps:

No Action required. This is an information alert.

  1. Operator shall configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfDbReplicationStatusInactive The db tier replication service status is inactive across the georedundant OCNRFs. Critical

description: 'The Database Replication Status is currently INACTIVE.'

summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, dbreplicationstatus: {{$labels.DbReplicationStatus}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The database replication status is INACTIVE.'

1.3.6.1.4.1.323.5.3.36.1.2.7013 'ocnrf_dbreplication_status' The alert is cleared when the dbtier replication services is active. The Alarm shall be included only if the Georedundancy feature is enabled.
OcnrfAccessTokenRequestsRejected OCNRF rejected an AccessToken Request

critical

warning

description: 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF.'

1.3.6.1.4.1.323.5.3.36.1.2.7014 'ocnrf_accessToken_tx_rejected_total' The alert is cleared automatically.

Steps:

The Rejection Reason shall be present in the alert.

In case the RejectionReason is AuthScreeningFailed/ClientNotAuthorized, either the configurations need to be reevaluated or check the consumer NF that has requested for unauthorized token.

For other reason, follow the RejectionReason.

 
OcnrfNfAuthenticationFailureRequestsRejected OCNRF rejected a service request due to NF authentication failure

critical

warning

description: 'Service request(s) received from NF have been rejected by OCNRF (current value is: {{ $value }})'

summary: 'namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Request rejected for Nf FQDN based Authentication failure.'

1.3.6.1.4.1.323.5.3.36.1.2.7015 'ocnrf_nf_authentication_failure_total' The alert is cleared automatically.

Steps:

No Action required for OCNRF. This is an information alert. The Rejection Reason shall be present in the alert
 

OCNRF Alert Configuration

This section describes the Measurement based Alert rules configuration for OCNRF. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

Note:

  • Alert file is packaged with OCNRF custom templates. The OCNRF templates.zip file can be downloaded from OHC. Unzip the OCNRF templates.zip file to get NrfAlertRules.yaml file.
  • Review the NrfAlertRules.yaml file and edit the value of the parameters in the NrfAlertRules.yaml file (if needed to be changed from default values) before configuring the alerts. See below table for details.
  • kubernetes_namespace is configured as kubernetes namespace in which NRF is deployed. Default value is OCNRF. Please update the NrfAlertRules.yaml file to reflect the correct OCNRF kubernetes namespace.
Alert details which can be updated in NrfAlertRules.yaml file before configuration

Table 6-12 Alerts

Alert Name Details Default Value Notes
OcnrfTotalIngressTrafficRateAboveMinorThreshold Traffic Rate is above 80 Percent of Max requests per second Greater than/equal to 800 and Less than 900

Maximum Ingress rate considered is 1000 requests per second.

So, here in default value 800 is 80% of 1000 and 900 is 90% of 1000.

For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 80% of Max Ingress Request Rate] for this alert

OcnrfTotalIngressTrafficRateAboveMajorThreshold Traffic Rate is above 90 Percent of Max requests per second Greater than/equal to 900 and Less than 950

Maximum Ingress rate considered is 1000 requests per second.

So, here in default value 900 is 90% of 1000 and 950 is 95% of 1000.

For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 95% of Max Ingress Request Rate] for this alert
OcnrfTotalIngressTrafficRateAboveCriticalThreshold Traffic Rate is above 95 Percent of Max requests per second Greater than/equal to 950

Maximum Ingress rate considered is 1000 requests per second.

So, here in default value 950 is 95% of 1000.

For example, if value need to be updated then depending upon maximum ingress request rate, set [ 95% of Max Ingress Request Rate] for this alert

OCNRF Alert configuration in Prometheus

This section describes the measurement based Alert rules configuration for OCNRF in Prometheus. Please use the NrfAlertRules.yaml file updated in OCNRF Alert configuration section.

_NAME_ :- Helm Release of Prometheus

_Namespace_ :- Kubernetes NameSpace in which Prometheus is installed

  1. Take Backup of current configuration map of Prometheus:
    kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml
  2. Check and add OCNRF Alert file name inside Prometheus configuration map:
    sed -i '/etc\/config\/alertsnrf/d' /tmp/tempConfig.yaml
    sed -i '/rule_files:/a\  \- /etc/config/alertsnrf' /tmp/tempConfig.yaml
  3. Update configuration map with updated file name of OCNRF alert file:
    kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
  4. Add OCNRF Alert rules in configuration map under file name of OCNRF alert file:
    kubectl patch configmap _NAME_-server -n _Namespace_--type merge --patch
    "$(cat ~/NrfAlertrules.yaml)"

Note:

The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 60 seconds. Refresh the Prometheus GUI to confirm that the OCNRF Alerts have been reloaded.

Disable OCNRF Alert in Prometheus

Steps to disable Alerts in Prometheus:
  1. Edit NrfAlertrules.yaml file to remove specific alert:

    Sample alert content from NrfAlertrules.yaml is below. This is to provide idea of a specific alert details in NrfAlertrules.yaml which need to be disabled.

    ## ALERT SAMPLE START##
          - alert: OcnrfTrafficRateAboveMinorThreshold
            annotations:
              description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})'
              summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)'
            expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900
            labels:
              severity: Minor
    ## ALERT SAMPLE END##
  2. Remove specific alert content which need to be disabled.
  3. Perform Alert configuration again. See OCNRF Alert configuration in Prometheus section above for detailed steps.

Disabling Alerts

This section explains the procedure to disable the alerts in OCNRF.
  1. Edit NrfAlertrules.yaml file to remove specific alert.
  2. Remove complete content of the specific alert from the NrfAlertrules.yaml file.
    For example: If you want to remove OcnrfTrafficRateAboveMinorThreshold alert, remove the complete content:
    ## ALERT SAMPLE START##
    
          - alert: OcnrfTrafficRateAboveMinorThreshold
            annotations:
              description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})'
              summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)'
            expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900
            labels:
              severity: Minor
    ## ALERT SAMPLE END##
  3. Perform Alert configuration. See OCNRF Alert Configuration section above for details.

Configuring SNMP Notifier

This section describes the procedure to configuring SNMP Notifier.

Configure and Validate Alerts in Prometheus Server

Refer to OCNRF Alert Configuration section for procedure to configure the alerts.

Validating Alerts

After configuring the alerts in Prometheus server, a user can verify that by following steps:

  • Open the Prometheus server from your browser using the <IP>:<Port>
  • Navigate to Status and then Rules
  • Search Ocnrf. OcnrfAlerts list is displayed.

    Note:

    If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try again.
Configuring SNMP-Notifier
Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using the following procedure:
  1. Execute the following command to edit the deployment:
    kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>

    Example:

    $ kubectl edit deploy occne-snmp-notifier -n occne-infra
  2. Edit the destination as follows:
    --snmp.destination=<destination_ip>:<destination_port>

    Example:

    --snmp.destination=10.75.203.94:162
Checking SNMP Traps
Following is an example on how to capture the logs of the trap receiver server to view the generated SNMP traps:
$ docker logs <trapd_container_id>
Sample output:
2020-04-29 15:34:24 10.75.203.103 [UDP: [10.75.203.103]:2747->[172.17.0.4]:162]:DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (158510800) 18 days, 8:18:28.00        SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003    SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.1 = STRING: "1.3.6.1.4.1.323.5.3.36.1.2.7003[]"  SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.2 = STRING: "critical"      SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.3 = STRING: "Status: critical- Alert: OcnrfActiveSubscribersBelowCriticalThreshold  Summary: namespace: ocnrf, nftype:5G_EIR, nrflevel:6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c, podname: ocnrf-nrfauditor-6b459f5db5-4kvt4,
        timestamp: 2020-04-29 15:33:24.408 +0000 UTC: Current number of registered NFs detected below critical threshold.  Description: The number of registered NFs detected below critical threshold (current value
          is: 0)
MIB Files for OCNRF

There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.

  • OCNRF-MIB-TC-1.8.0.mib

    This is considered as OCNRF top level mib file, where the Objects and their data types are defined.

  • OCNRF-MIB-1.8.0.mib

    This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.

Note:

MIB files are packaged along with OCNRF Custom Templates. Download the file from OHC. Refer to OCNRF Installation and Upgrade guide for more details.