10 OCNRF Metrics, KPIs, and Alerts

OCNRF Metrics

This section includes information about metrics for Oracle Communications Network Repository Function.

Note:

Sample OCNRF dashboard for Grafana is delivered to the customer through OCNRF Custom Templates. Metrics and functions used to achieve KPI are covered in OCNRF Custom Templates. Refer to Oracle Help Center site for the information about OCNRF Custom Templates.

Dimensions Legend for the Metrics

The following table includes the details about the metrics dimensions:

Table 10-1 Dimensions Legend

Dimension Details
application application name, here it is ocnrf
vendor For OCNRF, vendor is Oracle
Method HTTP Method Name. For example: PUT, GET
Status HTTP Status Code in response
Uri URI defined to identify the Service Operation at Ingress Gateway
Node Name of the kubernetes worker node on which microservice is running
NrfLevel OCNRF Deployment Name by which OCNRF can be identified, it will be OCNRF Instance Id passed through helm
NfType Types of Network Functions (NF)
NfInstanceId Unique identity of the NF Instance sending request to OCNRF
HttpStatusCode HTTP Status Code
ServiceName Name of the service instance (for example: "nudm-sdm")
ServiceInstanceId Unique ID of the service instance within a given NF Instance
UpdateType(Partial/Complete) NF Update with PUT (Complete) or PATCH (Partial) methods
OperationType Dimension is for NFSubscribe Service operation to tell if the request is to create or update the subscription
NotificationEventType This dimension indicates subscription request is for which event types. For example: NF_REGISTERED, NF_DEREGISTERED and NF_PROFILE_CHANGED
TargetNfType Dimension indicates request is for which target NF type
RequesterNfType Dimension indicates the NF type which originating the request. This value comes from UserAgent header. For NFDiscover Service operation it is taken from Search Query.

In case no header or value, this value will be UNKNOWN in the metrics

TargetNfInstanceId Dimension indicates the target NF Instance Id for NF Access Token
ClientNfInstanceId Dimension indicates the client NF Instance Id for NF Access Token
RejectionReason Dimension indicates the rejection reason for NF Access Token
SubscriptionIdType Dimension indicates the Subscription Id type for which SLF query is received
GroupId Dimension indicates the GroupId returned by SLF/UDR corresponding to SubscriptionId
BucketSize Dimension indicates how many profiles are returned in the response of Discovery request. Range is not configurable. Possible values are 0-10, +Inf. According to NF profiles returned, corresponding bucket will be incremented by one. For example, if 2 profiles are returned, then bucket 2 will be incremented by one. Profiles getting returned more than 10 will fall in +Inf bucket.
DBOperation Create,update,delete and find
TableName OCNRF Table Name
SubscriptionStatus Status of subscription shall be 'SUBSCRIBED', 'SUSPENDED' or 'UNSUBSCRIBED'
DbReplicationStatus "ACTIVE" or "INACTIVE"
RemoteNrfInstanceId Remote OCNRF Instance Id
HeartbeatTimer The heartbeatTimer of the NfProfile. The value is considered in seconds.
TLSFqdn FQDN received in TLS Certificate
NfFqdn

FQDN of consumer NF.

This dimension will only be available if the service mesh sends the consumer NF FQDN in XFCC header, otherwise this value will be UNKNOWN in the metrics

ServiceOperation Service operations as defined in 3gpp specification for NRF
Scope Scope as received in the AccessToken Request
ResponseReason Response Reason in Response sent back to NF
SubscriptionId Subscription Id generated by OCNRF for NFStatusSubscribe Service Operation
NFType

Used in Gateway metrics.

NF Type extracted from URI. Path is /nxxx-yyy/vz/.......

Where xxx will be changed to (Upper Case) is NFType

UNKNOWN if unable to extract NFType from the path

Example: nnrf-nfm/v1/nf-instances

NFServiceType

Used in Gateway metrics.

NF Type extracted from URI. Path is /nxxx-yyy/vz/.......

Where nxxx-yyy is NFServiceType

UNKNOWN if unable to extract NFServiceType from the path

Example: nnrf-nfm/v1/nf-instances

Host

Used in Gateway metrics.

(Ip or fqdn): port of gateway

HttpVersion Http protocol version - HTTP/1.1, HTTP/2.0
Scheme Http protocol scheme - HTTP, HTTPS, UNKNOWN
ClientCertIdentity

Used in Gateway metrics.

Certificate Identity of the client, SAN=127.0.0.1,localhost CN=localhost, N/A if data is not available

Route_Path

Used in Gateway metrics.

Path predicate/Header predicate that matched the current request

InstanceIdentifier

Used in Gateway metrics.

Prefix of the pod configured in helm when there are multiple instances in same deployment- Prefix configured in helm otherwise UNKNOWN

ErrorOriginator

Used in Gateway metrics.

This tag captures the ErrorOriginator - ServiceProducer, Nrf, IngressGW, None

Direction

Used in Gateway metrics.

Direction of the request or response. egress, egressOut

error_reason

Reason for failure response received. If message is sent in the response, then it is filled with the message otherwise exception class is filled. In case of successful response it is filled with "no-error".

Examples: error_reason="no_error" (In case successful response is received), error_reason="java.nio.channels.ClosedChannelException", error_reason="unable to find valid certification path to requested target"

KeyId Key Id from Access Token Configuration used to sign the Access Token
KeyType Key type of Access Token Configuration (private key or certificate)
isCurrentKeyId True or False, when specific metric is for current key id in Access Token Configuration.

OCNRF Gateways Metrics

Table 10-2 OCNRF Gateways Metrics

Metric Name Metric Details Metric filter Metric Type Dimensions
Total number of ingress requests Total number of requests received at OCNRF oc_ingressgateway_http_requests_total Counter

Method

NFType

NFServiceType

Host

HttpVersion

Scheme

Route_path

InstanceIdentifier

ClientCertIdentity

NF Register Success Total number of successful NFRegister service operations at OCNRF oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"} Counter

Method

NFType

NFServiceType

Host

HttpVersion

Scheme

Route_path

InstanceIdentifier

ClientCertIdentity

NF Update Success (Complete Replacement) Total number of successful NFUpdate service operations at OCNRF oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

NF Update Success (Partial Replacement) Total number of successful NFUpdate service operations at OCNRF oc_ingressgateway_http_responses_total{Status=~".*2.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PATCH"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

NF List/Profile Retrieval Success Total number of successful NF List/Profile retrieval service operations at OCNRF oc_ingressgateway_http_responses_total{Status=~".*2.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="GET"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

Access Token Success Total number of successful Access Token service operations at OCNRF oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*/oauth2/token*."} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

NF De-register Success Total number of successful service operations at OCNRF oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="DELETE"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

NF Subscribe Success Total number of successful NFStatusSubscribe service operations at OCNRF oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="POST"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

NF Unsubscribe Success Total number of successful NFStatusUnSubscribe service operations at OCNRF oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="DELETE"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

NF Discover Success Total number of successful NFDiscover service operations at OCNRF oc_ingressgateway_http_responses_total{Status=~"2.*",Route_path=~".*nnrf-disc/v1/nf-instances.*",Method="GET"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

4xx Responses (NF-Instances) Total number of 4xx responses (NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval) oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

4xx Responses (Subscriptions) Total number of 4xx responses (NFStatusSubscribe/NFStatusUnSubscribe) oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

4xx Responses (Discovery) Total number of 4xx responses (NfDiscover) oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

4xx Responses (AccessToken) Total number of 4xx responses (NfAccessToken) oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*oauth2/token.*"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

5xx Responses (NF-Instances) Total number of 5xx responses (NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval) oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

5xx Responses (Subscriptions) Total number of 5xx responses (NFStatusSubscribe/NFStatusUnSubscribe) oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

5xx Responses (Discovery) Total number of 5xx responses (NfDiscover) oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

Identifier

ClientCertIdentity

5xx Responses (AccessToken) Total number of 5xx responses (NfAccessToken) oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*oauth2/token.*"} Counter

Status

Method

Route_path

NFType

NFServiceType

Host

HttpVersion

Scheme

InstanceIdentifier

ClientCertIdentity

Avg NRF Latency

Time (in microseconds) to process an ingress request. Measured from when the request is received to when the response is sent

oc_ingressgateway_request_latency_seconds Timer

quantile

InstanceIdentifier

Connection Failures Ingress Gateway

Metric to capture the connection failures when connect to the destination service fails. Here in case of Ingress gateway, the destination service will be a backend microservice of the NF.

TLS connection failure metrics when connecting to ingress.

oc_ingressgateway_connection_failure_total Counter

Host

Port

InstanceIdentifier

error_reason

Ingress Gateway Request Processing Latency Metric to capture the amount of time taken for processing of the request only within Ingress gateway. oc_ingressgateway_request_processing_latency_seconds Timer

quantile

InstanceIdentifier

Total number of Egress requests Metric to capture the request count reaches the Egress gateway from the application microservice and pegs with Direction as egress. Also, when the request goes out of egress gateway and pegs with Direction as egressOut. oc_egressgateway_http_requests_total Counter

Method

NFType

NFServiceType

Host

HttpVersion

Scheme

Proxy

InstanceIdentifier

Direction

Total number of Egress responses Metric to capture when Egress gateway sends response back to backend NF microservice and pegs with Direction as egress. Also, when the response is received Egress gateway and pegs with Direction as egressOut. oc_egressgateway_http_responses_total Counter

Status

Method

NFType

NFServiceType

Host

HttpVersion

Scheme

InstanceIdentifier

Direction

Connection Failures Egress Gateway Metric to capture failure while connecting the backend microservice and destination service oc_egressgateway_connection_failure_total Counter

Host

Port

InstanceIdentifier

Direction

error_reason

Egress Gateway Request Processing Latency Metric captures the amount of time taken for processing of the request only within Egress gateway. oc_egressgateway_request_processing_latency_seconds Timer

quantile

InstanceIdentifier

OCNRF NF Metrics

Table 10-3 OCNRF NF Metrics

Metric Name Metric Details Metric filter Metric Type Recommended legend to see dimension level data (as applicable) Dimensions
NfRegistrations Total Number of Registration Requests received ocnrf_nfRegister_rx_requests_total Counter NfRegistrations Total NrfLevel, NfInstanceId, RequesterNfType, NfFqdn
NfRegistrations Responses Total Number of Registration Responses sent. ocnrf_nfRegister_tx_responses_total Counter NfRegistrations Responses Total NrfLevel, NfInstanceId, RequesterNfType, HttpStatusCode, NfFqdn
NfRegistrations Per Service Total Number of Registrations received and processed successfully per Service. ocnrf_nfRegister_rx_requests_success_perService_total Counter NfRegistrations Per Service [ serviceName :- {{ serviceName }}, nfInstanceId :- {{NfInstanceId}} ] NrfLevel, NfInstanceId, ServiceName, ServiceInstanceId, NfFqdn
NFUpdates Total Number of Update Requests received. ocnrf_nfUpdate_rx_requests_total Counter NfUpdates Total NrfLevel, NfInstanceId RequesterNfType, UpdateType (Partial/Complete), HttpStatusCode, NfFqdn
NFUpdates Responses Total Number of Update Responses sent. ocnrf_nfUpdate_tx_responses_total Counter NfUpdates Responses Total NrfLevel, NfInstanceId, RequesterNfType, UpdateType (Partial/Complete), HttpStatusCode, NfFqdn
NFUpdates Per Service Total Number of NfUpdates received and processed successfully per Service. ocnrf_nfUpdate_rx_requests_success_perService_total Counter NFUpdates Per Service [ serviceName :- {{ serviceName }}, serviceInstanceId:- {{ServiceInstanceId}} ] NrfLevel, Updatetype =(Partial/Complete), NfInstanceId, ServiceName, ServiceInstanceId, NfFqdn
Heartbeat Requests Total Number of Heartbeat Requests received ocnrf_nfHeartbeat_rx_requests_total Counter NrfLevel, NfInstanceId, RequesterNfType, NfFqdn
Heartbeat Responses Total Number of Heartbeat Responses sent ocnrf_nfHeartbeat_tx_responses_total Counter Nrflevel, NfInstanceId, RequesterNfType, HttpStatusCode, NfFqdn
NF De-Registration Requests Total Number of De-registration requests received ocnrf_nfDeregister_rx_requests_total Counter NrfLevel, NfInstanceId, RequesterNfType, NfFqdn
NF De-Registration Responses Total Number of De-registration responses sent ocnrf_nfDeregister_tx_responses_total Counter NrfLevel, NfInstanceId, RequesterNfType, HttpStatusCode, NfFqdn
NF De-Registrations Per Service Total Number of De-registration requests received and process successfully per Service ocnrf_nfDeregister_rx_requests_success_perService_total Counter NFDeregistration Per Service [ serviceName :- {{ serviceName }}, serviceInstanceId:- {{ServiceInstanceId}} ] NrfLevel, ServiceName, ServiceInstanceId, NfInstanceId, NfFqdn
NF List Retrieval Requests Total Number of NFListRetrieval requests received ocnrf_nfListRetrieval_rx_requests_total Counter NrfLevel, RequesterNfType, NfFqdn
NF List Retrieval Responses Total Number of NFListRetrieval responses sent ocnrf_nfListRetrieval_tx_responses_total Counter NrfLevel, RequesterNfType, HttpStatusCode, NfFqdn
NF Profile Retrieval Requests Total Number of NFProfileRetrieval requests received ocnrf_nfProfileRetrieval_rx_requests_total Counter NrfLevel, NfInstanceId, NfFqdn
NF Profile Retrieval Responses Total Number of NFProfileRetrieval responses sent ocnrf_nfProfileRetrieval_tx_responses_total Counter NrfLevel, NfInstanceId, HttpStatusCode, NfFqdn
Number of Heartbeats missed Number of heartbeats missed. ocnrf_heartbeat_missed_total Counter NrfLevel, NfType, NfInstanceId, NfFqdn
NF Status Subscribe Requests Total Number of NStatusSubscribe requests received ocnrf_nfStatusSubscribe_rx_requests_total Counter NrfLevel, RequesterNfType, OperationType, NfFqdn
NF Status Subscribe Responses Total Number of NfStatusSubscribe responses sent ocnrf_nfStatusSubscribe_tx_responses_total Counter NrfLevel, RequesterNfType, HttpStatusCode, OperationType, NfFqdn
NF Status UnSubscribe Requests Total Number of NfStatusUnsubscribe requests received ocnrf_nfStatusUnsubscribe_rx_requests_total Counter NrfLevel, RequesterNfType, NfFqdn
NF Status UnSubscribe Responses Total Number of NfStatusUnsubscribe responses sent ocnrf_nfStatusUnsubscribe_tx_responses_total Counter NrfLevel, RequesterNfType, HttpStatusCode, NfFqdn
NF Status Notifications Requests Sent Number of NfStatusNotify requests sent ocnrf_nfStatusNotify_tx_requests_total Counter NrfLevel, NotificationEventType, TargetNfType, NfFqdn, SubscriptionId
NF Status Notifications Responses Received Number of NfStatusNotify responses received ocnrf_nfStatusNotify_rx_responses_total Counter NrfLevel, NotificationEventType, TargetNfType, HttpStatusCode, NfFqdn, SubscriptionId
NF Status Notifications Requests Failed Number of NfStatusNotify requests failed to sent out ocnrf_nfStatusNotify_requests_failed_total Counter NrfLevel, NotificationEventType, TargetNfType, NfFqdn,SubscriptionId
NfDiscover Requests Total Number of NfDiscover Requests received ocnrf_nfDiscover_rx_requests_total Counter NfDiscover Req [ TargetNf :- {{ TargetNfType }}, RequesterNfType :- {{RequesterNfType}} ] NrfLevel, TargetNfType, RequesterNfType, NfFqdn
NfDiscover Responses Total Number of NfDiscover responses sent ocnrf_nfDiscover_tx_responses_total Counter NrfLevel, TargetNfType, RequesterNfType, HttpResponseCode, NfFqdn
NFDiscover Per Service Total Number of NfDiscover requests received and processed successfully per Service ocnrf_nfDiscover_rx_requests_success_perService_total Counter NFDiscover Per Service [ serviceName :- {{ serviceName }} ] NrfLevel, RequesterNfType, ServiceName, NfFqdn
Discovered profiles Number of Profiles returned in discovery response. Depending on bucket size and corresponding value will tell how many profiles are returned in discovery response. ocnrf_nfDiscover_profiles_discovered_total Counter Discovered profiles [ TargetNfType :- {{TargetNfType}}, Bucket :- {{ Bucket }} ] NrfLevel, TargetNfType, BucketSize, NfFqdn
Active Registrations Number of active registered NFs at any point of time ocnrf_active_registrations_count Gauge Active Registrations [ NfType-{{ NfType }}, NrfLevel-{{ NrfLevel }} ] NfType, NrfLevel
Avg NRF Latency taken by NRF specific microservice Time taken by NRF specific microservice to process the service operation (NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval/NfHeartbeat/NfDiscover/NFStatusSubscribe/NFStatusUnSubscribe/NfAccessToken) Note: Latency calculated by this metric doesn't include time taken by OCNRF API gateway. ocnrf_message_processing_time_seconds Timer Avg NRF Latency {{ ServiceOperation }} {{ RequesterNfType }} NrfLevel, RequesterNfType, ServiceOperation
OCNRF database operations Database operation count corresponding to every service operation ocnrf_dbmetric_total Counter  

Method,

DBOperation,

NrfLevel,

HttpStatusCode

Database operation round trip time

Time (in microseconds) taken by database operation corresponding to every service operation

NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval/NfHeartbeat/NfDiscover/NFStatusSubscribe/NFStatusUnSubscribe/NfAccessToken)

ocnrf_dbmetrics_round_trip_time_seconds Timer   Method, DBOperation, ServiceOperation, TableName: (NRF Table Names), NrfLevel, HttpStatusCode

NF Screening Metrics

Table 10-4 NF Screening metrics

Metric Name Metric Details Metric filter Metric Type Service Operation Dimensions
Total NF Requests for which Screening Failed The total number of requests for which screening failed against NF FQDN screening list. ocnrf_nfScreening_nfFqdn_requestFailed_total Counter NFRegister, NFUpdate NRF level, NF type, NfFqdn
Total NF Requests Rejected due to Screening Failed The total number of requests rejected because screening failed against NF FQDN screening list. ocnrf_nfScreening_nfFqdn_requestRejected_total Counter NFRegister, NFUpdate NRF level, NF type, NfFqdn
Total NF Requests for which Screening Failed The total number of requests for which screening failed against NF IP endpoint screening list. ocnrf_nfScreening_nfIpEndPoint_requestFailed_total Counter NFRegister, NFUpdate NRF level, NF type, NfFqdn
Total NF Requests Rejected due to Screening Failed The total number of requests rejected because screening failed against NF IP endpoint screening list. ocnrf_nfScreening_nfIpEndPoint_requestRejected_total Counter NFRegister, NFUpdate NRF level NF type NfFqdn
Total NF Requests for which Screening Failed The total number of requests for which screening failed against Callback URI screening list. ocnrf_nfScreening_callbackUri_requestFailed_total Counter NFRegister, NFUpdate, NFStatusSubscribe NRF level, NF type, NfFqdn
Total NF Requests Rejected due to Screening Failed The total number of requests rejected because screening failed against Callback URI screening list. ocnrf_nfScreening_callbackUri_requestRejected_total Counter NFRegister, NFUpdate, NFStatusSubscribe NRF level, NF type, NfFqdn
Total NF Requests for which Screening Failed The total number of requests for which screening failed against PLMN id screening list. ocnrf_nfScreening_plmnId_requestFailed_total Counter NFRegister, NFUpdate NRF level NF type NfFqdn
Total NF Requests Rejected due to Screening Failed The total number of requests rejected because screening failed against PLMN id screening list. ocnrf_nfScreening_plmnId_requestRejected_total Counter NFRegister, NFUpdate NRF level, NF type, NfFqdn
Total NF Requests for which Screening Failed The total number of NFRegister requests rejected as NF type was not allowed to register with NRF. ocnrf_nfScreening_nfTypeRegister_requestFailed_total Counter NFRegister NRF level, NF type, NfFqdn
Total NF Requests Rejected due to Screening Failed The total number of NFRegister requests for which screening failed against NF type screening list. ocnrf_nfScreening_nfTypeRegister_requestRejected_total Counter NFRegister NRF level, NF type, NfFqdn
NF Screening not applied Internal Error The total number of times screening not applied due to internal error. ocnrf_nfScreening_notApplied_InternalError_total Counter NFRegister, NFUpdate, NFStatusSubscribe NRF level, NF type, NfFqdn

Note:

In the above "NF Screening metrics" table, the dimension NF Type is a requester NF Type.

NF Access token Metrics

Table 10-5 NF Access token metrics

Metric Name Metric Details Metric filter Metric Type Service Operation Dimensions
NF Access Token Request Received Total The total number of access token requests received ocnrf_accessToken_rx_requests_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, NfFqdn
NF Access Token Responses Sent Total The total number of access token responses sent ocnrf_accessToken_tx_responses_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType
NF Access Token Request Rejected (ClientNotAuthorized) Number of access token request for which client authorized failed. ocnrf_accessToken_tx_rejected_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = ClientNotAuthorized

NF Access Token Request Rejected (ProducerWithRequestedScopeNotFound) Number of access token not granted because of no producer instance registered for service/s in the scope. ocnrf_accessToken_tx_rejected_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = ProducerWithRequestedScopeNotFound

NF Access Token Request Rejected (ProducerWithRequestedNfInstanceIdNotFound) Number of access token not granted because of no producer instance registered for No producer instance is registered at all for provided target Instance Id in request. ocnrf_accessToken_tx_rejected_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = ProducerWithRequestedNfInstanceIdNotFound

NF Access Token Request Rejected (InconsistentScope) Number of access token not granted because services in the scope belong to different NF types. ocnrf_accessToken_tx_rejected_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = InconsistentScope

NF Access Token Request Rejected (ConsumerNFTypeMismatch) Number of access token not granted because consumer NF type in profile is not matching with the access token request. ocnrf_accessToken_tx_rejected_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = ConsumerNFTypeMismatch

NF Access Token Request Rejected (ProducerNFTypeMismatch) Number of access token not granted because producer NF type in profile is not matching with the access token request. ocnrf_accessToken_tx_rejected_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = ProducerNFTypeMismatch

NF Access Token Request Rejected (InternalError) Number of access token not granted because failure at NRF due to internal error. ocnrf_accessToken_tx_rejected_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = ProducerNFTypeMismatch

NF Access Token Request Rejected (ConsumerNfTypeNotAllowed) Number of access token not granted because the consumer NFType is not allowed to access the requested NF. ocnrf_accessToken_tx_rejected_total Counter AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = ConsumerNfTypeNotAllowed

NF Access Token Request Rejected (ConsumerPlmnNotAllowed) Number of access token not granted because the consumer NF PLMN is not allowed to access the requested NF. ocnrf_accessToken_tx_rejected_total Counter AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = ConsumerPlmnNotAllowed

NF Access Token Request Rejected

(SecretNotAccessible)

Number of access token not granted because the secret for current key id is not accessible. ocnrf_accessToken_tx_rejected_total Counter AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = SecretNotAccessible

NF Access Token Request Rejected

(InvalidFileData)

Number of access token not granted because the current key id file data is invalid. ocnrf_accessToken_tx_rejected_total Counter AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = InvalidFileData

NF Access Token Request Rejected

(NamespaceNotAccessible)

Number of access token not granted because the namspace for current key id is not accessible. ocnrf_accessToken_tx_rejected_total Counter AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = NamespaceNotAccessible

NF Access Token Request Rejected

(FileNotFound)

Number of access token not granted because the file not found in secrets. ocnrf_accessToken_tx_rejected_total Counter AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = FileNotFound

NF Access Token Request Rejected

(CurrentKeyIdNotConfigured)

Number of access token not granted because the current key id is not configured. ocnrf_accessToken_tx_rejected_total Counter AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = CurrentKeyIdNotConfigured

NF Access Token Request Rejected

(ExpiredCertificate)

Number of access token not granted because the OCNRF certificate is expired. ocnrf_accessToken_tx_rejected_total Counter AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = ExpiredCertificate

NF Access Token Request Rejected (BadRequest) Number of access token not granted because the Request is incorrect. ocnrf_accessToken_tx_rejected_total Counter AccessToken

TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType

RejectionReason = BadRequest

NRF Configuration Metrics

Table 10-6 NRF Configuration Metrics

Metric Name Metric Details Metric Filter Metric Type Service Operation Dimensions
OCNRF Oauth Token Signing Keys Health Status Oauth Token Signing keys health status ocnrf_oauth_keyData_healthStatus

Value 0 - Healthy

Value 1 - Unhealthy

Gauge Configuration KeyId, KeyType, isCurrentKeyId, NrfLevel
OCNRF Oauth Current KeyId Configuration Status Oauth Current Key Id Configuration Status ocnrf_oauth_currentKeyId_configuredStatus

Value 0 - Healthy

Value 1 - Unhealthy

Gauge Configuration NrfLevel
OCNRF Oauth Token Signing Keys Expiry Status Oauth Token Signing keys Expiry Status

ocnrf_oauth_keyData_expiryStatus

(Value is expiry time in epoch time)

Gauge Configuration

KeyId, isCurrentKeyId,

NrfLevel

NRF-SLF Metrics

Table 10-7 NRF-SLF metrics

Metric Name Metric Details Metric filter Metric Type Service Operation Dimensions
Discover Request Received For SLF Total The total number of NF Discover request received for SLF ocnrf_nfDiscover_ForSLF_rx_requests_total Counter NFDiscover TargetNfType, NRFLevel, NfFqdn
Discover Response Sent For SLF Total The total number of NF Discover responses sent for SLF ocnrf_nfDiscover_ForSLF_tx_responses_total Counter NFDiscover TargetNfType, NRFLevel, HttpStatusCode, ResponseReason, NfFqdn

Possible Response Reasons:

ResponseReason = SLFCommunicationFailure

ResponseReason = MandatoryParamsMissing

ResponseReason = SLFSubscriberNotProvisioned

ResponseReason = ErrorFromSLF

ResponseReason = InternalError

ResponseReason = SuccessFromSLF

ResponseReason = GroupIdUsedFromSearchQuery

SLF Query Requests Sent Total The total number of SLF query request sent ocnrf_SLF_tx_requests_total Counter NFDiscover TargetNfType, NRFLevel, SubscriptionIdType, NfFqdn
SLF Query Responses Received Total The total number of SLF query response received ocnrf_SLF_rx_responses_total Counter NFDiscover TargetNfType, NRFLevel, SubscriptionIdType,HttpStatusCode, GroupId, NfFqdn
SLF Round Trip Time Total Time (in microseconds) after sending query to SLF and getting response from SLF ocnrf_slf_round_trip_time_seconds Timer NFDiscover

TargetNfType, SubscriptionIdType, HttpStatusCode, GroupId, NrfLevel, SLF ApiRoot, NfFqdn

NRF Forwarding Metrics

Table 10-8 NRF Forwarding Metrics

Metric Name Metric Details Metric filter Metric Type Service Operation Dimensions
NF Access Token Requests Forwarded Total The total number of Access Token Request forwarded to Primary/Secondary NRF ocnrf_forward_accessToken_tx_requests_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, NfFqdn
NF Access Token Forwarded Responses Total The total number of Access Token Responses for request forwarded to Primary/Secondary NRF ocnrf_forward_accessToken_rx_responses_total Counter AccessToken TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel,HttpStatusCode, RejectionReason, NfFqdn RejectionReason:
  • InternalError
  • NRFCommunicationFailure
  • ErrorFromNRF
  • NRFForwardingConfigurationMissing
  • LoopDetected

*NotApplicable is applicable for 2xx Status code

NF Profile Retrieval Requests Forwarded Total The total number of Profile Retrieval Request forwarded to Primary/Secondary NRF ocnrf_forward_nfProfileRetrieval_tx_requests_total Counter NFProfileRetrieval NrfLevel, NfInstanceId, NfFqdn
NF Profile Retrieval Forwarded Responses Total The total number of Profile Retrieval Responses for Request forwarded to Primary/Secondary NRF ocnrf_forward_nfProfileRetrieval_rx_responses_total Counter NFProfileRetrieval NrfLevel, NfInstanceId, HttpStatusCode, RejectionReason, NfFqdn RejectionReason:
  • InternalError
  • NRFCommunicationFailure
  • ErrorFromNRF
  • NRFForwardingConfigurationMissing
  • LoopDetected

*NotApplicable is applicable for 2xx Status code

NF Status Subscribe Forwarded Requests Total The total number of Status Subscribe Request forwarded to Primary/Secondary NRF ocnrf_forward_nfStatusSubscribe_tx_requests_total Counter NFStatusSubscribe, NFStatusUnsubscribe NrfLevel, RequesterNfType, OperationType, NfFqdn
NF Status Subscribe Forwarded Responses Total The total number of Responses for Status Subscribe Request forwarded to Primary/Secondary NRF ocnrf_forward_nfStatusSubscribe_rx_responses_total Counter NFStatusSubscribe, NFStatusUnsubscribe, NrfLevel, RequesterNfType, HttpStatusCode, OperationType, RejectionReason, NfFqdn RejectionReason:
  • InternalError
  • NRFCommunicationFailure
  • ErrorFromNRF
  • NRFForwardingConfigurationMissing
  • LoopDetected

*NotApplicable is applicable for 2xx Status code

NF Discovery Forwarded Requests Total The total number of NF Discovery Request forwarded to Primary/Secondary NRF ocnrf_forward_nfDiscover_tx_requests_total Counter NFDiscover NrfLevel, TargetNfType, RequesterNfType, NfFqdn
NF Discovery Forwarded Responses Total The total number of Responses for NF Discovery Request forwarded to Primary/Secondary NRF ocnrf_forward_nfDiscover_rx_responses_total Counter NFDiscover NrfLevel, TargetNfType, RequesterNfType, HttpResponseCode, RejectionReason, NfFqdn RejectionReason:
  • InternalError
  • NrfCommunicationFailure
  • NrfForwardingConfigurationMissing
  • LoopDetected

ErrorFromNrf

*NotApplicable is applicable for 2xx Status code

Avg Latency for NRF Message Forwarding Time taken by NRF specific microservice to forward the message to other Primary/Secondary NRF with the service operation: (NFProfileRetrieval/NFDiscover/NFStatusSubscribe/NfStatusUnsubscribe/AccessToken) ocnrf_forward_round_trip_time_seconds Timer NFStatusSubscribe, NFStatusUnsubscribe, NFProfileRetrieval, NFDiscover, AccessToken NrfLevel, RequesterNfType, ServiceOperation, NfFqdn

GeoRedundancy metrics

Table 10-9 GeoRedundancy metrics

Metric Name Metric Details Metric filter Metric Type Service Operation Dimensions
DB Replication status The current replication status of the DBTier service. This metric is pegged only if the GeoRedundancy Feature is enabled. ocnrf_dbreplication_status Gauge NA NrfLevel, DbReplicationStatus
DB Replication down Time Time taken for the replication status to change from "INACTIVE" to "ACTIVE". This metric is pegged only if the GeoRedundancy Feature is enabled. ocnrf_dbreplication_down_time_seconds Timer NA NrfLevel,DbReplicationDownStartTime,DbReplicationDownEndTime
Total NfInstances switched over from mated site The number of NFInstances that got switched over from the mated site. ocnrf_nf_switch_over_total Counter NfRegister, NfUpdate,NfDeregister, NfHeartbeat NrfLevel, NfInstanceId,RemoteNrfInstanceId,ServiceOperation,OperationType, NfFqdn
Total NfSubscriptions switched over from mated site The number of NfSubscriptions that got switched over from the mated site. ocnrf_nfSubscriptions_switch_over_total Counter NfStatusSubscribe,NfStatusUnsubscribe, NrfAuditor NrfLevel,SubscriptionId,RemoteNrfInstanceId,ServiceOperation,OperationType
Total Nfinstances removed by OCNRF as it is stale The number of NfInstances that get deleted by the NrfAuditor when it detects a record to be stale. ocnrf_stale_nf_deleted_total Counter NA NrfLevel, NfInstanceId, NfStatus, NfFqdn
Total NfSubscriptions removed by OCNRF as it is stale The number of NfSubscriptions that get deleted by the NrfAuditor when it detects a record to be stale. ocnrf_stale_nfSubscriptions_deleted_total Counter NA NrfLevel,NfSubscriptionId,SubscriptionStatus
Total NfInstances that have been marked as SUSPENDED by the OCNRF Auditor The number of profiles that have been marked as SUSPENDED when a profile has missed nfHeartBeatMissAllowed. ocnrf_nf_suspended_total Counter NA NrfLevel, NfInstanceId,NfStatus, HeartbeatTimer, NfFqdn
Total NfSubscriptions whose validityTime has expired The number of NfSubscriptions whose validityTime has expired ocnrf_nfSubscriptions_expired_total Counter   NrfLevel, SubscriptionId

NF AccessToken Authorization Metrics

Table 10-10 NF AccessToken Authorization Metrics

Metric Name Metric Details Metric filter Metric Type Service Operation Dimensions
NF Access Token Request Rejected (AuthScreeningFailed) Number of access token not granted because the consumer NF is not authorized to access the requested NF or its services. ocnrf_accessToken_tx_rejected_total Counter NfAccessToken TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName Scope, NrfLevel, NfFqdn, HttpStatusCode

RejectionReason = ClientNotAuthorized

NF Authentication Metrics

Table 10-11 NF Authentication Metrics

Metric Name Metric Details Metric filter Metric Type Service Operation Dimensions
NF Authentication Failure Total The total number of request for which FQDN based Authentication failed at OCNRF ocnrf_nf_authentication_failure_total Counter NrfLevel,

Method,

ServiceOperation,

NfFqdn,

TLSFqdn

NFAccessToken/NFRegistration/NFSubscription/NFDiscovery/NfListRetrieval/NfProfileRetrieval

For NfListRetrieval and NfProfileRetrieval serviceOperations NfFqdn is filled as NotApplicable.

If OC-XFCC-DNS header is not received at NRF Microservice then TLSFqdn is filled as "UNKNOWN".

OCNRF KPIs

This section includes information about KPIs for Oracle Communications Network Repository Function (OCNRF).

Note:

Sample OCNRF dashboard for Grafana is delivered to the customer through OCNRF Custom Templates. Metrics and functions used to achieve KPI are already covered in OCNRF Custom Templates.

Table 10-12 KPI Details

KPI Name KPI Details Metric used for KPI Service Operation Response code
OCNRF Ingress Request Rate of HTTP requests received at OCNRF Ingress Gateway oc_ingressgateway_http_requests_total All Not Applicable
NF Register Success  
sum(irate(oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}[5m]))
sum(irate(oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}[5m]))
NFRegister 201
NF Update Success (Complete Replacement)   sum(irate(oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}[5m])) NFUpdate 200
NF DeRegister Success   sum(irate(oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="DELETE"}[5m])) NFDeregister 204
NF Subscribe Success   sum(irate(oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="POST"}[5m])) NFStatusSubscribe 201
NF Unsubscribe Success   sum(irate(oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="DELETE"}[5m])) NFStatusUnsubscribe 204
NF Discover Success   sum(irate(oc_ingressgateway_http_responses_total{Status=~"2.*",Route_path=~".*nnrf-disc/v1/nf-instances.*",Method="GET"}[5m])) NFDiscover 200
4xx Responses (NF-Instances)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"}[5m])) NFRegister/NFUpdate/NFDeregister 4xx
4xx Responses (Subscriptions)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"}[5m])) NFStatusSubscribe/NFStatusUnsubscribe 4xx
4xx Responses (Discovery)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"}[5m])) NFDiscover 4xx
5xx Responses (NF-Instances)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"}[5m])) NFRegister/NFUpdate/NFDeregister 5xx
5xx Responses (Subscriptions)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"}[5m])) NFStatusSubscribe/NFStatusUnsubscribe 5xx
5xx Responses (Discovery)   sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"}[5m])) NFDiscover 5xx

OCNRF Alerts

This section includes information about alerts for OCNRF.

Table 10-13 Alert Details

Alert Trigger Condition Severity Alert details provided OID Metric Used Resolution Notes
System Level Alerts              
OcnrfNfStatusUnavailable All the OCNRF services are unavailable, either because the OCNRF is getting deployed or purged. These OCNRF services considered are nfregistration, nfsubscription, nrfauditor, nrfconfiguration, nfaccesstoken, nfdiscovery, appinfo, ingressgateway and egressgateway Critical

description: 'OCNRF services unavailable'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : All OCNRF services are unavailable.'

1.3.6.1.4.1.323.5.3.36.1.2.7016

'up'

Note: This is a prometheus metric used for instance availability monitoring.

If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared automatically when the OCNRF services start becoming available.

Steps:

  1. Check for service specific alerts.
  2. Refer the application logs on Kibana and check for database related failures like connectivity, invalid secrets etc. The logs can be filtered based on the services.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfPodsRestart A pod belonging to any of the OCNRF services have restarted. Major

description: 'Pod <Pod Name> has restarted.

summary: 'kubernetes_namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A Pod has restarted'

1.3.6.1.4.1.323.5.3.36.1.2.7017 'kube_pod_container_status_restarts_total'

Note: This is a kubernetes metric. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared automatically if the specific pod is up.

Steps:

  1. Refer the application logs on Kibana and filter based on pod name, check for database related failures like connectivity, kubernetes secrets etc.
  2. Check orchestration logs for liveness or readiness probe failures.
  3. In case the issue persists, contact My Oracle Support.
 
NnrfNFManagementServiceDown Either NFRegistration or NFSubscription or NrfAuditor services are unavailable. Critical

description: 'OCNRF Nnrf_Management service <nfregistration|nfsubscription|nrfauditor> is down'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFManagement service is down'

1.3.6.1.4.1.323.5.3.36.1.2.7018 ''up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.
The alert is cleared when all the Nnrf_NFManagement services are available that is nfregistration, nfsubscription and nrfauditor.

Steps:

  1. Check if NfService specific alerts are generated to understand which service is down.
  2. Check the orchestration logs of nfregistration, nfsubscription and nrfauditor services and check for liveness or readiness probe failures.
  3. Refer the application logs on Kibana and filter based on above service names. Check for ERROR WARNING logs for each of these services.
  4. Refer the application logs on Kibana and filter the service appinfo, check for the service status of the nfregistration, nfsubscription and nrfauditor services.
  5. Depending on the failure reason, take the resolution steps.
  6. In case the issue persists, contact My Oracle Support.
 
NnrfAccessTokenServiceDown NFAccessToken service is unavailable. Critical

description: 'OCNRF Nnrf_NFAccessToken service nfaccesstoken is down'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFAccessToken service down'

1.3.6.1.4.1.323.5.3.36.1.2.7020 ''up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available use the similar metric as exposed by the monitoring system.
The alert is cleared when the Nnrf_AccessToken service is available.

Steps:

  1. Check the orchestration logs of nfaccesstoken service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfaccesstoken service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
NnrfNFDiscoveryServiceDown NFDiscovery is unavailable. Critical

description: 'OCNRF Nnrf_NFDiscovery service nfdiscovery is down'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFDiscovery service down'

1.3.6.1.4.1.323.5.3.36.1.2.7019 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the Nnrf_NFDiscovery service is available.

Steps:

  1. Check the orchestration logs of nfdiscovery service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfdiscovery service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfRegistrationServiceDown None of the pods of the NFRegistration microservice is available. Critical

description: 'OCNRF NFRegistration service nfregistration is down'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFRegistration service is down'

1.3.6.1.4.1.323.5.3.36.1.2.7021 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the nfregistration service is available.

Steps:

  1. Check the orchestration logs of nfregistration service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfregistration service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfSubscriptionServiceDown None of the pods of the NFSubscription microservice is available. Critical

description: 'OCNRF NFSubscription service nfsubscription is down.

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFSubscription service is down'

1.3.6.1.4.1.323.5.3.36.1.2.7022 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.
The alert is cleared when the nfsubscription service is available.

Steps:

  1. Check the orchestration logs of nfsubscription service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfsubcription service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfDiscoveryServiceDown None of the pods of the NFDiscovery microservice is available. Critical

description: 'OCNRF NFDiscovery service nfdiscovery is down'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFDiscovery service down'

1.3.6.1.4.1.323.5.3.36.1.2.7023 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.
The alert is cleared when the nfdiscovery service is available.

Steps:

  1. Check the orchestration logs of nfdiscovery service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfdiscovery service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfAccessTokenServiceDown None of the pods of the NFAccessToken microservice is available. Critical

description: 'OCNRF NFAccessToken service nfaccesstoken is down

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFAccesstoken service down'

1.3.6.1.4.1.323.5.3.36.1.2.7024 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.
The alert is cleared when the nfaccesstoken service is available.

Steps:

  1. Check the orchestration logs of nfaccesstoken service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nfaccesstoken service names. Check for ERROR WARNING logs.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfAuditorServiceDown None of the pods of the NrfAuditor microservice is available. Critical description: 'OCNRF NrfAuditor service nrfauditor is down' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfAuditor service down' 1.3.6.1.4.1.323.5.3.36.1.2.7026 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the nrfauditor service is available.

Steps:

  1. Check the orchestration logs of nrfauditor service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nrfauditor service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfConfigurationServiceDown None of the pods of the NrfConfiguration microservice is available. Critical

description: 'OCNRF NrfConfiguration service nrfconfiguration is down'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfConfiguration service down'

1.3.6.1.4.1.323.5.3.36.1.2.7025 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the nrfconfiguration service is available.

Steps:

  1. Check the orchestration logs of nrfconfiguration service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on nrfconfiguration service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfAppInfoServiceDown None of the pods of the App Info microservice is available. Critical

description: 'OCNRF Appinfo service appinfo is down'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Appinfo service down'

1.3.6.1.4.1.323.5.3.36.1.2.7027 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the app-info service is available.

Steps:

  1. Check the orchestration logs of appinfo service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfIngressGatewayServiceDown None of the pods of the Ingress-Gateway microservice is available. Critical

description: 'OCNRF Ingress-Gateway service ingressgateway is down.

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'

1.3.6.1.4.1.323.5.3.36.1.2.7028 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the ingressgateway service is available.

Steps:

  1. Check the orchestration logs of ingress-gateway service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on ingress-gateway service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfEgressGatewayServiceDown None of the pods of the Egress-Gateway microservice is available. Critical

description: 'OCNRF Egress-Gateway service egressgateway is down'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Egress-Gateway service down'

1.3.6.1.4.1.323.5.3.36.1.2.7029 'up'

Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

The alert is cleared when the egressgateway service is available.

Note: The threshold is configurable in the alerts.yaml

Steps:

  1. Check the orchestration logs of egress-gateway service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on egress-gateway service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.
 
OcnrfMemoryUsageCrossedMinorThreshold A pod has reached the configured minor threshold( 50%) of its memory resource limits. Minor

description: 'OCNRF Memory Usage for pod <Pod name> has crossed the configured minor threshold (50 %) (value={{ $value }}) of its limit.'

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 50% of its limit.'

1.3.6.1.4.1.323.5.3.36.1.2.7030 'container_memory_usage_bytes''container_spec_memory_limit_bytes'

Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.
The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case OcnrfMemoryUsageCrossedMajorThreshold alert shall be raised.

Note: The threshold is configurable in the alerts.yaml

If guidance required, contact My Oracle Support.

 
OcnrfMemoryUsageCrossedMajorThreshold A pod has reached the configured major threshold( 60%) of its memory resource limits. Major

description: 'OCNRF Memory Usage for pod <Pod name> has crossed the major threshold(60%) (value = {{ $value }}) of its limit.'

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 60% of its limit.'

1.3.6.1.4.1.323.5.3.36.1.2.7031

'container_memory_usage_bytes'

'container_spec_memory_limit_bytes'

Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case OcnrfMemoryUsageCrossedCriticalThreshold alert shall be raised.

Note: The threshold is configurable in the alerts.yaml

If guidance required, contact My Oracle Support.

 
OcnrfMemoryUsageCrossedCriticalThreshold A pod has reached the configured critical threshold ( 70% ) of its memory resource limits. Critical

description: 'OCNRF Memory Usage for pod <Pod name> has crossed the configured critical threshold (70%) (value = {{ $value }}) of its limit.'

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.'

1.3.6.1.4.1.323.5.3.36.1.2.7032

'container_memory_usage_bytes'

'container_spec_memory_limit_bytes'

Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

The alert gets cleared when the memory utilization falls below the Critical Threshold.

Note: The threshold is configurable in the alerts.yaml

If guidance required, contact My Oracle Support.

 
OcnrfTotalIngressTrafficRateAboveMinorThreshold

The total OCNRF Ingress Message rate has crossed the configured minor threshold of 800 TPS.

Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate)

Minor

description: Total'Ingress traffic Rate is above configured minor threshold i.e. 800 requests per second (current value is: {{ $value }})'

summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'

1.3.6.1.4.1.323.5.3.36.1.2.7001 'oc_ingressgateway_http_requests_total'

The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcnrfTotalIngressTrafficRateAboveMinorThreshold alert shall be raised.

Note: The threshold is configurable in the alerts.yaml

Steps:

Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable).

If this is unexpected, contact My Oracle Support.

 
OcnrfTotalIngressTrafficRateAboveMajorThreshold

The total OCNRF Ingress Message rate has crossed the configured major threshold of 900 TPS.

Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate)

Major

description: 'Total Ingress traffic Rate is above major threshold i.e. 900 requests per second (current value is: {{ $value }})'

summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'

1.3.6.1.4.1.323.5.3.36.1.2.7002 'oc_ingressgateway_http_requests_total'

The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate cross the Critical threshold, in which case the OcnrfTotalIngressTrafficRateAboveCriticalThreshold Note: The threshold is configurable in the alerts.yaml alert shall be raised.

Steps:

Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable).

If this is unexpected, contact My Oracle Support.

 
OcnrfTotalIngressTrafficRateAboveCriticalThreshold

The total OCNRF Ingress Message rate has crossed the configured critical threshold of 950 TPS.

Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 95 % of 1000 (Maximum ingress request rate)

Critical

description: 'Total Ingress traffic Rate is above critical threshold i.e. 950 requests per second (current value is: {{ $value }})'

summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 95 Percent of Max requests per second(1000)'

1.3.6.1.4.1.323.5.3.36.1.2.7003 'oc_ingressgateway_http_requests_total'

The alert is cleared when the Ingress Traffic rate falls below the Critical threshold.

Note: The threshold is configurable in the alerts.yaml

Steps:

Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable).

If this is unexpected, contact My Oracle Support.

 
OcnrfTransactionErrorRateAbove0.1Percent The number of failed transactions is above 0.1 percent of the total transactions. Warning

description: 'Transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})'

summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 0.1 Percent of Total Transactions'

1.3.6.1.4.1.323.5.3.36.1.2.7004 'oc_ingressgateway_http_responses_total'

The alert is cleared when the number of failure transactions are below 0.1 percent of the total transactions or when the number of failure transactions cross the 1% threshold in which case the OcnrfTransactionErrorRateAbove1Percent shall be raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.

    for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. If guidance required, contact My Oracle Support.
 
OcnrfTransactionErrorRateAbove1Percent The number of failed transactions is above 1 percent of the total transactions. Warning description: 'Transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 1 Percent of Total Transactions' 1.3.6.1.4.1.323.5.3.36.1.2.7005 'oc_ingressgateway_http_responses_total'

The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.

    for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. If guidance required, contact My Oracle Support.
 
OcnrfTransactionErrorRateAbove10Percent The number of failed transactions has crossed the minor threshold of 10 percent of the total transactions. Minor

description: 'Transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'

summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 10 Percent of Total Transactions'

1.3.6.1.4.1.323.5.3.36.1.2.7006 'oc_ingressgateway_http_responses_total'

The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.

    for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. If guidance required, contact My Oracle Support.
 
OcnrfTransactionErrorRateAbove25Percent The number of failed transactions has crossed the minor threshold of 25 percent of the total transactions. Major

description: 'Transaction Error rate is above 25 Percent of Total Transactions (current value is {{ $value }})'

summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 25 Percent of Total Transactions'
1.3.6.1.4.1.323.5.3.36.1.2.7007 'oc_ingressgateway_http_responses_total'

The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the OcnrfTransactionErrorRateAbove50Percent shall be raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.

    for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. If guidance required, contact My Oracle Support.
 
OcnrfTransactionErrorRateAbove50Percent The number of failed transactions has crossed the minor threshold of 50 percent of the total transactions. Critical

description: 'Transaction Error rate is above 50 Percent of Total Transactions (current value is {{ $value }})'

summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 50 Percent of Total Transactions'
1.3.6.1.4.1.323.5.3.36.1.2.7008 'oc_ingressgateway_http_responses_total

The alert is cleared when the number of failure transactions are below 50 percent of the total transactions.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.

    for example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. If guidance required, contact My Oracle Support.
 
OCNRF Application Alerts              
OcnrfRegisteredNFsBelowCriticalThreshold

The number of NFs currently registered with OCNRF is below the critical threshold.

Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is below 2.

Critical

description: 'The number of registered NFs detected below critical threshold (current value is: {{ $value }})'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.RequesterNfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below critical threshold.'

1.3.6.1.4.1.323.5.3.36.1.2.7009 'ocnrf_active_registrations_count'

The alert is cleared when the number of registered NFs are above the critical threshold.

Steps:

No Action required. This is an information alert.

  1. Operator shall configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfRegisteredNFsBelowMajorThreshold

The number of NFs currently registered with OCNRF is below the major threshold.

Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 2 and less than below 10.

Major

description: 'The number of registered NFs detected below major threshold (current value is: {{ $value }})'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below major threshold.'

1.3.6.1.4.1.323.5.3.36.1.2.7010 'ocnrf_active_registrations_count

The alert is cleared when the number of registered NFs are above the major threshold.

Steps:

No Action required. This is an information alert.

  1. Operator shall configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfRegisteredNFsBelowMinorThreshold

The number of NFs currently registered with OCNRF is below the minor threshold.

Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 10 and less than below 20.

Minor

description: 'The number of registered NFs detected below minor threshold (current value is: {{ $value }})'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below minor threshold.'

1.3.6.1.4.1.323.5.3.36.1.2.7011 'ocnrf_active_registrations_count'

The alert is cleared when the number of registered NFs are above the minor threshold.

Steps:

No Action required. This is an information alert.

  1. Operator shall configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfRegisteredNFsBelowThreshold

The number of NFs currently registered with OCNRF is approaching minor threshold.

Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 20 and less than below 30.

Warning

description: 'The number of registered NFs is approaching minor threshold (current value is: {{ $value }})'

summary:'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs approaching minor threshold.'

1.3.6.1.4.1.323.5.3.36.1.2.7012 'ocnrf_active_registrations_count'

The alert is cleared when the number of registered NFs are approaching minor threshold.

Steps:

No Action required. This is an information alert.

  1. Operator shall configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' shall not be considered as registered.
OcnrfDbReplicationStatusInactive The db tier replication service status is inactive across the georedundant OCNRFs. The Alarm is raised/cleared only if the Georedundancy feature is enabled. Critical

description: 'The Database Replication Status is currently INACTIVE.'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, dbreplicationstatus: {{$labels.DbReplicationStatus}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The database replication status is INACTIVE.'

1.3.6.1.4.1.323.5.3.36.1.2.7013 'ocnrf_dbreplication_status' The alert is cleared when the dbtier replication services is active. The Alarm shall be included only if the Georedundancy feature is enabled.
OcnrfAccessTokenRequestsRejected OCNRF rejected an AccessToken Request

Warning

description: 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF.'

1.3.6.1.4.1.323.5.3.36.1.2.7014 'ocnrf_accessToken_tx_rejected_total' The alert is cleared automatically.

Steps:

The Rejection Reason shall be present in the alert.

In case the RejectionReason is AuthScreeningFailed/ClientNotAuthorized, either the configurations need to be reevaluated or check the consumer NF that has requested for unauthorized token.

For other reason, follow the RejectionReason.

 
OcnrfNfAuthenticationFailureRequestsRejected OCNRF rejected a service request due to NF authentication failure

Warning

description: 'Service request(s) received from NF have been rejected by OCNRF (current value is: {{ $value }})'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Request rejected for Nf FQDN based Authentication failure.'

1.3.6.1.4.1.323.5.3.36.1.2.7015 'ocnrf_nf_authentication_failure_total' The alert is cleared automatically.

Steps:

No Action required for OCNRF. This is an information alert. The Response Reason shall be present in the alert
 
OcnrfAccessTokenCurrentKeyIdNotConfigured OCNRF Access Token Rejected due to CurrentKeyId not configured Critical

description: 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF as Current Key Id is not configured.'

1.3.6.1.4.1.323.5.3.36.1.2.7033 ocnrf_accessToken_tx_rejected_total The alert is cleared automatically as this will be raised when OCNRF recieves Access Token Request and at that point Current Key Id is not selected.  
OcnrfAccessTokenCurrentKeyIdInvalidDetails OCNRF Access Token Rejected due to token signign details correspondign to CurrentKeyId are invalid Critical

description: 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyType: {{$labels.KeyType}}, RejectionReason: {{$labels.RejectionReason}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF as CurrentKeyId details are invalid.'

1.3.6.1.4.1.323.5.3.36.1.2.7034 ocnrf_accessToken_tx_rejected_total The alert is cleared automatically as this will be raised when OCNRF receives Access Token Request and at that point Current Key Id details are invalid.  
OcnrfOauthCurrentKeyNotConfigured Oauth Current Key ID is not configured Critical

description: 'OCNRF Oauth Access token Current Key Id is not configured'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token Current Key Id is not configured.'

1.3.6.1.4.1.323.5.3.36.1.2.7035 ocnrf_oauth_currentKeyId_configuredStatus

The alert is cleared when current key id is configured.

Steps:

Configure valid current key id in Access Token Configuration

 
OcnrfOauthCurrentKeyDataHealthStatus Oauth Current Key ID details health is not good Critical

description: 'OCNRF Oauth Access token Current Key Id status is not healthy'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, KeyType: {{$labels.KeyType}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token Current Key Id status is not healthy.'

1.3.6.1.4.1.323.5.3.36.1.2.7036 ocnrf_oauth_keyData_healthStatus

The alert is cleared when current key id status is healthy.

Steps:

Key Data Health Status details can be checked using OCNRF configuration status REST APIs and configuration microservice logs.

Rectify the condition by checking ErrorCondition

 
OcnrfOauthNonCurrentKeyDataHealthStatus Oauth Non Current Key details health is not good Info

description: 'OCNRF Oauth Access token Non current Key Id status is not healthy'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, KeyType: {{$labels.KeyType}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token non current Key Id status is not healthy.'

1.3.6.1.4.1.323.5.3.36.1.2.7037 ocnrf_oauth_keyData_healthStatus

The alert is cleared when current key id status is healthy.

Steps:

Key Data Health Status details can be checked using OCNRF configuration status REST APIs and configuration microservice logs.

Rectify the condition by checking ErrorCondition

 
OcnrfOauthCurrentCertificateExpiringIn24Hours Oauth Current Key ID details are expiring in less than 24 hours Critical

description: 'OCNRF Oauth Access token current Key Id certificate is expiring in less than 24 hours'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token current Key Id certificate is expiring in less than 24 hours.'

1.3.6.1.4.1.323.5.3.36.1.2.7038 ocnrf_oauth_keyData_expiryStatus

The alert is cleared when key expiry time is more than 24 hours.

Steps:

Replace expiring certificate key pair with new ones
 
OcnrfOauthNonCurrentCertificateExpiringIn24Hours Oauth Non Current Key ID details are expiring in less than 24 hours Info

description: 'OCNRF Oauth Access token non current Key Id certificate is expiring in less than 24 hours'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token non current Key Id certificate is expiring in less than 24 hours.'

1.3.6.1.4.1.323.5.3.36.1.2.7039 ocnrf_oauth_keyData_expiryStatus

The alert is cleared when key expiry time is more than 24 hours.

Steps:

Replace expiring certificate key pair with new ones
 
OcnrfOauthCurrentCertificateExpiringIn30days Oauth Current Key ID details are expiring in more than 24 hours and less than 30 days Critical

description: 'OCNRF Oauth Access token current Key Id certificate is expiring in less than 30 days'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token current Key Id certificate is expiring in less than 30 days.'

1.3.6.1.4.1.323.5.3.36.1.2.7040 ocnrf_oauth_keyData_expiryStatus

The alert is cleared when certificate for current key id's expiry time is more than 30 days.

Steps:

Replace expiring certificate key pair with new ones
 
OcnrfOauthNonCurrentCertificateExpiringIn30days Oauth Non Current Key ID details are expiring in more than 24 hours and less than 30 days Info

description: 'OCNRF Oauth Access token non current Key Id certificate is expiring in less than 30 days'

summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token non current Key Id certificate is expiring in less than 30 days.'

1.3.6.1.4.1.323.5.3.36.1.2.7041 ocnrf_oauth_keyData_expiryStatus

The alert is cleared when certificate for non-current key id's certificate expiry time is more than 30 days.

Steps:

Replace expiring certificate key pair with new ones
 

OCNRF Alert Configuration

This section describes the Measurement based Alert rules configuration for OCNRF. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

Note:

  • Alert file is packaged with OCNRF custom templates. The OCNRF templates.zip file can be downloaded from MOS. Unzip the OCNRF templates.zip file to get NrfAlertRules.yaml file.
  • Review the NrfAlertRules.yaml file and edit the value of the parameters in the NrfAlertRules.yaml file (if needed to be changed from default values) before configuring the alerts. See below table for details.
  • kubernetes_namespace is configured as kubernetes namespace in which NRF is deployed. Default value is OCNRF. Please update the NrfAlertRules.yaml file to reflect the correct OCNRF kubernetes namespace.
Alert details which can be updated in NrfAlertRules.yaml file before configuration

Table 10-14 Alerts

Alert Name Details Default Value Notes
OcnrfTotalIngressTrafficRateAboveMinorThreshold Traffic Rate is above 80 Percent of Max requests per second Greater than/equal to 800 and Less than 900

Maximum Ingress rate considered is 1000 requests per second.

So, here in default value 800 is 80% of 1000 and 900 is 90% of 1000.

For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 80% of Max Ingress Request Rate] for this alert

OcnrfTotalIngressTrafficRateAboveMajorThreshold Traffic Rate is above 90 Percent of Max requests per second Greater than/equal to 900 and Less than 950

Maximum Ingress rate considered is 1000 requests per second.

So, here in default value 900 is 90% of 1000 and 950 is 95% of 1000.

For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 95% of Max Ingress Request Rate] for this alert
OcnrfTotalIngressTrafficRateAboveCriticalThreshold Traffic Rate is above 95 Percent of Max requests per second Greater than/equal to 950

Maximum Ingress rate considered is 1000 requests per second.

So, here in default value 950 is 95% of 1000.

For example, if value need to be updated then depending upon maximum ingress request rate, set [ 95% of Max Ingress Request Rate] for this alert

OCNRF Alert configuration in Prometheus

This section describes the measurement based Alert rules configuration for OCNRF in Prometheus.

_NAME_ :- Helm Release of Prometheus

_Namespace_ :- Kubernetes NameSpace in which Prometheus is installed

  1. Take Backup of current configuration map of Prometheus:
    kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml
  2. Check and add OCNRF Alert file name inside Prometheus configuration map:
    sed -i '/etc\/config\/alertsnrf/d' /tmp/tempConfig.yaml
    sed -i '/rule_files:/a\  \- /etc/config/alertsnrf' /tmp/tempConfig.yaml
  3. Update configuration map with updated file name of OCNRF alert file:
    kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
  4. Add OCNRF Alert rules in configuration map under file name of OCNRF alert file:
    kubectl patch configmap _NAME_-server -n _Namespace_--type merge --patch
    "$(cat ~/NrfAlertrules.yaml)"

Note:

The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 60 seconds. Refresh the Prometheus GUI to confirm that the OCNRF Alerts have been reloaded.

Disable OCNRF Alert in Prometheus

Steps to disable Alerts in Prometheus:
  1. Edit NrfAlertrules.yaml file to remove specific alert:

    Sample alert content from NrfAlertrules.yaml is below. This is to provide idea of a specific alert details in NrfAlertrules.yaml which need to be disabled.

    ## ALERT SAMPLE START##
          - alert: OcnrfTrafficRateAboveMinorThreshold
            annotations:
              description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})'
              summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)'
            expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900
            labels:
              severity: Minor
    ## ALERT SAMPLE END##
  2. Remove specific alert content which need to be disabled.
  3. Perform Alert configuration again. See OCNRF Alert configuration in Prometheus section above for detailed steps.

Disabling Alerts

This section explains the procedure to disable the alerts in OCNRF.
  1. Edit NrfAlertrules.yaml file to remove specific alert.
  2. Remove complete content of the specific alert from the NrfAlertrules.yaml file.
    For example: If you want to remove OcnrfTrafficRateAboveMinorThreshold alert, remove the complete content:
    ## ALERT SAMPLE START##
    
          - alert: OcnrfTrafficRateAboveMinorThreshold
            annotations:
              description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})'
              summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)'
            expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900
            labels:
              severity: Minor
    ## ALERT SAMPLE END##
  3. Perform Alert configuration. See OCNRF Alert Configuration section above for details.

Configuring SNMP Notifier

This section describes the procedure to configuring SNMP Notifier.

Configure and Validate Alerts in Prometheus Server

Refer to OCNRF Alert Configuration section for procedure to configure the alerts.

Validating Alerts

After configuring the alerts in Prometheus server, a user can verify that by following steps:

  • Open the Prometheus server from your browser using the <IP>:<Port>
  • Navigate to Status and then Rules
  • Search Ocnrf. OcnrfAlerts list is displayed.

    Note:

    If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try again.
Configuring SNMP-Notifier
Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using the following procedure:
  1. Execute the following command to edit the deployment:
    kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>

    Example:

    $ kubectl edit deploy occne-snmp-notifier -n occne-infra
  2. Edit the destination as follows:
    --snmp.destination=<destination_ip>:<destination_port>

    Example:

    --snmp.destination=10.75.203.94:162
Checking SNMP Traps
Following is an example on how to capture the logs of the trap receiver server to view the generated SNMP traps:
$ docker logs <trapd_container_id>
Sample output:
2020-04-29 15:34:24 10.75.203.103 [UDP: [10.75.203.103]:2747->[172.17.0.4]:162]:DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (158510800) 18 days, 8:18:28.00        SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003    SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.1 = STRING: "1.3.6.1.4.1.323.5.3.36.1.2.7003[]"  SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.2 = STRING: "critical"      SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.3 = STRING: "Status: critical- Alert: OcnrfActiveSubscribersBelowCriticalThreshold  Summary: namespace: ocnrf, nftype:5G_EIR, nrflevel:6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c, podname: ocnrf-nrfauditor-6b459f5db5-4kvt4,
        timestamp: 2020-04-29 15:33:24.408 +0000 UTC: Current number of registered NFs detected below critical threshold.  Description: The number of registered NFs detected below critical threshold (current value
          is: 0)
MIB Files for OCNRF

There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.

  • OCNRF-MIB-TC-1.10.0.mib

    This is considered as OCNRF top level mib file, where the Objects and their data types are defined.

  • OCNRF-MIB-1.10.0.mib

    This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.

Note:

MIB files are packaged along with OCNRF Custom Templates. Download the file from MOS. Refer to OCNRF Installation and Upgrade guide for more details.