6 OCNRF Metrics, KPIs, and Alerts
OCNRF Metrics
This section includes information about Metrics for Oracle Communications Network Repository Function.
Note:
Sample OCNRF dashboard for Grafana is delivered to the customer through OCNRF Custom Templates. Metrics and functions used to achieve KPI are covered in OCNRF Custom Templates. Refer to Oracle Help Center site for the information about OCNRF Custom Templates.Dimensions Legend for the Metrics
The following table includes the details about the metrics dimensions:
Table 6-1 Dimensions Legend
Dimension | Details |
---|---|
Method | HTTP Method Name. For Example:- PUT, GET |
Status | HTTP Status Code in response |
Uri | URI defined to identify the Service Operation at Ingress Gateway |
Node | Name of the kubernetes worker node on which microservice is running |
NrfLevel | OCNRF Deployment Name by which OCNRF can be identified, it will be OCNRF Instance Id passed through helm |
NfType | Types of Network Functions (NF) |
NfInstanceId | Unique identity of the NF Instance sending request to OCNRF |
HttpStatusCode | HTTP Status Code |
ServiceName | Name of the service instance (e.g. "nudm-sdm") |
ServiceInstanceId | Unique ID of the service instance within a given NF Instance |
UpdateType(Partial/Complete) | NF Update with PUT (Complete) or PATCH (Partial) methods |
OperationType | Dimension is for NFSubscribe Service operation to tell if the request is to create or update the subscription |
NotificationEventType | This dimension indicates subscription request is for which event types. For example:- NF_REGISTERED, NF_DEREGISTERED and NF_PROFILE_CHANGED |
TargetNfType | Dimension indicates request is for which target NF type |
RequesterNfType | Dimension indicates the NF type which originating the
request. This value comes from UserAgent header. For NFDiscover Service
operation it is taken from Search Query.
In case no header or value, this value will be UNKNOWN in the metrics |
TargetNfInstanceId | Dimension indicates the target NF Instance Id for NF Access Token |
ClientNfInstanceId | Dimension indicates the client NF Instance Id for NF Access Token |
RejectionReason | Dimension indicates the rejection reason for NF Access Token |
SubscriptionIdType | Dimension indicates the Subscription Id type for which SLF query is received |
GroupId | Dimension indicates the GroupId returned by SLF/UDR corresponding to SubscriptionId |
BucketSize | Dimension indicates how many profiles are returned in the response of Discovery request. Range is not configurable. Possible values are 0-10, +Inf. According to NF profiles returned, corresponding bucket will be incremented by one. For example, if 2 profiles are returned, then bucket 2 will be incremented by one. Profiles getting returned more than 10 will fall in +Inf bucket. |
DBOperation | Create,update,delete and find |
TableName | OCNRF Table Name |
SubscriptionStatus | Status of subscription shall be 'SUBSCRIBED', 'SUSPENDED' or 'UNSUBSCRIBED' |
DbReplicationStatus | "ACTIVE" or "INACTIVE" |
RemoteNrfInstanceId | Remote OCNRF Instance Id |
HeartbeatTimer | The heartbeatTimer of the NfProfile. The value is considered in seconds. |
Table 6-2 OCNRF Metrics
Sl. No# | Metric Name | Metric Details | Metric filter | Recommended legend to see dimension level data (as applicable) | Dimensions |
---|---|---|---|---|---|
1 | Total number of ingress requests | Total number of requests received at OCNRF | oc_ingressgateway_http_requests_total | ||
2 | NF Register Success | Total number of successful NFRegister service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
3 | NF Update Success (Complete Replacement) | Total number of successful NFUpdate service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
4 | NF Update Success (Partial Replacement) | Total number of successful NFUpdate service operations at OCNRF | oc_ingressgateway_http_responses_total{Status=~".*2.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PATCH"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
5 | NF List/Profile Retrieval Success | Total number of successful NF List/Profile retrieval service operations at OCNRF | oc_ingressgateway_http_responses_total{Status=~".*2.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="GET"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
6 | Access Token Success | Total number of successful Access Token service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*/oauth2/token*."} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the Kubernetes worker node on which micro-service is running |
|
7 | NF De-register Success | Total number of successful service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="DELETE"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the Kubernetes worker node on which micro-service is running |
|
8 | NF Subscribe Success | Total number of successful NFSubscribe service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="POST"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the Kubernetes worker node on which micro-service is running |
|
9 | NF Unsubscribe Success | Total number of successful NFUnSubscribe service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="DELETE"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the Kubernetes worker node on which micro-service is running |
|
10 | NF Discover Success | Total number of successful NFDiscover service operations at OCNRF | oc_ingressgateway_http_responses_total{Status=~"2.*",Route_path=~".*nnrf-disc/v1/nf-instances.*",Method="GET"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the Kubernetes worker node on which micro-service is running |
|
11 | 4xx Responses (NF-Instances) | Total number of 4xx responses(NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval) | oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
12 | 4xx Responses (Subscriptions) | Total number of 4xx responses(NfSubscribe/NfUnsubscribe) | oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
13 | 4xx Responses (Discovery) | Total number of 4xx responses(NfDiscover) | oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
14 | 4xx Responses (AccessToken) | Total number of 4xx responses(NfAccessToken) | oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*oauth2/token.*"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
15 | 5xx Responses (NF-Instances) | Total number of 5xx responses(NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval) | oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
16 | 5xx Responses (Subscriptions) | Total number of 5xx responses(NfSubscribe/NfUnsubscribe) | oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
17 | 5xx Responses (Discovery) | Total number of 5xx responses(NfDiscover) | oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
18 | 5xx Responses (AccessToken) | Total number of 5xx responses(NfAccessToken) | oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*oauth2/token.*"} |
Method- HTTP method of request Status - status code in HTTP response Uri- URI from the request line Node-Name of the kubernetes worker node on which microservice is running |
|
19 | NfRegistrations Total | Number of Registration Requests received | ocnrf_nfRegister_rx_requests_total | NfRegistrations Total |
NrfLevel NfInstanceId RequesterNfType |
20 | NfRegistrations Responses Total | Number of Registration Responses sent. | ocnrf_nfRegister_tx_responses_total | NfRegistrations Responses Total |
NrfLevel NfInstanceId RequesterNfType HttpStatusCode |
21 | NfRegistrations Per Service Total | Number of Registrations received and processed successfully per Service. | ocnrf_nfRegister_rx_requests_success_perService_total | NfRegistrations Per Service [ serviceName :- {{ serviceName }}, nfInstanceId :- {{NfInstanceId}} ] |
NrfLevel NfInstanceId ServiceName ServiceInstanceId |
22 | NFUpdates Total | Number of Update Requests received. | ocnrf_nfUpdate_rx_requests_total | NfUpdates Total |
NrfLevel NfInstanceId RequesterNfType UpdateType(Partial/Complete) |
23 | NFUpdates Responses Total | Number of Update Responses sent. | ocnrf_nfUpdate_tx_responses_total | NfUpdates Responses Total |
NrfLevel NfInstanceId RequesterNfType UpdateType(Partial/Complete) HttpStatusCode |
24 | NFUpdates Per Service Total | Number of NfUpdates received and processed successfully per Service. | ocnrf_nfUpdate_rx_requests_success_perService_total | NFUpdates Per Service [ serviceName :- {{ serviceName }}, serviceInstanceId:- {{ServiceInstanceId}} ] |
NrfLevel, Updatetype =(Partial/Complete), NfInstanceId, ServiceName, ServiceInstanceId |
25 | Heartbeat Requests Total | Number of Heartbeat Requests received | ocnrf_nfHeartbeat_rx_requests_total |
NrfLevel NfInstanceId RequesterNfType |
|
26 | Heartbeat Resposnes Total | Number of Heartbeat Responses sent | ocnrf_nfHeartbeat_tx_responses_total |
Nrflevel, NfInstanceId, RequesterNfType , HttpStatusCode |
|
27 | NF De-Registration Requests Total | Number of De-registration requests received | ocnrf_nfDeregister_rx_requests_total |
NrfLevel, NfInstanceId, RequesterNfType |
|
28 | NF De-Registration Responses Total | Number of De-registration responses sent | ocnrf_nfDeregister_tx_responses_total |
NrfLevel, NfInstanceId, RequesterNfType , HttpStatusCode |
|
29 | NF De-Registrations Per Service Total | Number of De-registration requests received and process successfully per Service | ocnrf_nfDeregister_rx_requests_success_perService_total | NFDeregistration Per Service [ serviceName :- {{ serviceName }}, serviceInstanceId:- {{ServiceInstanceId}} ] |
NrfLevel, ServiceName, ServiceInstanceId, NfInstanceId |
30 | NF List Retrieval Requests Total | Number of NFListRetrieval requests received | ocnrf_nfListRetrieval_rx_requests_total |
NrfLevel, RequesterNfType |
|
31 | NF List Retrieval Responses Total | Number of NFListRetrieval responses sent | ocnrf_nfListRetrieval_tx_responses_total |
NrfLevel, RequesterNfType , HttpStatusCode |
|
32 | NF Profile Retrieval Requests Total | Number of NFProfileRetrieval requests received | ocnrf_nfProfileRetrieval_rx_requests_total |
NrfLevel, NfInstanceId |
|
33 | NF Profile Retrieval Responses Total | Number of NFProfileRetrieval responses sent | ocnrf_nfProfileRetrieval_tx_responses_total |
NrfLevel, NfInstanceId, HttpStatusCode |
|
34 | Number of Heartbeats missed | Number of heartbeats missed. | ocnrf_heartbeat_missed_total |
NrfLevel, RequesterNfType , NfInstanceId |
|
35 | NF Status Subscribe Requests Total | Number of NStatusSubscribe requests received | ocnrf_nfStatusSubscribe_rx_requests_total |
NrfLevel, RequesterNfType, OperationType |
|
36 | NF Status Subscribe Responses Total | Number of NfStatusSubscribe responses sent | ocnrf_nfStatusSubscribe_tx_responses_total |
NrfLevel, RequesterNfType , HttpStatusCode, OperationType |
|
37 | NF Status UnSubscribe Requests Total | Number of NfStatusUnsubscribe requests received | ocnrf_nfStatusUnsubscribe_rx_requests_total |
NrfLevel, RequesterNfType |
|
38 | NF Status UnSubscribe Responses Total | Number of NfStatusUnsubscribe responses sent | ocnrf_nfStatusUnsubscribe_tx_responses_total |
NrfLevel, RequesterNfType, HttpStatusCode |
|
39 | NF Status Notifications Requests Sent | Number of NfStatusNotify requests sent | ocnrf_nfStatusNotify_tx_requests_total |
NrfLevel, NotificationEventType, TargetNfType |
|
40 | NF Status Notifications Responses Received | Number of NfStatusNotify responses received | ocnrf_nfStatusNotify_rx_responses_total |
NrfLevel, NotificationEventType, TargetNfType, HttpStatusCode |
|
41 | NF Status Notifications Requests Failed | Number of NfStatusNotify requests failed to sent out | ocnrf_nfStatusNotify_requests_failed_total |
NrfLevel, NotificationEventType, TargetNfType |
|
42 | NfDiscover Requests Total | Number of NfDiscover Requests received | ocnrf_nfDiscover_rx_requests_total | NfDiscover Req [ TargetNf :- {{ TargetNfType }}, RequesterNfType :- {{RequesterNfType}} ] |
NrfLevel, TargetNfType, RequesterNfType |
43 | NfDiscover Responses Total | Number of NfDiscover responses sent | ocnrf_nfDiscover_tx_responses_total |
NrfLevel, TargetNfType, RequesterNfType, HttpResponseCode |
|
44 | NFDiscover Per Service Total | Number of NfDiscover requests received and processed successfully per Service | ocnrf_nfDiscover_rx_requests_success_perService_total | NFDiscover Per Service [ serviceName :- {{ serviceName }} ] |
NrfLevel, RequesterNfType, ServiceName |
45 | Discovered profiles | Number of Profiles returned in discovery response. Depending on bucket size and corresponding value will tell how many profiles are returned in discovery response. | ocnrf_nfDiscover_profiles_discovered_total | Discovered profiles [ TargetNfType :- {{TargetNfType}}, Bucket :- {{ Bucket }} ] |
NrfLevel, TargetNfType, BucketSize NfFqdn |
46 | Active Registrations | Number of active registered NFs at any point of time | ocnrf_active_registrations_count | Active Registrations [ NfType-{{ NfType }}, NrfLevel-{{ NrfLevel }} ] |
NfType, NrfLevel |
47 | Avg NRF Latency taken by NRF specific microservice | Time taken by NRF specific microservice to process
the service operation
(NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval/NfHeartbeat/NfDiscover/NfSubscribe/NfUnsubscribe/NfAccessToken)
Note: Latency calculated by this metric doesn't include time taken by OCNRF API gateway. |
ocnrf_message_processing_time_seconds | Avg NRF Latency {{ ServiceOperation }} {{ RequesterNfType }} | NrfLevel,RequesterNfType ,ServiceOperation |
48 | OCNRF database operations | Database operation count corresponding to every service operation | ocnrf_dbmetric_total |
Method, DBOperation, NrfLevel, HttpStatusCode |
|
49 | Database operation round trip time | Time (in microseconds) taken by database operation
corresponding to every service operation
NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval/NfHeartbeat/NfDiscover/NfSubscribe/NfUnsubscribe/NfAccessToken) |
ocnrf_dbmetrics_round_trip_time_seconds |
|
In the above NRF Metrics table, 4xx and 5xx are the error codes in REST API.
Table 6-3 NF Screening specific metrics
Sl. No# | Metric Name | Metric Details | Metric filter | Service Operation | Dimensions | Notes |
---|---|---|---|---|---|---|
1 | Total NF Requests for which Screening Failed | The total number of requests for which screening failed against NF FQDN screening list. | ocnrf_nfScreening_nfFqdn_requestFailed_total | NFRegister, NFUpdate | NRF level NF type | See Note 1 below this table. |
2 | Total NF Requests Rejected due to Screening Failed | The total number of requests rejected because screening failed against NF FQDN screening list. | ocnrf_nfScreening_nfFqdn_requestRejected_total | NFRegister, NFUpdate | NRF level NF type | See Note 1 below this table. |
3 | Total NF Requests for which Screening Failed | The total number of requests for which screening failed against NF IP endpointscreening list. | ocnrf_nfScreening_nfIpEndPoint_requestFailed_total | NFRegister, NFUpdate | NRF level NF type | See Note 1 below this table. |
4 | Total NF Requests Rejected due to Screening Failed | The total number of requests rejected because screening failed against NF IP endpoint screening list. | ocnrf_nfScreening_nfIpEndPoint_requestRejected_total | NFRegister, NFUpdate | NRF level NF type | See Note 1 below this table. |
5 | Total NF Requests for which Screening Failed | The total number of requests for which screening failed against Callback URIscreening list. | ocnrf_nfScreening_callbackUri_requestFailed_total | NFRegister, NFUpdate, NFSubscribe | NRF level NF type | See Note 1 below this table. |
6 | Total NF Requests Rejected due to Screening Failed | The total number of requests rejected because screening failed against Callback URI screening list. | ocnrf_nfScreening_callbackUri_requestRejected_total | NFRegister, NFUpdate, NFSubscribe | NRF level NF type | See Note 1 below this table. |
7 | Total NF Requests for which Screening Failed | The total number of requests for which screening failed against PLMN idscreening list. | ocnrf_nfScreening_plmnId_requestFailed_total | NFRegister, NFUpdate | NRF level NF type | See Note 1 below this table. |
8 | Total NF Requests Rejected due to Screening Failed | The total number of requests rejected because screening failed against PLMN id screening list. | ocnrf_nfScreening_plmnId_requestRejected_total | NFRegister, NFUpdate | NRF level NF type | See Note 1 below this table. |
9 | Total NF Requests for which Screening Failed | The total number of NFRegister requests rejected as NF type was not allowed to register with NRF. | ocnrf_nfScreening_nfTypeRegister_requestFailed_total | NFRegister | NRF level NF type | See Note 1 below this table. |
10 | Total NF Requests Rejected due to Screening Failed | The total number of NFRegister requests for which screening failed against NF type screening list. | ocnrf_nfScreening_nfTypeRegister_requestRejected_total | NFRegister | NRF level NF type | See Note 1 below this table. |
11 | NF Screening not applied Internal Error | The total number of times screening not applied due to internal error. | ocnrf_nfScreening_notApplied_InternalError_total | NFRegister, NFUpdate, NFSubscribe | NRF level NF type | See Note 1 below this table. |
Note:
In the above "NF Screening metrics" table, the dimension NF Type is a requester NF Type.NF Access token metrics
Table 6-4 NF Access token metrics
Sl. No# | Metric Name | Metric Details | Metric filter | Service Operation | Dimensions |
---|---|---|---|---|---|
1 | NF Access Token Request Received Total | The total number of access token requests received | ocnrf_accessToken_rx_requests_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel |
2 | NF Access Token Responses Sent Total | The total number of access token responses sent | ocnrf_accessToken_tx_responses_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode |
3 | NF Access Token Request Rejected (ClientNotAuthorized) | Number of access token request for which client authorized failed RejectionReason = ClientNotAuthorized | ocnrf_accessToken_tx_rejected_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode RejectionReason = ClientNotAuthorized |
4 | NF Access Token Request Rejected (ProducerWithRequestedScopeNotFound) | Number of access token not granted because of no producer instance registered for service/s in the scope RejectionReason = ProducerWithRequestedScopeNotFound | ocnrf_accessToken_tx_rejected_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, RejectionReason HttpStatusCode RejectionReason = ProducerWithRequestedScopeNotFound |
5 | NF Access Token Request Rejected (ProducerWithRequestedNfInstanceIdNotFound) | Number of access token not granted because of no producer instance registered for No producer instance is registered at all for provided target Instance Id in request. RejectionReason = ProducerWithRequestedNfInstanceIdNotFound | ocnrf_accessToken_tx_rejected_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId,
Scope, NrfLevel, RejectionReason HttpStatusCode
RejectionReason = ProducerWithRequestedNfInstanceIdNotFound |
6 | NF Access Token Request Rejected (InconsistentScope) | Number of access token not granted because services in the scope belong to different NF types. RejectionReason = InconsistentScope | ocnrf_accessToken_tx_rejected_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId,
Scope, NrfLevel, RejectionReason HttpStatusCode
RejectionReason = InconsistentScope |
7 | NF Access Token Request Rejected (ConsumerNFTypeMismatch) | Number of access token not granted because consumer NF type in profile is not matching with the access token request. RejectionReason = ConsumerNFTypeMismatch | ocnrf_accessToken_tx_rejected_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId,
Scope, NrfLevel, RejectionReason HttpStatusCode
RejectionReason = ConsumerNFTypeMismatch |
8 | NF Access Token Request Rejected (ProducerNFTypeMismatch) | Number of access token not granted because producer NF type in profile is not matching with the access token request. RejectionReason = ProducerNFTypeMismatch | ocnrf_accessToken_tx_rejected_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId,
Scope, NrfLevel, RejectionReason HttpStatusCode
RejectionReason = ProducerNFTypeMismatch |
9 | NF Access Token Request Rejected (InternalError) | Number of access token not granted because failure at NRF due to internal error. RejectionReason = InternalError | ocnrf_accessToken_tx_rejected_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId,
Scope, NrfLevel, HttpStatusCode
RejectionReason = ProducerNFTypeMismatch |
10 | NF Access Token Request Rejected (ConsumerNfTypeNotAllowed) | Number of access token not granted because the consumer NFType is not allowed to access the requested NF. | ocnrf_accessToken_tx_rejected_total | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode RejectionReason = ConsumerNfTypeNotAllowed |
11 | NF Access Token Request Rejected (ConsumerPlmnNotAllowed) | Number of access token not granted because the consumer NF PLMN is not allowed to access the requested NF. | ocnrf_accessToken_tx_rejected_total | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode RejectionReason = ConsumerPlmnNotAllowed |
NRF-SLF specific metrics
Table 6-5 NRF-SLF specific metrics
Sl. No# | Metric Name | Metric Details | Metric filter | Service Operation | Dimensions |
---|---|---|---|---|---|
1 | Discover Request Received For SLF Total | The total number of NF Discover request received for SLF | ocnrf_nfDiscover_ForSLF_rx_requests_total | NFDiscover | TargetNfType, NRFLevel |
2 | Discover Response Sent For SLF Total | The total number of NF Discover responses sent for SLF | ocnrf_nfDiscover_ForSLF_tx_responses_total | NFDiscover | TargetNfType, NRFLevel, HttpStatusCode, RejectionReason Possible Reject reasons:- RejectionReason = SLFCommunicationFailure RejectionReason = MandatoryParamsMissing RejectionReason = SLFConfigurationMissing RejectionReason = GroupIdNotFound RejectionReason = ErrorFromSLF RejectionReason = InternalError RejectionReason= *NotApplicable *NotApplicable is applicable for 2xx Status code |
3 | SLF Query Requests Sent Total | The total number of SLF query request sent | ocnrf_SLF_tx_requests_total | NFDiscover | TargetNfType, NRFLevel, SubscriptionIdType |
4 | SLF Query Responses Received Total | The total number of SLF query response received | ocnrf_SLF_rx_responses_total | NFDiscover | TargetNfType, NRFLevel, SubscriptionIdType,HttpStatusCode, GroupId |
5 | SLF Round Trip Time Total | Time (in microseconds) after sending query to SLF and getting response from SLF | ocnrf_slf_round_trip_time_seconds | NFDiscover |
TargetNfType, SubscriptionIdType, HttpStatusCode, GroupId, NrfLevel, SLF ApiRoot |
NRF Forwarding Metrics
Table 6-6 NRF Forwarding Metrics
Sl. No# | Metric Name | Metric Details | Metric filter | Service Operation | Dimensions |
---|---|---|---|---|---|
1 | NF Access Token Requests Forwarded Total | The total number of Access Token Request forwarded to Primary/Secondary NRF | ocnrf_forward_accessToken_tx_requests_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel |
2 | NF Access Token Forwarded Responses Total | The total number of Access Token Responses for request forwarded to Primary/Secondary NRF | ocnrf_forward_accessToken_rx_responses_total | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId,
Scope, NrfLevel,HttpStatusCode, RejectionReason RejectionReason:
*NotApplicable is applicable for 2xx Status code |
3 | NF Profile Retrieval Requests Forwarded Total | The total number of Profile Retrieval Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfProfileRetrieval_tx_requests_total | NFProfileRetrieval | NrfLevel, NfInstanceId |
4 | NF Profile Retrieval Forwarded Responses Total | The total number of Profile Retrieval Responses for Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfProfileRetrieval_rx_responses_total | NFProfileRetrieval | NrfLevel, NfInstanceId, HttpStatusCode,
RejectionReason RejectionReason:
*NotApplicable is applicable for 2xx Status code |
5 | NF Status Subscribe Forwarded Requests Total | The total number of Status Subscribe Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfStatusSubscribe_tx_requests_total | NFStatusSubscribe, NFStatusUnsubscribe | NrfLevel, RequesterNfType, OperationType |
6 | NF Status Subscribe Forwarded Responses Total | The total number of Responses for Status Subscribe Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfStatusSubscribe_rx_responses_total | NFStatusSubscribe, NFStatusUnsubscribe, | NrfLevel, RequesterNfType, HttpStatusCode,
OperationType, RejectionReason RejectionReason:
*NotApplicable is applicable for 2xx Status code |
7 | NF Discovery Forwarded Requests Total | The total number of NF Discovery Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfDiscover_tx_requests_total | NFDiscover | NrfLevel, TargetNfType, RequesterNfType |
8 | NF Discovery Forwarded Responses Total | The total number of Responses for NF Discovery Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfDiscover_rx_responses_total | NFDiscover | NrfLevel, TargetNfType, RequesterNfType,
HttpResponseCode, RejectionReason RejectionReason:
ErrorFromNrf *NotApplicable is applicable for 2xx Status code |
9 | Avg Latency for NRF Message Forwarding | Time taken by NRF specific microservice to forward the message to other Primary/Secondary NRF with the service operation: (NFProfileRetrieval/NFDiscover/NFStatusSubscribe/NfStatusUnsubscribe/AccessToken) | ocnrf_forward_round_trip_time_seconds | NFStatusSubscribe, NFStatusUnsubscribe, NFProfileRetrieval, NFDiscover, AccessToken | NrfLevel, RequesterNfType, ServiceOperation |
GeoRedundancy metrics
Table 6-7 GeoRedundancy metrics
Sl. No# | Metric Name | Metric Details | Metric filter | Service Operation | Dimensions |
---|---|---|---|---|---|
1. | DB Replication status | The current replication status of the db tier service. | ocnrf_dbreplication_status | NA | NrfLevel,DbReplicationStatus |
2. | DB Replication down Time | Time taken for the replication status to change from "INACTIVE" to "ACTIVE" | ocnrf_dbreplication_down_time_seconds | NA | NrfLevel,DbReplicationDownStartTime,DbReplicationDownEndTime |
3. | Total NfInstances switched over from mated site | The number of NFInstances that got switched over from the mated site. | ocnrf_nf_switch_over_total | NfRegister, NfUpdate,NfDeregister, NfHeartbeat | NrfLevel, NfInstanceId,RemoteNrfInstanceId,ServiceOperation,OperationType |
4. | Total NfSubscriptions switched over from mated site | The number of NfSubscriptions that got switched over from the mated site. | ocnrf_nfSubscriptions_switch_over_total | NfStatusSubscribe,NfStatusUnsubscribe, NrfAuditor | NrfLevel,SubscriptionId,RemoteNrfInstanceId,ServiceOperation,OperationType |
5. | Total Nfinstances removed by OCNRF as it is stale | The number of NfInstances that get deleted by the NrfAuditor when it detects a record to be stale. | ocnrf_stale_nf_deleted_total | NA |
NrfLevel, NfInstanceId, NfStatus |
6. | Total NfSubscriptions removed by OCNRF as it is stale | The number of NfSubscriptions that get deleted by the NrfAuditor when it detects a record to be stale. | ocnrf_stale_nfSubscriptions_deleted_total | NA | NrfLevel,NfSubscriptionId,SubscriptionStatus |
7. | Total NfInstances that have been marked as SUSPENDED by the OCNRF Auditor | The number of profiles that have been marked as SUSPENDED when a profile has missed nfHeartBeatMissAllowed. | ocnrf_nf_suspended_total | NA |
NrfLevel, NfInstanceId, NfStatus, HeartbeatTimer |
8 | Total NfSubscriptions whose validityTime has expired | The number of NfSubscriptions whose validityTime has expired | ocnrf_nfSubscriptions_expired_total | NA | NrfLevel,SubscriptionId |
NF AccessToken Authorization Metrics
Table 6-8 NF AccessToken Authorization Metrics
Sl. No# | Metric Name | Metric Details | Metric filter | Service Operation | Dimensions |
---|---|---|---|---|---|
1 | NF Access Token Request Rejected (AuthScreeningFailed) | Number of access token not granted because the consumer NF is not authorized to access the requested NF or its services. | ocnrf_accessToken_tx_rejected_total | NfAccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel,HttpStatusCode RejectionReason = ClientNotAuthorized |
NF Authentication Metrics
Table 6-9 NF Authentication Metrics
Sl. No# | Metric Name | Metric Details | Metric filter | Service Operation | Dimensions |
---|---|---|---|---|---|
1 | NF Authentication Failure Total | The total number of request for which FQDN based Authentication failed at OCNRF | ocnrf_nf_authentication_failure_total | NrfLevel,
Method, ServiceOperation, NfFqdn, TLSFqdn |
NFAccessToken/NFRegistration/NFSubscription/NFDiscovery/NfListRetrieval/NfProfileRetrieval
For NfListRetrieval and NfProfileRetrieval serviceOperations NfFqdn is filled as NotApplicable. If OC-XFCC-DNS header is not received at NRF Microservice then TLSFqdn is filled as "UNKNOWN" |
OCNRF KPIs
This section includes information about KPIs for Oracle Communications Network Repository Function (OCNRF).
Note:
Sample OCNRF dashboard for Grafana is delivered to the customer through OCNRF Custom Templates. Metrics and functions used to achieve KPI are already covered in OCNRF Custom Templates.Table 6-10 KPI Details
KPI Name | KPI Details | Metric used for KPI | Service Operation | Response code |
---|---|---|---|---|
OCNRF Ingress Request | Rate of HTTP requests received at OCNRF Ingress Gateway | oc_ingressgateway_http_requests_total | All | Not Applicable |
NF Register Success | sum(irate(oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}[5m])) | NFRegister | 201 | |
NF Update Success (Complete Replacement) | sum(irate(oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}[5m])) | NFUpdate | 200 | |
NF DeRegister Success | sum(irate(oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="DELETE"}[5m])) | NFDeregister | 204 | |
NF Subscribe Success | sum(irate(oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="POST"}[5m])) | NFStatusSubscribe | 201 | |
NF Unsubscribe Success | sum(irate(oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="DELETE"}[5m])) | NFStatusUnsubscribe | 204 | |
NF Discover Success | sum(irate(oc_ingressgateway_http_responses_total{Status=~"2.*",Route_path=~".*nnrf-disc/v1/nf-instances.*",Method="GET"}[5m])) | NFDiscover | 200 | |
4xx Responses (NF-Instances) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"}[5m])) | NFRegister/NFUpdate/NFDeregister | 4xx | |
4xx Responses (Subscriptions) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"}[5m])) | NFStatusSubscribe/NFStatusUnsubscribe | 4xx | |
4xx Responses (Discovery) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"}[5m])) | NFDiscover | 4xx | |
5xx Responses (NF-Instances) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"}[5m])) | NFRegister/NFUpdate/NFDeregister | 5xx | |
5xx Responses (Subscriptions) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"}[5m])) | NFStatusSubscribe/NFStatusUnsubscribe | 5xx | |
5xx Responses (Discovery) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"}[5m])) | NFDiscover | 5xx |
OCNRF Alerts
This section includes information about alerts for OCNRF.
Table 6-11 Alert Details
Alert | Trigger Condition | Severity | Alert details provided | OID | Metric Used | Resolution | Notes |
---|---|---|---|---|---|---|---|
System Level Alerts | |||||||
OcnrfNfStatusUnavailable | All the OCNRF services are unavailable, either because the OCNRF is getting deployed or purged. These OCNRF services considered are nfregistration, nfsubscription, nrfauditor, nrfconfiguration, nfaccesstoken, nfdiscovery, appinfo, ingressgateway and egressgateway | Critical |
description: 'OCNRF services unavailable' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : All OCNRF services are unavailable.' |
1.3.6.1.4.1.323.5.3.36.1.2.7016 |
'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared automatically when the OCNRF services start
becoming available.
Steps:
|
|
OcnrfPodsRestart | A pod belonging to any of the OCNRF services have restarted. | Major |
description: 'Pod <Pod Name> has restarted. summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A Pod has restarted' |
1.3.6.1.4.1.323.5.3.36.1.2.7017 | 'kube_pod_container_status_restarts_total' Note: This is a kubernetes metric. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared automatically if the specific pod is up. Steps:
|
|
NnrfNFManagementServiceDown | Either NFRegistration or NFSubscription or NrfAuditor services are unavailable. | Critical |
description: 'OCNRF Nnrf_Management service <nfregistration|nfsubscription|nrfauditor> is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFManagement service is down' |
1.3.6.1.4.1.323.5.3.36.1.2.7018 | ''up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. | The alert is cleared when all the Nnrf_NFManagement services are
available that is nfregistration, nfsubscription and nrfauditor.
Steps:
|
|
NnrfAccessTokenServiceDown | NFAccessToken service is unavailable. | Critical |
description: 'OCNRF Nnrf_NFAccessToken service nfaccesstoken is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFAccessToken service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7020 | ''up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available use the similar metric as exposed by the monitoring system. | The alert is cleared when the Nnrf_AccessToken service is
available.
Steps:
|
|
NnrfNFDiscoveryServiceDown | NFDiscovery is unavailable. | Critical |
description: 'OCNRF Nnrf_NFDiscovery service nfdiscovery is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFDiscovery service unavailable.' |
1.3.6.1.4.1.323.5.3.36.1.2.7019 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the Nnrf_NFDiscovery service is available. Steps:
|
|
OcnrfRegistrationServiceDown | None of the pods of the NFRegistration microservice is available. | Critical |
description: 'OCNRF NFRegistration service nfregistration is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFRegistration service is down'' |
1.3.6.1.4.1.323.5.3.36.1.2.7021 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the nfregistration service is available. Steps:
|
|
OcnrfSubscriptionServiceDown | None of the pods of the NFSubscription microservice is available. | Critical |
description: 'OCNRF NFSubscription service nfsubscription is down. summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFSubscription service is down' |
1.3.6.1.4.1.323.5.3.36.1.2.7022 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. | The alert is cleared when the nfsubscription service is
available.
Steps:
|
|
OcnrfDiscoveryServiceDown | None of the pods of the NFDiscovery microservice is available. | Critical |
description: 'OCNRF NFDiscovery service nfdiscovery is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFDiscovery service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7023 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. | The alert is cleared when the nfdiscovery service is available.
Steps:
|
|
OcnrfAccessTokenServiceDown | None of the pods of the NFAccessToken microservice is available. | Critical |
description: 'OCNRF NFAccessToken service nfaccesstoken is down summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFAccesstoken service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7024 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. | The alert is cleared when the nfaccesstoken service is available.
Steps:
|
|
OcnrfAuditorServiceDown | None of the pods of the NrfAuditor microservice is available. | Critical | description: 'OCNRF NrfAuditor service nrfauditor is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfAuditor service down' | 1.3.6.1.4.1.323.5.3.36.1.2.7026 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the nrfauditor service is available. Steps:
|
|
OcnrfConfigurationServiceDown | None of the pods of the NrfConfiguration microservice is available. | Critical |
description: 'OCNRF NrfConfiguration service nrfconfiguration is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfConfiguration service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7025 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the nrfconfiguration service is available. Steps:
|
|
OcnrfAppInfoServiceDown | None of the pods of the App Info microservice is available. | Critical |
description: 'OCNRF Appinfo service appinfo is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Appinfo service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7027 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the app-info service is available. Steps:
|
|
OcnrfIngressGatewayServiceDown | None of the pods of the Ingress-Gateway microservice is available. | Critical |
description: 'OCNRF Ingress-Gateway service ingressgateway is down. summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7028 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the ingressgateway service is available. Steps:
|
|
OcnrfEgressGatewayServiceDown | None of the pods of the Egress-Gateway microservice is available. | Critical |
description: 'OCNRF Egress-Gateway service egressgateway is down' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Egress-Gateway service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7029 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the egressgateway service is available. Note: The threshold is configurable in the alerts.yaml Steps:
|
|
OcnrfMemoryUsageCrossedMinorThreshold | A pod has reached the configured minor threshold( 50%) of its memory resource limits. | Minor |
description: 'OCNRF Memory Usage for pod <Pod name> has crossed the configured minor threshold (50 %) (value={{ $value }}) of its limit.' summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 50% of its limit.' |
1.3.6.1.4.1.323.5.3.36.1.2.7030 | 'container_memory_usage_bytes''container_spec_memory_limit_bytes' Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. | The alert gets cleared when the memory utilization falls below
the Minor Threshold or crosses the major threshold, in which case
OcnrfMemoryUsageCrossedMajorThreshold alert shall be
raised.
Note: The threshold is configurable in the alerts.yaml If guidance required, contact My Oracle Support. |
|
OcnrfMemoryUsageCrossedMajorThreshold | A pod has reached the configured major threshold( 60%) of its memory resource limits. | Major |
description: 'OCNRF Memory Usage for pod <Pod name> has crossed the major threshold(60%) (value = {{ $value }}) of its limit.' summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 60% of its limit.' |
1.3.6.1.4.1.323.5.3.36.1.2.7031 |
'container_memory_usage_bytes' 'container_spec_memory_limit_bytes' Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
The alert gets cleared when the memory utilization falls below
the Major Threshold or crosses the critical threshold, in which case
OcnrfMemoryUsageCrossedCriticalThreshold alert shall be
raised.
Note: The threshold is configurable in the alerts.yaml If guidance required, contact My Oracle Support. |
|
OcnrfMemoryUsageCrossedCriticalThreshold | A pod has reached the configured critical threshold ( 70% ) of its memory resource limits. | Critical |
description: 'OCNRF Memory Usage for pod <Pod name> has crossed the configured critical threshold (70%) (value = {{ $value }}) of its limit.' summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.' |
1.3.6.1.4.1.323.5.3.36.1.2.7032 |
'container_memory_usage_bytes' 'container_spec_memory_limit_bytes' Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
The alert gets cleared when the memory utilization falls below
the Critical Threshold.
Note: The threshold is configurable in the alerts.yaml If guidance required, contact My Oracle Support. |
|
OcnrfTotalIngressTrafficRateAboveMinorThreshold |
The total OCNRF Ingress Message rate has crossed the configured minor threshold of 800 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate) |
Minor |
description: Total'Ingress traffic Rate is above configured minor threshold i.e. 800 requests per second (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' |
1.3.6.1.4.1.323.5.3.36.1.2.7001 | 'oc_ingressgateway_http_requests_total' |
The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcnrfTotalIngressTrafficRateAboveMinorThreshold alert shall be raised. Note: The threshold is configurable in the alerts.yaml Steps: Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable). If this is unexpected, contact My Oracle Support. |
|
OcnrfTotalIngressTrafficRateAboveMajorThreshold |
The total OCNRF Ingress Message rate has crossed the configured major threshold of 900 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate) |
Major |
description: 'Total Ingress traffic Rate is above major threshold i.e. 900 requests per second (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' |
1.3.6.1.4.1.323.5.3.36.1.2.7002 | 'oc_ingressgateway_http_requests_total' |
The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate cross the Critical threshold, in which case the OcnrfTotalIngressTrafficRateAboveCriticalThreshold Note: The threshold is configurable in the alerts.yaml alert shall be raised. Steps: Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable). If this is unexpected, contact My Oracle Support. |
|
OcnrfTotalIngressTrafficRateAboveCriticalThreshold |
The total OCNRF Ingress Message rate has crossed the configured critical threshold of 950 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 95 % of 1000 (Maximum ingress request rate) |
Critical |
description: 'Total Ingress traffic Rate is above critical threshold i.e. 950 requests per second (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 95 Percent of Max requests per second(1000)' |
1.3.6.1.4.1.323.5.3.36.1.2.7003 | 'oc_ingressgateway_http_requests_total' |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the alerts.yaml Steps: Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable). If this is unexpected, contact My Oracle Support. |
|
OcnrfTransactionErrorRateAbove0.1Percent | The number of failed transactions is above 0.1 percent of the total transactions. | Warning |
description: 'Transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 0.1 Percent of Total Transactions' |
1.3.6.1.4.1.323.5.3.36.1.2.7004 | 'oc_ingressgateway_http_responses_total' |
The alert is cleared when the number of failure transactions are below 0.1 percent of the total transactions or when the number of failure transactions cross the 1% threshold in which case the OcnrfTransactionErrorRateAbove1Percent shall be raised. Steps:
|
|
OcnrfTransactionErrorRateAbove1Percent | The number of failed transactions is above 1 percent of the total transactions. | Warning | description: 'Transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 1 Percent of Total Transactions' | 1.3.6.1.4.1.323.5.3.36.1.2.7005 | 'oc_ingressgateway_http_responses_total' |
The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised. Steps:
|
|
OcnrfTransactionErrorRateAbove10Percent | The number of failed transactions has crossed the minor threshold of 10 percent of the total transactions. | Minor |
description: 'Transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 10 Percent of Total Transactions' |
1.3.6.1.4.1.323.5.3.36.1.2.7006 | 'oc_ingressgateway_http_responses_total' |
The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised. Steps:
|
|
OcnrfTransactionErrorRateAbove25Percent | The number of failed transactions has crossed the minor threshold of 25 percent of the total transactions. | Major |
description: 'Transaction Error rate is above 25 Percent of Total Transactions (current value is {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 25 Percent of Total Transactions' |
1.3.6.1.4.1.323.5.3.36.1.2.7007 | 'oc_ingressgateway_http_responses_total' |
The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the OcnrfTransactionErrorRateAbove50Percent shall be raised. Steps:
|
|
OcnrfTransactionErrorRateAbove50Percent | The number of failed transactions has crossed the minor threshold of 50 percent of the total transactions. | Critical |
description: 'Transaction Error rate is above 50 Percent of Total Transactions (current value is {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 50 Percent of Total Transactions' |
1.3.6.1.4.1.323.5.3.36.1.2.7008 | 'oc_ingressgateway_http_responses_total |
The alert is cleared when the number of failure transactions are below 50 percent of the total transactions. Steps:
|
|
OCNRF Application Alerts | |||||||
OcnrfRegisteredNFsBelowCriticalThreshold |
The number of NFs currently registered with OCNRF is below the critical threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is below 2. |
Critical |
description: 'The number of registered NFs detected below critical threshold (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below critical threshold.' |
1.3.6.1.4.1.323.5.3.36.1.2.7009 | 'ocnrf_active_registrations_count' |
The alert is cleared when the number of registered NFs are above the critical threshold. Steps: No Action required. This is an information alert. |
|
OcnrfRegisteredNFsBelowMajorThreshold |
The number of NFs currently registered with OCNRF is below the major threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 2 and less than below 10. |
Major |
description: 'The number of registered NFs detected below major threshold (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below major threshold.' |
1.3.6.1.4.1.323.5.3.36.1.2.7010 | 'ocnrf_active_registrations_count |
The alert is cleared when the number of registered NFs are above the major threshold. Steps: No Action required. This is an information alert. |
|
OcnrfRegisteredNFsBelowMinorThreshold |
The number of NFs currently registered with OCNRF is below the minor threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 10 and less than below 20. |
Minor |
description: 'The number of registered NFs detected below minor threshold (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below minor threshold.' |
1.3.6.1.4.1.323.5.3.36.1.2.7011 | 'ocnrf_active_registrations_count' |
The alert is cleared when the number of registered NFs are above the minor threshold. Steps: No Action required. This is an information alert. |
|
OcnrfRegisteredNFsBelowThreshold |
The number of NFs currently registered with OCNRF is approaching minor threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 20 and less than below 30. |
Warning |
description: 'The number of registered NFs is approaching minor threshold (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs approaching minor threshold.' |
1.3.6.1.4.1.323.5.3.36.1.2.7012 | 'ocnrf_active_registrations_count' |
The alert is cleared when the number of registered NFs are approaching minor threshold. Steps: No Action required. This is an information alert. |
|
OcnrfDbReplicationStatusInactive | The db tier replication service status is inactive across the georedundant OCNRFs. | Critical |
description: 'The Database Replication Status is currently INACTIVE.' summary: 'namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, dbreplicationstatus: {{$labels.DbReplicationStatus}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The database replication status is INACTIVE.' |
1.3.6.1.4.1.323.5.3.36.1.2.7013 | 'ocnrf_dbreplication_status' | The alert is cleared when the dbtier replication services is active. | The Alarm shall be included only if the Georedundancy feature is enabled. |
OcnrfAccessTokenRequestsRejected | OCNRF rejected an AccessToken Request |
critical warning |
description: 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF.' |
1.3.6.1.4.1.323.5.3.36.1.2.7014 | 'ocnrf_accessToken_tx_rejected_total' | The alert is cleared automatically.
Steps: The Rejection Reason shall be present in the alert. In case the RejectionReason is AuthScreeningFailed/ClientNotAuthorized, either the configurations need to be reevaluated or check the consumer NF that has requested for unauthorized token. For other reason, follow the RejectionReason. |
|
OcnrfNfAuthenticationFailureRequestsRejected | OCNRF rejected a service request due to NF authentication failure |
critical warning |
description: 'Service request(s) received from NF have been rejected by OCNRF (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Request rejected for Nf FQDN based Authentication failure.' |
1.3.6.1.4.1.323.5.3.36.1.2.7015 | 'ocnrf_nf_authentication_failure_total' | The alert is cleared automatically.
Steps: No Action required for OCNRF. This is an information alert. The Rejection Reason shall be present in the alert |
OCNRF Alert Configuration
This section describes the Measurement based Alert rules configuration for OCNRF. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.
Note:
- Alert file is packaged with OCNRF custom templates. The OCNRF templates.zip file can be downloaded from OHC. Unzip the OCNRF templates.zip file to get NrfAlertRules.yaml file.
- Review the NrfAlertRules.yaml file and edit the value of the parameters in the NrfAlertRules.yaml file (if needed to be changed from default values) before configuring the alerts. See below table for details.
- kubernetes_namespace is configured as kubernetes namespace in which NRF is deployed. Default value is OCNRF. Please update the NrfAlertRules.yaml file to reflect the correct OCNRF kubernetes namespace.
Table 6-12 Alerts
Alert Name | Details | Default Value | Notes |
---|---|---|---|
OcnrfTotalIngressTrafficRateAboveMinorThreshold | Traffic Rate is above 80 Percent of Max requests per second | Greater than/equal to 800 and Less than 900 |
Maximum Ingress rate considered is 1000 requests per second. So, here in default value 800 is 80% of 1000 and 900 is 90% of 1000. For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 80% of Max Ingress Request Rate] for this alert |
OcnrfTotalIngressTrafficRateAboveMajorThreshold | Traffic Rate is above 90 Percent of Max requests per second | Greater than/equal to 900 and Less than 950 |
Maximum Ingress rate considered is 1000 requests per second. So, here in default value 900 is 90% of 1000 and 950 is 95% of 1000. For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 95% of Max Ingress Request Rate] for this alert |
OcnrfTotalIngressTrafficRateAboveCriticalThreshold | Traffic Rate is above 95 Percent of Max requests per second | Greater than/equal to 950 |
Maximum Ingress rate considered is 1000 requests per second. So, here in default value 950 is 95% of 1000. For example, if value need to be updated then depending upon maximum ingress request rate, set [ 95% of Max Ingress Request Rate] for this alert |
OCNRF Alert configuration in Prometheus
This section describes the measurement based Alert rules configuration for OCNRF in Prometheus. Please use the NrfAlertRules.yaml file updated in OCNRF Alert configuration section.
_NAME_ :- Helm Release of Prometheus
_Namespace_ :- Kubernetes NameSpace in which Prometheus is installed
- Take Backup of current
configuration map of Prometheus:
kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml
- Check and add OCNRF Alert file name
inside Prometheus configuration map:
sed -i '/etc\/config\/alertsnrf/d' /tmp/tempConfig.yaml sed -i '/rule_files:/a\ \- /etc/config/alertsnrf' /tmp/tempConfig.yaml
- Update configuration map with
updated file name of OCNRF alert file:
kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
- Add OCNRF Alert rules in
configuration map under file name of OCNRF alert file:
kubectl patch configmap _NAME_-server -n _Namespace_--type merge --patch "$(cat ~/NrfAlertrules.yaml)"
Note:
The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 60 seconds. Refresh the Prometheus GUI to confirm that the OCNRF Alerts have been reloaded.Disable OCNRF Alert in Prometheus
- Edit NrfAlertrules.yaml file to remove specific alert:
Sample alert content from NrfAlertrules.yaml is below. This is to provide idea of a specific alert details in NrfAlertrules.yaml which need to be disabled.
## ALERT SAMPLE START## - alert: OcnrfTrafficRateAboveMinorThreshold annotations: description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})' summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)' expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900 labels: severity: Minor ## ALERT SAMPLE END##
- Remove specific alert content which need to be disabled.
- Perform Alert configuration again. See OCNRF Alert configuration in Prometheus section above for detailed steps.
Disabling Alerts
- Edit NrfAlertrules.yaml file to remove specific alert.
- Remove complete content of the specific alert from the
NrfAlertrules.yaml file.
For example: If you want to remove
OcnrfTrafficRateAboveMinorThreshold
alert, remove the complete content:## ALERT SAMPLE START## - alert: OcnrfTrafficRateAboveMinorThreshold annotations: description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})' summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)' expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900 labels: severity: Minor ## ALERT SAMPLE END##
- Perform Alert configuration. See OCNRF Alert Configuration section above for details.
Configuring SNMP Notifier
This section describes the procedure to configuring SNMP Notifier.
Configure and Validate Alerts in Prometheus Server
Refer to OCNRF Alert Configuration section for procedure to configure the alerts.
Validating AlertsAfter configuring the alerts in Prometheus server, a user can verify that by following steps:
- Open the Prometheus server from your browser using the <IP>:<Port>
- Navigate to Status and then Rules
- Search Ocnrf. OcnrfAlerts list is
displayed.
Note:
If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try again.
- Execute the following command to edit the
deployment:
kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>
Example:
$ kubectl edit deploy occne-snmp-notifier -n occne-infra
- Edit the destination as
follows:
--snmp.destination=<destination_ip>:<destination_port>
Example:
--snmp.destination=10.75.203.94:162
$ docker logs <trapd_container_id>
2020-04-29 15:34:24 10.75.203.103 [UDP: [10.75.203.103]:2747->[172.17.0.4]:162]:DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (158510800) 18 days, 8:18:28.00 SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003 SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.1 = STRING: "1.3.6.1.4.1.323.5.3.36.1.2.7003[]" SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.2 = STRING: "critical" SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.3 = STRING: "Status: critical- Alert: OcnrfActiveSubscribersBelowCriticalThreshold Summary: namespace: ocnrf, nftype:5G_EIR, nrflevel:6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c, podname: ocnrf-nrfauditor-6b459f5db5-4kvt4,
timestamp: 2020-04-29 15:33:24.408 +0000 UTC: Current number of registered NFs detected below critical threshold. Description: The number of registered NFs detected below critical threshold (current value
is: 0)
There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.
- OCNRF-MIB-TC-1.8.0.mib
This is considered as OCNRF top level mib file, where the Objects and their data types are defined.
- OCNRF-MIB-1.8.0.mib
This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.
Note:
MIB files are packaged along with OCNRF Custom Templates. Download the file from OHC. Refer to OCNRF Installation and Upgrade guide for more details.