10 OCNRF Metrics, KPIs, and Alerts
OCNRF Metrics
This section includes information about metrics for Oracle Communications Network Repository Function.
Note:
Sample OCNRF dashboard for Grafana is delivered to the customer through OCNRF Custom Templates. Metrics and functions used to achieve KPI are covered in OCNRF Custom Templates. Refer to Oracle Help Center site for the information about OCNRF Custom Templates.Dimensions Legend for the Metrics
The following table includes the details about the metrics dimensions:
Table 10-1 Dimensions Legend
| Dimension | Details |
|---|---|
| application | application name, here it is ocnrf |
| vendor | For OCNRF, vendor is Oracle |
| Method | HTTP Method Name. For example: PUT, GET |
| Status | HTTP Status Code in response |
| Uri | URI defined to identify the Service Operation at Ingress Gateway |
| Node | Name of the kubernetes worker node on which microservice is running |
| NrfLevel | OCNRF Deployment Name by which OCNRF can be identified, it will be OCNRF Instance Id passed through helm |
| NfType | Types of Network Functions (NF) |
| NfInstanceId | Unique identity of the NF Instance sending request to OCNRF |
| HttpStatusCode | HTTP Status Code |
| ServiceName | Name of the service instance (for example: "nudm-sdm") |
| ServiceInstanceId | Unique ID of the service instance within a given NF Instance |
| UpdateType(Partial/Complete) | NF Update with PUT (Complete) or PATCH (Partial) methods |
| OperationType | Dimension is for NFSubscribe Service operation to tell if the request is to create or update the subscription |
| NotificationEventType | This dimension indicates subscription request is for which event types. For example: NF_REGISTERED, NF_DEREGISTERED and NF_PROFILE_CHANGED |
| TargetNfType | Dimension indicates request is for which target NF type |
| RequesterNfType | Dimension indicates the NF type which originating the
request. This value comes from UserAgent header. For NFDiscover Service
operation it is taken from Search Query.
In case no header or value, this value will be UNKNOWN in the metrics |
| TargetNfInstanceId | Dimension indicates the target NF Instance Id for NF Access Token |
| ClientNfInstanceId | Dimension indicates the client NF Instance Id for NF Access Token |
| RejectionReason | Dimension indicates the rejection reason for NF Access Token |
| SubscriptionIdType | Dimension indicates the Subscription Id type for which SLF query is received |
| GroupId | Dimension indicates the GroupId returned by SLF/UDR corresponding to SubscriptionId |
| BucketSize | Dimension indicates how many profiles are returned in the response of Discovery request. Range is not configurable. Possible values are 0-10, +Inf. According to NF profiles returned, corresponding bucket will be incremented by one. For example, if 2 profiles are returned, then bucket 2 will be incremented by one. Profiles getting returned more than 10 will fall in +Inf bucket. |
| DBOperation | Create,update,delete and find |
| TableName | OCNRF Table Name |
| SubscriptionStatus | Status of subscription shall be 'SUBSCRIBED', 'SUSPENDED' or 'UNSUBSCRIBED' |
| DbReplicationStatus | "ACTIVE" or "INACTIVE" |
| RemoteNrfInstanceId | Remote OCNRF Instance Id |
| HeartbeatTimer | The heartbeatTimer of the NfProfile. The value is considered in seconds. |
| TLSFqdn | FQDN received in TLS Certificate |
| NfFqdn |
FQDN of consumer NF. This dimension will only be available if the service mesh sends the consumer NF FQDN in XFCC header, otherwise this value will be UNKNOWN in the metrics |
| ServiceOperation | Service operations as defined in 3gpp specification for NRF |
| Scope | Scope as received in the AccessToken Request |
| ResponseReason | Response Reason in Response sent back to NF |
| SubscriptionId | Subscription Id generated by OCNRF for NFStatusSubscribe Service Operation |
| NFType |
Used in Gateway metrics. NF Type extracted from URI. Path is /nxxx-yyy/vz/....... Where xxx will be changed to (Upper Case) is NFType UNKNOWN if unable to extract NFType from the path Example: nnrf-nfm/v1/nf-instances |
| NFServiceType |
Used in Gateway metrics. NF Type extracted from URI. Path is /nxxx-yyy/vz/....... Where nxxx-yyy is NFServiceType UNKNOWN if unable to extract NFServiceType from the path Example: nnrf-nfm/v1/nf-instances |
| Host |
Used in Gateway metrics. (Ip or fqdn): port of gateway |
| HttpVersion | Http protocol version - HTTP/1.1, HTTP/2.0 |
| Scheme | Http protocol scheme - HTTP, HTTPS, UNKNOWN |
| ClientCertIdentity |
Used in Gateway metrics. Certificate Identity of the client, SAN=127.0.0.1,localhost CN=localhost, N/A if data is not available |
| Route_Path |
Used in Gateway metrics. Path predicate/Header predicate that matched the current request |
| InstanceIdentifier |
Used in Gateway metrics. Prefix of the pod configured in helm when there are multiple instances in same deployment- Prefix configured in helm otherwise UNKNOWN |
| ErrorOriginator |
Used in Gateway metrics. This tag captures the ErrorOriginator - ServiceProducer, Nrf, IngressGW, None |
| Direction |
Used in Gateway metrics. Direction of the request or response. egress, egressOut |
| error_reason |
Reason for failure response received. If message is sent in the response, then it is filled with the message otherwise exception class is filled. In case of successful response it is filled with "no-error". Examples: error_reason="no_error" (In case successful response is received), error_reason="java.nio.channels.ClosedChannelException", error_reason="unable to find valid certification path to requested target" |
| KeyId | Key Id from Access Token Configuration used to sign the Access Token |
| KeyType | Key type of Access Token Configuration (private key or certificate) |
| isCurrentKeyId | True or False, when specific metric is for current key id in Access Token Configuration. |
OCNRF Gateways Metrics
Table 10-2 OCNRF Gateways Metrics
| Metric Name | Metric Details | Metric filter | Metric Type | Dimensions |
|---|---|---|---|---|
| Total number of ingress requests | Total number of requests received at OCNRF | oc_ingressgateway_http_requests_total | Counter |
Method NFType NFServiceType Host HttpVersion Scheme Route_path InstanceIdentifier ClientCertIdentity |
| NF Register Success | Total number of successful NFRegister service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"} | Counter |
Method NFType NFServiceType Host HttpVersion Scheme Route_path InstanceIdentifier ClientCertIdentity |
| NF Update Success (Complete Replacement) | Total number of successful NFUpdate service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| NF Update Success (Partial Replacement) | Total number of successful NFUpdate service operations at OCNRF | oc_ingressgateway_http_responses_total{Status=~".*2.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PATCH"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| NF List/Profile Retrieval Success | Total number of successful NF List/Profile retrieval service operations at OCNRF | oc_ingressgateway_http_responses_total{Status=~".*2.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="GET"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| Access Token Success | Total number of successful Access Token service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*/oauth2/token*."} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| NF De-register Success | Total number of successful service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="DELETE"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| NF Subscribe Success | Total number of successful NFStatusSubscribe service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="POST"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| NF Unsubscribe Success | Total number of successful NFStatusUnSubscribe service operations at OCNRF | oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="DELETE"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| NF Discover Success | Total number of successful NFDiscover service operations at OCNRF | oc_ingressgateway_http_responses_total{Status=~"2.*",Route_path=~".*nnrf-disc/v1/nf-instances.*",Method="GET"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| 4xx Responses (NF-Instances) | Total number of 4xx responses (NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval) | oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| 4xx Responses (Subscriptions) | Total number of 4xx responses (NFStatusSubscribe/NFStatusUnSubscribe) | oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| 4xx Responses (Discovery) | Total number of 4xx responses (NfDiscover) | oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| 4xx Responses (AccessToken) | Total number of 4xx responses (NfAccessToken) | oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*oauth2/token.*"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| 5xx Responses (NF-Instances) | Total number of 5xx responses (NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval) | oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| 5xx Responses (Subscriptions) | Total number of 5xx responses (NFStatusSubscribe/NFStatusUnSubscribe) | oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| 5xx Responses (Discovery) | Total number of 5xx responses (NfDiscover) | oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme Identifier ClientCertIdentity |
| 5xx Responses (AccessToken) | Total number of 5xx responses (NfAccessToken) | oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*oauth2/token.*"} | Counter |
Status Method Route_path NFType NFServiceType Host HttpVersion Scheme InstanceIdentifier ClientCertIdentity |
| Avg NRF Latency |
Time (in microseconds) to process an ingress request. Measured from when the request is received to when the response is sent |
oc_ingressgateway_request_latency_seconds | Timer |
quantile InstanceIdentifier |
| Connection Failures Ingress Gateway |
Metric to capture the connection failures when connect to the destination service fails. Here in case of Ingress gateway, the destination service will be a backend microservice of the NF. TLS connection failure metrics when connecting to ingress. |
oc_ingressgateway_connection_failure_total | Counter |
Host Port InstanceIdentifier error_reason |
| Ingress Gateway Request Processing Latency | Metric to capture the amount of time taken for processing of the request only within Ingress gateway. | oc_ingressgateway_request_processing_latency_seconds | Timer |
quantile InstanceIdentifier |
| Total number of Egress requests | Metric to capture the request count reaches the Egress gateway from the application microservice and pegs with Direction as egress. Also, when the request goes out of egress gateway and pegs with Direction as egressOut. | oc_egressgateway_http_requests_total | Counter |
Method NFType NFServiceType Host HttpVersion Scheme Proxy InstanceIdentifier Direction |
| Total number of Egress responses | Metric to capture when Egress gateway sends response back to backend NF microservice and pegs with Direction as egress. Also, when the response is received Egress gateway and pegs with Direction as egressOut. | oc_egressgateway_http_responses_total | Counter |
Status Method NFType NFServiceType Host HttpVersion Scheme InstanceIdentifier Direction |
| Connection Failures Egress Gateway | Metric to capture failure while connecting the backend microservice and destination service | oc_egressgateway_connection_failure_total | Counter |
Host Port InstanceIdentifier Direction error_reason |
| Egress Gateway Request Processing Latency | Metric captures the amount of time taken for processing of the request only within Egress gateway. | oc_egressgateway_request_processing_latency_seconds | Timer |
quantile InstanceIdentifier |
OCNRF NF Metrics
Table 10-3 OCNRF NF Metrics
| Metric Name | Metric Details | Metric filter | Metric Type | Recommended legend to see dimension level data (as applicable) | Dimensions |
|---|---|---|---|---|---|
| NfRegistrations Total | Number of Registration Requests received | ocnrf_nfRegister_rx_requests_total | Counter | NfRegistrations Total | NrfLevel, NfInstanceId, RequesterNfType, NfFqdn |
| NfRegistrations Responses Total | Number of Registration Responses sent. | ocnrf_nfRegister_tx_responses_total | Counter | NfRegistrations Responses Total | NrfLevel, NfInstanceId, RequesterNfType, HttpStatusCode, NfFqdn |
| NfRegistrations Per Service Total | Number of Registrations received and processed successfully per Service. | ocnrf_nfRegister_rx_requests_success_perService_total | Counter | NfRegistrations Per Service [ serviceName :- {{ serviceName }}, nfInstanceId :- {{NfInstanceId}} ] | NrfLevel, NfInstanceId, ServiceName, ServiceInstanceId, NfFqdn |
| NFUpdates Total | Number of Update Requests received. | ocnrf_nfUpdate_rx_requests_total | Counter | NfUpdates Total | NrfLevel, NfInstanceId RequesterNfType, UpdateType (Partial/Complete), HttpStatusCode, NfFqdn |
| NFUpdates Responses Total | Number of Update Responses sent. | ocnrf_nfUpdate_tx_responses_total | Counter | NfUpdates Responses Total | NrfLevel, NfInstanceId, RequesterNfType, UpdateType (Partial/Complete), HttpStatusCode, NfFqdn |
| NFUpdates Per Service Total | Number of NfUpdates received and processed successfully per Service. | ocnrf_nfUpdate_rx_requests_success_perService_total | Counter | NFUpdates Per Service [ serviceName :- {{ serviceName }}, serviceInstanceId:- {{ServiceInstanceId}} ] | NrfLevel, Updatetype =(Partial/Complete), NfInstanceId, ServiceName, ServiceInstanceId, NfFqdn |
| Heartbeat Requests Total | Number of Heartbeat Requests received | ocnrf_nfHeartbeat_rx_requests_total | Counter | NrfLevel, NfInstanceId, RequesterNfType, NfFqdn | |
| Heartbeat Responses Total | Number of Heartbeat Responses sent | ocnrf_nfHeartbeat_tx_responses_total | Counter | Nrflevel, NfInstanceId, RequesterNfType, HttpStatusCode, NfFqdn | |
| NF De-Registration Requests Total | Number of De-registration requests received | ocnrf_nfDeregister_rx_requests_total | Counter | NrfLevel, NfInstanceId, RequesterNfType, NfFqdn | |
| NF De-Registration Responses Total | Number of De-registration responses sent | ocnrf_nfDeregister_tx_responses_total | Counter | NrfLevel, NfInstanceId, RequesterNfType, HttpStatusCode, NfFqdn | |
| NF De-Registrations Per Service Total | Number of De-registration requests received and process successfully per Service | ocnrf_nfDeregister_rx_requests_success_perService_total | Counter | NFDeregistration Per Service [ serviceName :- {{ serviceName }}, serviceInstanceId:- {{ServiceInstanceId}} ] | NrfLevel, ServiceName, ServiceInstanceId, NfInstanceId, NfFqdn |
| NF List Retrieval Requests Total | Number of NFListRetrieval requests received | ocnrf_nfListRetrieval_rx_requests_total | Counter | NrfLevel, RequesterNfType, NfFqdn | |
| NF List Retrieval Responses Total | Number of NFListRetrieval responses sent | ocnrf_nfListRetrieval_tx_responses_total | Counter | NrfLevel, RequesterNfType, HttpStatusCode, NfFqdn | |
| NF Profile Retrieval Requests Total | Number of NFProfileRetrieval requests received | ocnrf_nfProfileRetrieval_rx_requests_total | Counter | NrfLevel, NfInstanceId, NfFqdn | |
| NF Profile Retrieval Responses Total | Number of NFProfileRetrieval responses sent | ocnrf_nfProfileRetrieval_tx_responses_total | Counter | NrfLevel, NfInstanceId, HttpStatusCode, NfFqdn | |
| Number of Heartbeats missed | Number of heartbeats missed. | ocnrf_heartbeat_missed_total | Counter | NrfLevel, NfType, NfInstanceId, NfFqdn | |
| NF Status Subscribe Requests Total | Number of NStatusSubscribe requests received | ocnrf_nfStatusSubscribe_rx_requests_total | Counter | NrfLevel, RequesterNfType, OperationType, NfFqdn | |
| NF Status Subscribe Responses Total | Number of NfStatusSubscribe responses sent | ocnrf_nfStatusSubscribe_tx_responses_total | Counter | NrfLevel, RequesterNfType, HttpStatusCode, OperationType, NfFqdn | |
| NF Status UnSubscribe Requests Total | Number of NfStatusUnsubscribe requests received | ocnrf_nfStatusUnsubscribe_rx_requests_total | Counter | NrfLevel, RequesterNfType, NfFqdn | |
| NF Status UnSubscribe Responses Total | Number of NfStatusUnsubscribe responses sent | ocnrf_nfStatusUnsubscribe_tx_responses_total | Counter | NrfLevel, RequesterNfType, HttpStatusCode, NfFqdn | |
| NF Status Notifications Requests Sent | Number of NfStatusNotify requests sent | ocnrf_nfStatusNotify_tx_requests_total | Counter | NrfLevel, NotificationEventType, TargetNfType, NfFqdn, SubscriptionId | |
| NF Status Notifications Responses Received | Number of NfStatusNotify responses received | ocnrf_nfStatusNotify_rx_responses_total | Counter | NrfLevel, NotificationEventType, TargetNfType, HttpStatusCode, NfFqdn, SubscriptionId | |
| NF Status Notifications Requests Failed | Number of NfStatusNotify requests failed to sent out | ocnrf_nfStatusNotify_requests_failed_total | Counter | NrfLevel, NotificationEventType, TargetNfType, NfFqdn,SubscriptionId | |
| NfDiscover Requests Total | Number of NfDiscover Requests received | ocnrf_nfDiscover_rx_requests_total | Counter | NfDiscover Req [ TargetNf :- {{ TargetNfType }}, RequesterNfType :- {{RequesterNfType}} ] | NrfLevel, TargetNfType, RequesterNfType, NfFqdn |
| NfDiscover Responses Total | Number of NfDiscover responses sent | ocnrf_nfDiscover_tx_responses_total | Counter | NrfLevel, TargetNfType, RequesterNfType, HttpResponseCode, NfFqdn | |
| NFDiscover Per Service Total | Number of NfDiscover requests received and processed successfully per Service | ocnrf_nfDiscover_rx_requests_success_perService_total | Counter | NFDiscover Per Service [ serviceName :- {{ serviceName }} ] | NrfLevel, RequesterNfType, ServiceName, NfFqdn |
| Discovered profiles | Number of Profiles returned in discovery response. Depending on bucket size and corresponding value will tell how many profiles are returned in discovery response. | ocnrf_nfDiscover_profiles_discovered_total | Counter | Discovered profiles [ TargetNfType :- {{TargetNfType}}, Bucket :- {{ Bucket }} ] | NrfLevel, TargetNfType, BucketSize, NfFqdn |
| Active Registrations | Number of active registered NFs at any point of time | ocnrf_active_registrations_count | Gauge | Active Registrations [ NfType-{{ NfType }}, NrfLevel-{{ NrfLevel }} ] | NfType, NrfLevel |
| Avg NRF Latency taken by NRF specific microservice | Time taken by NRF specific microservice to process the service operation (NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval/NfHeartbeat/NfDiscover/NFStatusSubscribe/NFStatusUnSubscribe/NfAccessToken) Note: Latency calculated by this metric doesn't include time taken by OCNRF API gateway. | ocnrf_message_processing_time_seconds | Timer | Avg NRF Latency {{ ServiceOperation }} {{ RequesterNfType }} | NrfLevel, RequesterNfType, ServiceOperation |
| OCNRF database operations | Database operation count corresponding to every service operation | ocnrf_dbmetric_total | Counter |
Method, DBOperation, NrfLevel, HttpStatusCode |
|
| Database operation round trip time |
Time (in microseconds) taken by database operation corresponding to every service operation NfRegister/NfUpdate/NfDelete/NfProfileRetrieval/NfListRetrieval/NfHeartbeat/NfDiscover/NFStatusSubscribe/NFStatusUnSubscribe/NfAccessToken) |
ocnrf_dbmetrics_round_trip_time_seconds | Timer | Method, DBOperation, ServiceOperation, TableName: (NRF Table Names), NrfLevel, HttpStatusCode |
NF Screening Metrics
Table 10-4 NF Screening metrics
| Metric Name | Metric Details | Metric filter | Metric Type | Service Operation | Dimensions |
|---|---|---|---|---|---|
| Total NF Requests for which Screening Failed | The total number of requests for which screening failed against NF FQDN screening list. | ocnrf_nfScreening_nfFqdn_requestFailed_total | Counter | NFRegister, NFUpdate | NRF level, NF type, NfFqdn |
| Total NF Requests Rejected due to Screening Failed | The total number of requests rejected because screening failed against NF FQDN screening list. | ocnrf_nfScreening_nfFqdn_requestRejected_total | Counter | NFRegister, NFUpdate | NRF level, NF type, NfFqdn |
| Total NF Requests for which Screening Failed | The total number of requests for which screening failed against NF IP endpoint screening list. | ocnrf_nfScreening_nfIpEndPoint_requestFailed_total | Counter | NFRegister, NFUpdate | NRF level, NF type, NfFqdn |
| Total NF Requests Rejected due to Screening Failed | The total number of requests rejected because screening failed against NF IP endpoint screening list. | ocnrf_nfScreening_nfIpEndPoint_requestRejected_total | Counter | NFRegister, NFUpdate | NRF level NF type NfFqdn |
| Total NF Requests for which Screening Failed | The total number of requests for which screening failed against Callback URI screening list. | ocnrf_nfScreening_callbackUri_requestFailed_total | Counter | NFRegister, NFUpdate, NFStatusSubscribe | NRF level, NF type, NfFqdn |
| Total NF Requests Rejected due to Screening Failed | The total number of requests rejected because screening failed against Callback URI screening list. | ocnrf_nfScreening_callbackUri_requestRejected_total | Counter | NFRegister, NFUpdate, NFStatusSubscribe | NRF level, NF type, NfFqdn |
| Total NF Requests for which Screening Failed | The total number of requests for which screening failed against PLMN id screening list. | ocnrf_nfScreening_plmnId_requestFailed_total | Counter | NFRegister, NFUpdate | NRF level NF type NfFqdn |
| Total NF Requests Rejected due to Screening Failed | The total number of requests rejected because screening failed against PLMN id screening list. | ocnrf_nfScreening_plmnId_requestRejected_total | Counter | NFRegister, NFUpdate | NRF level, NF type, NfFqdn |
| Total NF Requests for which Screening Failed | The total number of NFRegister requests rejected as NF type was not allowed to register with NRF. | ocnrf_nfScreening_nfTypeRegister_requestFailed_total | Counter | NFRegister | NRF level, NF type, NfFqdn |
| Total NF Requests Rejected due to Screening Failed | The total number of NFRegister requests for which screening failed against NF type screening list. | ocnrf_nfScreening_nfTypeRegister_requestRejected_total | Counter | NFRegister | NRF level, NF type, NfFqdn |
| NF Screening not applied Internal Error | The total number of times screening not applied due to internal error. | ocnrf_nfScreening_notApplied_InternalError_total | Counter | NFRegister, NFUpdate, NFStatusSubscribe | NRF level, NF type, NfFqdn |
Note:
In the above "NF Screening metrics" table, the dimension NF Type is a requester NF Type.NF Access token Metrics
Table 10-5 NF Access token metrics
| Metric Name | Metric Details | Metric filter | Metric Type | Service Operation | Dimensions |
|---|---|---|---|---|---|
| NF Access Token Request Received Total | The total number of access token requests received | ocnrf_accessToken_rx_requests_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, NfFqdn |
| NF Access Token Responses Sent Total | The total number of access token responses sent | ocnrf_accessToken_tx_responses_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType |
| NF Access Token Request Rejected (ClientNotAuthorized) | Number of access token request for which client authorized failed. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId,
ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason
HttpStatusCode, NfFqdn, KeyId, KeyType
RejectionReason = ClientNotAuthorized |
| NF Access Token Request Rejected (ProducerWithRequestedScopeNotFound) | Number of access token not granted because of no producer instance registered for service/s in the scope. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId,
ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason
HttpStatusCode, NfFqdn, KeyId, KeyType
RejectionReason = ProducerWithRequestedScopeNotFound |
| NF Access Token Request Rejected (ProducerWithRequestedNfInstanceIdNotFound) | Number of access token not granted because of no producer instance registered for No producer instance is registered at all for provided target Instance Id in request. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId,
ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason
HttpStatusCode, NfFqdn, KeyId, KeyType
RejectionReason = ProducerWithRequestedNfInstanceIdNotFound |
| NF Access Token Request Rejected (InconsistentScope) | Number of access token not granted because services in the scope belong to different NF types. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId,
ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason
HttpStatusCode, NfFqdn, KeyId, KeyType
RejectionReason = InconsistentScope |
| NF Access Token Request Rejected (ConsumerNFTypeMismatch) | Number of access token not granted because consumer NF type in profile is not matching with the access token request. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId,
ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason
HttpStatusCode, NfFqdn, KeyId, KeyType
RejectionReason = ConsumerNFTypeMismatch |
| NF Access Token Request Rejected (ProducerNFTypeMismatch) | Number of access token not granted because producer NF type in profile is not matching with the access token request. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId,
ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason
HttpStatusCode, NfFqdn, KeyId, KeyType
RejectionReason = ProducerNFTypeMismatch |
| NF Access Token Request Rejected (InternalError) | Number of access token not granted because failure at NRF due to internal error. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId,
ClientNfInstanceId, ServiceName, NrfLevel, RejectionReason
HttpStatusCode, NfFqdn, KeyId, KeyType
RejectionReason = ProducerNFTypeMismatch |
| NF Access Token Request Rejected (ConsumerNfTypeNotAllowed) | Number of access token not granted because the consumer NFType is not allowed to access the requested NF. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType RejectionReason = ConsumerNfTypeNotAllowed |
| NF Access Token Request Rejected (ConsumerPlmnNotAllowed) | Number of access token not granted because the consumer NF PLMN is not allowed to access the requested NF. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType RejectionReason = ConsumerPlmnNotAllowed |
|
NF Access Token Request Rejected (SecretNotAccessible) |
Number of access token not granted because the secret for current key id is not accessible. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType RejectionReason = SecretNotAccessible |
|
NF Access Token Request Rejected (InvalidFileData) |
Number of access token not granted because the current key id file data is invalid. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType RejectionReason = InvalidFileData |
|
NF Access Token Request Rejected (NamespaceNotAccessible) |
Number of access token not granted because the namspace for current key id is not accessible. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType RejectionReason = NamespaceNotAccessible |
|
NF Access Token Request Rejected (FileNotFound) |
Number of access token not granted because the file not found in secrets. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType RejectionReason = FileNotFound |
|
NF Access Token Request Rejected (CurrentKeyIdNotConfigured) |
Number of access token not granted because the current key id is not configured. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType RejectionReason = CurrentKeyIdNotConfigured |
|
NF Access Token Request Rejected (ExpiredCertificate) |
Number of access token not granted because the OCNRF certificate is expired. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType RejectionReason = ExpiredCertificate |
| NF Access Token Request Rejected (BadRequest) | Number of access token not granted because the Request is incorrect. | ocnrf_accessToken_tx_rejected_total | Counter | AccessToken |
TargetNfType, RequesterNfType, TargetNfInstanceId, ClientNfInstanceId, Scope, NrfLevel, HttpStatusCode, NfFqdn, KeyId, KeyType RejectionReason = BadRequest |
NRF Configuration Metrics
Table 10-6 NRF Configuration Metrics
| Metric Name | Metric Details | Metric Filter | Metric Type | Service Operation | Dimensions |
|---|---|---|---|---|---|
| OCNRF Oauth Token Signing Keys Health Status | Oauth Token Signing keys health status | ocnrf_oauth_keyData_healthStatus
Value 0 - Healthy Value 1 - Unhealthy |
Gauge | Configuration | KeyId, KeyType, isCurrentKeyId, NrfLevel |
| OCNRF Oauth Current KeyId Configuration Status | Oauth Current Key Id Configuration Status | ocnrf_oauth_currentKeyId_configuredStatus
Value 0 - Healthy Value 1 - Unhealthy |
Gauge | Configuration | NrfLevel |
| OCNRF Oauth Token Signing Keys Expiry Status | Oauth Token Signing keys Expiry Status |
ocnrf_oauth_keyData_expiryStatus (Value is expiry time in epoch time) |
Gauge | Configuration |
KeyId, isCurrentKeyId, NrfLevel |
NRF-SLF Metrics
Table 10-7 NRF-SLF metrics
| Metric Name | Metric Details | Metric filter | Metric Type | Service Operation | Dimensions |
|---|---|---|---|---|---|
| Discover Request Received For SLF Total | The total number of NF Discover request received for SLF | ocnrf_nfDiscover_ForSLF_rx_requests_total | Counter | NFDiscover | TargetNfType, NRFLevel, NfFqdn |
| Discover Response Sent For SLF Total | The total number of NF Discover responses sent for SLF | ocnrf_nfDiscover_ForSLF_tx_responses_total | Counter | NFDiscover | TargetNfType, NRFLevel, HttpStatusCode, ResponseReason,
NfFqdn
Possible Response Reasons: ResponseReason = SLFCommunicationFailure ResponseReason = MandatoryParamsMissingResponseReason = SLFSubscriberNotProvisioned ResponseReason = ErrorFromSLFResponseReason = InternalError ResponseReason = SuccessFromSLFResponseReason = GroupIdUsedFromSearchQuery |
| SLF Query Requests Sent Total | The total number of SLF query request sent | ocnrf_SLF_tx_requests_total | Counter | NFDiscover | TargetNfType, NRFLevel, SubscriptionIdType, NfFqdn |
| SLF Query Responses Received Total | The total number of SLF query response received | ocnrf_SLF_rx_responses_total | Counter | NFDiscover | TargetNfType, NRFLevel, SubscriptionIdType,HttpStatusCode, GroupId, NfFqdn |
| SLF Round Trip Time Total | Time (in microseconds) after sending query to SLF and getting response from SLF | ocnrf_slf_round_trip_time_seconds | Timer | NFDiscover |
TargetNfType, SubscriptionIdType, HttpStatusCode, GroupId, NrfLevel, SLF ApiRoot, NfFqdn |
NRF Forwarding Metrics
Table 10-8 NRF Forwarding Metrics
| Metric Name | Metric Details | Metric filter | Metric Type | Service Operation | Dimensions |
|---|---|---|---|---|---|
| NF Access Token Requests Forwarded Total | The total number of Access Token Request forwarded to Primary/Secondary NRF | ocnrf_forward_accessToken_tx_requests_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId, ClientNfInstanceId, ServiceName, NrfLevel, NfFqdn |
| NF Access Token Forwarded Responses Total | The total number of Access Token Responses for request forwarded to Primary/Secondary NRF | ocnrf_forward_accessToken_rx_responses_total | Counter | AccessToken | TargetNfType, ClientNfType, TargetNfInstanceId,
ClientNfInstanceId, ServiceName, NrfLevel,HttpStatusCode,
RejectionReason, NfFqdn RejectionReason:
*NotApplicable is applicable for 2xx Status code |
| NF Profile Retrieval Requests Forwarded Total | The total number of Profile Retrieval Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfProfileRetrieval_tx_requests_total | Counter | NFProfileRetrieval | NrfLevel, NfInstanceId, NfFqdn |
| NF Profile Retrieval Forwarded Responses Total | The total number of Profile Retrieval Responses for Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfProfileRetrieval_rx_responses_total | Counter | NFProfileRetrieval | NrfLevel, NfInstanceId, HttpStatusCode,
RejectionReason, NfFqdn RejectionReason:
*NotApplicable is applicable for 2xx Status code |
| NF Status Subscribe Forwarded Requests Total | The total number of Status Subscribe Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfStatusSubscribe_tx_requests_total | Counter | NFStatusSubscribe, NFStatusUnsubscribe | NrfLevel, RequesterNfType, OperationType, NfFqdn |
| NF Status Subscribe Forwarded Responses Total | The total number of Responses for Status Subscribe Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfStatusSubscribe_rx_responses_total | Counter | NFStatusSubscribe, NFStatusUnsubscribe, | NrfLevel, RequesterNfType, HttpStatusCode,
OperationType, RejectionReason, NfFqdn RejectionReason:
*NotApplicable is applicable for 2xx Status code |
| NF Discovery Forwarded Requests Total | The total number of NF Discovery Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfDiscover_tx_requests_total | Counter | NFDiscover | NrfLevel, TargetNfType, RequesterNfType, NfFqdn |
| NF Discovery Forwarded Responses Total | The total number of Responses for NF Discovery Request forwarded to Primary/Secondary NRF | ocnrf_forward_nfDiscover_rx_responses_total | Counter | NFDiscover | NrfLevel, TargetNfType, RequesterNfType,
HttpResponseCode, RejectionReason, NfFqdn RejectionReason:
ErrorFromNrf *NotApplicable is applicable for 2xx Status code |
| Avg Latency for NRF Message Forwarding | Time taken by NRF specific microservice to forward the message to other Primary/Secondary NRF with the service operation: (NFProfileRetrieval/NFDiscover/NFStatusSubscribe/NfStatusUnsubscribe/AccessToken) | ocnrf_forward_round_trip_time_seconds | Timer | NFStatusSubscribe, NFStatusUnsubscribe, NFProfileRetrieval, NFDiscover, AccessToken | NrfLevel, RequesterNfType, ServiceOperation, NfFqdn |
GeoRedundancy metrics
Table 10-9 GeoRedundancy metrics
| Metric Name | Metric Details | Metric filter | Metric Type | Service Operation | Dimensions |
|---|---|---|---|---|---|
| DB Replication status | The current replication status of the DBTier service. This metric is pegged only if the GeoRedundancy Feature is enabled. | ocnrf_dbreplication_status | Gauge | NA | NrfLevel, DbReplicationStatus |
| DB Replication down Time | Time taken for the replication status to change from "INACTIVE" to "ACTIVE". This metric is pegged only if the GeoRedundancy Feature is enabled. | ocnrf_dbreplication_down_time_seconds | Timer | NA | NrfLevel,DbReplicationDownStartTime,DbReplicationDownEndTime |
| Total NfInstances switched over from mated site | The number of NFInstances that got switched over from the mated site. | ocnrf_nf_switch_over_total | Counter | NfRegister, NfUpdate,NfDeregister, NfHeartbeat | NrfLevel, NfInstanceId,RemoteNrfInstanceId,ServiceOperation,OperationType, NfFqdn |
| Total NfSubscriptions switched over from mated site | The number of NfSubscriptions that got switched over from the mated site. | ocnrf_nfSubscriptions_switch_over_total | Counter | NfStatusSubscribe,NfStatusUnsubscribe, NrfAuditor | NrfLevel,SubscriptionId,RemoteNrfInstanceId,ServiceOperation,OperationType |
| Total Nfinstances removed by OCNRF as it is stale | The number of NfInstances that get deleted by the NrfAuditor when it detects a record to be stale. | ocnrf_stale_nf_deleted_total | Counter | NA | NrfLevel, NfInstanceId, NfStatus, NfFqdn |
| Total NfSubscriptions removed by OCNRF as it is stale | The number of NfSubscriptions that get deleted by the NrfAuditor when it detects a record to be stale. | ocnrf_stale_nfSubscriptions_deleted_total | Counter | NA | NrfLevel,NfSubscriptionId,SubscriptionStatus |
| Total NfInstances that have been marked as SUSPENDED by the OCNRF Auditor | The number of profiles that have been marked as SUSPENDED when a profile has missed nfHeartBeatMissAllowed. | ocnrf_nf_suspended_total | Counter | NA | NrfLevel, NfInstanceId,NfStatus, HeartbeatTimer, NfFqdn |
| Total NfSubscriptions whose validityTime has expired | The number of NfSubscriptions whose validityTime has expired | ocnrf_nfSubscriptions_expired_total | Counter | NrfLevel, SubscriptionId |
NF AccessToken Authorization Metrics
Table 10-10 NF AccessToken Authorization Metrics
| Metric Name | Metric Details | Metric filter | Metric Type | Service Operation | Dimensions |
|---|---|---|---|---|---|
| NF Access Token Request Rejected (AuthScreeningFailed) | Number of access token not granted because the consumer NF is not authorized to access the requested NF or its services. | ocnrf_accessToken_tx_rejected_total | Counter | NfAccessToken | TargetNfType, RequesterNfType, TargetNfInstanceId,
ClientNfInstanceId, ServiceName Scope, NrfLevel, NfFqdn,
HttpStatusCode
RejectionReason = ClientNotAuthorized |
NF Authentication Metrics
Table 10-11 NF Authentication Metrics
| Metric Name | Metric Details | Metric filter | Metric Type | Service Operation | Dimensions |
|---|---|---|---|---|---|
| NF Authentication Failure Total | The total number of request for which FQDN based Authentication failed at OCNRF | ocnrf_nf_authentication_failure_total | Counter | NrfLevel,
Method, ServiceOperation, NfFqdn, TLSFqdn |
NFAccessToken/NFRegistration/NFSubscription/NFDiscovery/NfListRetrieval/NfProfileRetrieval
For NfListRetrieval and NfProfileRetrieval serviceOperations NfFqdn is filled as NotApplicable. If OC-XFCC-DNS header is not received at NRF Microservice then TLSFqdn is filled as "UNKNOWN". |
OCNRF KPIs
This section includes information about KPIs for Oracle Communications Network Repository Function (OCNRF).
Note:
Sample OCNRF dashboard for Grafana is delivered to the customer through OCNRF Custom Templates. Metrics and functions used to achieve KPI are already covered in OCNRF Custom Templates.Table 10-12 KPI Details
| KPI Name | KPI Details | Metric used for KPI | Service Operation | Response code |
|---|---|---|---|---|
| OCNRF Ingress Request | Rate of HTTP requests received at OCNRF Ingress Gateway | oc_ingressgateway_http_requests_total | All | Not Applicable |
| NF Register Success | sum(irate(oc_ingressgateway_http_responses_total{Status="201
CREATED",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}[5m])) |
NFRegister | 201 | |
| NF Update Success (Complete Replacement) | sum(irate(oc_ingressgateway_http_responses_total{Status="200 OK",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="PUT"}[5m])) | NFUpdate | 200 | |
| NF DeRegister Success | sum(irate(oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/nf-instances.*",Method="DELETE"}[5m])) | NFDeregister | 204 | |
| NF Subscribe Success | sum(irate(oc_ingressgateway_http_responses_total{Status="201 CREATED",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="POST"}[5m])) | NFStatusSubscribe | 201 | |
| NF Unsubscribe Success | sum(irate(oc_ingressgateway_http_responses_total{Status="204 NO_CONTENT",Route_path=~".*nnrf-nfm/v1/subscriptions.*",Method="DELETE"}[5m])) | NFStatusUnsubscribe | 204 | |
| NF Discover Success | sum(irate(oc_ingressgateway_http_responses_total{Status=~"2.*",Route_path=~".*nnrf-disc/v1/nf-instances.*",Method="GET"}[5m])) | NFDiscover | 200 | |
| 4xx Responses (NF-Instances) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"}[5m])) | NFRegister/NFUpdate/NFDeregister | 4xx | |
| 4xx Responses (Subscriptions) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"}[5m])) | NFStatusSubscribe/NFStatusUnsubscribe | 4xx | |
| 4xx Responses (Discovery) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"4.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"}[5m])) | NFDiscover | 4xx | |
| 5xx Responses (NF-Instances) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/nf-instances.*"}[5m])) | NFRegister/NFUpdate/NFDeregister | 5xx | |
| 5xx Responses (Subscriptions) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-nfm/v1/subscriptions.*"}[5m])) | NFStatusSubscribe/NFStatusUnsubscribe | 5xx | |
| 5xx Responses (Discovery) | sum(irate(oc_ingressgateway_http_responses_total{Status=~"5.*",Route_path=~".*nnrf-disc/v1/nf-instances.*"}[5m])) | NFDiscover | 5xx |
OCNRF Alerts
This section includes information about alerts for OCNRF.
Table 10-13 Alert Details
| Alert | Trigger Condition | Severity | Alert details provided | OID | Metric Used | Resolution | Notes |
|---|---|---|---|---|---|---|---|
| System Level Alerts | |||||||
| OcnrfNfStatusUnavailable | All the OCNRF services are unavailable, either because the OCNRF is getting deployed or purged. These OCNRF services considered are nfregistration, nfsubscription, nrfauditor, nrfconfiguration, nfaccesstoken, nfdiscovery, appinfo, ingressgateway and egressgateway | Critical |
description: 'OCNRF services unavailable' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : All OCNRF services are unavailable.' |
1.3.6.1.4.1.323.5.3.36.1.2.7016 |
'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared automatically when the OCNRF services start
becoming available.
Steps:
|
|
| OcnrfPodsRestart | A pod belonging to any of the OCNRF services have restarted. | Major |
description: 'Pod <Pod Name> has restarted. summary: 'kubernetes_namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A Pod has restarted' |
1.3.6.1.4.1.323.5.3.36.1.2.7017 | 'kube_pod_container_status_restarts_total' Note: This is a kubernetes metric. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared automatically if the specific pod is up. Steps:
|
|
| NnrfNFManagementServiceDown | Either NFRegistration or NFSubscription or NrfAuditor services are unavailable. | Critical |
description: 'OCNRF Nnrf_Management service <nfregistration|nfsubscription|nrfauditor> is down' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFManagement service is down' |
1.3.6.1.4.1.323.5.3.36.1.2.7018 | ''up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. | The alert is cleared when all the Nnrf_NFManagement services are
available that is nfregistration, nfsubscription and nrfauditor.
Steps:
|
|
| NnrfAccessTokenServiceDown | NFAccessToken service is unavailable. | Critical |
description: 'OCNRF Nnrf_NFAccessToken service nfaccesstoken is down' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFAccessToken service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7020 | ''up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available use the similar metric as exposed by the monitoring system. | The alert is cleared when the Nnrf_AccessToken service is
available.
Steps:
|
|
| NnrfNFDiscoveryServiceDown | NFDiscovery is unavailable. | Critical |
description: 'OCNRF Nnrf_NFDiscovery service nfdiscovery is down' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFDiscovery service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7019 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the Nnrf_NFDiscovery service is available. Steps:
|
|
| OcnrfRegistrationServiceDown | None of the pods of the NFRegistration microservice is available. | Critical |
description: 'OCNRF NFRegistration service nfregistration is down' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFRegistration service is down' |
1.3.6.1.4.1.323.5.3.36.1.2.7021 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the nfregistration service is available. Steps:
|
|
| OcnrfSubscriptionServiceDown | None of the pods of the NFSubscription microservice is available. | Critical |
description: 'OCNRF NFSubscription service nfsubscription is down. summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFSubscription service is down' |
1.3.6.1.4.1.323.5.3.36.1.2.7022 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. | The alert is cleared when the nfsubscription service is
available.
Steps:
|
|
| OcnrfDiscoveryServiceDown | None of the pods of the NFDiscovery microservice is available. | Critical |
description: 'OCNRF NFDiscovery service nfdiscovery is down' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFDiscovery service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7023 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. | The alert is cleared when the nfdiscovery service is available.
Steps:
|
|
| OcnrfAccessTokenServiceDown | None of the pods of the NFAccessToken microservice is available. | Critical |
description: 'OCNRF NFAccessToken service nfaccesstoken is down summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFAccesstoken service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7024 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. | The alert is cleared when the nfaccesstoken service is available.
Steps:
|
|
| OcnrfAuditorServiceDown | None of the pods of the NrfAuditor microservice is available. | Critical | description: 'OCNRF NrfAuditor service nrfauditor is down' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfAuditor service down' | 1.3.6.1.4.1.323.5.3.36.1.2.7026 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the nrfauditor service is available. Steps:
|
|
| OcnrfConfigurationServiceDown | None of the pods of the NrfConfiguration microservice is available. | Critical |
description: 'OCNRF NrfConfiguration service nrfconfiguration is down' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfConfiguration service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7025 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the nrfconfiguration service is available. Steps:
|
|
| OcnrfAppInfoServiceDown | None of the pods of the App Info microservice is available. | Critical |
description: 'OCNRF Appinfo service appinfo is down' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Appinfo service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7027 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the app-info service is available. Steps:
|
|
| OcnrfIngressGatewayServiceDown | None of the pods of the Ingress-Gateway microservice is available. | Critical |
description: 'OCNRF Ingress-Gateway service ingressgateway is down. summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7028 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the ingressgateway service is available. Steps:
|
|
| OcnrfEgressGatewayServiceDown | None of the pods of the Egress-Gateway microservice is available. | Critical |
description: 'OCNRF Egress-Gateway service egressgateway is down' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Egress-Gateway service down' |
1.3.6.1.4.1.323.5.3.36.1.2.7029 | 'up' Note: This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
The alert is cleared when the egressgateway service is available. Note: The threshold is configurable in the alerts.yaml Steps:
|
|
| OcnrfMemoryUsageCrossedMinorThreshold | A pod has reached the configured minor threshold( 50%) of its memory resource limits. | Minor |
description: 'OCNRF Memory Usage for pod <Pod name> has crossed the configured minor threshold (50 %) (value={{ $value }}) of its limit.' summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 50% of its limit.' |
1.3.6.1.4.1.323.5.3.36.1.2.7030 | 'container_memory_usage_bytes''container_spec_memory_limit_bytes' Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. | The alert gets cleared when the memory utilization falls below
the Minor Threshold or crosses the major threshold, in which case
OcnrfMemoryUsageCrossedMajorThreshold alert shall be
raised.
Note: The threshold is configurable in the alerts.yaml If guidance required, contact My Oracle Support. |
|
| OcnrfMemoryUsageCrossedMajorThreshold | A pod has reached the configured major threshold( 60%) of its memory resource limits. | Major |
description: 'OCNRF Memory Usage for pod <Pod name> has crossed the major threshold(60%) (value = {{ $value }}) of its limit.' summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 60% of its limit.' |
1.3.6.1.4.1.323.5.3.36.1.2.7031 |
'container_memory_usage_bytes' 'container_spec_memory_limit_bytes' Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
The alert gets cleared when the memory utilization falls below
the Major Threshold or crosses the critical threshold, in which case
OcnrfMemoryUsageCrossedCriticalThreshold alert shall be
raised.
Note: The threshold is configurable in the alerts.yaml If guidance required, contact My Oracle Support. |
|
| OcnrfMemoryUsageCrossedCriticalThreshold | A pod has reached the configured critical threshold ( 70% ) of its memory resource limits. | Critical |
description: 'OCNRF Memory Usage for pod <Pod name> has crossed the configured critical threshold (70%) (value = {{ $value }}) of its limit.' summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.' |
1.3.6.1.4.1.323.5.3.36.1.2.7032 |
'container_memory_usage_bytes' 'container_spec_memory_limit_bytes' Note: This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
The alert gets cleared when the memory utilization falls below
the Critical Threshold.
Note: The threshold is configurable in the alerts.yaml If guidance required, contact My Oracle Support. |
|
| OcnrfTotalIngressTrafficRateAboveMinorThreshold |
The total OCNRF Ingress Message rate has crossed the configured minor threshold of 800 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate) |
Minor |
description: Total'Ingress traffic Rate is above configured minor threshold i.e. 800 requests per second (current value is: {{ $value }})' summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' |
1.3.6.1.4.1.323.5.3.36.1.2.7001 | 'oc_ingressgateway_http_requests_total' |
The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcnrfTotalIngressTrafficRateAboveMinorThreshold alert shall be raised. Note: The threshold is configurable in the alerts.yaml Steps: Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable). If this is unexpected, contact My Oracle Support. |
|
| OcnrfTotalIngressTrafficRateAboveMajorThreshold |
The total OCNRF Ingress Message rate has crossed the configured major threshold of 900 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate) |
Major |
description: 'Total Ingress traffic Rate is above major threshold i.e. 900 requests per second (current value is: {{ $value }})' summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' |
1.3.6.1.4.1.323.5.3.36.1.2.7002 | 'oc_ingressgateway_http_requests_total' |
The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate cross the Critical threshold, in which case the OcnrfTotalIngressTrafficRateAboveCriticalThreshold Note: The threshold is configurable in the alerts.yaml alert shall be raised. Steps: Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable). If this is unexpected, contact My Oracle Support. |
|
| OcnrfTotalIngressTrafficRateAboveCriticalThreshold |
The total OCNRF Ingress Message rate has crossed the configured critical threshold of 950 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when OCNRF Ingress Rate crosses 95 % of 1000 (Maximum ingress request rate) |
Critical |
description: 'Total Ingress traffic Rate is above critical threshold i.e. 950 requests per second (current value is: {{ $value }})' summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 95 Percent of Max requests per second(1000)' |
1.3.6.1.4.1.323.5.3.36.1.2.7003 | 'oc_ingressgateway_http_requests_total' |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the alerts.yaml Steps: Reassess why the OCNRF is receiving additional traffic (for example: geo redundancy OCNRF is unavailable). If this is unexpected, contact My Oracle Support. |
|
| OcnrfTransactionErrorRateAbove0.1Percent | The number of failed transactions is above 0.1 percent of the total transactions. | Warning |
description: 'Transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})' summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 0.1 Percent of Total Transactions' |
1.3.6.1.4.1.323.5.3.36.1.2.7004 | 'oc_ingressgateway_http_responses_total' |
The alert is cleared when the number of failure transactions are below 0.1 percent of the total transactions or when the number of failure transactions cross the 1% threshold in which case the OcnrfTransactionErrorRateAbove1Percent shall be raised. Steps:
|
|
| OcnrfTransactionErrorRateAbove1Percent | The number of failed transactions is above 1 percent of the total transactions. | Warning | description: 'Transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 1 Percent of Total Transactions' | 1.3.6.1.4.1.323.5.3.36.1.2.7005 | 'oc_ingressgateway_http_responses_total' |
The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised. Steps:
|
|
| OcnrfTransactionErrorRateAbove10Percent | The number of failed transactions has crossed the minor threshold of 10 percent of the total transactions. | Minor |
description: 'Transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})' summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 10 Percent of Total Transactions' |
1.3.6.1.4.1.323.5.3.36.1.2.7006 | 'oc_ingressgateway_http_responses_total' |
The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised. Steps:
|
|
| OcnrfTransactionErrorRateAbove25Percent | The number of failed transactions has crossed the minor threshold of 25 percent of the total transactions. | Major |
description: 'Transaction Error rate is above 25 Percent of Total Transactions (current value is {{ $value }})' summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 25 Percent of Total Transactions' |
1.3.6.1.4.1.323.5.3.36.1.2.7007 | 'oc_ingressgateway_http_responses_total' |
The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the OcnrfTransactionErrorRateAbove50Percent shall be raised. Steps:
|
|
| OcnrfTransactionErrorRateAbove50Percent | The number of failed transactions has crossed the minor threshold of 50 percent of the total transactions. | Critical |
description: 'Transaction Error rate is above 50 Percent of Total Transactions (current value is {{ $value }})' summary: 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 50 Percent of Total Transactions' |
1.3.6.1.4.1.323.5.3.36.1.2.7008 | 'oc_ingressgateway_http_responses_total |
The alert is cleared when the number of failure transactions are below 50 percent of the total transactions. Steps:
|
|
| OCNRF Application Alerts | |||||||
| OcnrfRegisteredNFsBelowCriticalThreshold |
The number of NFs currently registered with OCNRF is below the critical threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is below 2. |
Critical |
description: 'The number of registered NFs detected below critical threshold (current value is: {{ $value }})' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.RequesterNfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below critical threshold.' |
1.3.6.1.4.1.323.5.3.36.1.2.7009 | 'ocnrf_active_registrations_count' |
The alert is cleared when the number of registered NFs are above the critical threshold. Steps: No Action required. This is an information alert. |
|
| OcnrfRegisteredNFsBelowMajorThreshold |
The number of NFs currently registered with OCNRF is below the major threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 2 and less than below 10. |
Major |
description: 'The number of registered NFs detected below major threshold (current value is: {{ $value }})' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below major threshold.' |
1.3.6.1.4.1.323.5.3.36.1.2.7010 | 'ocnrf_active_registrations_count |
The alert is cleared when the number of registered NFs are above the major threshold. Steps: No Action required. This is an information alert. |
|
| OcnrfRegisteredNFsBelowMinorThreshold |
The number of NFs currently registered with OCNRF is below the minor threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 10 and less than below 20. |
Minor |
description: 'The number of registered NFs detected below minor threshold (current value is: {{ $value }})' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below minor threshold.' |
1.3.6.1.4.1.323.5.3.36.1.2.7011 | 'ocnrf_active_registrations_count' |
The alert is cleared when the number of registered NFs are above the minor threshold. Steps: No Action required. This is an information alert. |
|
| OcnrfRegisteredNFsBelowThreshold |
The number of NFs currently registered with OCNRF is approaching minor threshold. Default value of this alert trigger point in NrfAlertValues.yaml is when Registered NFs count with OCNRF is greater than equal to 20 and less than below 30. |
Warning |
description: 'The number of registered NFs is approaching minor threshold (current value is: {{ $value }})' summary:'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs approaching minor threshold.' |
1.3.6.1.4.1.323.5.3.36.1.2.7012 | 'ocnrf_active_registrations_count' |
The alert is cleared when the number of registered NFs are approaching minor threshold. Steps: No Action required. This is an information alert. |
|
| OcnrfDbReplicationStatusInactive | The db tier replication service status is inactive across the georedundant OCNRFs. The Alarm is raised/cleared only if the Georedundancy feature is enabled. | Critical |
description: 'The Database Replication Status is currently INACTIVE.' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, dbreplicationstatus: {{$labels.DbReplicationStatus}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The database replication status is INACTIVE.' |
1.3.6.1.4.1.323.5.3.36.1.2.7013 | 'ocnrf_dbreplication_status' | The alert is cleared when the dbtier replication services is active. | The Alarm shall be included only if the Georedundancy feature is enabled. |
| OcnrfAccessTokenRequestsRejected | OCNRF rejected an AccessToken Request |
Warning |
description: 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF.' |
1.3.6.1.4.1.323.5.3.36.1.2.7014 | 'ocnrf_accessToken_tx_rejected_total' | The alert is cleared automatically.
Steps: The Rejection Reason shall be present in the alert. In case the RejectionReason is AuthScreeningFailed/ClientNotAuthorized, either the configurations need to be reevaluated or check the consumer NF that has requested for unauthorized token. For other reason, follow the RejectionReason. |
|
| OcnrfNfAuthenticationFailureRequestsRejected | OCNRF rejected a service request due to NF authentication failure |
Warning |
description: 'Service request(s) received from NF have been rejected by OCNRF (current value is: {{ $value }})' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Request rejected for Nf FQDN based Authentication failure.' |
1.3.6.1.4.1.323.5.3.36.1.2.7015 | 'ocnrf_nf_authentication_failure_total' | The alert is cleared automatically.
Steps: No Action required for OCNRF. This is an information alert. The Response Reason shall be present in the alert |
|
| OcnrfAccessTokenCurrentKeyIdNotConfigured | OCNRF Access Token Rejected due to CurrentKeyId not configured | Critical |
description: 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF as Current Key Id is not configured.' |
1.3.6.1.4.1.323.5.3.36.1.2.7033 | ocnrf_accessToken_tx_rejected_total | The alert is cleared automatically as this will be raised when OCNRF recieves Access Token Request and at that point Current Key Id is not selected. | |
| OcnrfAccessTokenCurrentKeyIdInvalidDetails | OCNRF Access Token Rejected due to token signign details correspondign to CurrentKeyId are invalid | Critical |
description: 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyType: {{$labels.KeyType}}, RejectionReason: {{$labels.RejectionReason}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF as CurrentKeyId details are invalid.' |
1.3.6.1.4.1.323.5.3.36.1.2.7034 | ocnrf_accessToken_tx_rejected_total | The alert is cleared automatically as this will be raised when OCNRF receives Access Token Request and at that point Current Key Id details are invalid. | |
| OcnrfOauthCurrentKeyNotConfigured | Oauth Current Key ID is not configured | Critical |
description: 'OCNRF Oauth Access token Current Key Id is not configured' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token Current Key Id is not configured.' |
1.3.6.1.4.1.323.5.3.36.1.2.7035 | ocnrf_oauth_currentKeyId_configuredStatus |
The alert is cleared when current key id is configured. Steps: Configure valid current key id in Access Token Configuration |
|
| OcnrfOauthCurrentKeyDataHealthStatus | Oauth Current Key ID details health is not good | Critical |
description: 'OCNRF Oauth Access token Current Key Id status is not healthy' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, KeyType: {{$labels.KeyType}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token Current Key Id status is not healthy.' |
1.3.6.1.4.1.323.5.3.36.1.2.7036 | ocnrf_oauth_keyData_healthStatus |
The alert is cleared when current key id status is healthy. Steps: Key Data Health Status details can be checked using OCNRF configuration status REST APIs and configuration microservice logs. Rectify the condition by checking ErrorCondition |
|
| OcnrfOauthNonCurrentKeyDataHealthStatus | Oauth Non Current Key details health is not good | Info |
description: 'OCNRF Oauth Access token Non current Key Id status is not healthy' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, KeyType: {{$labels.KeyType}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token non current Key Id status is not healthy.' |
1.3.6.1.4.1.323.5.3.36.1.2.7037 | ocnrf_oauth_keyData_healthStatus |
The alert is cleared when current key id status is healthy. Steps: Key Data Health Status details can be checked using OCNRF configuration status REST APIs and configuration microservice logs. Rectify the condition by checking ErrorCondition |
|
| OcnrfOauthCurrentCertificateExpiringIn24Hours | Oauth Current Key ID details are expiring in less than 24 hours | Critical |
description: 'OCNRF Oauth Access token current Key Id certificate is expiring in less than 24 hours' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token current Key Id certificate is expiring in less than 24 hours.' |
1.3.6.1.4.1.323.5.3.36.1.2.7038 | ocnrf_oauth_keyData_expiryStatus |
The alert is cleared when key expiry time is more than 24 hours. Steps: Replace expiring certificate key pair with new ones |
|
| OcnrfOauthNonCurrentCertificateExpiringIn24Hours | Oauth Non Current Key ID details are expiring in less than 24 hours | Info |
description: 'OCNRF Oauth Access token non current Key Id certificate is expiring in less than 24 hours' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token non current Key Id certificate is expiring in less than 24 hours.' |
1.3.6.1.4.1.323.5.3.36.1.2.7039 | ocnrf_oauth_keyData_expiryStatus |
The alert is cleared when key expiry time is more than 24 hours. Steps: Replace expiring certificate key pair with new ones |
|
| OcnrfOauthCurrentCertificateExpiringIn30days | Oauth Current Key ID details are expiring in more than 24 hours and less than 30 days | Critical |
description: 'OCNRF Oauth Access token current Key Id certificate is expiring in less than 30 days' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token current Key Id certificate is expiring in less than 30 days.' |
1.3.6.1.4.1.323.5.3.36.1.2.7040 | ocnrf_oauth_keyData_expiryStatus |
The alert is cleared when certificate for current key id's expiry time is more than 30 days. Steps: Replace expiring certificate key pair with new ones |
|
| OcnrfOauthNonCurrentCertificateExpiringIn30days | Oauth Non Current Key ID details are expiring in more than 24 hours and less than 30 days | Info |
description: 'OCNRF Oauth Access token non current Key Id certificate is expiring in less than 30 days' summary: 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token non current Key Id certificate is expiring in less than 30 days.' |
1.3.6.1.4.1.323.5.3.36.1.2.7041 | ocnrf_oauth_keyData_expiryStatus |
The alert is cleared when certificate for non-current key id's certificate expiry time is more than 30 days. Steps: Replace expiring certificate key pair with new ones |
OCNRF Alert Configuration
This section describes the Measurement based Alert rules configuration for OCNRF. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.
Note:
- Alert file is packaged with OCNRF custom templates. The OCNRF templates.zip file can be downloaded from MOS. Unzip the OCNRF templates.zip file to get NrfAlertRules.yaml file.
- Review the NrfAlertRules.yaml file and edit the value of the parameters in the NrfAlertRules.yaml file (if needed to be changed from default values) before configuring the alerts. See below table for details.
- kubernetes_namespace is configured as kubernetes namespace in which NRF is deployed. Default value is OCNRF. Please update the NrfAlertRules.yaml file to reflect the correct OCNRF kubernetes namespace.
Table 10-14 Alerts
| Alert Name | Details | Default Value | Notes |
|---|---|---|---|
| OcnrfTotalIngressTrafficRateAboveMinorThreshold | Traffic Rate is above 80 Percent of Max requests per second | Greater than/equal to 800 and Less than 900 |
Maximum Ingress rate considered is 1000 requests per second. So, here in default value 800 is 80% of 1000 and 900 is 90% of 1000. For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 80% of Max Ingress Request Rate] for this alert |
| OcnrfTotalIngressTrafficRateAboveMajorThreshold | Traffic Rate is above 90 Percent of Max requests per second | Greater than/equal to 900 and Less than 950 |
Maximum Ingress rate considered is 1000 requests per second. So, here in default value 900 is 90% of 1000 and 950 is 95% of 1000. For example, if value need to be updated then depending upon maximum ingress request rate, set [ 90% of Max Ingress Request Rate] and [ 95% of Max Ingress Request Rate] for this alert |
| OcnrfTotalIngressTrafficRateAboveCriticalThreshold | Traffic Rate is above 95 Percent of Max requests per second | Greater than/equal to 950 |
Maximum Ingress rate considered is 1000 requests per second. So, here in default value 950 is 95% of 1000. For example, if value need to be updated then depending upon maximum ingress request rate, set [ 95% of Max Ingress Request Rate] for this alert |
OCNRF Alert configuration in Prometheus
This section describes the measurement based Alert rules configuration for OCNRF in Prometheus.
_NAME_ :- Helm Release of Prometheus
_Namespace_ :- Kubernetes NameSpace in which Prometheus is installed
- Take Backup of current
configuration map of Prometheus:
kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml - Check and add OCNRF Alert file name
inside Prometheus configuration map:
sed -i '/etc\/config\/alertsnrf/d' /tmp/tempConfig.yaml sed -i '/rule_files:/a\ \- /etc/config/alertsnrf' /tmp/tempConfig.yaml - Update configuration map with
updated file name of OCNRF alert file:
kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml - Add OCNRF Alert rules in
configuration map under file name of OCNRF alert file:
kubectl patch configmap _NAME_-server -n _Namespace_--type merge --patch "$(cat ~/NrfAlertrules.yaml)"
Note:
The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 60 seconds. Refresh the Prometheus GUI to confirm that the OCNRF Alerts have been reloaded.Disable OCNRF Alert in Prometheus
- Edit NrfAlertrules.yaml file to remove specific alert:
Sample alert content from NrfAlertrules.yaml is below. This is to provide idea of a specific alert details in NrfAlertrules.yaml which need to be disabled.
## ALERT SAMPLE START## - alert: OcnrfTrafficRateAboveMinorThreshold annotations: description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})' summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)' expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900 labels: severity: Minor ## ALERT SAMPLE END## - Remove specific alert content which need to be disabled.
- Perform Alert configuration again. See OCNRF Alert configuration in Prometheus section above for detailed steps.
Disabling Alerts
- Edit NrfAlertrules.yaml file to remove specific alert.
- Remove complete content of the specific alert from the
NrfAlertrules.yaml file.
For example: If you want to remove
OcnrfTrafficRateAboveMinorThresholdalert, remove the complete content:## ALERT SAMPLE START## - alert: OcnrfTrafficRateAboveMinorThreshold annotations: description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})' summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)' expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900 labels: severity: Minor ## ALERT SAMPLE END## - Perform Alert configuration. See OCNRF Alert Configuration section above for details.
Configuring SNMP Notifier
This section describes the procedure to configuring SNMP Notifier.
Configure and Validate Alerts in Prometheus Server
Refer to OCNRF Alert Configuration section for procedure to configure the alerts.
Validating AlertsAfter configuring the alerts in Prometheus server, a user can verify that by following steps:
- Open the Prometheus server from your browser using the <IP>:<Port>
- Navigate to Status and then Rules
- Search Ocnrf. OcnrfAlerts list is
displayed.
Note:
If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try again.
- Execute the following command to edit the
deployment:
kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>Example:
$ kubectl edit deploy occne-snmp-notifier -n occne-infra - Edit the destination as
follows:
--snmp.destination=<destination_ip>:<destination_port>Example:
--snmp.destination=10.75.203.94:162
$ docker logs <trapd_container_id>2020-04-29 15:34:24 10.75.203.103 [UDP: [10.75.203.103]:2747->[172.17.0.4]:162]:DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (158510800) 18 days, 8:18:28.00 SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003 SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.1 = STRING: "1.3.6.1.4.1.323.5.3.36.1.2.7003[]" SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.2 = STRING: "critical" SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.3 = STRING: "Status: critical- Alert: OcnrfActiveSubscribersBelowCriticalThreshold Summary: namespace: ocnrf, nftype:5G_EIR, nrflevel:6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c, podname: ocnrf-nrfauditor-6b459f5db5-4kvt4,
timestamp: 2020-04-29 15:33:24.408 +0000 UTC: Current number of registered NFs detected below critical threshold. Description: The number of registered NFs detected below critical threshold (current value
is: 0)There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.
- OCNRF-MIB-TC-1.10.0.mib
This is considered as OCNRF top level mib file, where the Objects and their data types are defined.
- OCNRF-MIB-1.10.0.mib
This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.
Note:
MIB files are packaged along with OCNRF Custom Templates. Download the file from MOS. Refer to OCNRF Installation and Upgrade guide for more details.