8 Alerts
This section provides information on Policy alerts and their configuration.
Note:
The performance and capacity of the system can vary based on the call model, configuration, including but not limited to the deployed policies and corresponding data, for example, policy tables.You can configure alerts in Prometheus and Alertrules.yaml
file.
The following table describes the various severity types of alerts generated by Policy:
Table 8-1 Alerts Levels or Severity Types
| Alerts Levels / Severity Types | Definition |
|---|---|
| Critical | Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions can affect the service of Policy. |
| Major | Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions can affect the service of Policy. |
| Minor | Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions can affect the service of Policy. |
| Info or Warn (Informational) | Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of Policy. |
For details on how to configure Policy alerts, see Configuring Alerts section in Oracle Communications Cloud Native Core, Converged Policy Installation, Upgrade, and Fault Recovery Guide.
For details on how to configure SNMP Notifier, see Configuring SNMP Notifier section in Oracle Communications Cloud Native Core, Converged Policy Installation, Upgrade, and Fault Recovery Guide.
8.1 List of Alerts
- Common Alerts - This category of alerts is common and required for all three modes of deployment.
- PCF Alerts - This category of alerts is specific to PCF microservices and required for Converged and PCF only modes of deployment.
- PCRF Alerts - This category of alerts is specific to PCRF microservices and required for Converged and PCRF only modes of deployment.
8.1.1 Common Alerts
This section provides information about alerts that are common for PCF and PCRF.
8.1.1.1 POD_CONGESTION_L1
Table 8-2 POD_CONGESTION_L1
| Field | Details |
|---|---|
| Name in Alert Yaml File | PodCongestionL1 |
| Description | Alert when cpu of pod is in CONGESTION_L1 state. |
| Summary | Alert when cpu of pod is in CONGESTION_L1 state. |
| Severity | Critical |
| Expression | occnp_pod_resource_congestion_state{type="cpu",container!~"bulwark|diam-gateway"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.71 |
| Metric Used | occnp_pod_resource_congestion_state |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.2 POD_CONGESTION_L2
Table 8-3 POD_CONGESTION_L2
| Field | Details |
|---|---|
| Name in Alert Yaml File | PodCongestionL2 |
| Description | Alert when cpu of pod is in CONGESTION_L2 state. |
| Summary | Alert when cpu of pod is in CONGESTION_L2 state. |
| Severity | Critical |
| Expression | occnp_pod_resource_congestion_state{type="cpu"} == 3 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.72 |
| Metric Used | occnp_pod_resource_congestion_state |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.3 POD_PENDING_REQUEST_CONGESTION_L1
Table 8-4 POD_PENDING_REQUEST_CONGESTION_L1
| Field | Details |
|---|---|
| Name in Alert Yaml File | PodPendingRequestCongestionL1 |
| Description | Alert when queue of pod is in CONGESTION_L1 state. |
| Summary | Alert when queue of pod is in CONGESTION_L1 state. |
| Severity | critical |
| Expression | occnp_pod_resource_congestion_state{type="queue",container!~"bulwark|diam-gateway"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.73 |
| Metric Used | occnp_pod_resource_congestion_state |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.4 POD_PENDING_REQUEST_CONGESTION_L2
Table 8-5 POD_PENDING_REQUEST_CONGESTION_L2
| Field | Details |
|---|---|
| Name in Alert Yaml File | PodPendingRequestCongestionL2 |
| Description | Alert when queue of pod is in CONGESTION_L2 state. |
| Summary | Alert when queue of pod is in CONGESTION_L2 state. |
| Severity | critical |
| Expression | occnp_pod_resource_congestion_state{type="queue"} == 3 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.74 |
| Metric Used | occnp_pod_resource_congestion_state |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.5 POD_CPU_CONGESTION_L1
Table 8-6 POD_CPU_CONGESTION_L1
| Field | Details |
|---|---|
| Name in Alert Yaml File | PodCPUCongestionL1 |
| Description | Alert when cpu of pod is in CONGESTION_L1 state. |
| Summary | Alert when cpu of pod is in CONGESTION_L1 state.Alert when pod is in CONGESTION_L1 state. |
| Severity | Critical |
| Expression | occnp_pod_resource_congestion_state{type="cpu",container!~"bulwark|diam-gateway"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.73 |
| Metric Used | occnp_pod_resource_congestion_state |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.6 POD_CPU_CONGESTION_L2
Table 8-7 POD_CPU_CONGESTION_L2
| Field | Details |
|---|---|
| Name in Alert Yaml File | PodCPUCongestionL2 |
| Description | Alert when cpu of pod is in CONGESTION_L2 state. |
| Summary | Alert when cpu of pod is in CONGESTION_L2 state. |
| Severity | critical |
| Expression | occnp_pod_resource_congestion_state{type="cpu"} == 3 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.74 |
| Metric Used | occnp_pod_resource_congestion_state |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.7 Pod_Memory_DoC
Table 8-8 Pod_Memory_DoC
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.service}} service is DoC for Memory type |
| Summary | Pod Resource Congestion status of {{$labels.service}} service is DoC for Memory type |
| Severity | Major |
| Expression | occnp_pod_resource_congestion_state{type="memory"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.31 |
| Metric Used | occnp_pod_resource_congestion_state |
| Recommended Actions |
Alert triggers based on the resource limit usage and load shedding
configurations in congestion control. The CPU, Memory, and queue
usage can be referred using the Grafana Dashboard.
Note: Threshold levels can be configured using thePCF_Alertrules.yaml file.
For any additional guidance, contact My Oracle Support. |
8.1.1.8 Pod_Memory_Congested
Table 8-9 Pod_Memory_Congested
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.service}} service is congested for Memory type |
| Summary | Pod Resource Congestion status of {{$labels.service}} service is congested for Memory type |
| Severity | Critical |
| Expression | occnp_pod_resource_congestion_state{type="memory"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.32 |
| Metric Used | occnp_pod_resource_congestion_state |
| Recommended Actions |
Alert triggers based on the resource limit usage and load shedding configurations in congestion control. The CPU, Memory, and queue usage can be referred using the Grafana Dashboard. For any additional guidance, contact My Oracle Support. |
8.1.1.9 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-10 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Rx fail count exceeds the critical threshold limit. |
| Summary | RAA Rx fail count exceeds the critical threshold limit. |
| Severity | CRITICAL |
| Expression | sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236"}[5m])) * 100 > 90 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.35 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.10 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-11 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Rx fail count exceeds the major threshold limit. |
| Summary | RAA Rx fail count exceeds the major threshold limit. |
| Severity | MAJOR |
| Expression | sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) * 100 <= 90 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.35 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.11 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-12 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Rx fail count exceeds the minor threshold limit. |
| Summary | RAA Rx fail count exceeds the minor threshold limit. |
| Severity | MINOR |
| Expression | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) * 100 <= 80 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.35 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.12 ASA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-13 ASA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA Rx fail count exceeds the critical threshold limit. |
| Summary | ASA Rx fail count exceeds the critical threshold limit. |
| Severity | CRITICAL |
| Expression | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 90 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.66 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.13 ASA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-14 ASA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA Rx fail count exceeds the major threshold limit. |
| Summary | ASA Rx fail count exceeds the major threshold limit. |
| Severity | MAJOR |
| Expression | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 90 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.66 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.14 ASA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-15 ASA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA Rx fail count exceeds the minor threshold limit. |
| Summary | ASA Rx fail count exceeds the minor threshold limit. |
| Severity | MINOR |
| Expression | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 80 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.66 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.15 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-16 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA Rx timeout count exceeds the minor threshold limit |
| Summary | ASA Rx timeout count exceeds the minor threshold limit |
| Severity | MINOR |
| Expression | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 80 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.67 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.16 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-17 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA Rx timeout count exceeds the major threshold limit |
| Summary | ASA Rx timeout count exceeds the major threshold limit |
| Severity | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 90 |
| Expression | MAJOR |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.67 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.17 ASA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-18 ASA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA Rx timeout count exceeds the critical threshold limit |
| Summary | ASA Rx timeout count exceeds the critical threshold limit |
| Severity | CRITICAL |
| Expression | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 90 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.67 |
| Metric Used | - |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.18 SCP_PEER_UNAVAILABLE
Table 8-19 SCP_PEER_UNAVAILABLE
| Field | Details |
|---|---|
| Description | Configured SCP peer is unavailable. |
| Summary | Configured SCP peer is unavailable. |
| Severity | Major |
| Expression | occnp_oc_egressgateway_peer_health_status != 0. SCP peer [ {{$labels.peer}} ] is unavailable. |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.60 |
| Metric Used | occnp_oc_egressgateway_peer_health_status |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.19 SCP_PEER_SET_UNAVAILABLE
Table 8-20 SCP_PEER_SET_UNAVAILABLE
| Field | Details |
|---|---|
| Description | None of the SCP peer available for configured peerset. |
| Summary | {{ $value }} SCP peers under peer set {{$labels.peerset}} are currently unavailable. |
| Severity | Critical |
| Expression | (occnp_oc_egressgateway_peer_count > 0 and (occnp_oc_egressgateway_peer_available_count) == 0) |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.61 |
| Metric Used | occnp_oc_egressgateway_peer_count and occnp_oc_egressgateway_peer_available_count |
| Recommended Actions |
NF clears the critical alarm when atleast one SCP peer in a peerset becomes available such that all other SCP peers in the given peerset are still unavailable. For any additional guidance, contact My Oracle Support. |
8.1.1.20 STALE_CONFIGURATION
Table 8-21 STALE_CONFIGURATION
| Field | Details |
|---|---|
| Description | In last 10 minutes, the current service config_level does not match the config_level from the config-server. |
| Summary | In last 10 minutes, the current service config_level does not match the config_level from the config-server. |
| Severity | Major |
| Expression | (sum by(namespace) (topic_version{app_kubernetes_io_name="config-server",topicName="config.level"})) / (count by(namespace) (topic_version{app_kubernetes_io_name="config-server",topicName="config.level"})) != (sum by(namespace) (topic_version{app_kubernetes_io_name!="config-server",topicName="config.level"})) / (count by(namespace) (topic_version{app_kubernetes_io_name!="config-server",topicName="config.level"})) |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.62 |
| Metric Used | topic_version |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.21 POLICY_SERVICES_DOWN
Table 8-22 POLICY_SERVICES_DOWN
| Field | Details |
|---|---|
| Name in Alert Yaml File | PCF_SERVICES_DOWN |
| Description | {{$labels.service}} service is not running. |
| Summary | {{$labels.service}} service is not running. |
| Severity | Critical |
| Expression | None of the pods of the CNC Policy application are available. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.1 |
| Metric Used | appinfo_service_running{vendor="Oracle", application="occnp", category!=""}!= 1 |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.22 DIAM_TRAFFIC_RATE_ABOVE_THRESHOLD
Table 8-23 DIAM_TRAFFIC_RATE_ABOVE_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | DiamTrafficRateAboveThreshold |
| Description | Diameter Connector Ingress traffic Rate is above threshold of Max MPS (current value is: {{ $value }}) |
| Summary | Traffic Rate is above 90 Percent of Max requests per second. |
| Severity | Major |
| Expression | The total Ingress traffic rate for Diameter connector
has crossed the configured threshold of 900 TPS.
Default value of this alert trigger point in Common_Alertrules.yaml file is when Diameter Connector Ingress Rate crosses 90% of maximum ingress requests per second. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.6 |
| Metric Used | ocpm_ingress_request_total |
| Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels can be
configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
For any additional guidance, contact My Oracle
Support.
|
8.1.1.23 DIAM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
Table 8-24 DIAM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
| Field | Details |
|---|---|
| Name in Alert Yaml File | DiamIngressErrorRateAbove10Percent |
| Description | Transaction Error Rate detected above 10 Percent of Total on Diameter Connector (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 10 Percent of Total Transactions. |
| Severity | Critical |
| Expression | The number of failed transactions is above 10 percent of the total transactions on Diameter Connector. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.7 |
| Metric Used | ocpm_ingress_response_total |
| Recommended Actions | The alert gets cleared when the number of failed
transactions are below 10% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle
Support.
|
8.1.1.24 DIAM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
Table 8-25 DIAM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
| Field | Details |
|---|---|
| Name in Alert Yaml File | DiamEgressErrorRateAbove1Percent |
| Description | Egress Transaction Error Rate detected above 1 Percent of Total on Diameter Connector (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 1 Percent of Total Transactions |
| Severity | Minor |
| Expression | The number of failed transactions is above 1 percent of the total Egress Gateway transactions on Diameter Connector. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.8 |
| Metric Used | ocpm_egress_response_total |
| Recommended Actions | The alert gets cleared when the number of failed
transactions are below 1% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle
Support.
|
8.1.1.25 UDR_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
Table 8-26 UDR_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | PcfUdrIngressTrafficRateAboveThreshold |
| Description | User service Ingress traffic Rate from UDR is above threshold of Max MPS (current value is: {{ $value }}) |
| Summary | Traffic Rate is above 90 Percent of Max requests per second |
| Severity | Major |
| Expression | The total User Service Ingress traffic rate from UDR has
crossed the configured threshold of 900 TPS.
Default value of this alert trigger point in Common_Alertrules.yaml file is when user service Ingress Rate from UDR crosses 90% of maximum ingress requests per second. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.9 |
| Metric Used | ocpm_userservice_inbound_count_total{service_resource="udr-service"} |
| Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels can be
configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
For any additional guidance, contact My Oracle Support. |
8.1.1.26 UDR_EGRESS_ERROR_RATE_ABOVE_10_PERCENT
Table 8-27 UDR_EGRESS_ERROR_RATE_ABOVE_10_PERCENT
| Field | Details |
|---|---|
| Name in Alert Yaml File | PcfUdrEgressErrorRateAbove10Percent |
| Description | Egress Transaction Error Rate detected above 10 Percent of Total on User service (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
| Severity | Critical |
| Expression | The number of failed transactions from UDR is more than 10 percent of the total transactions. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.10 |
| Metric Used | ocpm_udr_tracking_response_total{servicename_3gpp="nudr-dr",response_code!~"2.*"} |
| Recommended Actions | The alert gets cleared when the number of failure
transactions falls below the configured threshold.
Note:
Threshold levels can be configured using the
It is recommended to assess the reason for failed
transactions. Perform the following steps to analyze the cause of
increased traffic:
For any additional guidance, contact My Oracle Support. |
8.1.1.27 POLICYDS_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
Table 8-28 POLICYDS_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | PolicyDsIngressTrafficRateAboveThreshold |
| Description | Ingress Traffic Rate is above threshold of Max MPS (current value is: {{ $value }}) |
| Summary | Traffic Rate is above 90 Percent of Max requests per second |
| Severity | Critical |
| Expression | The total PolicyDS Ingress message rate has crossed the
configured threshold of 900 TPS. 90% of maximum Ingress request rate.
Default value of this alert trigger point in Common_Alertrules.yaml file is when PolicyDS Ingress Rate crosses 90% of maximum ingress requests per second. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.13 |
| Metric Used | client_request_total
Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
| Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels can be
configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
For any additional guidance, contact My Oracle Support. |
8.1.1.28 POLICYDS_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
Table 8-29 POLICYDS_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
| Field | Details |
|---|---|
| Name in Alert Yaml File | PolicyDsIngressErrorRateAbove10Percent |
| Description | Ingress Transaction Error Rate detected above 10 Percent of Total on PolicyDS service (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
| Severity | Critical |
| Expression | The number of failed transactions is above 10 percent of the total transactions for PolicyDS service. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.14 |
| Metric Used | client_response_total |
| Recommended Actions | The alert gets cleared when the number of failed
transactions are below 10% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle Support. |
8.1.1.29 POLICYDS_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
Table 8-30 POLICYDS_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
| Field | Details |
|---|---|
| Name in Alert Yaml File | PolicyDsEgressErrorRateAbove1Percent |
| Description | Egress Transaction Error Rate detected above 1 Percent of Total on PolicyDS service (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 1 Percent of Total Transactions |
| Severity | Minor |
| Expression | The number of failed transactions is above 1 percent of the total transactions for PolicyDS service. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.15 |
| Metric Used | server_response_total |
| Recommended Actions | The alert gets cleared when the number of failed
transactions are below 10% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle Support. |
8.1.1.30 UDR_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD
Table 8-31 UDR_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | PcfUdrIngressTimeoutErrorAboveMajorThreshold |
| Description | Ingress Timeout Error Rate detected above 10 Percent of Total towards UDR service (current value is: {{ $value }}) |
| Summary | Timeout Error Rate detected above 10 Percent of Total Transactions |
| Severity | Major |
| Expression | The number of failed transactions due to timeout is above 10 percent of the total transactions for UDR service. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.16 |
| Metric Used | ocpm_udr_tracking_request_timeout_total{servicename_3gpp="nudr-dr"} |
| Recommended Actions | The alert gets cleared when the number of failed
transactions due to timeout are below 10% of the total transactions.
To assess the reason for failed transactions, perform
the following steps:
For any additional guidance, contact My Oracle Support. |
8.1.1.31 DB_TIER_DOWN_ALERT
Table 8-32 DB_TIER_DOWN_ALERT
| Field | Details |
|---|---|
| Name in Alert Yaml File | DBTierDownAlert |
| Description | DB cannot be reachable. |
| Summary | DB cannot be reachable. |
| Severity | Critical |
| Expression | Database is not available. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.18 |
| Metric Used | appinfo_category_running{category="database"} |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.32 CPU_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
Table 8-33 CPU_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | CPUUsagePerServiceAboveMinorThreshold |
| Description | CPU usage for {{$labels.service}} service is above 60 |
| Summary | CPU usage for {{$labels.service}} service is above 60 |
| Severity | Minor |
| Expression | A service pod has reached the configured minor threshold (60%) of its CPU usage limits. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.19 |
| Metric Used | container_cpu_usage_seconds_total
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
| Recommended Actions | The alert gets cleared when the CPU utilization falls
below the minor threshold or crosses the major threshold, in which case
CPUUsagePerServiceAboveMajorThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any additional guidance, contact My Oracle Support. |
8.1.1.33 CPU_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
Table 8-34 CPU_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | CPUUsagePerServiceAboveMajorThreshold |
| Description | CPU usage for {{$labels.service}} service is above 80 |
| Summary | CPU usage for {{$labels.service}} service is above 80 |
| Severity | Major |
| Expression | A service pod has reached the configured major threshold (80%) of its CPU usage limits. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.20 |
| Metric Used | container_cpu_usage_seconds_total
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
| Recommended Actions | The alert gets cleared when the CPU utilization falls
below the major threshold or crosses the critical threshold, in which
case CPUUsagePerServiceAboveCriticalThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any additional guidance, contact My Oracle Support. |
8.1.1.34 CPU_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
Table 8-35 CPU_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | CPUUsagePerServiceAboveCriticalThreshold |
| Description | CPU usage for {{$labels.service}} service is above 90 |
| Summary | CPU usage for {{$labels.service}} service is above 90 |
| Severity | Critical |
| Expression | A service pod has reached the configured critical threshold (90%) of its CPU usage limits. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.21 |
| Metric Used | container_cpu_usage_seconds_total
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
| Recommended Actions | The alert gets cleared when the CPU utilization falls
below the critical threshold.
Note: Threshold levels can be
configured using the For any additional guidance, contact My Oracle Support. |
8.1.1.35 MEMORY_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
Table 8-36 MEMORY_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | MemoryUsagePerServiceAboveMinorThreshold |
| Description | Memory usage for {{$labels.service}} service is above 60 |
| Summary | Memory usage for {{$labels.service}} service is above 60 |
| Severity | Minor |
| Expression | A service pod has reached the configured minor threshold (60%) of its memory usage limits. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.22 |
| Metric Used | container_memory_usage_bytes
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
| Recommended Actions | The alert gets cleared when the memory utilization falls
below the minor threshold or crosses the critical threshold, in which
case MemoryUsagePerServiceAboveMajorThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any additional guidance, contact My Oracle Support. |
8.1.1.36 MEMORY_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
Table 8-37 MEMORY_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | MemoryUsagePerServiceAboveMajorThreshold |
| Description | Memory usage for {{$labels.service}} service is above 80 |
| Summary | Memory usage for {{$labels.service}} service is above 80 |
| Severity | Major |
| Expression | A service pod has reached the configured major threshold (80%) of its memory usage limits. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.23 |
| Metric Used | container_memory_usage_bytes
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
| Recommended Actions | The alert gets cleared when the memory utilization falls
below the major threshold or crosses the critical threshold, in which
case MemoryUsagePerServiceAboveCriticalThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any additional guidance, contact My Oracle Support. |
8.1.1.37 MEMORY_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
Table 8-38 MEMORY_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | MemoryUsagePerServiceAboveCriticalThreshold |
| Description | Memory usage for {{$labels.service}} service is above 90 |
| Summary | Memory usage for {{$labels.service}} service is above 90 |
| Severity | Critical |
| Expression | A service pod has reached the configured critical threshold (90%) of its memory usage limits. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.24 |
| Metric Used | container_memory_usage_bytes
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
| Recommended Actions | The alert gets cleared when the memory utilization falls
below the critical threshold.
Note: Threshold levels can be
configured using the For any additional guidance, contact My Oracle Support. |
8.1.1.38 POD_CONGESTED
Table 8-39 POD_CONGESTED
| Field | Details |
|---|---|
| Name in Alert Yaml File | PodCongested |
| Description | The pod congestion status is set to congested. |
| Summary | Pod Congestion status of {{$labels.service}} service is congested |
| Severity | Critical |
| Expression | occnp_pod_congestion_state == 4 |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.26 |
| Metric Used | occnp_pod_congestion_state |
| Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
8.1.1.39 POD_DANGER_OF_CONGESTION
Table 8-40 POD_DANGER_OF_CONGESTION
| Field | Details |
|---|---|
| Description | The pod congestion status is set to Danger of Congestion. |
| Summary | Pod Congestion status of {{$labels.service}} service is DoC |
| Severity | Major |
| Expression | occnp_pod_resource_congestion_state == 1 |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.25 |
| Metric Used | occnp_pod_congestion_state |
| Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
8.1.1.40 POD_PENDING_REQUEST_CONGESTED
Table 8-41 POD_PENDING_REQUEST_CONGESTED
| Field | Details |
|---|---|
| Name in Alert Yaml File | PodPendingRequestCongested |
| Description | The pod congestion status is set to congested for PendingRequest. |
| Summary | Pod Resource Congestion status of {{$labels.service}} service is congested for PendingRequest type. |
| Severity | Critical |
| Expression | occnp_pod_resource_congestion_state{type="queue"} == 4 |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.28 |
| Metric Used | occnp_pod_resource_congestion_state{type="queue"} |
| Recommended Actions | The alert gets cleared when the pending requests in the
queue comes below the configured threshold value.
For any additional guidance, contact My Oracle Support. |
8.1.1.41 POD_PENDING_REQUEST_DANGER_OF_CONGESTION
Table 8-42 POD_PENDING_REQUEST_DANGER_OF_CONGESTION
| Field | Details |
|---|---|
| Description | The pod congestion status is set to DoC for pending requests. |
| Summary | Pod Resource Congestion status of {{$labels.service}} service is DoC for PendingRequest type. |
| Severity | Major |
| Expression | occnp_pod_resource_congestion_state{type="queue"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.27 |
| Metric Used | occnp_pod_resource_congestion_state{type="queue"} |
| Recommended Actions | The alert gets cleared when the pending requests in the
queue comes below the configured threshold value.
For any additional guidance, contact My Oracle Support. |
8.1.1.42 POD_CPU_CONGESTED
Table 8-43 POD_CPU_CONGESTED
| Field | Details |
|---|---|
| Name in Alert Yaml File | PodCPUCongested |
| Description | The pod congestion status is set to congested for CPU. |
| Summary | Pod Resource Congestion status of {{$labels.service}} service is congested for CPU type. |
| Severity | Critical |
| Expression | occnp_pod_resource_congestion_state{type="cpu"} == 4 |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.30 |
| Metric Used | occnp_pod_resource_congestion_state{type="cpu"} |
| Recommended Actions | The alert gets cleared when the system CPU usage comes
below the configured threshold value.
For any additional guidance, contact My Oracle Support. |
8.1.1.43 POD_CPU_DANGER_OF_CONGESTION
Table 8-44 POD_CPU_DANGER_OF_CONGESTION
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.service}} service is DoC for CPU type. |
| Summary | Pod Resource Congestion status of {{$labels.service}} service is DoC for CPU type. |
| Severity | Major |
| Expression | The pod congestion status is set to DoC for CPU. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.29 |
| Metric Used | occnp_pod_resource_congestion_state{type="cpu"} |
| Recommended Actions | The alert gets cleared when the system CPU usage comes
below the configured threshold value.
For any additional guidance, contact My Oracle Support. |
8.1.1.44 SERVICE_OVERLOADED
Table 8-45 SERVICE_OVERLOADED
| Field | Details |
|---|---|
| Description | Overload Level of {{$labels.service}} service is L1 |
| Summary | Overload Level of {{$labels.service}} service is L1 |
| Severity | Minor |
| Expression | The overload level of the service is L1. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.40 |
| Metric Used | load_level |
| Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
Table 8-46 SERVICE_OVERLOADED
| Field | Details |
|---|---|
| Description | Overload Level of {{$labels.service}} service is L2 |
| Summary | Overload Level of {{$labels.service}} service is L2 |
| Severity | Major |
| Expression | The overload level of the service is L2. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.40 |
| Metric Used | load_level |
| Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
Table 8-47 SERVICE_OVERLOADED
| Field | Details |
|---|---|
| Description | Overload Level of {{$labels.service}} service is L3 |
| Summary | Overload Level of {{$labels.service}} service is L3 |
| Severity | Critical |
| Expression | The overload level of the service is L3. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.40 |
| Metric Used | load_level |
| Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
8.1.1.45 SERVICE_RESOURCE_OVERLOADED
Alerts when service is in overload state due to memory usage
Table 8-48 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L1 for {{$labels.type}} type |
| Summary | {{$labels.service}} service is L1 for {{$labels.type}} type |
| Severity | Minor |
| Expression | The overload level of the service is L1 due to memory usage. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="memory"} |
| Recommended Actions | The alert gets cleared when the memory usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-49 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L2 for {{$labels.type}} type |
| Summary | {{$labels.service}} service is L2 for {{$labels.type}} type |
| Severity | Major |
| Expression | The overload level of the service is L2 due to memory usage. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="memory"} |
| Recommended Actions | The alert gets cleared when the memory usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-50 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L3 for {{$labels.type}} type. |
| Summary | {{$labels.service}} service is L3 for {{$labels.type}} type |
| Severity | Critical |
| Expression | The overload level of the service is L3 due to memory usage. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="memory"} |
| Recommended Actions | The alert gets cleared when the memory usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Alerts when service is in overload state due to CPU usage
Table 8-51 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L1 for {{$labels.type}} type |
| Summary | {{$labels.service}} service is L1 for {{$labels.type}} type |
| Severity | Minor |
| Expression | The overload level of the service is L1 due to CPU usage. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="cpu"} |
| Recommended Actions | The alert gets cleared when the CPU usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-52 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L2 for {{$labels.type}} type |
| Summary | {{$labels.service}} service is L2 for {{$labels.type}} type |
| Severity | Major |
| Expression | The overload level of the service is L2 due to CPU usage. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="cpu"} |
| Recommended Actions | The alert gets cleared when the CPU usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-53 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L3 for {{$labels.type}} type |
| Summary | {{$labels.service}} service is L3 for {{$labels.type}} type |
| Severity | Critical |
| Expression | The overload level of the service is L3 due to CPU usage. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="cpu"} |
| Recommended Actions | The alert gets cleared when the CPU usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Alerts when service is in overload state due to number of pending messages
Table 8-54 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L1 for {{$labels.type}} type |
| Summary | {{$labels.service}} service is L1 for {{$labels.type}} type |
| Severity | Minor |
| Expression | The overload level of the service is L1 due to number of pending messages. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="svc_pending_count"} |
| Recommended Actions | The alert gets cleared when the number of pending
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-55 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L2 for {{$labels.type}} type |
| Summary | {{$labels.service}} service is L2 for {{$labels.type}} type |
| Severity | Major |
| Expression | The overload level of the service is L2 due to number of pending messages. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="svc_pending_count"} |
| Recommended Actions | The alert gets cleared when the number of pending
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-56 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L3 for {{$labels.type}} type |
| Summary | {{$labels.service}} service is L3 for {{$labels.type}} type |
| Severity | Critical |
| Expression | The overload level of the service is L3 due to number of pending messages. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="svc_pending_count"} |
| Recommended Actions | The alert gets cleared when the number of pending
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Alerts when service is in overload state due to number of failed requests
Table 8-57 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L1 for {{$labels.type}} type. |
| Summary | {{$labels.service}} service is L1 for {{$labels.type}} type. |
| Severity | Minor |
| Expression | The overload level of the service is L1 due to number of failed requests. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="svc_failure_count"} |
| Recommended Actions | The alert gets cleared when the number of failed
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-58 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L2 for {{$labels.type}} type. |
| Summary | {{$labels.service}} service is L2 for {{$labels.type}} type. |
| Severity | Major |
| Expression | The overload level of the service is L2 due to number of failed requests. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="svc_failure_count"} |
| Recommended Actions | The alert gets cleared when the number of failed
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-59 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is L3 for {{$labels.type}} type. |
| Summary | {{$labels.service}} service is L3 for {{$labels.type}} type. |
| Severity | Critical |
| Expression | The overload level of the service is L3 due to number of failed requests. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
| Metric Used | service_resource_overload_level{type="svc_failure_count"} |
| Recommended Actions | The alert gets cleared when the number of failed
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
8.1.1.46 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_CRITICAL_THRESHOLD
Table 8-60 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | Notification Transaction Error exceeds the critical threshold limit for a given Subscriber Notification server |
| Summary | Transaction Error exceeds the critical threshold limit for a given Subscriber Notification server |
| Severity | Critical |
| Expression | The number of error responses for a given subscriber notification server exceeds the critical threshold of 1000. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.42 |
| Metric Used | http_notification_response_total{responseCode!~"2.*"} |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
Table 8-61 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | Notification Transaction Error exceeds the major threshold limit for a given Subscriber Notification server |
| Summary | Transaction Error exceeds the major threshold limit for a given Subscriber Notification server |
| Severity | Major |
| Expression | The number of error responses for a given subscriber notification server exceeds the major threshold value, that is, between 750 and 1000. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.42 |
| Metric Used | http_notification_response_total{responseCode!~"2.*"} |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
Table 8-62 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | Notification Transaction Error exceeds the minor threshold limit for a given Subscriber Notification server |
| Summary | Transaction Error exceeds the minor threshold limit for a given Subscriber Notification server |
| Severity | Minor |
| Expression | The number of error responses for a given subscriber notification server exceeds the minor threshold value, that is, between 500 and 750. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.42 |
| Metric Used | http_notification_response_total{responseCode!~"2.*"} |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.47 SYSTEM_IMPAIRMENT_MAJOR
Table 8-63 SYSTEM_IMPAIRMENT_MAJOR
| Field | Details |
|---|---|
| Description | Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 10 minutes. |
| Summary | Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 10 minutes. |
| Severity | Major |
| Expression | (db_tier_replication_status{role="failed"} == 0) or (db_tier_replication_status{role="active"} == 0) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="standby"})) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="failed"})) or (avg_over_time(db_tier_binlog_used_bytes_percentage[5m])>= 80) |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.43 |
| Metric Used | db_tier_replication_status and db_tier_binlog_used_bytes_percentage |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.48 SYSTEM_IMPAIRMENT_CRITICAL
Table 8-64 SYSTEM_IMPAIRMENT_CRITICAL
| Field | Details |
|---|---|
| Description | Critical impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 30 minutes. |
| Summary | Critical impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 30 minutes. |
| Severity | Critical |
| Expression | (db_tier_replication_status{role="failed"} == 0) or (db_tier_replication_status{role="active"} == 0) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="standby"})) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="failed"})) or (avg_over_time(db_tier_binlog_used_bytes_percentage[5m])>= 80) |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.43 |
| Metric Used | db_tier_replication_status and db_tier_binlog_used_bytes_percentage |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.49 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN
Table 8-65 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN
| Field | Details |
|---|---|
| Description | System Operational State is now in partial shutdown state. |
| Summary | System Operational State is now in partial shutdown state. |
| Severity | Major |
| Expression | system_operational_state == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.17 |
| Metric Used | system_operational_state == 2 |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.50 SYSTEM_OPERATIONAL_STATE_COMPLETE_SHUTDOWN
Table 8-66 SYSTEM_OPERATIONAL_COMPLETE_SHUTDOWN
| Field | Details |
|---|---|
| Description | System Operational State is now in complete shutdown state |
| Summary | System Operational State is now in complete shutdown state |
| Severity | Critical |
| Expression | system_operational_state == 3 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.17 |
| Metric Used | system_operational_state |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.51 TDF_CONNECTION_DOWN
Table 8-67 TDF_CONNECTION_DOWN
| Field | Details |
|---|---|
| Description | TDF connection is down. |
| Summary | TDF connection is down. |
| Severity | Critical |
| Expression | occnp_diam_conn_app_network{applicationName="Sd"} == 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.48 |
| Metric Used | occnp_diam_conn_app_network |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.52 DIAM_CONN_PEER_DOWN
Table 8-68 DIAM_CONN_PEER_DOWN
| Field | Details |
|---|---|
| Description | Diameter connection to peer {{ $labels.peerHost }} is down. |
| Summary | Diameter connection to peer is down. |
| Severity | Major |
| Expression | Diameter connection to peer peerHost in given namespace is down. |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.50 |
| Metric Used | occnp_diam_conn_network |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.53 DIAM_CONN_NETWORK_DOWN
Table 8-69 DIAM_CONN_NETWORK_DOWN
| Field | Details |
|---|---|
| Description | All the diameter network connections are down. |
| Summary | All the diameter network connections are down. |
| Severity | Critical |
| Expression | sum by (kubernetes_namespace)(occnp_diam_conn_network) == 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.51 |
| Metric Used | occnp_diam_conn_network |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.54 DIAM_CONN_BACKEND_DOWN
Table 8-70 DIAM_CONN_BACKEND_DOWN
| Field | Details |
|---|---|
| Description | All the diameter backend connections are down. |
| Summary | All the diameter backend connections are down. |
| Severity | Critical |
| Expression | sum by (kubernetes_namespace)(occnp_diam_conn_backend) == 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.52 |
| Metric Used | occnp_diam_conn_network |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.55 PerfInfoActiveOverloadThresholdFetchFailed
Table 8-71 PerfInfoActiveOverloadThresholdFetchFailed
| Field | Details |
|---|---|
| Description | The application fails to get the current active overload level threshold data. |
| Summary | The application fails to get the current active overload level threshold data. |
| Severity | Major |
| Expression | active_overload_threshold_fetch_failed == 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.53 |
| Metric Used | active_overload_threshold_fetch_failed |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.56 SLA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-72 SLA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | SLA Sy fail count exceeds the critical threshold limit |
| Summary | SLA Sy fail count exceeds the critical threshold limit |
| Severity | Critical |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 > 90 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.58 |
| Metric Used |
occnp_diam_response_local_total |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.57 SLA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-73 SLA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
SLA Sy fail count exceeds the major threshold limit |
| Summary |
SLA Sy fail count exceeds the major threshold limit |
| Severity | Major |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 <= 90 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.58 |
| Metric Used |
occnp_diam_response_local_total |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.58 SLA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-74 SLA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
SLA Sy fail count exceeds the minor threshold limit |
| Summary |
SLA Sy fail count exceeds the minor threshold limit |
| Severity | Minor |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 <= 80 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.58 |
| Metric Used |
occnp_diam_response_local_total |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.59 STA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-75 STA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description |
STA Sy fail count exceeds the critical threshold limit. |
| Summary |
STA Sy fail count exceeds the critical threshold limit. |
| Severity | Critical |
| Expression |
The failure rate of Sy STA responses is more than 90% of the total responses. |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 > 90 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.59 |
| Metric Used |
occnp_diam_response_local_total |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.60 STA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-76 STA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
STA Sy fail count exceeds the major threshold limit. |
| Summary |
STA Sy fail count exceeds the major threshold limit. |
| Severity | Major |
| Expression |
The failure rate of Sy STA responses is more than 80% and less and or equal to 90% of the total responses. |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 <= 90 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.59 |
| Metric Used |
occnp_diam_response_local_total |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.61 STA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-77 STA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
STA Sy fail count exceeds the minor threshold limit. |
| Summary |
STA Sy fail count exceeds the minor threshold limit. |
| Severity | Minor |
| Expression |
The failure rate of Sy STA responses is more than 60% and less and or equal to 80% of the total responses. |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 <= 80 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.59 |
| Metric Used |
occnp_diam_response_local_total |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.62 SMSC_CONNECTION_DOWN
Table 8-78 STASYFailCountExceedsCritcalThreshold
| Field | Details |
|---|---|
| Description | This alert is triggered when connection to SMSC host is down. |
| Summary | Connection to SMSC peer {{$labels.smscName}} is down in notifier service pod {{$labels.pod}} |
| Severity | Major |
| Expression | sum by(namespace, pod, smscName)(occnp_active_smsc_conn_count) == 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.63 |
| Metric Used | occnp_active_smsc_conn_count |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.63 STA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-79 STASYFailCountExceedsCritcalThreshold
| Field | Details |
|---|---|
| Description |
The failure rate of Rx STA responses is more than 90% of the total responses. |
| Summary |
STA Rx fail count exceeds the critical threshold limit. |
| Severity | Critical |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 > 90 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.64 |
| Metric Used |
occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"} |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.64 STA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-80 STA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
The failure rate of Rx STA responses is more than 80% and less and or equal to 90% of the total responses. |
| Summary |
STA Rx fail count exceeds the major threshold limit. |
| Severity | Major |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 <= 90 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.64 |
| Metric Used |
occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"} |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) & AF and ensure connectivity is present. Check that the session and user is valid and hasn't been removed in the Policy database, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.65 STA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-81 STA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
The failure rate of Rx STA responses is more than 60% and less and or equal to 80% of the total responses. |
| Summary |
STA Rx fail count exceeds the minor threshold limit. |
| Severity | Minor |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 <= 80 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.64 |
| Metric Used |
occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"} |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) & AF and ensure connectivity is present. Check that the session and user is valid and hasn't been removed in the Policy database, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.66 SNA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-82 SNA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description |
The failure rate of Sy SNA responses is more than 90% of the total responses. |
| Summary |
SNA Sy fail count exceeds the critical threshold limit |
| Severity | Critical |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 > 90 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.65 |
| Metric Used |
occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"} |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.67 SNA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-83 SNA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
The failure rate of Sy SNA responses is more than 80% and less and or equal to 90% of the total responses. |
| Summary |
SNA Sy fail count exceeds the major threshold limit |
| Severity | Major |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 <= 90 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.65 |
| Metric Used |
occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"} |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.68 SNA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-84 SNA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
The failure rate of Sy STA responses is more than 60% and less and or equal to 80% of the total responses. |
| Summary |
SNA Sy fail count exceeds the minor threshold limit |
| Severity | Minor |
| Expression |
sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 <= 80 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.65 |
| Metric Used |
occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"} |
| Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.1.1.69 STALE_DIAMETER_REQUEST_CLEANUP_MINOR
Table 8-85 STALE_DIAMETER_REQUEST_CLEANUP_MINOR
| Field | Details |
|---|---|
| Description | This alerts is triggered when more than 10 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Summary | The Diam requests are being discarded due to timeout processing occurring above 10%. |
| Severity | Minor |
| Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 10 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.70 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR
Table 8-86 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 20 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Summary | The Diam requests are being discarded due to timeout processing occurring above 20%. |
| Severity | Major |
| Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.71 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL
Table 8-87 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 30 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Summary | The Diam requests are being discarded due to timeout processing occurring above 30%. |
| Severity | Critical |
| Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.72 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR
Table 8-88 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR
| Field | Details |
|---|---|
| Description | Certificate expiry in less than 6 months. |
| Summary | Certificate expiry in less than 6 months. |
| Severity | Minor |
| Expression | dgw_tls_cert_expiration_seconds - time() <= 15724800 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.47 |
| Metric Used | dgw_tls_cert_expiration_seconds |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.73 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR
Table 8-89 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR
| Field | Details |
|---|---|
| Description | Certificate expiry in less than 3 months. |
| Summary | Certificate expiry in less than 3 months. |
| Severity | Major |
| Expression | dgw_tls_cert_expiration_seconds - time() <= 7862400 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.47 |
| Metric Used | dgw_tls_cert_expiration_seconds |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.74 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL
Table 8-90 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL
| Field | Details |
|---|---|
| Description | Certificate expiry in less than 1 month. |
| Summary | Certificate expiry in less than 1 month. |
| Severity | Critical |
| Expression | dgw_tls_cert_expiration_seconds - time() <= 2592000 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.47 |
| Metric Used | dgw_tls_cert_expiration_seconds |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.75 DGW_TLS_CONNECTION_FAILURE
Table 8-91 DGW_TLS_CONNECTION_FAILURE
| Field | Details |
|---|---|
| Description | Alert for TLS connection establishment. |
| Summary | TLS Connection failure when Diam gateway is an initiator. |
| Severity | Major |
| Expression | sum by (namespace,reason)(occnp_diam_failed_conn_network) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.81 |
| Metric Used | occnp_diam_failed_conn_network |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.76 POLICY_CONNECTION_FAILURE
Table 8-92 POLICY_CONNECTION_FAILURE
| Field | Details |
|---|---|
| Description | Connection failure on Egress and Ingress Gateways for incoming and outgoing connections. |
| Summary | Connection failure on Egress and Ingress Gateways for incoming and outgoing connections. |
| Severity | Major |
| Expression | sum(increase(occnp_oc_ingressgateway_connection_failure_total[5m]) >0
or (occnp_oc_ingressgateway_connection_failure_total unless
occnp_oc_ingressgateway_connection_failure_total offset 5m )) by
(namespace,app, error_reason) > 0
or sum(increase(occnp_oc_egressgateway_connection_failure_total[5m]) >0 or (occnp_oc_egressgateway_connection_failure_total unless occnp_oc_egressgateway_connection_failure_total offset 5m )) by (namespace,app, error_reason) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.76 |
| Metric Used | occnp_oc_ingressgateway_connection_failure_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.77 AUDIT_NOT_RUNNING
Table 8-93 AUDIT_NOT_RUNNING
| Field | Details |
|---|---|
| Description | Audit has not been running for at least 1 hour. |
| Summary | Audit has not been running for at least 1 hour. |
| Severity | CRITICAL |
| Expression | (absent_over_time(data_repository_invocations_seconds_count{method="getQueuedTablesToAudit"}[1h]) == 1) OR (sum(increase(data_repository_invocations_seconds_count{method="getQueuedTablesToAudit"}[1h])) == 0) |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.78 |
| Metric Used | data_repository_invocations_seconds_count |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.78 DIAMETER_POD_ERROR_RESPONSE_MINOR
Table 8-94 DIAMETER_POD_ERROR_RESPONSE_MINOR
| Field | Details |
|---|---|
| Description | At least 1% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
| Summary | At least 1% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
| Severity | MINOR |
| Expression | (topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.79 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.79 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD
Table 8-95 DIAMETER_POD_ERROR_RESPONSE_MAJOR
| Field | Details |
|---|---|
| Description | At least 5% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
| Summary | At least 5% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
| Severity | MAJOR |
| Expression | (topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=5 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.79 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.80 DIAMETER_POD_ERROR_RESPONSE_CRITICAL
Table 8-96 DIAMETER_POD_ERROR_RESPONSE_CRITICAL
| Field | Details |
|---|---|
| Description | At least 10% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER |
| Summary | At least 10% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER |
| Severity | CRITICAL |
| Expression | (topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=10 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.79 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.1.81 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD
Table 8-97 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | lockAcquisitionExceedsCriticalThreshold |
| Description | The lock requests fails to acquire the lock count exceeds the critical threshold limit. The (current value is: {{ $value }}) |
| Summary | Keys used in Bulwark lock request which are already in locked state detected above 75 Percent of Total Transactions. |
| Severity | Critical |
| Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
| Metric Used | lock_request_total |
| Recommended Actions |
Cause This alert fires when, within a 5-minute window, above 75% of lock acquisition requests (acquireLock) to the Bulwark service in any namespace fail. Elevated lock acquisition failure rates may indicate:
Diagnostic Information
Recovery
|
8.1.1.82 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD
Table 8-98 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | lockAcquisitionExceedsMajorThreshold |
| Description | The lock requests fails to acquire the lock count exceeds the major threshold limit. The (current value is: {{ $value }}) |
| Summary | Keys used in Bulwark lock request which are already in locked state detected above 50 Percent of Total Transactions. |
| Severity | Major |
| Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >= 50 < 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
| Metric Used | lock_request_total |
| Recommended Actions |
Cause This alert fires when, within a 5-minute window, between 50% and 75% of lock acquisition requests (acquireLock) to the Bulwark service in any namespace fail. Elevated lock acquisition failure rates may indicate:
Diagnostic Information
Recovery
|
8.1.1.83 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD
Table 8-99 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | lockAcquisitionExceedsMinorThreshold |
| Description | The lock requests fails to acquire the lock count exceeds the minor threshold limit. The (current value is: {{ $value }}) |
| Summary | Keys used in Bulwark lock request which are already in locked state detected above 20 Percent of Total Transactions. |
| Severity | Minor |
| Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=20 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
| Metric Used | lock_request_total |
| Recommended Actions |
Cause This alert fires when, within a 5-minute window, between 20% and 50% of lock acquisition requests (acquireLock) to the Bulwark service in any namespace fail. Elevated lock acquisition failure rates may indicate:
Diagnostic Information
Recovery
|
8.1.1.84 CERTIFICATE_EXPIRY_MINOR
Table 8-100 CERTIFICATE_EXPIRY_MINOR
| Field | Details |
|---|---|
| Description | Certificate expiry in less than 6 months |
| Summary | Certificate expiry in less than 6 months |
| Severity | MINOR |
| Expression | security_cert_x509_expiration_seconds - time() <= 15724800 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.77 |
| Metric Used | - |
| Recommended Actions | - |
8.1.1.85 CERTIFICATE_EXPIRY_MAJOR
Table 8-101 CERTIFICATE_EXPIRY_MAJOR
| Field | Details |
|---|---|
| Description | Certificate expiry in less than 3 months |
| Summary | Certificate expiry in less than 3 months |
| Severity | MAJOR |
| Expression | security_cert_x509_expiration_seconds - time() <= 7862400 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.77 |
| Metric Used | - |
| Recommended Actions | - |
8.1.1.86 CERTIFICATE_EXPIRY_CRITICAL
Table 8-102 CERTIFICATE_EXPIRY_CRITICAL
| Field | Details |
|---|---|
| Description | Certificate expiry in less than 1 months |
| Summary | Certificate expiry in less than 1 months |
| Severity | CRITICAL |
| Expression | security_cert_x509_expiration_seconds - time() <= 2592000 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.77 |
| Metric Used | - |
| Recommended Actions | - |
8.1.1.87 PERF_INFO_ACTIVE_OVERLOADTHRESHOLD_DATA_PRESENT
Table 8-103 PERF_INFO_ACTIVE_OVERLOADTHRESHOLD_DATA_PRESENT
| Field | Details |
|---|---|
| Description | - |
| Summary | - |
| Severity | MINOR |
| Expression | active_overload_threshold_fetch_failed == 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.53 |
| Metric Used | - |
| Recommended Actions | - |
8.1.1.88 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Table 8-104 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
| Field | Details |
|---|---|
| Description | More than 10% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
| Summary | More than 10% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
| Severity | MINOR |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 10 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.85 |
| Metric Used | occnp_late_processing_rejection_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.89 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Table 8-105 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
| Field | Details |
|---|---|
| Description | More than 20% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
| Summary | More than 20% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
| Severity | MAJOR |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.85 |
| Metric Used | occnp_late_processing_rejection_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.90 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Table 8-106 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
| Field | Details |
|---|---|
| Description | More than 30% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
| Summary | More than 30% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
| Severity | CRITICAL |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.85 |
| Metric Used | occnp_late_processing_rejection_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.91 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Table 8-107 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
| Field | Details |
|---|---|
| Description | More than 10% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
| Summary | More than 10% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
| Severity | MINOR |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 10 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.86 |
| Metric Used | occnp_late_processing_rejection_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.92 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Table 8-108 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
| Field | Details |
|---|---|
| Description | More than 20% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
| Summary | More than 20% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
| Severity | MAJOR |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.86 |
| Metric Used | occnp_late_processing_rejection_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.93 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Table 8-109 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
| Field | Details |
|---|---|
| Description | More than 30% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
| Summary | More than 30% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
| Severity | CRITICAL |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.86 |
| Metric Used | occnp_late_processing_rejection_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.94 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
Table 8-110 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
| Field | Details |
|---|---|
| Description | Policy Egress Gateway Data Director unreachable for {{$labels.namespace}}. |
| Summary | kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} BSF Egress Gateway Data Director unreachable |
| Severity | Major |
| Expression | sum(oc_egressgateway_dd_unreachable) by(namespace,container) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.84 |
| Metric Used | oc_egressgateway_dd_unreachable |
| Recommended Actions | Alert gets cleared automatically when the connection with data director is established. |
8.1.1.95 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
Table 8-111 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
| Field | Details |
|---|---|
| Description | Policy Ingress Gateway Data Director unreachable for {{$labels.namespace}}. |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} BSF Ingress Gateway Data Director unreachable' |
| Severity | Major |
| Expression | sum(oc_ingressgateway_dd_unreachable) by(namespace,container) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.83 |
| Metric Used | oc_ingressgateway_dd_unreachable |
| Recommended Actions | Alert gets cleared automatically when the connection with data director is established. |
8.1.1.96 STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Table 8-112 STALE_HTTP_REQUEST_CLEANUP_CRITICAL
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 30 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Summary | This alert is triggered when more than 30 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Severity | Critical |
| Expression | - |
| OID | - |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.97 STALE_HTTP_REQUEST_CLEANUP_MAJOR
Table 8-113 STALE_HTTP_REQUEST_CLEANUP_MAJOR
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 20 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Summary | This alert is triggered when more than 20 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Severity | Major |
| Expression | - |
| OID | - |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.98 STALE_HTTP_REQUEST_CLEANUP_MINOR
Table 8-114 STALE_HTTP_REQUEST_CLEANUP_MINOR
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 10 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Summary | This alert is triggered when more than 10 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Severity | Minor |
| Expression | - |
| OID | - |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.99 STALE_BINDING_REQUEST_REJECTION_CRITICAL
Table 8-115 STALE_BINDING_REQUEST_REJECTION_CRITICAL
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 30 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Summary | '{{ $value }} % of requests are being discarded by binding svc due to request being stale either on arrival or during processing.'summary: "More than 30% of the Binding requests failed with error TIMED_OUT_REQUEST" |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total {microservice=~".*binding"}[5m]))+sum by (namespace) rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m])))/(sum by (namespace) (rate(ocpm_binding_inbound_request_total{microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.87 |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.100 STALE_BINDING_REQUEST_REJECTION_MAJOR
Table 8-116 STALE_BINDING_REQUEST_REJECTION_MAJOR
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 20 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Summary | '{{ $value }} % of requests are being discarded by binding svc due to request being stale either on arrival or during processing.'summary: "More than 20% of the Binding requests failed with error TIMED_OUT_REQUEST" |
| Severity | Major |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m])))/(sum by (namespace) (rate(ocpm_binding_inbound_request_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m]))) * 100 >= 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.87 |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.101 STALE_BINDING_REQUEST_REJECTION_MINOR
Table 8-117 STALE_BINDING_REQUEST_REJECTION_MINOR
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 10 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
| Summary | '{{ $value }} % of requests are being discarded by binding service due to request being stale either on arrival or during processing.' summary: "More than 10% of the Binding requests failed with error TIMED_OUT_REQUEST" |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"} [5m])))/(sum by (namespace) (rate(ocpm_binding_inbound_request_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m]))) * 100 >= 10 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.87 |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.102 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Table 8-118 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
| Field | Details |
|---|---|
| Description | At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Summary | At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Severity | Minor |
| Expression | - |
| OID | - |
| Metric Used |
|
| Recommended Actions | - |
8.1.1.103 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Table 8-119 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
| Field | Details |
|---|---|
| Description | At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Summary | At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Severity | Major |
| Expression | - |
| OID | - |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.104 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Table 8-120 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
| Field | Details |
|---|---|
| Description | At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Summary | At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Severity | Critical |
| Expression | - |
| OID | - |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.105 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Table 8-121 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
| Field | Details |
|---|---|
| Description | At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Summary | At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Severity | Minor |
| Expression | - |
| OID | - |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.106 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Table 8-122 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
| Field | Details |
|---|---|
| Description | At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Summary | At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Severity | Major |
| Expression | - |
| OID | - |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.107 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Table 8-123 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
| Field | Details |
|---|---|
| Description | At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Summary | At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
| Severity | Critical |
| Expression | - |
| OID | - |
| Metric Used |
|
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.1.108 UPDATE_NOTIFY_TIMEOUT_ABOVE_70_PERCENT
Table 8-124 UPDATE_NOTIFY_TIMEOUT_ABOVE_70_PERCENT
| Field | Details |
|---|---|
| Description | Number of Update Notify failed because a timeout is equal to or above 70% in a given time period. |
| Summary | Number of Update Notify failed because a timeout is equal to or above 70% in a given time period. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_for_rx_collision_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70 |
| OID | - |
| Metric Used | occnp_http_out_conn_response_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.109 UPDATE_NOTIFY_TIMEOUT_ABOVE_50_PERCENT
Table 8-125 UPDATE_NOTIFY_TIMEOUT_ABOVE_50_PERCENT
| Field | Details |
|---|---|
| Description | Number of Update Notify that failed because a timeout is equal to or above 50% but less than 70% in a given time period. |
| Summary | Number of Update Notify that failed because a timeout is equal to or above 50% but less than 70% in a given time period. |
| Severity | Major |
| Expression | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_for_rx_collision_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70 |
| OID | - |
| Metric Used | occnp_http_out_conn_response_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.110 UPDATE_NOTIFY_TIMEOUT_ABOVE_30_PERCENT
Table 8-126 UPDATE_NOTIFY_TIMEOUT_ABOVE_30_PERCENT
| Field | Details |
|---|---|
| Description | Number of Update Notify that failed because a timeout is equal to or above 30% but less than 50% of total Rx sessions. |
| Summary | Number of Update Notify that failed because a timeout is equal to or above 30% but less than 50% of total Rx sessions. |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_for_rx_collision_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 30 < 50 |
| OID | - |
| Metric Used | occnp_http_out_conn_response_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.111 POLICYDS_PREEXPIRY_RESUBSCRIBE_FAILURE_MINOR
Table 8-127 POLICYDS_PREEXPIRY_RESUBSCRIBE_FAILURE_MINOR
| Field | Details |
|---|---|
| Description | If 30% to 50% of subscriptions which are in PRE_EXPIRY period fail to resubscribe, this alert will be raised. |
| Summary | If 30% to 50% of subscriptions which are in PRE_EXPIRY period fail to resubscribe, this alert will be raised. |
| Severity | Minor |
| Expression | (sum by (microservice, namespace) (rate(occnp_policy_data_resubscription_response_total{expiryStatus="PRE_EXPIRY",response!~"2.*"}[5m]))) / (sum by (microservice, namespace) (rate(occnp_policy_data_resubscription_response_total{expiryStatus="PRE_EXPIRY"}[5m]))) * 100 > 30 <= 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.129 |
| Metric Used | occnp_policy_data_resubscription_response_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.112 POLICYDS_PREEXPIRY_RESUBSCRIBE_FAILURE_MAJOR
Table 8-128 POLICYDS_PREEXPIRY_RESUBSCRIBE_FAILURE_MAJOR
| Field | Details |
|---|---|
| Description | If 50% to 70% of subscriptions which are in PRE_EXPIRY period fail to resubscribe, this alert will be raised. |
| Summary | If 50% to 70% of subscriptions which are in PRE_EXPIRY period fail to resubscribe, this alert will be raised. |
| Severity | Major |
| Expression | (sum by (microservice, namespace) (rate(occnp_policy_data_resubscription_response_total{expiryStatus="PRE_EXPIRY",response!~"2.*"}[5m]))) / (sum by (microservice, namespace) (rate(occnp_policy_data_resubscription_response_total{expiryStatus="PRE_EXPIRY"}[5m]))) * 100 > 50 <= 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.129 |
| Metric Used | occnp_policy_data_resubscription_response_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.113 POLICYDS_PREEXPIRY_RESUBSCRIBE_FAILURE_CRITICAL
Table 8-129 POLICYDS_PREEXPIRY_RESUBSCRIBE_FAILURE_CRITICAL
| Field | Details |
|---|---|
| Description | If 70% of subscriptions which are in PRE_EXPIRY period fail to resubscribe, this alert will be raised. |
| Summary | If 70% of subscriptions which are in PRE_EXPIRY period fail to resubscribe, this alert will be raised. |
| Severity | Critical |
| Expression | (sum by (microservice, namespace) (rate(occnp_policy_data_resubscription_response_total{expiryStatus="PRE_EXPIRY",response!~"2.*"}[5m]))) / (sum by (microservice, namespace) (rate(occnp_policy_data_resubscription_response_total{expiryStatus="PRE_EXPIRY"}[5m]))) * 100 > 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.129 |
| Metric Used | occnp_policy_data_resubscription_response_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.114 POLICYDS_EXPIRED_SUBSCRIPTION
Table 8-130 POLICYDS_EXPIRED_SUBSCRIPTION
| Field | Details |
|---|---|
| Description | If more than 10% of audited subscriptions are expired, this alert will be raised. |
| Summary | If more than 10% of audited subscriptions are expired, this alert will be raised. |
| Severity | Major |
| Expression | (sum by (microservice, namespace) (rate(occnp_policy_data_resubscription_request_total{expiryStatus="EXPIRED"}[5m]))) / (sum by (microservice, namespace) (rate(occnp_policy_data_resubscription_request_total[5m]))) * 100 > 10 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.130 |
| Metric Used | occnp_policy_data_resubscription_request_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.1.115 LDAP_PEER_CONNECTION_LOST
Table 8-131 LDAP_PEER_CONNECTION_LOST
| Field | Details |
|---|---|
| Name in Alert Yaml File | LDAP_PEER_CONNECTION_LOST |
| Description | This alert is triggered when the LDAP Gateway loses connection to its LDAP peer(s). It is based on the value of the occnp_ldap_conn_total metric falling to zero. The connection re-attempt and alert clearance behavior is governed by a new configuration parameter, LDAP_CONNECTION_REVERT_DELAY. |
| Summary | LDAP Gateway loses connection to its LDAP peer(s). |
| Severity | major |
| Expression | sum by (namespace,peer)(occnp_ldap_conn_total) == 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.113 |
| Metric Used | occnp_ldap_conn_total |
| Recommended Actions |
|
8.1.1.116 IGW_POD_PROTECTION_DOC_STATE
Table 8-132 IGW_POD_PROTECTION_DOC_STATE
| Field | Details |
|---|---|
| Description | The Ingress Gateway is in Danger_of_Congestion Level for the pod {{$labels.pod}} in namespace {{$labels.namespace}} ( current congestion level: {{ $value }} % ) |
| Summary | Ingress Gateway pod congestion state in Danger_of_Congestion Level. |
| Severity | Minor |
| Expression | oc_ingressgateway_congestion_system_state{microservice=~".*ingress-gateway"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.123 |
| Metric Used | oc_ingressgateway_congestion_system_state |
| Recommended Actions |
The alert is cleared when the pod CPU consumption dropped below the configured abatement value for the DOC level. |
8.1.1.117 IGW_POD_PROTECTION_CONGESTED_STATE
Table 8-133 IGW_POD_PROTECTION_CONGESTED_STATE
| Field | Details |
|---|---|
| Description | The Ingress Gateway is in Congested Level for the pod {{$labels.pod}} in namespace {{$labels.namespace}} ( current congestion level: {{ $value }} % ) |
| Summary | Ingress Gateway pod congestion state in Congested level. |
| Severity | Critical |
| Expression | sum(oc_ingressgateway_congestion_system_state{app_kubernetes_io_name="occnp-ingress-gateway"}) by (pod) == 4 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.123 |
| Metric Used | oc_ingressgateway_congestion_system_state |
| Recommended Actions | The alert is cleared when the pod CPU consumption dropped below the configured abatement value for the Congested level. |
8.1.2 PCF Alerts
This section provides information on PCF alerts.
8.1.2.1 UDR_SM_IMMREP_RESPONSE_MISSING_DATA_MINOR
Table 8-134 UDR_SM_IMMREP_RESPONSE_MISSING_DATA_MINOR
| Field | Details |
|---|---|
| Description | UDR returning with POST subscribe response but without user data for SM as part of immediate reporting occurring above 10% for service {{$labels.microservice}} in {{$labels.namespace}} ( current value: {{ $value }} % ) |
| Summary | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting |
| Severity | Minor |
| Expression | (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post",imm_reports_present="false"}[5m]))) / (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post"}[5m]))) * 100 >= 10 < 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.127 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.2 UDR_SM_IMMREP_RESPONSE_MISSING_DATA_MAJOR
Table 8-135 UDR_SM_IMMREP_RESPONSE_MISSING_DATA_MAJOR
| Field | Details |
|---|---|
| Description | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting |
| Summary | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting |
| Severity | Major |
| Expression | (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post",imm_reports_present="false"}[5m]))) / (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post"}[5m]))) * 100 >= 20 < 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.127 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.3 UDR_SM_IMMREP_RESPONSE_MISSING_DATA_CRITICAL
Table 8-136 UDR_SM_IMMREP_RESPONSE_MISSING_DATA_CRITICAL
| Field | Details |
|---|---|
| Description | More than 30% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting. |
| Summary | More than 30% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting. |
| Severity | Critical |
| Expression | (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post",imm_reports_present="false"}[5m]))) / (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post"}[5m]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.127 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.4 UDR_SM_IMMREP_FEATURE_NEGOTIATION_FAILED_MINOR
Table 8-137 UDR_SM_IMMREP_FEATURE_NEGOTIATION_FAILED_MINOR
| Field | Details |
|---|---|
| Description | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting. |
| Summary | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting. |
| Severity | Minor |
| Expression | (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post",immediate_report_pcc="false"}[5m]))) / (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post"}[5m]))) * 100 >= 10 < 20 |
| OID | occnp_immrep_response_total |
| Metric Used | 1.3.6.1.4.1.323.5.3.52.1.2.128 |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.5 UDR_SM_IMMREP_FEATURE_NEGOTIATION_FAILED_MAJOR
Table 8-138 UDR_SM_IMMREP_FEATURE_NEGOTIATION_FAILED_MAJOR
| Field | Details |
|---|---|
| Description | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting. |
| Summary | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting. |
| Severity | Major |
| Expression | (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post",immediate_report_pcc="false"}[5m]))) / (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post"}[5m]))) * 100 >= 20 < 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.128 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.6 UDR_SM_IMMREP_FEATURE_NEGOTIATION_FAILED_CRITICAL
Table 8-139 UDR_SM_IMMREP_FEATURE_NEGOTIATION_FAILED_CRITICAL
| Field | Details |
|---|---|
| Description | More than 30% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting. |
| Summary | More than 30% of the traffic, UDR returned with POST subscribe response but without user data for SM as part of immediate reporting. |
| Severity | Critical |
| Expression | (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post",immediate_report_pcc="false"}[5m]))) / (sum by (microservice, namespace) (rate(occnp_immrep_response_total{service_subresource="sm-data",operation_type="post"}[5m]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.128 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.7 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MINOR
Table 8-140 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MINOR
| Field | Details |
|---|---|
| Description | The Diameter requests are being discarded due to timeout processing occurring above 10% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | More than 10% of the Diam Connector requests failed with error DIAMETER_ERROR_TIMED_OUT_REQUEST. |
| Severity | Minor |
| Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total{microservice=diam-connector}[5m]))) / (sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER", microservice=diam-connector}[5m]))) * 100 >= 10 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.88 |
| Metric Used |
|
| Recommended Actions |
The alert gets cleared when the number of stale requests is below 10% of the total requests. To troubleshoot and resolve the issue, perform the following steps:
For further assistance, contact My Oracle Support. |
8.1.2.8 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MAJOR
Table 8-141 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MAJOR
| Field | Details |
|---|---|
| Description | The Diameter requests are being discarded due to timeout processing occurring above 20% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | More than 20% of the Diam Connector requests failed with error DIAMETER_ERROR_TIMED_OUT_REQUEST. |
| Severity | Major |
| Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total{microservice=diam-connector}[5m]))) / (sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER", microservice=diam-connector}[5m]))) * 100 >= 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.88 |
| Metric Used |
|
| Recommended Actions |
The alert gets cleared when the number of stale requests is below 20% of the total requests. To troubleshoot and resolve the issue, perform the following steps:
For further assistance, contact My Oracle Support. |
8.1.2.9 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_CRITICAL
Table 8-142 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_CRITICAL
| Field | Details |
|---|---|
| Description | The Diameter requests are being discarded due to timeout processing occurring above 30% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | More than 30% of the Diam Connector requests failed with error DIAMETER_ERROR_TIMED_OUT_REQUEST. |
| Severity | Critical |
| Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total{microservice=diam-connector}[5m]))) / (sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER", microservice=diam-connector}[5m]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.88 |
| Metric Used |
|
| Recommended Actions |
The alert gets cleared when the number of stale requests is below 30% of the total requests. To troubleshoot and resolve the issue, perform the following steps:
For further assistance, contact My Oracle Support. |
8.1.2.10 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_CRITICAL_THRESHOLD
Table 8-143 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | {{ $value }} % binding were missing but restored from BSF over all bindings being audited in {{$labels.namespace}}. |
| Summary | 70% or more of binding were missing but restored from BSF over all bindings being audited. |
| Severity | Critical |
| Expression |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code="2xx",action="restored"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx"}[5m]))) * 100 >= 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.89 |
| Metric Used | occnp_session_binding_revalidation_response_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.11 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MAJOR_THRESHOLD
Table 8-144 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | {{ $value }} % binding were missing but restored from BSF over all bindings being audited in {{$labels.namespace}}. |
| Summary | 50% to 70% of binding were missing but restored from BSF over all bindings being audited |
| Severity | Major |
| Expression |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code="2xx",action="restored"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx"}[5m]))) * 100 >= 50 < 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.89 |
| Metric Used | occnp_session_binding_revalidation_response_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.12 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MINOR_THRESHOLD
Table 8-145 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | {{ $value }} % binding were missing but restored from BSF over all bindings being audited in {{$labels.namespace}}. |
| Summary | 30% to 50% of binding were missing but restored from BSF over all bindings being audited. |
| Severity | Minor |
| Expression |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx",action="restored"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx"}[5m]))) * 100 >= 30 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.89 |
| Metric Used | occnp_session_binding_revalidation_response_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.13 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_CRITICAL_THRESHOLD
Table 8-146 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | {{ $value }} % failed Revalidation Responses received from BSF over total Revalidation Responses in {{$labels.namespace}}. |
| Summary | 70% or more of failed Revalidation Responses received from BSF over total Revalidation Responses |
| Severity | Critical |
| Expression |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code!~"2.*"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding"}[5m]))) * 100 >= 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.90 |
| Metric Used | occnp_session_binding_revalidation_response_total |
| Recommended Actions |
Verify the health condition of BSF Management Service. For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.14 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MAJOR_THRESHOLD
Table 8-147 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | {{ $value }} % failed Revalidation Responses received from BSF over total Revalidation Responses in {{$labels.namespace}}. |
| Summary | 50% to 70% of failed Revalidation Responses received from BSF over total Revalidation Responses |
| Severity | Major |
| Expression |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code!~"2.*"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding"}[5m]))) * 100 >= 50 < 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.90 |
| Metric Used | occnp_session_binding_revalidation_response_total |
| Recommended Actions |
Verify the health condition of BSF Management Service. For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.15 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MINOR_THRESHOLD
Table 8-148 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | {{ $value }} % failed Revalidation Responses received from BSF over total Revalidation Responses in {{$labels.namespace}}. |
| Summary | 30% to 50% of failed Revalidation Responses received from BSF over total Revalidation Responses |
| Severity | Minor |
| Expression |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code!~"2.*"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding"}[5m]))) * 100 >= 30 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.90 |
| Metric Used | occnp_session_binding_revalidation_response_total |
| Recommended Actions |
Verify the health condition of BSF Management Service. For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.16 N7_OPTIMIZED_LOOKUP_ERROR_RATE_ABOVE_MINOR_THRESHOLD_PERCENT
Table 8-149 N7_OPTIMIZED_LOOKUP_ERROR_RATE_ABOVE_MINOR_THRESHOLD_PERCENT
| Field | Details |
|---|---|
| Description | when {{ $value }} % of primary key lookup fails during PA create in namespace {{$labels.namespace}} |
| Summary | Primary Key lookup failed is equal or above 10% but less than 50% of total PA create. |
| Severity | Minor |
| Expression | sum by (namespace)(increase(occnp_optimized_smpolicyassociation_lookup_query_total{status="failed"}[30m])) / sum by (namespace)(increase(occnp_optimized_smpolicyassociation_lookup_query_total[30m])) * 100 >= 10 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.124 |
| Metric Used | occnp_optimized_smpolicyassociation_lookup_query_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.17 N7_OPTIMIZED_LOOKUP_ERROR_RATE_ABOVE_MAJOR_THRESHOLD_PERCENT
Table 8-150 N7_OPTIMIZED_LOOKUP_ERROR_RATE_ABOVE_MAJOR_THRESHOLD_PERCENT
| Field | Details |
|---|---|
| Description | when {{ $value }} % of primary key lookup fails during PA create in namespace {{$labels.namespace}} |
| Summary | Primary Key lookup failed is equal or above 50% but less than 75% of total PA create. |
| Severity | Major |
| Expression | sum by (namespace)(increase(occnp_optimized_smpolicyassociation_lookup_query_total{status="failed"}[30m])) / sum by (namespace)(increase(occnp_optimized_smpolicyassociation_lookup_query_total[30m])) * 100 >= 50 < 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.124 |
| Metric Used | occnp_optimized_smpolicyassociation_lookup_query_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.18 N7_OPTIMIZED_LOOKUP_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD_PERCENT
Table 8-151 N7_OPTIMIZED_LOOKUP_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD_PERCENT
| Field | Details |
|---|---|
| Description | when {{ $value }} % of primary key lookup fails during PA create in namespace {{$labels.namespace}} |
| Summary | Primary Key lookup failed is equal or above 75% of total PA create |
| Severity | Critical |
| Expression | sum by (namespace)(increase(occnp_optimized_smpolicyassociation_lookup_query_total{status="failed"}[30m])) / sum by (namespace)(increase(occnp_optimized_smpolicyassociation_lookup_query_total[30m])) * 100 >= 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.124 |
| Metric Used | occnp_optimized_smpolicyassociation_lookup_query_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.19 SM_SVC_REQ_ENHANCED_OVERLOAD_REJECTION_MINOR
Table 8-152 SM_SVC_REQ_ENHANCED_OVERLOAD_REJECTION_MINOR
| Field | Details |
|---|---|
| Description | {{ $value }}% of incoming request towards pcf_sm service are rejected due to enhanced overload control mechanism |
| Summary | At least 10% of the received Requests have been rejected due to Overload state of pcf-sm service in namespace {{$labels.namespace}} |
| Severity | Minor |
| Expression | ( sum by (namespace) (rate(occnp_enhanced_overload_reject_total{microservice=~".*pcf_sm"}[2m])) / (sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_sm"}[2m]) or occnp_enhanced_overload_reject_total * 0) + (sum by (namespace) (rate(session_oam_request_total{microservice=~".*pcf_sm"}[2m]) or occnp_enhanced_overload_reject_total * 0) ) ) ) * 100 >= 10 < 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.125 |
| Metric Used | occnp_enhanced_overload_reject_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.20 SM_SVC_REQ_ENHANCED_OVERLOAD_REJECTION_MAJOR
Table 8-153 SM_SVC_REQ_ENHANCED_OVERLOAD_REJECTION_MAJOR
| Field | Details |
|---|---|
| Description | {{ $value }}% of incoming request towards pcf_sm service are rejected due to enhanced overload control mechanism |
| Summary | At least 20% of the received Requests have been rejected due to Overload state of pcf-sm service in namespace {{$labels.namespace}} |
| Severity | Major |
| Expression | ( sum by (namespace) (rate(occnp_enhanced_overload_reject_total{microservice=~".*pcf_sm"}[2m])) / (sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_sm"}[2m]) or occnp_enhanced_overload_reject_total * 0) + (sum by (namespace) (rate(session_oam_request_total{microservice=~".*pcf_sm"}[2m]) or occnp_enhanced_overload_reject_total * 0) ) ) ) * 100 >= 20 < 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.125 |
| Metric Used | occnp_enhanced_overload_reject_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.21 SM_SVC_REQ_ENHANCED_OVERLOAD_REJECTION_CRITICAL
Table 8-154 SM_SVC_REQ_ENHANCED_OVERLOAD_REJECTION_CRITICAL
| Field | Details |
|---|---|
| Description | {{ $value }}% of incoming request towards pcf_sm service are rejected due to enhanced overload control mechanism |
| Summary | At least 30% of the received Requests have been rejected due to Overload state of pcf-sm service in namespace {{$labels.namespace}}. |
| Severity | Critical |
| Expression | ( sum by (namespace) (rate(occnp_enhanced_overload_reject_total{microservice=~".*pcf_sm"}[2m])) / (sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_sm"}[2m]) or occnp_enhanced_overload_reject_total * 0) + (sum by (namespace) (rate(session_oam_request_total{microservice=~".*pcf_sm"}[2m]) or occnp_enhanced_overload_reject_total * 0) ) ) ) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.125 |
| Metric Used | occnp_enhanced_overload_reject_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.1.2.22 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MINOR_THRESHOLD
Table 8-155 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MINOR_THRESHOLD |
| Description | More than 70% of timer capacity has been occupied for n1n2 transfer failure notification |
| Summary | More than 70% of timer capacity has been occupied for n1n2 transfer failure notification |
| Severity | Minor |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2TransferFailure"})/360000) * 100 > 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.107 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: This alert indicates sustained high utilization of the UE N1N2 Transfer Failure Notification timer pool. The occnp_timer_capacity gauge tracks the current number of outstanding timers per timerName, updated every timer scan.
Diagnostic Information:
Recovery:
|
8.1.2.23 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MAJOR_THRESHOLD
Table 8-156 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MAJOR_THRESHOLD |
| Description | More than 80% of timer capacity has been occupied for n1n2 transfer failure notification |
| Summary | More than 80% of timer capacity has been occupied for n1n2 transfer failure notification |
| Severity | Major |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2TransferFailure"})/360000) * 100 > 80 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.107 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: This alert indicates sustained high utilization of the UE N1N2 Transfer Failure Notification timer pool. The occnp_timer_capacity gauge tracks the current number of outstanding timers per timerName, updated every timer scan. - These timers are created when the UE cannot deliver URSP rules and the system initiates a reattempt flow using backoff with a timer. High utilization suggests many failures are triggering the N1N2 transfer failure notification flow. - The alert notifies when utilization for timerName "UE_N1N2TransferFailure" exceeds 80% of a baseline capacity of 360000. Dimensions: timerName : UE_N1N2TransferFailure namespace : as per Prometheus label used in aggregation siteId : underlying metric label; rule aggregates with max by (namespace) Diagnostic Information :
Recovery :
|
8.1.2.24 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_CRITICAL_THRESHOLD
Table 8-157 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_CRITICAL_THRESHOLD |
| Description | More than 90% of timer capacity has been occupied for n1n2 transfer failure notification |
| Summary | More than 90% of timer capacity has been occupied for n1n2 transfer failure notification |
| Severity | Critical |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2TransferFailure"})/360000) * 100 > 90 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.107 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: This alert indicates sustained high utilization of the UE N1N2 Transfer Failure Notification timer pool. The occnp_timer_capacity gauge tracks the current number of outstanding timers per timerName, updated every timer scan. - These timers are created when the UE cannot deliver URSP rules and the system initiates a reattempt flow using backoff with a timer. High utilization suggests many failures are triggering the N1N2 transfer failure notification flow. - The alert notifies when utilization for timerName "UE_N1N2TransferFailure" exceeds 90% of a baseline capacity of 360000. Dimensions: timerName : UE_N1N2TransferFailure namespace : as per Prometheus label used in aggregation siteId : underlying metric label; rule aggregates with max by (namespace) Diagnostic Information :
Recovery :
|
8.1.2.25 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MINOR_THRESHOLD
Table 8-158 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MINOR_THRESHOLD |
| Description | More than 70% of timers capacity has been occupied for amf discovery. |
| Summary | More than 70% of timers capacity has been occupied for amf discovery. |
| Severity | Minor |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_AMFDiscovery"})/360000) * 100 > 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.95 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: More than 70% of timer capacity has been occupied for AMF discovery.
The Diagnostic Information:
Recovery:
|
8.1.2.26 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MAJOR_THRESHOLD
Table 8-159 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MAJOR_THRESHOLD |
| Description | More than 80% of timer capacity has been occupied for amf discovery. |
| Summary | More than 80% of timer capacity has been occupied for amf discovery. |
| Severity | Major |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_AMFDiscovery"})/360000) * 100 > 80 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.95 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: More than 80% of timer capacity has been occupied for AMF discovery.
The Diagnostic Information:
Recovery:
|
8.1.2.27 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_CRITICAL_THRESHOLD
Table 8-160 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_CRITICAL_THRESHOLD |
| Description | More than 90% of timer capacity has been occupied for amf discovery. |
| Summary | More than 90% of timer capacity has been occupied for amf discovery. |
| Severity | Critical |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_AMFDiscovery"})/360000) * 100 > 90 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.95 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: More than 90% of timer capacity has been occupied for AMF discovery.
The Diagnostic Information:
Recovery:
|
8.1.2.28 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MINOR_THRESHOLD
Table 8-161 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MINOR_THRESHOLD |
| Description | More than 70% of timer capacity has been occupied for n1n2 subscribe. |
| Summary | More than 70% of timer capacity has been occupied for n1n2 subscribe. |
| Severity | Minor |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageSubscribe"})/360000) * 100 > 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.96 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: More than 70% of timer capacity has been occupied for N1N2
subscription. The Diagnostic Information:
Recovery:
|
8.1.2.29 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MAJOR_THRESHOLD
Table 8-162 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MAJOR_THRESHOLD |
| Description | More than 80% of timer capacity has been occupied for n1n2 subscribe. |
| Summary | More than 80% of timer capacity has been occupied for n1n2 subscribe. |
| Severity | Major |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageSubscribe"})/360000) * 100 > 80 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.96 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: More than 80% of timer capacity has been occupied for N1N2
subscription. The Diagnostic Information:
Recovery:
|
8.1.2.30 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_CRITICAL_THRESHOLD
Table 8-163 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_CRITICAL_THRESHOLD |
| Description | More than 90% of timer capacity has been occupied for n1n2 subscribe. |
| Summary | More than 90% of timer capacity has been occupied for n1n2 subscribe. |
| Severity | Critical |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageSubscribe"})/360000) * 100 > 90 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.96 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: More than 90% of timer
capacity has been occupied for N1N2 subscription. The
Diagnostic Information:
Recovery:
|
8.1.2.31 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MINOR_THRESHOLD
Table 8-164 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MINOR_THRESHOLD |
| Description | More than 70% of timer capacity has been occupied for n1n2 transfer. |
| Summary | More than 70% of timer capacity has been occupied for n1n2 transfer. |
| Severity | Minor |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageTransfer"})/360000) * 100 > 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.97 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: More
than 70% of timer capacity has been occupied for N1N2 transfer. The
Diagnostic Information:
Recovery:
|
8.1.2.32 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MAJOR_THRESHOLD
Table 8-165 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MAJOR_THRESHOLD |
| Description | More than 80% of timer capacity has been occupied for n1n2 transfer. |
| Summary | More than 80% of timer capacity has been occupied for n1n2 transfer. |
| Severity | Major |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageTransfer"})/360000) * 100 > 80 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.97 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: More
than 80% of timer capacity has been occupied for N1N2 transfer. The
Diagnostic Information:
Recovery:
|
8.1.2.33 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_CRITICAL_THRESHOLD
Table 8-166 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_CRITICAL_THRESHOLD |
| Description | More than 90% of timer capacity has been occupied for n1n2 transfer. |
| Summary | More than 90% of timer capacity has been occupied for n1n2 transfer. |
| Severity | Critical |
| Expression | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageTransfer"})/360000) * 100 > 90 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.97 |
| Metric Used | occnp_timer_capacity |
| Recommended Actions |
The Cause: More
than 90% of timer capacity has been occupied for N1N2 transfer. The
Diagnostic Information:
Recovery:
|
8.1.2.34 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Table 8-167 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD |
| Description | More than 25% of n1n2 subscribe reattempt failed. |
| Summary | More than 25% of n1n2 subscribe reattempt failed. |
| Severity | Minor |
| Expression | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",operationType="subscribe",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",operationType="subscribe"}[5m]))) * 100 > 25 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.99 |
| Metric Used | http_out_conn_response_total, http_out_conn_request_total |
| Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 subscribe. If there is
an increase of failure, operator can revise the reason why the flow
triggering n1n2 subscription is failing or if the AMF that request are
going to is unhealthy.
Cause: An elevated percentage of
reattempt failures has been detected for UE N1N2 subscriptions. The
Diagnostic Information:
Recovery:
|
8.1.2.35 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Table 8-168 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD |
| Description | More than 50% of n1n2 subscribe reattempt failed. |
| Summary | More than 50% of n1n2 subscribe reattempt failed. |
| Severity | Major |
| Expression | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",operationType="subscribe",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",operationType="subscribe"}[5m]))) * 100 > 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.99 |
| Metric Used | http_out_conn_response_total, http_out_conn_request_total |
| Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 subscribe.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 subscription is failing or if the AMF that request are
going to is unhealthy.
Cause: An elevated percentage of
reattempt failures has been detected for UE N1N2 subscriptions. The
Diagnostic Information: AMF (Access and Mobility Management Function) Unavailability or Instability: The target AMF may be experiencing outages, heavy load, or is otherwise unhealthy, causing it to reject or fail to respond to subscription requests. Network Issues or Communication Failures: Network congestion, routing problems, or transient communication errors may prevent successful delivery of N1N2 subscription requests or receipt of responses. Configuration Errors: Misconfiguration of endpoints (such as incorrect URLs, authentication, or authorization settings) may cause subscription requests to be rejected or fail. High Load or Resource Exhaustion: If the AMF or intermediate network components are overloaded or have run out of necessary resources (e.g., memory, threads, process slots), reattempted requests may be rejected.
Recovery:
|
8.1.2.36 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Table 8-169 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD |
| Description | More than 75% of n1n2 subscribe reattempt failed. |
| Summary | More than 75% of n1n2 subscribe reattempt failed. |
| Severity | Critical |
| Expression | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",operationType="subscribe",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",operationType="subscribe"}[5m]))) * 100 > 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.99 |
| Metric Used | http_out_conn_response_total, http_out_conn_request_total |
| Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 subscribe.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 subscription is failing or if the AMF that request are
going to is unhealthy.
Cause: An elevated percentage of
reattempt failures has been detected for UE N1N2 subscriptions. The
Diagnostic Information:
Recovery:
|
8.1.2.37 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Table 8-170 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD |
| Description | More than 25% of n1n2 transfer reattempt failed. |
| Summary | More than 25% of n1n2 transfer reattempt failed. |
| Severity | Minor |
| Expression | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer"}[5m]))) * 100 > 25 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.100 |
| Metric Used | http_out_conn_response_total, http_out_conn_request_total |
| Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 transfer.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 message transfer is failing or if the AMF that request
are going to is unhealthy.
Cause: An increased
percentage of reattempt failures has been detected for UE N1N2
message transfers. The Diagnostic Information:
Recovery:
|
8.1.2.38 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Table 8-171 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD |
| Description | More than 50% of n1n2 transfer reattempt failed. |
| Summary | More than 50% of n1n2 transfer reattempt failed. |
| Severity | Major |
| Expression | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer"}[5m]))) * 100 > 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.100 |
| Metric Used | http_out_conn_response_total, http_out_conn_request_total |
| Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 transfer.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 message transfer is failing or if the AMF that request
are going to is unhealthy.
Cause: An increased percentage
of reattempt failures has been detected for UE N1N2 message
transfers. The Diagnostic Information:
Recovery:
|
8.1.2.39 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Table 8-172 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD |
| Description | More than 75% of n1n2 transfer reattempt failed. |
| Summary | More than 75% of n1n2 transfer reattempt failed. |
| Severity | Critical |
| Expression | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer"}[5m]))) * 100 > 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.100 |
| Metric Used | http_out_conn_response_total, http_out_conn_request_total |
| Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 transfer.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 message transfer is failing or if the AMF that request
are going to is unhealthy.
Cause: An increased percentage
of reattempt failures has been detected for UE N1N2 message
transfers. The Diagnostic Information:
Recovery:
|
8.1.2.40 SM_STALE_REQUEST_PROCESSING_REJECT_MINOR
Table 8-173 SM_STALE_REQUEST_PROCESSING_REJECT_MINOR
| Field | Details |
|---|---|
| Name in Alert Yaml File | SM_STALE_REQUEST_PROCESSING_REJECT_MINOR |
| Description |
More than 10% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
| Summary |
More than 10% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
| Severity | Minor |
| Expression |
(sum by (namespace,pod) (rate(occnp_late_processing_rejection_total{microservice=~"occnp_pcf_sm"}[5m])))/(sum by (namespace,pod) (rate(ocpm_ingress_request_total{microservice=~"occnp_pcf_sm"}[5m]))) * 100 >= 10 < 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.101 |
| Metric Used | occnp_late_processing_rejection_total, ocpm_ingress_request_total |
| Recommended Actions | The metric occnp_late_processing_rejection_total
is pegged when Late Processing finds a stale
session.
Cause: The metric occnp_late_processing_rejection_total is incremented when the SM Service determines that a request has become stale. For example, if a request includes the following header parameters:
In this scenario, if there is a delay in receiving a response from the external Network Function (NF), a stale check is later performed. If the request is deemed stale during this check, it is counted in the metric. When more than 10% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT, then this alarm will be raised.
Diagnostic Information:
Recovery: Once the recommended diagnostic actions are implemented and responses from the external NF are received within the expected timeframe, the percentage of rejected messages will begin to decline, ultimately clearing the alert. |
8.1.2.41 SM_STALE_REQUEST_PROCESSING_REJECT_MAJOR
Table 8-174 SM_STALE_REQUEST_PROCESSING_REJECT_MAJOR
| Field | Details |
|---|---|
| Name in Alert Yaml File | SM_STALE_REQUEST_PROCESSING_REJECT_MAJOR |
| Description |
More than 20% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
| Summary |
More than 20% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
| Severity | Major |
| Expression |
(sum by (namespace,pod) (rate(occnp_late_processing_rejection_total{microservice=~"occnp_pcf_sm"}[5m])))/(sum by (namespace,pod) (rate(ocpm_ingress_request_total{microservice=~"occnp_pcf_sm"}[5m]))) * 100 >= 20 < 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.101 |
| Metric Used | occnp_late_processing_rejection_total, ocpm_ingress_request_total |
| Recommended Actions | The metric occnp_late_processing_rejection_total
is pegged when Late Processing finds a stale
session.
Cause: The metric occnp_late_processing_rejection_total is incremented when the SM Service determines that a request has become stale. For example, if a request includes the following header parameters:
In this scenario, if there is a delay in receiving a response from the external Network Function (NF), a stale check is later performed. If the request is deemed stale during this check, it is counted in the metric. When more than 20% and less than 30% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT, then this alarm will be raised. Diagnostic Information:
Recovery: Once the recommended diagnostic actions are implemented and responses from the external NF are received within the expected timeframe, the percentage of rejected messages will begin to decline, ultimately clearing the alert. |
8.1.2.42 SM_STALE_REQUEST_PROCESSING_REJECT_CRITICAL
Table 8-175 SM_STALE_REQUEST_PROCESSING_REJECT_CRITICAL
| Field | Details |
|---|---|
| Name in Alert Yaml File | SM_STALE_REQUEST_PROCESSING_REJECT_CRITICAL |
| Description |
More than 30% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
| Summary |
More than 30% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
| Severity | Critical |
| Expression |
(sum by (namespace,pod) (rate(occnp_late_processing_rejection_total{microservice=~"occnp_pcf_sm"}[5m])))/(sum by (namespace,pod) (rate(ocpm_ingress_request_total{microservice=~"occnp_pcf_sm"}[5m]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.101 |
| Metric Used | occnp_late_processing_rejection_total, ocpm_ingress_request_total |
| Recommended Actions | The metric occnp_late_processing_rejection_total
is pegged when Late Processing finds a stale
session.
Cause: The metric occnp_late_processing_rejection_total is incremented when the SM Service determines that a request has become stale. For example, if a request includes the following header parameters:
In this scenario, if there is a delay in receiving a response from the external Network Function (NF), a stale check is later performed. If the request is deemed stale during this check, it is counted in the metric. When more than 30% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT, then this alarm will be raised. Diagnostic Information:
Recovery: Once the recommended diagnostic actions are implemented and responses from the external NF are received within the expected timeframe, the percentage of rejected messages will begin to decline, ultimately clearing the alert. |
8.1.2.43 UE_STALE_REQUEST_PROCESSING_REJECT_MAJOR
Table 8-176 UE_STALE_REQUEST_PROCESSING_REJECT_MAJOR
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
| Summary | This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
| Severity | Major |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.104 |
| Metric Used | occnp_late_processing_rejection_total |
| Recommended Actions | Metric
occnp_late_processing_rejection_total is pegged
when requests being processed become
stale.Cause: More than 20% of incoming requests to the ue-service have been rejected because they became stale during processing. The service flags a request as stale when its processing exceeds an acceptable time window. Diagnostic Information:
Recovery:
|
8.1.2.44 UE_STALE_REQUEST_PROCESSING_REJECT_CRITICAL
Table 8-177 UE_STALE_REQUEST_PROCESSING_REJECT_CRITICAL
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 30% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
| Summary | This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.104 |
| Metric Used | occnp_late_processing_rejection_total |
| Recommended Actions | Metric
occnp_late_processing_rejection_total is pegged
when requests being processed become
stale.Cause: More than 30% of incoming requests to the ue-service have been rejected because they became stale during processing. The service flags a request as stale when its processing exceeds an acceptable time window. Diagnostic Information:
Recovery:
|
8.1.2.45 UE_STALE_REQUEST_PROCESSING_REJECT_MINOR
Table 8-178 UE_STALE_REQUEST_PROCESSING_REJECT_MINOR
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
| Summary | This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 10 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.104 |
| Metric Used | occnp_late_processing_rejection_total |
| Recommended Actions | Metric
occnp_late_processing_rejection_total is pegged
when requests being processed become
stale.Cause: More than 10% of incoming requests to the ue-service have been rejected because they became stale during processing. The service flags a request as stale when its processing exceeds an acceptable time window. Diagnostic Information:
Recovery:
|
8.1.2.46 UE_STALE_REQUEST_ARRIVAL_REJECT_MINOR
Table 8-179 UE_STALE_REQUEST_ARRIVAL_REJECT_MINOR
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
| Summary | This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(ocpm_late_arrival_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace)(rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 10 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.109 |
| Metric Used | ocpm_late_arrival_rejection_total |
| Recommended Actions | Metric ocpm_late_arrival_rejection_total
is pegged when a received requests is
stale.
Cause: Metric ocpm_late_arrival_rejection_total is pegged when a received requests is stale.
Diagnostic Information:
Recovery:
|
8.1.2.47 UE_STALE_REQUEST_ARRIVAL_REJECT_MAJOR
Table 8-180 UE_STALE_REQUEST_ARRIVAL_REJECT_MAJOR
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
| Summary | This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
| Severity | Major |
| Expression | (sum by (namespace) (rate(ocpm_late_arrival_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace)(rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 20 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.109 |
| Metric Used | ocpm_late_arrival_rejection_total |
| Recommended Actions | Metric ocpm_late_arrival_rejection_total
is pegged when a received requests is
stale.
Cause: Metric ocpm_late_arrival_rejection_total is pegged when a received requests is stale.
Diagnostic Information:
Recovery:
|
8.1.2.48 UE_STALE_REQUEST_ARRIVAL_REJECT_CRITICAL
Table 8-181 UE_STALE_REQUEST_ARRIVAL_REJECT_CRITICAL
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 30% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
| Summary | This alert is triggered when more than 30% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(ocpm_late_arrival_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace)(rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 30 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.109 |
| Metric Used | ocpm_late_arrival_rejection_total |
| Recommended Actions | Metric ocpm_late_arrival_rejection_total
is pegged when a received requests is stale.
Cause: Metric ocpm_late_arrival_rejection_total is pegged when a received requests is stale.
Diagnostic Information:
Recovery:
|
8.1.2.49 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Table 8-182 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD |
| Description | More than 75% of N1N2 transfer failure notification reattempts failed. |
| Summary | More than 75% of N1N2 transfer failure notification reattempts failed. |
| Severity | Critical |
| Expression | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer"}[5m]))) * 100 > 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.106 |
| Metric Used | http_out_conn_response_total, http_out_conn_request_total |
| Recommended Actions |
The
http_out_conn_response_total metric is pegged
when PCF-UE receives a response from a message that is going out of
the NF. Then in this case the alert notifies when there is a certain
amount of reattempt failure for UE N1N2 transfer failure
notification. If there is an increase of failure, operator can
investigate on:
Cause: http_out_conn_response_total metric with indicated dimensions is pegged when PCF-UE receives a response for an outgoing reattempt transfer request triggered for N1N2TransferFailure notification. Dimensions: IsReattempt : true reattemptType : UE_N1N2TransferFailure OperationType : transfer ResponseCode : !2xx
In this case more than 75% of outgoing transfer reattempts (due to N1N2TransferFailure as notified by AMF) receive a non-2xx (failure) response in the last 5 minutes (or selected sample frame)
Diagnostic Information :
Recovery :
|
8.1.2.50 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Table 8-183 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD |
| Description | More than 50% of N1N2 transfer failure notification reattempts failed. |
| Summary | More than 50% of N1N2 transfer failure notification reattempts failed. |
| Severity | Major |
| Expression | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer"}[5m]))) * 100 > 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.106 |
| Metric Used | http_out_conn_response_total, http_out_conn_request_total |
| Recommended Actions |
The
http_out_conn_response_total metric is pegged
when PCF-UE receives a response from a message that is going out of
the NF. Then in this case the alert notifies when there is a certain
amount of reattempt failure for UE N1N2 transfer failure
notification. If there is an increase of failure, operator can
investigate on:
Cause: http_out_conn_response_total metric with indicated dimensions is pegged when PCF-UE receives a response for an outgoing reattempt transfer request triggered for N1N2TransferFailure notification. Dimensions: IsReattempt : true reattemptType : UE_N1N2TransferFailure OperationType : transfer ResponseCode : !2xx
In this case more than 50% of outgoing transfer reattempts (due to N1N2TransferFailure as notified by AMF) receive a non-2xx (failure) response in the last 5 minutes (or selected sample frame)
Diagnostic Information :
Recovery :
|
8.1.2.51 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Table 8-184 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD |
| Description | More than 25% of N1N2 transfer failure notification reattempts failed. |
| Summary | More than 25% of N1N2 transfer failure notification reattempts failed. |
| Severity | Minor |
| Expression | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer"}[5m]))) * 100 > 25 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.106 |
| Metric Used | http_out_conn_response_total, http_out_conn_request_total |
| Recommended Actions |
The
http_out_conn_response_total metric is pegged
when PCF-UE receives a response from a message that is going out of
the NF. Then in this case the alert notifies when there is a certain
amount of reattempt failure for UE N1N2 transfer failure
notification. If there is an increase of failure, operator can
investigate on:
Cause: http_out_conn_response_total metric with indicated dimensions is pegged when PCF-UE receives a response for an outgoing reattempt transfer request triggered for N1N2TransferFailure notification. Dimensions: IsReattempt : true reattemptType : UE_N1N2TransferFailure OperationType : transfer ResponseCode : !2xx
In this case more than 25% of outgoing transfer reattempts (due to N1N2TransferFailure as notified by AMF) receive a non-2xx (failure) response in the last 5 minutes (or selected sample frame)
Diagnostic Information :
Recovery :
|
8.1.2.52 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Table 8-185 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD |
| Description | More than 75% of amf discovery reattempts failed. |
| Summary | More than 75% of amf discovery reattempts failed. |
| Severity | Critical |
| Expression | (sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_response_total{operationType="timer_expiry_notification",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_request_total{operationType="timer_expiry_notification"}[5m]))) * 100 > 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.105 |
| Metric Used | occnp_ue_nf_discovery_reattempt_response_total |
| Recommended Actions | The
occnp_ue_nf_discovery_reattempt_response_total
metric is pegged when PCF-UE receives a response from a message that is
going out of the NF. Then in this case, the alert notifies when there is
a certain number of reattempt failure while discovering AMF. If there is
an increase of failure, operator can investigate on:
Cause: The main cause of the
Diagnostic Information:
Recovery:
|
8.1.2.53 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Table 8-186 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD |
| Description | More than 50% of amf discovery reattempts failed. |
| Summary | More than 50% of amf discovery reattempts failed. |
| Severity | Major |
| Expression | (sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_response_total{operationType="timer_expiry_notification",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_request_total{operationType="timer_expiry_notification"}[5m]))) * 100 > 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.105 |
| Metric Used | occnp_ue_nf_discovery_reattempt_response_total |
| Recommended Actions | The
occnp_ue_nf_discovery_reattempt_response_total
metric is pegged when PCF-UE receives a response from a message that is
going out of the NF. Then in this case, the alert notifies when there is
a certain number of reattempt failure while discovering AMF. If there is
an increase of failure, operator can investigate on:
Cause: The main cause of the
Diagnostic Information:
Recovery:
|
8.1.2.54 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Table 8-187 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD |
| Description | More than 25% of amf discovery reattempts failed. |
| Summary | More than 25% of amf discovery reattempts failed. |
| Severity | Minor |
| Expression | (sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_response_total{operationType="timer_expiry_notification",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_request_total{operationType="timer_expiry_notification"}[5m]))) * 100 > 25 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.105 |
| Metric Used | occnp_ue_nf_discovery_reattempt_response_total |
| Recommended Actions | The
occnp_ue_nf_discovery_reattempt_response_total
metric is pegged when PCF-UE receives a response from a message that is
going out of the NF. Then, in this case the alert notifies when there is
a certain number of reattempt failure while discovering AMF. If there is
an increase of failure, operator can investigate on:
Cause: The main cause of the
Diagnostic Information:
Recovery:
|
8.1.2.55 INGRESS_ERROR_RATE_ABOVE_10_PERCENT_PER_POD
Table 8-188 INGRESS_ERROR_RATE_ABOVE_10_PERCENT_PER_POD
| Field | Details |
|---|---|
| Name in Alert Yaml File | IngressErrorRateAbove10PercentPerPod |
| Description | Ingress Error Rate above 10 Percent in {{$labels.kubernetes_name}} in {{$labels.kubernetes_namespace}} |
| Summary | Transaction Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
| Severity | Critical |
| Expression | (sum
by(pod)(rate(ocpm_ingress_response_total{response_code!~"2.*"}[24h])
or (up * 0 ) )/sum by(pod)(rate(ocpm_ingress_response_total[24h]))) * 100>= 10 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.2 |
| Metric Used | ocpm_ingress_response_total |
| Recommended Actions | The alert gets cleared when the number of failed
transactions are below 10% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
Cause This alert fires when 10% or more of ingress (incoming) HTTP requests handled by any individual pod result in non-2xx (unsuccessful) responses, measured over a 1-day window. A high ingress error rate per pod suggests issues that could impact application availability, reliability, or user experience.
Common causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.56 SM_TRAFFIC_RATE_ABOVE_THRESHOLD
Table 8-189 SM_TRAFFIC_RATE_ABOVE_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | SMTrafficRateAboveThreshold |
| Description | SM service Ingress traffic Rate is above threshold of Max MPS (current value is: {{ $value }}) |
| Summary | Traffic Rate is above 90 Percent of Max requests per second |
| Severity | Major |
| Expression | The total SM service Ingress traffic rate has crossed
the configured threshold of 900 TPS.
Default value of this alert trigger point in PCF_Alertrules.yaml file is when SM service Ingress Rate crosses 90% of maximum ingress requests per second. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.3 |
| Metric Used | ocpm_ingress_request_total{servicename_3gpp="npcf-smpolicycontrol"} |
| Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels
can be configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
Cause: The metric
Diagnostic Information: Examine Current Rate: Query
Review Upstream Sources: Identify if request rates from any upstream SMF, AF, or TDF instances have increased. Inspect Application Logs: Check for Recovery:
SM_TRAFFIC_RATE_ABOVE_THRESHOLD alert.
For any additional guidance, contact My Oracle Support. |
8.1.2.57 SM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
Table 8-190 SM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
| Field | Details |
|---|---|
| Name in Alert Yaml File | SMIngressErrorRateAbove10Percent |
| Description | Transaction Error Rate detected above 10 Percent of Total on SM service (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
| Severity | Critical |
| Expression | The number of failed transactions is above 10 percent of the total transactions. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.4 |
| Metric Used | ocpm_ingress_response_total |
| Recommended Actions | The alert gets cleared when the number of failed
transactions are below 10% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
Cause This alert fires when more than 10%
of all HTTP responses returned by the SM Service
(
Diagnostic Information
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.58 SM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
Table 8-191 SM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
| Field | Details |
|---|---|
| Name in Alert Yaml File | SMEgressErrorRateAbove1Percent |
| Description | Egress Transaction Error Rate detected above 1 Percent of Total Transactions (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 1 Percent of Total Transactions |
| Severity | Minor |
| Expression | The number of failed transactions is above 1 percent of the total transactions. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.5 |
| Metric Used | system_operational_state == 1 |
| Recommended Actions | The alert gets cleared when the number of failed
transactions are below 1% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
Cause This alert fires when more than 1% of
all HTTP responses returned by the SM Service
(
Diagnostic Information
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.59 PCF_CHF_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
Table 8-192 PCF_CHF_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | PcfChfIngressTrafficRateAboveThreshold |
| Description | User service Ingress traffic Rate from CHF is above threshold of Max MPS (current value is: {{ $value }}) |
| Summary | Traffic Rate is above 90 Percent of Max requests per second |
| Severity | Major |
| Expression | The total User Service Ingress traffic rate from CHF has
crossed the configured threshold of 900 TPS.
Default value of this alert trigger point in PCF_Alertrules.yaml file is when user service Ingress Rate from CHF crosses 90% of maximum ingress requests per second. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.11 |
| Metric Used | ocpm_userservice_inbound_count_total{service_resource="chf-service"} |
| Recommended Actions |
Cause: The metric ocpm_userservice_inbound_count_total with dimension service_resource="chf-service" is incremented for every inbound HTTP request reaching the CHF connector service. If the 2-minute average exceeds 900 mps, this indicates that the system may be experiencing an overload or an abnormal spike in traffic. Diagnostic Information: Examine Current Rate: Query ocpm_userservice_inbound_count_total for service_resource="chf-service" to assess the current ingress traffic rate. Review Upstream Sources: Identify if request rates from any upstream CHF, SMF, AMF instances have increased. Inspect Application Logs: Check for
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.60 PCF_CHF_EGRESS_ERROR_RATE_ABOVE_10_PERCENT
Table 8-193 PCF_CHF_EGRESS_ERROR_RATE_ABOVE_10_PERCENT
| Field | Details |
|---|---|
| Name in Alert Yaml File | PcfChfEgressErrorRateAbove10Percent |
| Description | The number of failed transactions from CHF is more than 10 percent of the total transactions. |
| Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
| Severity | Critical |
| Expression |
(sum(rate(ocpm_chf_tracking_response_total {servicename_3gpp="nchf-spendinglimitcontrol",response_code!~"2.*"} [24h]) or (up * 0 ) ) / sum(rate(ocpm_chf_tracking_response_total {servicename_3gpp="nchf-spendinglimitcontrol"} [24h]))) 100 >= 10 |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.12 |
| Metric Used | ocpm_chf_tracking_response_total |
| Recommended Actions | The alert gets cleared when the number of failure
transactions falls below the configured threshold.
Note: Threshold levels can be configured using the
It is recommended to assess the reason for failed transactions.
Perform the following steps to analyze the cause of increased
traffic:
Cause: This alert fires when more than 10% of all HTTP responses for the PCF (CHF connector the PCF component that calls the external CHF via nchf-spendinglimitcontrol) over the past day are non-2xx (i.e., not successful). This may be due to:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.61 PCF_CHF_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD
Table 8-194 PCF_CHF_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | Ingress Timeout Error Rate detected above 10 Percent of Total towards CHF service (current value is: {{ $value }}) |
| Summary | Timeout Error Rate detected above 10 Percent of Total Transactions |
| Severity | Major |
| Expression | The number of failed transactions due to timeout is above 10 percent of the total transactions for CHF service. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.17 |
| Metric Used | ocpm_chf_tracking_request_timeout_total{servicename_3gpp="nchf-spendinglimitcontrol"} |
| Recommended Actions | The alert gets cleared when the number of failed
transactions due to timeout are below 10% of the total transactions.
To assess the reason for failed transactions, perform
the following steps:
Cause: This alert is triggered when more
than 10% of all inbound requests from PCF (Policy Control Function)
to the CHF (
Common causes include:
Diagnostic Information:
Recovery:
Alert Resolution: This alert will auto-resolve once the ingress timeout error rate drops below 10% of total requests to CHF over the evaluation window. For any additional guidance, contact My Oracle Support. |
8.1.2.62 PCF_PENDING_BINDING_SITE_TAKEOVER
Table 8-195 PCF_PENDING_BINDING_SITE_TAKEOVER
| Field | Details |
|---|---|
| Description | The site takeover configuration has been activated |
| Summary | The site takeover configuration has been activated |
| Severity | CRITICAL |
| Expression | sum by (application, container, namespace) (changes(occnp_pending_binding_site_takeover[2m])) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.45 |
| Metric Used | occnp_pending_binding_site_takeover |
| Recommended Actions |
Cause: This alert fires when the site takeover functionality is engaged to handle geo-redundancy scenarios. Site takeover is typically activated when a site in a distributed PCF deployment is down or unreachable, empowering another site to process that site’s pending binding operations for service continuity.
Diagnostic Information:
Recovery & Actions:
Alert Resolution: The alert will auto-resolve once there are no new site takeover events, and the takeover configuration is deactivated or no longer required. For any additional guidance, contact My Oracle Support. |
8.1.2.63 PCF_PENDING_BINDING_THRESHOLD_LIMIT_REACHED
Table 8-196 PCF_PENDING_BINDING_THRESHOLD_LIMIT_REACHED
| Field | Details |
|---|---|
| Description | The Pending Operation table threshold has been reached. |
| Summary | The Pending Operation table threshold has been reached. |
| Severity | CRITICAL |
| Expression | sum by (application, container, namespace) (changes(occnp_threshold_limit_reached_total[2m])) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.46 |
| Metric Used | occnp_threshold_limit_reached_total |
| Recommended Actions |
Cause This alert fires when the number of records in the Pending Operation table (to reattempt binding registration in BSF at a later time) reaches a predefined threshold. This means the system’s retry or pending queue for binding operations is saturated and may be at risk of delaying, or failing new operations. Exceeding this threshold typically signals that retry or binding registrations are not clearing at an expected rate.
Common causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.64 PCF_PENDING_BINDING_RECORDS_COUNT
Table 8-197 PCF_PENDING_BINDING_RECORDS_COUNT
| Field | Details |
|---|---|
| Description | An attempt to internally recreate a PCF binding has been triggered by PCF |
| Summary | An attempt to internally recreate a PCF binding has been triggered by PCF |
| Severity | MINOR |
| Expression | sum by (application, container, namespace) (changes(occnp_pending_operation_records_count[10s])) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.47 |
| Metric Used | occnp_pending_operation_records_count |
| Recommended Actions |
Cause This alert fires when a new pending binding operation is inserted into the system by the SM Service(to reattempt binding registration in BSF at a later time). This typically happens when the BSF reattempt settings are configured and the response from BSF to a binding registration indicates an error condition that requires a retry (as per pre-configured error codes).
Common causes for entries in the PendingOperation table include:Common causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.65 AUTONOMOUS_SUBSCRIPTION_FAILURE
Table 8-198 AUTONOMOUS_SUBSCRIPTION_FAILURE
| Field | Details |
|---|---|
| Description | Autonomous subscription failed for a configured Slice Load Level |
| Summary | Autonomous subscription failed for a configured Slice Load Level |
| Severity | Critical |
| Expression | The number of failed Autonomous Subscription for a configured Slice Load Level in nwdaf-agent is greater than zero. |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.49 |
| Metric Used | subscription_failure{requestType="autonomous"} |
| Recommended Actions | The alert gets cleared when the failed Autonomous
Subscription is corrected.
To clear the alert, perform the
following steps:
Cause: This alert activates when there is at least one autonomous subscription (such as the NWDAF event subscription process) failure detected for a given S-NSSAI, indicating that the system was unable to successfully initiate or maintain a subscription for a specific network slice. Common causes may include:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.66 AM_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT
Table 8-199 AM_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT
| Field | Details |
|---|---|
| Description | AM Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }}) |
| Summary | AM Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }}) |
| Severity | MINOR |
| Expression | (sum(rate(http_out_conn_response_total{pod=~".*amservice.*",responseCode!~"2.*",servicename3gpp="npcf-am-policy-control"}[1d])) / sum(rate(http_out_conn_response_total{pod=~".*amservice.*",servicename3gpp="npcf-am-policy-control"}[1d]))) * 100 >= 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.54 |
| Metric Used | http_out_conn_response_total |
| Recommended Actions |
Cause This alert triggers when 1% or more
of notification requests sent from the AM service (part of PCF) to
the AMF (
endpoint)
result in non-2xx (unsuccessful) responses over a 1-day window.
These notifications inform AMF about access or mobility events. A
significant portion of errors could be 404 responses, which occur
when AMF does not have the corresponding session in its context.
This may indicate attempts to notify AMF about sessions that have
already ended or were never established.
Other possible causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.67 AM_AR_ERROR_RATE_ABOVE_1_PERCENT
Table 8-200 AM_AR_ERROR_RATE_ABOVE_1_PERCENT
| Field | Details |
|---|---|
| Description | Alternate Routing Error Rate detected above 1 Percent of Total on AM Service (current value is: {{ $value }}) |
| Summary | Alternate Routing Error Rate detected above 1 Percent of Total on AM Service (current value is: {{ $value }}) |
| Severity | MINOR |
| Expression | (sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*amservice.*",responseCode!~"2.*",servicename3gpp="npcf-am-policy-control"}[1d])) / sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*amservice.*",servicename3gpp="npcf-am-policy-control"}[1d]))) * 100 >= 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.55 |
| Metric Used | ocpm_ar_response_total |
| Recommended Actions |
Cause This alert fires when 1% or more of
alternate routing (AR) requests initiated by the AM service (as part
of PCF) to AMF (
)
result in non-2xx (unsuccessful) responses over a 1-day window,
grouped by FQDN. Alternate routing is the process of retrying the original request to a different AMF instance when the initial attempt fails. A rising AR error rate suggests persistent issues with connectivity, service health, or configuration for primary or alternate AMF endpoints.
Typical causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.68 UE_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT
Table 8-201 UE_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT
| Field | Details |
|---|---|
| Description | UE Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }}) |
| Summary | UE Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }}) |
| Severity | MINOR |
| Expression | (sum(rate(http_out_conn_response_total{pod=~".*ueservice.*",responseCode!~"2.*",servicename3gpp="npcf-ue-policy-control"}[1d])) / sum(rate(http_out_conn_response_total{pod=~".*ueservice.*",servicename3gpp="npcf-ue-policy-control"}[1d]))) * 100 >= 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.56 |
| Metric Used | http_out_conn_response_total |
| Recommended Actions |
Cause This alert triggers when 1% or more of notification requests sent from the UE service (part of PCF) to the AMF (npcf-ue-policy-control endpoint) result in non-2xx (unsuccessful) responses over a 1-day window. These notifications inform AMF about UE policy events. A significant portion of errors could be 404 responses, which occur when AMF does not have the corresponding session in its context. This may indicate attempts to notify AMF about sessions that have already ended or were never established.
Other possible causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.69 UE_AR_FAILURE_RATE_ABOVE_1_PERCENT
Table 8-202 UE_AR_FAILURE_RATE_ABOVE_1_PERCENT
| Field | Details |
|---|---|
| Description | Alternate Routing Error Rate detected above 1 Percent of Total on UE Service (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 1 Percent of Total Transactions on UE Alternate Routing |
| Severity | MINOR |
| Expression | (sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*ueservice.*",responseCode!~"2.*",servicename3gpp="npcf-ue-policy-control"}[1d])) / sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*ueservice.*",servicename3gpp="npcf-ue-policy-control"}[1d]))) * 100 >= 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.57 |
| Metric Used | ocpm_ar_response_total |
| Recommended Actions |
Cause This alert fires when 1% or more of alternate routing (AR) requests initiated by the AM service (as part of PCF) to AMF (npcf-ue-policy-control) result in non-2xx (unsuccessful) responses over a 1-day window, grouped by FQDN. Alternate routing is the process of retrying the original request to a different AMF instance when the initial attempt fails. A rising AR error rate suggests persistent issues with connectivity, service health, or configuration for primary or alternate AMF endpoints.
Typical causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.70 SMSC_CONNECTION_DOWN
Table 8-203 SMSC_CONNECTION_DOWN
| Field | Details |
|---|---|
| Description | Connection to SMSC peer {{$labels.smscName}} is down in notifier service pod {{$labels.pod}} |
| Summary | Connection to SMSC peer {{$labels.smscName}} is down in notifier service pod {{$labels.pod}} |
| Severity | MAJOR |
| Expression | sum by(namespace, pod, smscName)(occnp_active_smsc_conn_count) == 0 |
| OID |
1.3.6.1.4.1.323.5.3.52.1.2.63 |
| Metric Used | occnp_active_smsc_conn_count |
| Recommended Actions |
Cause This alert fires when the
connection count to a specific SMSC (Short Message Service Center)
peer (
Common causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.71 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD
Table 8-204 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | lockAcquisitionExceedsMinorThreshold |
| Description | The lock requests fails to acquire the lock count exceeds the minor threshold limit. The (current value is: {{ $value }}) |
| Summary | Keys used in Bulwark lock request which are already in locked state detected above 20 Percent of Total Transactions. |
| Severity | Minor |
| Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=20 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
| Metric Used | lock_request_total |
| Recommended Actions |
Cause This alert fires when, within a 5-minute window, between 20% and 50% of lock acquisition requests (acquireLock) to the Bulwark service in any namespace fail. Elevated lock acquisition failure rates may indicate:
Diagnostic Information
Recovery
|
8.1.2.72 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD
Table 8-205 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | lockAcquisitionExceedsMajorThreshold |
| Description | The lock requests fails to acquire the lock count exceeds the major threshold limit. The (current value is: {{ $value }}) |
| Summary | Keys used in Bulwark lock request which are already in locked state detected above 50 Percent of Total Transactions. |
| Severity | Major |
| Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >= 50 < 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
| Metric Used | lock_request_total |
| Recommended Actions |
Cause This alert fires when, within a 5-minute window, between 50% and 75% of lock acquisition requests (acquireLock) to the Bulwark service in any namespace fail. Elevated lock acquisition failure rates may indicate:
Diagnostic Information
Recovery
|
8.1.2.73 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD
Table 8-206 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Name in Alert Yaml File | lockAcquisitionExceedsCriticalThreshold |
| Description | The lock requests fails to acquire the lock count exceeds the critical threshold limit. The (current value is: {{ $value }}) |
| Summary | Keys used in Bulwark lock request which are already in locked state detected above 75 Percent of Total Transactions. |
| Severity | Critical |
| Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
| Metric Used | lock_request_total |
| Recommended Actions |
Cause This alert fires when, within a 5-minute window, above 75% of lock acquisition requests (acquireLock) to the Bulwark service in any namespace fail. Elevated lock acquisition failure rates may indicate:
Diagnostic Information
Recovery
|
8.1.2.74 SM_UPDATE_NOTIFY_FAILED_ABOVE_50_PERCENT
Table 8-207 SM_UPDATE_NOTIFY_FAILED_ABOVE_50_PERCENT
| Field | Details |
|---|---|
| Description | Update Notify Terminate sent to SMF failed >= 50 < 60 |
| Summary | Update Notify Terminate sent to SMF failed >= 50 < 60 |
| Severity | MINOR |
| Expression | (sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol",responseCode!~"2.*"})*100)/ sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol"}) >= 50 < 60 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.80 |
| Metric Used | occnp_http_out_conn_response_total |
| Recommended Actions |
Cause This alert fires when, over the
evaluation period, between 50% and 60% of
Other common causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.75 SM_UPDATE_NOTIFY_FAILED_ABOVE_60_PERCENT
Table 8-208 SM_UPDATE_NOTIFY_FAILED_ABOVE_60_PERCENT
| Field | Details |
|---|---|
| Description | Update Notify Terminate sent to SMF failed >= 60 < 70 |
| Summary | Update Notify Terminate sent to SMF failed >= 60 < 70 |
| Severity | MAJOR |
| Expression | (sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol",responseCode!~"2.*"})*100)/ sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol"}) >= 60 < 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.80 |
| Metric Used | occnp_http_out_conn_response_total |
| Recommended Actions |
Cause This alert fires when, over the
evaluation period, between 60% and 70% of
Other common causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.76 SM_UPDATE_NOTIFY_FAILED_ABOVE_70_PERCENT
Table 8-209 SM_UPDATE_NOTIFY_FAILED_ABOVE_70_PERCENT
| Field | Details |
|---|---|
| Description | Update Notify Terminate sent to SMF failed >= 70 |
| Summary | Update Notify Terminate sent to SMF failed >= 70 |
| Severity | CRITICAL |
| Expression | (sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol",responseCode!~"2.*"})*100)/ sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol"}) >= 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.80 |
| Metric Used | occnp_http_out_conn_response_total |
| Recommended Actions |
Cause This alert fires when, over the
evaluation period, above 70% of
Other common causes include:
Diagnostic Information
Recovery
For any additional guidance, contact My Oracle Support. |
8.1.2.77 UPDATE_NOTIFY_FAILURE_ABOVE_30_PERCENT
Table 8-210 UPDATE_NOTIFY_FAILURE_ABOVE_30_PERCENT
| Field | Details |
|---|---|
| Description | {{ $value }} % of update notify sent to SMF that failed. |
| Summary | More than 30% of update notify sent to SMF failed |
| Severity | minor |
| Expression | sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 30 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.94 |
| Metric Used | occnp_http_out_conn_response_total |
| Recommended Actions |
In this case the alert is notifying when there is a certain amount of update-notify failure coming from SMF. If there is an increase of update-notify failure operator can revise if all the flows that trigger update-notify are failing or analyze which flow is failing the most or if the SMF that request are going to is unhealthy.For any additional guidance, contact My Oracle Support. |
8.1.2.78 UPDATE_NOTIFY_FAILURE_ABOVE_50_PERCENT
Table 8-211 UPDATE_NOTIFY_FAILURE_ABOVE_50_PERCENT
| Field | Details |
|---|---|
| Description | Number of Update notify that failed is equal or above 50% but less than 70% in a given time period |
| Summary | Number of Update notify that failed is equal or above 50% but less than 70% in a given time period |
| Severity | MAJOR |
| Expression | (sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.94 |
| Metric Used | occnp_http_out_conn_response_total |
| Recommended Actions |
In this case the alert is notifying when there is a certain amount of update-notify failure coming from SMF. If there is an increase of update-notify failure operator can revise if all the flows that trigger update-notify are failing or analyze which flow is failing the most or if the SMF that request are going to is unhealthy. For any additional guidance, contact My Oracle Support. |
8.1.2.79 UPDATE_NOTIFY_FAILURE_ABOVE_70_PERCENT
Table 8-212 UPDATE_NOTIFY_FAILURE_ABOVE_70_PERCENT
| Field | Details |
|---|---|
| Description | {{ $value }} % of update notify sent to SMF that failed |
| Summary | More than 70% of update notify sent to SMF failed |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.94 |
| Metric Used | occnp_http_out_conn_response_total |
| Recommended Actions |
In this case the alert is notifying when there is a certain amount of update-notify failure coming from SMF. If there is an increase of update-notify failure operator can revise if all the flows that trigger update-notify are failing or analyze which flow is failing the most or if the SMF that request are going to is unhealthy. For any additional guidance, contact My Oracle Support. |
8.1.2.80 POD_PROTECTION_BY_RATELIMIT_REJECTED_REQUEST
Table 8-213 POD_PROTECTION_BY_RATELIMIT_REJECTED_REQUEST
| Field | Details |
|---|---|
| Description | Ingress Gateway traffic gets rejected more than 1% because of ratelimiting. |
| Summary | Ingress Gateway traffic gets rejected more than 1% because of ratelimiting. |
| Severity | Major |
| Expression | (sum by (namespace,pod) (rate(oc_ingressgateway_http_request_ratelimit_values_total {Allowed="false",app_kubernetes_io_name="occnp-ingress-gateway"}[2m])))/ (sum by (namespace,pod) (rate(oc_ingressgateway_http_request_ratelimit_values_total {app_kubernetes_io_name="occnp-ingress-gateway"}[2m]))) * 100 >= 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.103 |
| Metric Used | oc_ingressgateway_http_request_ratelimit_values_total |
| Recommended Actions |
Cause: Alert is triggered when percentage of denied requests is above 1% of total tps.. Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.81 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MINOR_THRESHOLD
Table 8-214 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 20 Percent of Total n1n2 notify Request. |
| Summary | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 20 Percent of Total n1n2 notify Request. |
| Severity | Minor |
| Expression | sum by (namespace) (rate(ue_n1_transfer_ue_notification_total{commandType="MANAGE_UE_POLICY_COMMAND_REJECT"}[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.91 |
| Metric Used | ue_n1_transfer_ue_notification_total |
| Recommended Actions |
The ue_n1_transfer_ue_notification_total metric is pegged when a fragment delivered by the PCF (pcf-ue service) is rejected by the UE (User Equipment). So, the operator needs to check on the AMF/UE side why these UPSI/URSP rules were rejected. For any additional guidance, contact My Oracle Support. |
8.1.2.82 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MAJOR_THRESHOLD
Table 8-215 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 50 Percent of Total n1n2 notify Request. |
| Summary | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 50 Percent of Total n1n2 notify Request. |
| Severity | Major |
| Expression | sum by (namespace) (rate(ue_n1_transfer_ue_notification_total{commandType="MANAGE_UE_POLICY_COMMAND_REJECT"}[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.91 |
| Metric Used | ue_n1_transfer_ue_notification_total |
| Recommended Actions | ue_n1_transfer_ue_notification_total metric is pegged
when fragment delivered by PCF (pcf-ue service) is rejected by UE (User
Equipment). So operator needs to check on AMF/UE side why these
UPSI/URSP rules were rejected.
For any additional guidance, contact My Oracle Support. |
8.1.2.83 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_CRITICAL_THRESHOLD
Table 8-216 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 75 Percent of Total n1n2 notify Request. |
| Summary | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 75 Percent of Total n1n2 notify Request. |
| Severity | CRITICAL |
| Expression | sum by (namespace) (rate(ue_n1_transfer_ue_notification_total{commandType="MANAGE_UE_POLICY_COMMAND_REJECT"}[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.91 |
| Metric Used | ue_n1_transfer_ue_notification_total |
| Recommended Actions | ue_n1_transfer_ue_notification_total metric is pegged
when fragment delivered by PCF (pcf-ue service) is rejected by UE (User
Equipment). So operator needs to check on AMF/UE side why these
UPSI/URSP rules were rejected.
For any additional guidance, contact My Oracle Support. |
8.1.2.84 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MINOR_THRESHOLD
Table 8-217 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
Over 20% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
| Summary |
Above 20 percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
| Severity | Minor |
| Expression | sum by (namespace) (rate(ue_n1_transfer_failure_notification_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.92 |
| Metric Used | ue_n1_transfer_failure_notification_total |
| Recommended Actions |
ue_n1_transfer_failure_notification_total metric is pegged when PCF receives transfer failure notification from AMF In this case operator needs to check for connectivity issues on AMF and UE - why fragment transfer to UE failed. Also operator might have to check if AMF has proper retransmission and reattempt configurations in place For any additional guidance, contact My Oracle Support. |
8.1.2.85 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MAJOR_THRESHOLD
Table 8-218 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
Over 50% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
| Summary |
Over 50% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
| Severity | Major |
| Expression | sum by (namespace) (rate(ue_n1_transfer_failure_notification_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.92 |
| Metric Used | ue_n1_transfer_failure_notification_total |
| Recommended Actions |
ue_n1_transfer_failure_notification_total metric is pegged when PCF receives transfer failure notification from AMF In this case operator needs to check for connectivity issues on AMF and UE - why fragment transfer to UE failed. Also operator might have to check if AMF has proper retransmission and reattempt configurations in place. For any additional guidance, contact My Oracle Support. |
8.1.2.86 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_CRITICAL_THRESHOLD
Table 8-219 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description |
Over 75% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
| Summary |
Over 75% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
| Severity | Critical |
| Expression | sum by (namespace) (rate(ue_n1_transfer_failure_notification_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.92 |
| Metric Used | ue_n1_transfer_failure_notification_total |
| Recommended Actions |
ue_n1_transfer_failure_notification_total metric is pegged when PCF receives transfer failure notification from AMF In this case operator needs to check for connectivity issues on AMF and UE - why fragment transfer to UE failed. Also operator might have to check if AMF has proper retransmission and reattempt configurations in place. For any additional guidance, contact My Oracle Support. |
8.1.2.87 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MINOR_THRESHOLD
Table 8-220 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
Over 20% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
| Summary |
Over 20% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
| Severity | Minor |
| Expression | sum by (namespace) (rate(ue_n1_transfer_t3501_expiry_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.93 |
| Metric Used | ue_n1_transfer_t3501_expiry_total |
| Recommended Actions |
ue_n1_transfer_t3501_expiry_total metric is pegged when PCF was not able to get any N1N2 notification message from AMF before T3501 timer expires In this case operator needs to check on AMF side why N1N2 message was delayed, also connectivity between PCF and AMF needs to be checked If connection between PCF and AMF is not an issue then as workaround operator can increase T3501 timer time by navigating to PCF GUI Service Configuration -> PCF UE Timer Setting section and increasing T3501 Timer Duration field to a bigger value. For any additional guidance, contact My Oracle Support. |
8.1.2.88 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MAJOR_THRESHOLD
Table 8-221 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
Over 50% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
| Summary |
Over 50% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
| Severity | Major |
| Expression | sum by (namespace) (rate(ue_n1_transfer_t3501_expiry_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.93 |
| Metric Used | ue_n1_transfer_t3501_expiry_total |
| Recommended Actions |
ue_n1_transfer_t3501_expiry_total metric is pegged when PCF was not able to get any N1N2 notification message from AMF before T3501 timer expires. In this case operator needs to check on AMF side why N1N2 message was delayed, also connectivity between PCF and AMF needs to be checked. If connection between PCF and AMF is not an issue then as workaround operator can increase T3501 timer time by navigating to PCF GUI. Service Configuration -> PCF UE Timer Setting section and increasing T3501 Timer Duration field to a bigger value. For any additional guidance, contact My Oracle Support. |
8.1.2.89 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_CRITICAL_THRESHOLD
Table 8-222 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description |
Over 75% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
| Summary |
Over 75% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
| Severity | Critical |
| Expression | sum by (namespace) (rate(ue_n1_transfer_t3501_expiry_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 75 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.93 |
| Metric Used | ue_n1_transfer_t3501_expiry_total |
| Recommended Actions |
ue_n1_transfer_t3501_expiry_total metric is pegged when PCF was not able to get any N1N2 notification message from AMF before T3501 timer expires In this case operator needs to check on AMF side why N1N2 message was delayed, also connectivity between PCF and AMF needs to be checked If connection between PCF and AMF is not an issue then as workaround operator can increase T3501 timer time by navigating to PCF GUI Service Configuration -> PCF UE Timer Setting section and increasing T3501 Timer Duration field to a bigger value For any additional guidance, contact My Oracle Support. |
8.1.2.90 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_CRITICAL_THRESHOLD
Table 8-223 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period. |
| Summary | This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(ocpm_handle_update_notify_error_response_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.111 |
| Metric Used | ocpm_handle_update_notify_error_response_as_pending_confirmation_total |
| Recommended Actions |
Cause: Metric ocpm_handle_update_notify_error_response_as_pending_confirmation_total is pegged when the operation Update Notify towards SMF ends up with an Error
Diagnostic Information:
Recover:
For any additional guidance, contact My Oracle Support. |
8.1.2.91 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MAJOR_THRESHOLD
Table 8-224 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | This alert is triggered when the number of update notify failed because a timeout is equal or above 50% in a given time period. |
| Summary | This alert is triggered when the number of update notify failed because a timeout is equal or above 50% in a given time period. |
| Severity | Major |
| Expression | (sum by (namespace) (rate(ocpm_handle_update_notify_error_response_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.111 |
| Metric Used | ocpm_handle_update_notify_error_response_as_pending_confirmation_total |
| Recommended Actions |
Cause: Metric ocpm_handle_update_notify_error_response_as_pending_confirmation_total is pegged when the operation Update Notify towards SMF ends up with an Error
Diagnostic Information:
Recover:
For any additional guidance, contact My Oracle Support. |
8.1.2.92 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MINOR_THRESHOLD
Table 8-225 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | This alert is triggered when the number of update notify failed because a timeout is equal or above 30% but less than 50% of total Rx sessions. |
| Summary | This alert is triggered when the number of update notify failed because a timeout is equal or above 30% but less than 50% of total Rx sessions. |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(ocpm_handle_update_notify_error_response_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm", responseCode=~"5xx/4xx"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 30 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.111 |
| Metric Used | ocpm_handle_update_notify_error_response_as_pending_confirmation_total |
| Recommended Actions |
Cause: Metric ocpm_handle_update_notify_error_response_as_pending_confirmation_total is pegged when the operation Update Notify towards SMF ends up with an error. Metrics:
Alarm Condition:
Diagnostic Information:
Recover:
|
8.1.2.93 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_CRITICAL_THRESHOLD
Table 8-226 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period. |
| Summary | This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.112 |
| Metric Used | ocpm_handle_update_notify_timeout_as_pending_confirmation_total |
| Recommended Actions |
Cause: Metric ocpm_handle_update_notify_timeout_as_pending_confirmation_total is pegged when the Update Notify operation towards SMF ends up with a timeout. Metrics:
Alarm Condition:
Diagnostic Information:
Recover:
For any additional guidance, contact My Oracle Support. |
8.1.2.94 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MAJOR_THRESHOLD
Table 8-227 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | This alert is triggered when the number of update notify that failed because a timeout is equal or above 50% but less than 70% in a given time period. |
| Summary | This alert is triggered when the number of update notify that failed because a timeout is equal or above 50% but less than 70% in a given time period. |
| Severity | Major |
| Expression | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.112 |
| Metric Used | ocpm_handle_update_notify_timeout_as_pending_confirmation_total |
| Recommended Actions |
Cause: Metric ocpm_handle_update_notify_timeout_as_pending_confirmation_total is pegged when the operation Update Notify towards SMF ends up with a timeout.
Diagnostic Information:
Recover:
For any additional guidance, contact My Oracle Support. |
8.1.2.95 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MINOR_THRESHOLD
Table 8-228 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | This alert is triggered when the number of update notify that failed because a timeout is equal or above 30% but less than 50% of total Rx sessions. |
| Summary | This alert is triggered when the number of update notify that failed because a timeout is equal or above 30% but less than 50% of total Rx sessions. |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 30 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.112 |
| Metric Used | ocpm_handle_update_notify_timeout_as_pending_confirmation_total |
| Recommended Actions |
Cause: Metric ocpm_handle_update_notify_timeout_as_pending_confirmation_total is pegged when the Update Notify operation towards SMF ends up with a timeout. Metrics:
Alarm Condition:
Diagnostic Information:
Recover:
For any additional guidance, contact My Oracle Support. |
8.1.2.96 PCF_STATE_NON_FUNCTIONAL_CRITICAL
Table 8-229 PCF_STATE_NON_FUNCTIONAL_CRITICAL
| Field | Details |
|---|---|
| Description | Policy is in non functional state due to DB cluster state down. |
| Summary | Policy is in non functional state due to DB cluster state down. |
| Severity | Critical |
| Expression | appinfo_nfDbFunctionalState_current{nfDbFunctionalState="Not_Running"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.102 |
| Metric Used | appinfo_nfDbFunctionalState_current |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.2.97 UDR_GET_REVALIDATION_FAILURE_ABOVE_MAJOR_PERCENT
Table 8-230 UDR_GET_REVALIDATION_FAILURE_ABOVE_MAJOR_PERCENT
| Field | Details |
|---|---|
| Description | This alert is triggered when more than or equal to 50% but less that 70% of the UDR revalidation using method GET call failed. |
| Summary | This alert is triggered when more than or equal to 50% but less that 70% of the UDR revalidation using method GET call failed. |
| Severity | Major |
| Expression | (sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",response_code!~"2.*",service_resource="subscription-revalidation"}[5m])) / sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",service_resource="subscription-revalidation"}[5m]))) * 100 >= 50 < 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.108 |
| Metric Used | ocpm_udr_tracking_response_total |
| Recommended Actions |
The Cause: The ocpm_udr_tracking_response_total metric is pegged whenever a response is received from the UDR in the UDR Connector. In this case, alerts are triggered when the number of failed responses received from UDR for the resubscribe operation exceeds the configured threshold. This alert is triggered when more than 50% but less than 70% of GET calls for UDR revalidation ( Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.98 UDR_GET_REVALIDATION_FAILURE_ABOVE_CRITICAL_PERCENT
Table 8-231 UDR_GET_REVALIDATION_FAILURE_ABOVE_CRITICAL_PERCENT
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 70% of the UDR revalidation using method GET call failed. |
| Summary | This alert is triggered when more than 70% of the UDR revalidation using method GET call failed. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",response_code!~"2.*",service_resource="subscription-revalidation"}[5m])) / sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",service_resource="subscription-revalidation"}[5m]))) * 100 >= 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.108 |
| Metric Used | ocpm_udr_tracking_response_total |
| Recommended Actions |
The Cause: The ocpm_udr_tracking_response_total metric is pegged whenever a response is received from the UDR in the UDR Connector. In this case, alerts are triggered when the number of failed responses received from UDR for the resubscribe operation exceeds the configured threshold. This alert is triggered when more than 70% of GET calls for UDR revalidation ( Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.99 UDR_GET_REVALIDATION_FAILURE_ABOVE_MINOR_PERCENT
Table 8-232 UDR_GET_REVALIDATION_FAILURE_ABOVE_MINOR_PERCENT
| Field | Details |
|---|---|
| Description | This alert is triggered when more than or equal to 30% but less that 50% of the UDR revalidation using method GET call failed. |
| Summary | This alert is triggered when more than or equal to 30% but less that 50% of the UDR revalidation using method GET call failed. |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",response_code!~"2.*",service_resource="subscription-revalidation"}[5m])) / sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",service_resource="subscription-revalidation"}[5m]))) * 100 >= 30 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.108 |
| Metric Used | ocpm_udr_tracking_response_total |
| Recommended Actions |
The Cause: The ocpm_udr_tracking_response_total metric is pegged whenever we receive a response from the UDR in the UDR Connector. In this case, alerts are triggered when the number of failed responses received from UDR for the resubscribe operation is above the configured threshold. This alert is triggered when more than 30% but less than 50% of GET calls for UDR revalidation (
,
)
sent by the PCF-UserService fail (i.e., receive non-2xx HTTP
response codes).
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.100 UDR_GET_REVALIDATION_404_FAILURE_ABOVE_CRITICAL_PERCENT
Table 8-233 UDR_GET_REVALIDATION_404_FAILURE_ABOVE_CRITICAL_PERCENT
| Field | Details |
|---|---|
| Description | This alert is triggered when more than 70% of the UDR revalidation using method GET call failed with status code 404 NOT FOUND. |
| Summary | This alert is triggered when more than 70% of the UDR revalidation using method GET call failed with status code 404 NOT FOUND. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",response_code="404",service_resource="subscription-revalidation"}[5m])) / sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",service_resource="subscription-revalidation"}[5m]))) * 100 >= 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.110 |
| Metric Used | ocpm_udr_tracking_response_total |
| Recommended Actions |
The Cause: This alert is triggered when more than 70% of UDR revalidation GET operations managed by PCF fail with an HTTP 404 (Not Found) response code within the specified window. A 404 response indicates that the requested subscription for revalidation was not found in UDR. Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.101 UDR_GET_REVALIDATION_404_FAILURE_ABOVE_MAJOR_PERCENT
Table 8-234 UDR_GET_REVALIDATION_404_FAILURE_ABOVE_MAJOR_PERCENT
| Field | Details |
|---|---|
| Description | This alert is triggered when more than or equal to 50% but less that 70% of the UDR revalidation using method GET call failed. |
| Summary | This alert is triggered when more than or equal to 50% but less that 70% of the UDR revalidation using method GET call failed. |
| Severity | Major |
| Expression | (sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",response_code="404",service_resource="subscription-revalidation"}[5m])) / sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",service_resource="subscription-revalidation"}[5m]))) * 100 >= 50 < 70 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.110 |
| Metric Used | ocpm_udr_tracking_response_total |
| Recommended Actions |
The Cause: This alert is triggered when more than 50% (but less than 70%) of UDR revalidation GET operations managed by PCF fail with an HTTP 404 (Not Found) response code within the specified window. A 404 response indicates that the requested subscription for revalidation was not found in UDR. Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.102 UDR_GET_REVALIDATION_404_FAILURE_ABOVE_MINOR_PERCENT
Table 8-235 UDR_GET_REVALIDATION_404_FAILURE_ABOVE_MINOR_PERCENT
| Field | Details |
|---|---|
| Description | This alert is triggered when more than or equal to 30% but less that 50% of the UDR revalidation using method GET call failed. |
| Summary | This alert is triggered when more than or equal to 30% but less that 50% of the UDR revalidation using method GET call failed. |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",response_code="404",service_resource="subscription-revalidation"}[5m])) / sum by (namespace) (rate(ocpm_udr_tracking_response_total{operation_type="resubscribe",microservice=~".*pcf_user",service_resource="subscription-revalidation"}[5m]))) * 100 >= 30 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.110 |
| Metric Used | ocpm_udr_tracking_response_total |
| Recommended Actions |
The Cause: This alert is triggered when more than 30% (but less than 50%) of UDR revalidation GET operations managed by PCF fail with an HTTP 404 (Not Found) response code within the specified window. A 404 response indicates that the requested subscription for revalidation was not found in UDR. Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.103 UDR_AM_IMMREP_RESPONSE_MISSING_DATA_MINOR
Table 8-236 UDR_AM_IMMREP_RESPONSE_MISSING_DATA_MINOR
| Field | Details |
|---|---|
| Description | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but without user data for AM as part of immediate reporting |
| Summary | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but without user data for AM as part of immediate reporting |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post",imm_reports_present="false"}[5m])) / sum(rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post"}[5m]))) * 100 >= 10 < 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.116 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.104 UDR_AM_IMMREP_RESPONSE_MISSING_DATA_MAJOR
Table 8-237 UDR_AM_IMMREP_RESPONSE_MISSING_DATA_MAJOR
| Field | Details |
|---|---|
| Description | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but without user data for AM as part of immediate reporting |
| Summary | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but without user data for AM as part of immediate reporting |
| Severity | Major |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post",imm_reports_present="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post"}[5m]))) * 100 >= 20 < 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.116 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.105 UDR_AM_IMMREP_RESPONSE_MISSING_DATA_CRITICAL
Table 8-238 UDR_AM_IMMREP_RESPONSE_MISSING_DATA_CRITICAL
| Field | Details |
|---|---|
| Description | More than or equal to 30% of the traffic, UDR returned with POST subscribe response but without user data for AM as part of immediate reporting. |
| Summary | More than or equal to 30% of the traffic, UDR returned with POST subscribe response but without user data for AM as part of immediate reporting. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post",imm_reports_present="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post"}[5m]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.116 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.106 UDR_AM_IMMREP_FEATURE_NEGOTIATION_FAILED_MINOR
Table 8-239 UDR_AM_IMMREP_FEATURE_NEGOTIATION_FAILED_MINOR
| Field | Details |
|---|---|
| Description | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for AM as part of immediate reporting |
| Summary | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for AM as part of immediate reporting |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post",immediate_report_pcc="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post"}[5m]))) * 100 >= 10 < 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.117 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.107 UDR_AM_IMMREP_FEATURE_NEGOTIATION_FAILED_MAJOR
Table 8-240 UDR_AM_IMMREP_FEATURE_NEGOTIATION_FAILED_MAJOR
| Field | Details |
|---|---|
| Description | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for AM as part of immediate reporting |
| Summary | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for AM as part of immediate reporting |
| Severity | Major |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post",immediate_report_pcc="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post"}[5m]))) * 100 >= 20 < 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.117 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.108 UDR_AM_IMMREP_FEATURE_NEGOTIATION_FAILED_CRITICAL
Table 8-241 UDR_AM_IMMREP_FEATURE_NEGOTIATION_FAILED_CRITICAL
| Field | Details |
|---|---|
| Description | More than or equal to 30% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for AM as part of immediate reporting |
| Summary | More than or equal to 30% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for AM as part of immediate reporting |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post",immediate_report_pcc="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="am-data",operation_type="post"}[5m]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.117 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.109 UDR_UE_IMMREP_RESPONSE_MISSING_DATA_MINOR
Table 8-242 UDR_UE_IMMREP_RESPONSE_MISSING_DATA_MINOR
| Field | Details |
|---|---|
| Description | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but without user data for UE as part of immediate reporting |
| Summary | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but without user data for UE as part of immediate reporting |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post",imm_reports_present="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post"}[5m]))) * 100 >= 10 < 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.118 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.110 UDR_UE_IMMREP_RESPONSE_MISSING_DATA_MAJOR
Table 8-243 UDR_UE_IMMREP_RESPONSE_MISSING_DATA_MAJOR
| Field | Details |
|---|---|
| Description | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but without user data for UE as part of immediate reporting |
| Summary | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but without user data for UE as part of immediate reporting |
| Severity | Major |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post",imm_reports_present="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post"}[5m]))) * 100 >= 20 < 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.118 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: The metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.111 UDR_UE_IMMREP_RESPONSE_MISSING_DATA_CRITICAL
Table 8-244 UDR_UE_IMMREP_RESPONSE_MISSING_DATA_CRITICAL
| Field | Details |
|---|---|
| Description | More than or equal to 30% of the traffic, UDR returned with POST subscribe response but without user data for UE as part of immediate reporting |
| Summary | More than or equal to 30% of the traffic, UDR returned with POST subscribe response but without user data for UE as part of immediate reporting |
| Severity | CRITICAL |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post",imm_reports_present="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post"}[5m]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.118 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: Metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for a POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.112 UDR_UE_IMMREP_FEATURE_NEGOTIATION_FAILED_MINOR
Table 8-245 UDR_UE_IMMREP_FEATURE_NEGOTIATION_FAILED_MINOR
| Field | Details |
|---|---|
| Description | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for UE as part of immediate reporting |
| Summary | More than or equal to 10% but less that 20% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for UE as part of immediate reporting |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post",immediate_report_pcc="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post"}[5m]))) * 100 >= 10 < 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.119 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: Metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.113 UDR_UE_IMMREP_FEATURE_NEGOTIATION_FAILED_MAJOR
Table 8-246 UDR_UE_IMMREP_FEATURE_NEGOTIATION_FAILED_MAJOR
| Field | Details |
|---|---|
| Description | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for UE as part of immediate reporting |
| Summary | More than or equal to 20% but less that 30% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for UE as part of immediate reporting |
| Severity | Major |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post",immediate_report_pcc="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post"}[5m]))) * 100 >= 20 < 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.119 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: Metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.114 UDR_UE_IMMREP_FEATURE_NEGOTIATION_FAILED_CRITICAL
Table 8-247 UDR_UE_IMMREP_FEATURE_NEGOTIATION_FAILED_CRITICAL
| Field | Details |
|---|---|
| Description | More than or equal to 30% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for UE as part of immediate reporting |
| Summary | More than or equal to 30% of the traffic, UDR returned with POST subscribe response but with failed feature negotiation for UE as part of immediate reporting |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post",immediate_report_pcc="false"}[5m]))) / (sum by (namespace) (rate(occnp_immrep_response_total{service_subresource="ue-policy-set",operation_type="post"}[5m]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.119 |
| Metric Used | occnp_immrep_response_total |
| Recommended Actions |
Cause: Metric occnp_immrep_response_total is pegged when UDR-C receives a user data response from UDR for POST Subscription with Immediate Reporting. Metric:
Alarm Condition:
Diagnostic Information:
Recovery:
For any additional guidance, contact My Oracle Support. |
8.1.2.115 POD_PROTECTION_BY_RATELIMIT_REJECTED_REQUEST_EGW
Table 8-248 POD_PROTECTION_BY_RATELIMIT_REJECTED_REQUEST_EGW
| Field | Details |
|---|---|
| Description | Egress Gateway traffic is getting rejected more than 1% because of ratelimiting. |
| Summary | Egress Gateway traffic is getting rejected more than 1% because of ratelimiting. |
| Severity | Major |
| Expression | (sum(rate(oc_egressgateway_http_request_ratelimit_values_total {allowed="false",app_kubernetes_io_name="egress-gateway",,namespace="$NAMESPACE"}[2m]) or (up * 0 ) ) )/sum(rate(oc_egressgateway_http_request_ratelimit_values_total {app_kubernetes_io_name="egress-gateway",,namespace="$NAMESPACE"}[2m])) * 100 >= 1 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.114 |
| Metric Used | oc_egressgateway_http_request_ratelimit_values_total |
| Recommended Actions |
The alert is cleared when the failure rate goes below 1% of total tps. For any additional guidance, contact My Oracle Support. |
8.1.2.116 SMF_REQUESTED_SERVICE_NOT_AUTHORIZED_ABOVE_CRITICAL_THRESHOLD_PERCENT
Table 8-249 SMF_REQUESTED_SERVICE_NOT_AUTHORIZED_ABOVE_CRITICAL_THRESHOLD_PERCENT
| Field | Details |
|---|---|
| Description | {{ $value }} % of patch requests failed in {{$labels.namespace}}. |
| Summary | This alert is triggered when the number of PATCH request that failed is equal to or above 60% in a given time period. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(occnp_pa_sponsored_sessions_total{responseCode="403"}[5m])))/(sum by (namespace) (rate(occnp_pa_sponsored_sessions_total[5m]))) * 100 >= 60 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.120 |
| Metric Used | occnp_pa_sponsored_sessions_total |
| Recommended Actions | If this alert gets triggered, Prometheus metrics or other
tools can be used to check what error codes are being thrown and
identify if the error comes from the NF being reached (in this case
SM).
Cause: Alerts are triggered when Sponsored
Connectivity requests processed by PA-Service fail with a 403
Requested Service Not Authorized response. This occurs when
the client sends a Sponsored request with
Diagnostic Information:
Verification steps:
Monitoring recommendations:
Recovery:
|
8.1.2.117 SMF_REQUESTED_SERVICE_NOT_AUTHORIZED_ABOVE_MAJOR_THRESHOLD_PERCENT
Table 8-250 SMF_REQUESTED_SERVICE_NOT_AUTHORIZED_ABOVE_MAJOR_THRESHOLD_PERCENT
| Field | Details |
|---|---|
| Description | {{ $value }} % of patch requests failed in {{$labels.namespace}}. |
| Summary | This alert is triggered when the number of PATCH request that failed is equal to or above 40% in a given time period. |
| Severity | Major |
| Expression | (sum by (namespace) (rate(occnp_pa_sponsored_sessions_total{responseCode="403"}[5m])))/(sum by (namespace) (rate(occnp_pa_sponsored_sessions_total[5m]))) * 100 >= 40 < 60 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.120 |
| Metric Used | occnp_pa_sponsored_sessions_total |
| Recommended Actions | If this alert gets triggered, Prometheus metrics or other
tools can be used to check what error codes are being thrown and
identify if the error comes from the NF being reached (in this case
SM).
Cause: Alerts are triggered when Sponsored
Connectivity requests processed by PA-Service fail with a 403
Requested Service Not Authorized response. This occurs when
the client sends a Sponsored request with
Diagnostic Information:
Verification steps:
Monitoring recommendations:
Recovery:
|
8.1.2.118 SMF_REQUESTED_SERVICE_NOT_AUTHORIZED_ABOVE_MINOR_THRESHOLD_PERCENT
Table 8-251 SMF_REQUESTED_SERVICE_NOT_AUTHORIZED_ABOVE_MINOR_THRESHOLD_PERCENT
| Field | Details |
|---|---|
| Description | {{ $value }} % of patch requests failed in {{$labels.namespace}} |
| Summary | This alert is triggered when the number of PATCH request that failed is equal to or above 20% in a given time period. |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(occnp_pa_sponsored_sessions_total{responseCode="403"}[5m])))/(sum by (namespace) (rate(occnp_pa_sponsored_sessions_total[5m]))) * 100 >= 20 < 40 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.120 |
| Metric Used | occnp_pa_sponsored_sessions_total |
| Recommended Actions | If this alert gets triggered, Prometheus metrics or other
tools can be used to check what error codes are being thrown and
identify if the error comes from the NF being reached (in this case
SM).
Cause: Alerts are triggered when Sponsored Connectivity
requests processed by PA-Service fail with a 403 Requested
Service Not Authorized response. This occurs when the client
sends a Sponsored request with
Diagnostic Information:
Verification steps:
Monitoring recommendations:
Recovery:
|
8.1.2.119 AF_MANDATORY_IE_MISSING_SC_ABOVE_CRITICAL_THRESHOLD_PERCENT
Table 8-252 AF_MANDATORY_IE_MISSING_SC_ABOVE_CRITICAL_THRESHOLD_PERCENT
| Field | Details |
|---|---|
| Description | {{ $value }} % of patch requests failed in {{$labels.namespace}}. |
| Summary | This alert is triggered when the number of PATCH request that failed is equal to or above 60% in a given time period. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(occnp_pa_sponsored_sessions_total{responseCode="400",cause="MANDATORY_IE_MISSING"}[5m])))/(sum by (namespace) (rate(occnp_pa_sponsored_sessions_total[5m]))) * 100 >= 60 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.122 |
| Metric Used | occnp_pa_sponsored_sessions_total |
| Recommended Actions | If this alert gets triggered, Prometheus metrics or other
tools can be used to check what error codes are being thrown and
identify if the error comes from the NF being reached (in this case
SM).
Cause: Alerts are triggered when Sponsored Connectivity
requests processed by PA-Service fail with a 400 Bad Request
due to Diagnostic Information:
Verification steps:
Monitoring recommendations:
Recovery:
|
8.1.2.120 AF_MANDATORY_IE_MISSING_SC_ABOVE_MAJOR_THRESHOLD_PERCENT
Table 8-253 AF_MANDATORY_IE_MISSING_SC_ABOVE_MAJOR_THRESHOLD_PERCENT
| Field | Details |
|---|---|
| Description | {{ $value }} % of patch requests failed in {{$labels.namespace}}. |
| Summary | This alert is triggered when the number of PATCH request that failed is equal to or above 40% in a given time period. |
| Severity | Major |
| Expression | (sum by (namespace) (rate(occnp_pa_sponsored_sessions_total{responseCode="400",cause="MANDATORY_IE_MISSING"}[5m])))/(sum by (namespace) (rate(occnp_pa_sponsored_sessions_total[5m]))) * 100 >= 40 < 60 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.122 |
| Metric Used | occnp_pa_sponsored_sessions_total |
| Recommended Actions | If this alert gets triggered, Prometheus metrics or other
tools can be used to check what error codes are being thrown and
identify if the error comes from the NF being reached (in this case
SM).
Cause: Alerts are triggered when Sponsored Connectivity
requests processed by PA-Service fail with a 400 Bad Request
due to Diagnostic Information:
Verification steps:
Monitoring recommendations:
Recovery:
|
8.1.2.121 AF_MANDATORY_IE_MISSING_SC_ABOVE_MINOR_THRESHOLD_PERCENT
Table 8-254 AF_MANDATORY_IE_MISSING_SC_ABOVE_MINOR_THRESHOLD_PERCENT
| Field | Details |
|---|---|
| Description | {{ $value }} % of patch requests failed in {{$labels.namespace}}. |
| Summary | This alert is triggered when the number of PATCH request that failed is equal to or above 20% in a given time period. |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(occnp_pa_sponsored_sessions_total{responseCode="400",cause="MANDATORY_IE_MISSING"}[5m])))/(sum by (namespace) (rate(occnp_pa_sponsored_sessions_total[5m]))) * 100 >= 20 < 40 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.122 |
| Metric Used | occnp_pa_sponsored_sessions_total |
| Recommended Actions | If this alert is triggered, Prometheus metrics or other
tools can be used to check what error codes are being thrown and
identify if the error comes from the NF being reached (in this case
SM).
Cause: Alerts are triggered
when Sponsored Connectivity requests processed by PA-Service fail
with a 400 Bad Request due to
Diagnostic Information:
Verification steps:
Monitoring recommendations:
Recovery:
|
8.1.3 PCRF Alerts
This section provides information about PCRF alerts.
8.1.3.1 PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD
PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD
Table 8-255 PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | PRE fail count exceeds the critical threshold limit. |
| Summary | Alert PRE unreachable NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | PRE fail count exceeds the critical threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.9 |
| Metric Used | http_out_conn_response_total{container="pcrf-core", responseCode!~"2.*", serviceResource="PRE"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.2 PRE_UNREACHABLE_EXCEEDS_MAJOR_THRESHOLD
PRE_UNREACHABLE_EXCEEDS_MAJOR_THRESHOLD
Table 8-256 PRE_UNREACHABLE_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | PRE fail count exceeds the major threshold limit. |
| Summary | Alert PRE unreachable NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | PRE fail count exceeds the major threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.9 |
| Metric Used | http_out_conn_response_total{container="pcrf-core", responseCode!~"2.*", serviceResource="PRE"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.3 PRE_UNREACHABLE_EXCEEDS_MINOR_THRESHOLD
PRE_UNREACHABLE_EXCEEDS_MINOR_THRESHOLD
Table 8-257 PRE_UNREACHABLE_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | PRE fail count exceeds the minor threshold limit. |
| Summary | Alert PRE unreachable NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | minor |
| Condition | PRE fail count exceeds the minor threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.9 |
| Metric Used | http_out_conn_response_total{container="pcrf-core", responseCode!~"2.*", serviceResource="PRE"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.4 PCRF_DOWN
Table 8-258 PCRF_DOWN
| Field | Details |
|---|---|
| Description | PCRF Service is down |
| Summary | Alert PCRF_DOWN NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | None of the pods of the PCRF service are available. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.33 |
| Metric Used | appinfo_service_running{service=~".*pcrf-core"} |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.3.5 CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-259 CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | CCA fail count exceeds the critical threshold limit |
| Summary | Alert CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The failure rate of CCA messages has exceeded the configured threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.13 |
| Metric Used | occnp_diam_response_local_total{msgType=~"CCA.*", responseCode!~"2.*"} |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.3.6 CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-260 CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | CCA fail count exceeds the major threshold limit |
| Summary | Alert CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The failure rate of CCA messages has exceeded the configured major threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.13 |
| Metric Used | occnp_diam_response_local_total{msgType=~"CCA.*", responseCode!~"2.*"} |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.3.7 CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-261 CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | CCA fail count exceeds the minor threshold limit |
| Summary | Alert CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The failure rate of CCA messages has exceeded the configured minor threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.13 |
| Metric Used | occnp_diam_response_local_total{msgType=~"CCA.*", responseCode!~"2.*"} |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.1.3.8 AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-262 AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | AAA fail count exceeds the critical threshold limit |
| Summary | Alert AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The failure rate of AAA messages has exceeded the critical threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.34 |
| Metric Used | occnp_diam_response_local_total{msgType=~"AAA.*", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.9 AAA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
AAA Fail Count Exceeds Major Threshold
Table 8-263 AAA Fail Count Exceeds Major Threshold
| Field | Details |
|---|---|
| Description | AAA fail count exceeds the major threshold limit |
| Summary | Alert AAA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The failure rate of AAA messages has exceeded the major threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.34 |
| Metric Used | occnp_diam_response_local_total{msgType=~"AAA.*", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.10 AAA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
AAA Fail Count Exceeds Minor Threshold
Table 8-264 AAA Fail Count Exceeds Minor Threshold
| Field | Details |
|---|---|
| Description | AAA fail count exceeds the minor threshold limit |
| Summary | Alert AAA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The failure rate of AAA messages has exceeded the minor threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.34 |
| Metric Used | occnp_diam_response_local_total{msgType=~"AAA.*", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.11 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-265 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Rx fail count exceeds the critical threshold limit |
| Summary | Alert RAA_Rx_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The failure rate of RAA Rx messages has exceeded the configured threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.35 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.12 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
(Required) <Enter a short description here.>
RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-266 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Rx fail count exceeds the major threshold limit |
| Summary | Alert RAA_Rx_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The failure rate of RAA Rx messages has exceeded the configured major threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.35 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.13 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
(Required) <Enter a short description here.>
RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-267 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Rx fail count exceeds the minor threshold limit |
| Summary | Alert RAA_Rx_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The failure rate of RAA Rx messages has exceeded the configured minor threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.35 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.14 RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-268 RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Gx fail count exceeds the critical threshold limit |
| Summary | Alert RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The failure rate of RAA Gx messages has exceeded the configured threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.18 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.15 RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
(Required) <Enter a short description here.>
RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-269 RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Gx fail count exceeds the major threshold limit |
| Summary | Alert RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The failure rate of RAA Gx messages has exceeded the configured major threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.18 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.16 RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
(Required) <Enter a short description here.>
RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-270 RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Gx fail count exceeds the minor threshold limit |
| Summary | Alert RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The failure rate of RAA Gx messages has exceeded the configured minor threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.18 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.17 ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-271 ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA fail count exceeds the critical threshold limit |
| Summary | Alert ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The failure rate of ASA messages has exceeded the configured threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.17 |
| Metric Used | occnp_diam_response_local_total{msgType=~"ASA.*", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.18 ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
(Required) <Enter a short description here.>
ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-272 ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA fail count exceeds the major threshold limit |
| Summary | Alert ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The failure rate of ASA messages has exceeded the configured major threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.17 |
| Metric Used | occnp_diam_response_local_total{msgType=~"ASA.*", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.19 ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
(Required) <Enter a short description here.>
ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-273 ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA fail count exceeds the minor threshold limit |
| Summary | Alert ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The failure rate of ASA messages has exceeded the configured minor threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.17 |
| Metric Used | occnp_diam_response_local_total{msgType=~"ASA.*", responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.20 STA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
STA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-274 STA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | STA fail count exceeds the critical threshold limit. |
| Summary | sum(rate(occnp_diam_response_local_total{msgType="STA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA"}[5m])) * 100 > 90 |
| Severity | Critical |
| Condition | The failure rate of STA messages has exceeded the configured critical threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.19 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.21 STA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
STA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-275 STA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | STA fail count exceeds the major threshold limit. |
| Summary | sum(rate(occnp_diam_response_local_total{msgType="STA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA"}[5m])) * 100 > 80 |
| Severity | Major |
| Condition | The failure rate of STA messages has exceeded the configured major threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.19 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.22 STA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
STA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-276 STA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | STA fail count exceeds the minor threshold limit. |
| Summary | sum(rate(occnp_diam_response_local_total{msgType="STA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA"}[5m])) * 100 > 60 |
| Severity | Minor |
| Condition | The failure rate of STA messages has exceeded the configured minor threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.19 |
| Metric Used | occnp_diam_response_local_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.23 ASATimeoutlCountExceedsThreshold
ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-277 ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA timeout count exceeds the critical threshold limit |
| Summary | Alert ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The timeout rate of ASA messages has exceeded the configured threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.31 |
| Metric Used | occnp_diam_response_local_total{msgType="ASA", responseCode="timeout"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.24 ASA_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
(Required) <Enter a short description here.>
ASA_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-278 ASA_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA timeout count exceeds the major threshold limit |
| Summary | Alert ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The timeout rate of ASA messages has exceeded the configured major threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.31 |
| Metric Used | occnp_diam_response_local_total{msgType="ASA", responseCode="timeout"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.25 ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
(Required) <Enter a short description here.>
ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-279 ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | ASA timeout count exceeds the minor threshold limit |
| Summary | Alert ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The timeout rate of ASA messages has exceeded the configured minor threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.31 |
| Metric Used | occnp_diam_response_local_total{msgType="ASA", responseCode="timeout"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.26 RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-280 RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Gx timeout count exceeds the critical threshold limit |
| Summary | Alert RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The timeout rate of RAA Gx messages has exceeded the configured threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.32 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"timeout"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.27 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-281 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Gx timeout count exceeds the major threshold limit |
| Summary | Alert RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The timeout rate of RAA Gx messages has exceeded the configured major threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.32 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"timeout"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.28 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-282 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Gx timeout count exceeds the minor threshold limit |
| Summary | Alert RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The timeout rate of RAA Gx messages has exceeded the configured minor threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.44.1.2.32 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"timeout"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.29 RAA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
RAA Rx Timeout Count Exceeds Critical Threshold
Table 8-283 RAA Rx Timeout Count Exceeds Critical Threshold
| Field | Details |
|---|---|
| Description | RAA Rx timeout count exceeds the critical threshold limit |
| Summary | Alert RAA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The timeout rate of RAA Rx messages has exceeded the configured threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.36 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"timeout"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.30 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-284 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Rx timeout count exceeds the major threshold limit |
| Summary | Alert RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The timeout rate of RAA Rx messages has exceeded the configured major threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.36 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"timeout"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.31 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-285 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | RAA Rx timeout count exceeds the minor threshold limit |
| Summary | Alert RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The timeout rate of RAA Rx messages has exceeded the configured minor threshold limit. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.36 |
| Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"timeout"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.32 RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Table 8-286 RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
| Field | Details |
|---|---|
| Description | CCA, AAA, RAA, ASA and STA error rate combined is above 10 percent |
| Summary | Alert RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The combined failure rate of CCA, AAA, RAA, ASA, and STA messages is more than 10% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.37 |
| Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.33 RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Table 8-287 RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
| Field | Details |
|---|---|
| Description | CCA, AAA, RAA, ASA and STA error rate combined is above 5 percent |
| Summary | Alert RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The combined failure rate of CCA, AAA, RAA, ASA, and STA messages is more than 5% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.37 |
| Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.34 RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Table 8-288 RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
| Field | Details |
|---|---|
| Description | CCA, AAA, RAA, ASA and STA error rate combined is above 1 percent |
| Summary | Alert RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The combined failure rate of CCA, AAA, RAA, ASA, and STA messages is more than 1% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.37 |
| Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.35 Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Table 8-289 Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
| Field | Details |
|---|---|
| Description | Rx error rate combined is above 10 percent |
| Summary | Alert Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The failure rate of Rx responses is more than 10% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.38 |
| Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Rx"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.36 Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Table 8-290 Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
| Field | Details |
|---|---|
| Description | Rx error rate combined is above 5 percent |
| Summary | Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The failure rate of Rx responses is more than 5% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.38 |
| Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Rx"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.37 Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Table 8-291 Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
| Field | Details |
|---|---|
| Description | Rx error rate combined is above 1 percent |
| Summary | Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The failure rate of Rx responses is more than 1% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.38 |
| Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Rx"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.38 Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Table 8-292 Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
| Field | Details |
|---|---|
| Description | Gx error rate combined is above 10 percent |
| Summary | Alert Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Critical |
| Condition | The failure rate of Gx responses is more than 10% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.39 |
| Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Gx"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.39 Gx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Gx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Table 8-293 Gx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
| Field | Details |
|---|---|
| Description | Gx error rate combined is above 5 percent |
| Summary | Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Major |
| Condition | The failure rate of Gx responses is more than 5% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.39 |
| Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Gx"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.40 Gx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
(Required) <Enter a short description here.>
Gx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Table 8-294 Gx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
| Field | Details |
|---|---|
| Description | Gx error rate combined is above 1 percent |
| Summary | Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
| Severity | Minor |
| Condition | The failure rate of Gx responses is more than 1% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.36.1.2.39 |
| Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Gx"} |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.41 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL
STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL
Table 8-295 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL
| Field | Details |
|---|---|
| Description | The Diameter requests are being discarded due to timeout processing occurring above 30% |
| Summary | The Diameter requests are being discarded due to timeout processing occurring above 30% |
| Severity | Critical |
| Condition | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
| Metric Used | occnp_stale_diam_request_cleanup_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.42 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR
STALE_DIAMETER_REQUEST_CLEANUP_MAJOR
Table 8-296 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR
| Field | Details |
|---|---|
| Description | The Diameter requests are being discarded due to timeout processing occurring above 20% |
| Summary | The Diameter requests are being discarded due to timeout processing occurring above 20% |
| Severity | Major |
| Condition | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 20 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
| Metric Used | occnp_stale_diam_request_cleanup_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.1.3.43 STALE_DIAMETER_REQUEST_CLEANUP_MINOR
STALE_DIAMETER_REQUEST_CLEANUP_MINOR
Table 8-297 STALE_DIAMETER_REQUEST_CLEANUP_MINOR
| Field | Details |
|---|---|
| Description | The Diameter requests are being discarded due to timeout processing occurring above 10% |
| Summary | The Diameter requests are being discarded due to timeout processing occurring above 10% |
| Severity | Minor |
| Condition | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 10 |
| OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
| Metric Used | occnp_stale_diam_request_cleanup_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |