8 Alerts

This section provides information on Policy alerts and their configuration.

Note:

The performance and capacity of the system can vary based on the call model, configuration, including but not limited to the deployed policies and corresponding data, for example, policy tables.

You can configure alerts in Prometheus and Alertrules.yaml file.

The following table describes the various severity types of alerts generated by Policy:

Table 8-1 Alerts Levels or Severity Types

Alerts Levels / Severity Types Definition
Critical Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions can affect the service of Policy.
Major Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions can affect the service of Policy.
Minor Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions can affect the service of Policy.
Info or Warn (Informational) Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of Policy.

8.1 Configuring Alerts

This section describes how to configure alerts in Policy. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

Note:

  • Sample alert files are packaged with Policy Custom Templates. The Policy Custom Templates.zip file can be downloaded from MOS. Unzip the folder to access the following files:
    • Common_Alertrules_cne1.9+.yaml
    • PCF_Alertrules_cne1.9+.yaml
    • PCRF_Alertrules_cne1.9+.yaml
  • Name in the metadata section should be unique while applying more than one unique files. For example:
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      creationTimestamp: null
      labels:
        role: cnc-alerting-rules
      name: occnp-pcf-alerting-rules
  • If required, edit the threshold values of various alerts in the alert files before configuring the alerts.
  • The Alert Manager and Prometheus tools should run in CNE namespace, for example, occne-infra.
  • Use the following table to select the appropriate files on the basis of deployment mode and CNE version

    Table 8-2 Alert Configuration

    Deployment Mode CNE 1.9+
    Converged Mode

    Common_Alertrules_cne1.9+.yaml

    PCF_Alertrules_cne1.9+.yaml

    PCRF_Alertrules_cne1.9+.yaml

    PCF only

    Common_Alertrules_cne1.9+.yaml

    PCF_Alertrules_cne1.9+.yaml

    PCRF only

    Common_Alertrules_cne1.9+.yaml

    PCRF_Alertrules_cne1.9+.yaml

Configuring Alerts in Prometheus for CNE 1.9.0 and later versions

To configure PCF alerts in Prometheus for CNE 1.9.0, perform the following steps:
  1. Copy the the required file to the Bastion Host.
  2. To create or replace the PrometheusRule CRD, run the following command:
    $ kubectl apply -f Common_Alertrules_cne1.9+.yaml -n <namespace>
    $ kubectl apply -f PCF_Alertrules_cne1.9+.yaml -n <namespace>
    $ kubectl apply -f PCRF_Alertrules_cne1.9+.yaml -n <namespace>

    Note:

    This is a sample command for Converged mode of deployment.
    To verify if the CRD is created, run the following command:
    kubectl get prometheusrule -n <namespace>
    Example:
    kubectl get prometheusrule -n occnp
  3. Verify the alerts in the Prometheus GUI. To do so, select the Alerts tab, and view alert details by selecting any individual rule from the list.

Validating Alerts

After configuring the alerts in Prometheus server, a user can verify using the following procedure:
  • Open the Prometheus server from your browser using the <IP>:<Port>
  • Navigate to Status and then Rules
  • Search Policy. Policy Alerts list is displayed.

If you are unable to see the alerts, verify if the alert file is correct and then try again.

Adding worker node name in metrics

To add the worker node name in metrics, perform the following steps:
  1. Edit the configmap occne-prometheus-server in namespace - occne-infra.
  2. Locate the the following job:
    job_name: kubernetes-pods
    kubernetes_sd_configs:
    role: pod
  3. Add the following in the relabel_configs:
    action: replace
    source_labels:
    __meta_kubernetes_pod_node_name
    target_label: kubernetes_pod_node_name

8.2 Configuring SNMP Notifier

This section describes the procedure to configure SNMP Notifier.

Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using the following procedure:
  1. Run the following command to edit the deployment:
    $ kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>

    Example:

    $ kubectl edit deploy occne-snmp-notifier -n occne-infra

    SNMP deployment yaml file is displayed.

  2. Edit the SNMP destination in the deployment yaml file as follows:
    --snmp.destination=<destination_ip>:<destination_port>

    Example:

    --snmp.destination=10.75.203.94:162
  3. Save the file.
Checking SNMP Traps
Following is an example on how to capture the logs of the trap receiver server to view the generated SNMP traps:
$ docker logs <trapd_container_id>
Sample output:

Figure 8-1 Sample output for SNMP Trap

Sample output for SNMP Trap
MIB Files for CNC Policy

There are two MIB files which are used to generate the traps. Update these files along with the Alert file in order to fetch the traps in their environment.

  • toplevel.mib

    This is the top level mib file, where the Objects and their data types are defined.

  • policy-alarm-mib.mib

    This file fetches objects from the top level mib file and these objects can be selected for display.

Note:

MIB files are packaged along with CNC Policy Custom Templates. Download the file from MOS. For more information on downloading custom templates, see Oracle Communications Cloud Native Core Policy Installation and Upgrade Guide.

8.3 List of Alerts

This section provides detailed information about the alert rules defined for Policy. It consists of the following three types of alerts:
  1. Common Alerts - This category of alerts is common and required for all three modes of deployment.
  2. PCF Alerts - This category of alerts is specific to PCF microservices and required for Converged and PCF only modes of deployment.
  3. PCRF Alerts - This category of alerts is specific to PCRF microservices and required for Converged and PCRF only modes of deployment.

8.3.1 Common Alerts

This section provides information about alerts that are common for PCF and PCRF.

8.3.1.1 POD_CONGESTION_L1

Table 8-3 POD_CONGESTION_L1

Field Details
Name in Alert Yaml File PodCongestionL1
Description Alert when cpu of pod is in CONGESTION_L1 state.
Summary Alert when cpu of pod is in CONGESTION_L1 state.
Severity Critical
Condition occnp_pod_resource_congestion_state{type="cpu",container!~"bulwark|diam-gateway"} == 2
OID 1.3.6.1.4.1.323.5.3.52.1.2.71
Metric Used occnp_pod_resource_congestion_state
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.2 POD_CONGESTION_L2

Table 8-4 POD_CONGESTION_L2

Field Details
Name in Alert Yaml File PodCongestionL2
Description Alert when cpu of pod is in CONGESTION_L2 state.
Summary Alert when cpu of pod is in CONGESTION_L2 state.
Severity Critical
Condition occnp_pod_resource_congestion_state{type="cpu"} == 3
OID 1.3.6.1.4.1.323.5.3.52.1.2.72
Metric Used occnp_pod_resource_congestion_state
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.3 POD_PENDING_REQUEST_CONGESTION_L1

Table 8-5 POD_PENDING_REQUEST_CONGESTION_L1

Field Details
Name in Alert Yaml File PodPendingRequestCongestionL1
Description Alert when queue of pod is in CONGESTION_L1 state.
Summary Alert when queue of pod is in CONGESTION_L1 state.
Severity critical
Condition occnp_pod_resource_congestion_state{type="queue",container!~"bulwark|diam-gateway"} == 2
OID 1.3.6.1.4.1.323.5.3.52.1.2.73
Metric Used occnp_pod_resource_congestion_state
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.4 POD_PENDING_REQUEST_CONGESTION_L2

Table 8-6 POD_PENDING_REQUEST_CONGESTION_L2

Field Details
Name in Alert Yaml File PodPendingRequestCongestionL2
Description Alert when queue of pod is in CONGESTION_L2 state.
Summary Alert when queue of pod is in CONGESTION_L2 state.
Severity critical
Condition occnp_pod_resource_congestion_state{type="queue"} == 3
OID 1.3.6.1.4.1.323.5.3.52.1.2.74
Metric Used occnp_pod_resource_congestion_state
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.5 POD_CPU_CONGESTION_L1

Table 8-7 POD_CPU_CONGESTION_L1

Field Details
Name in Alert Yaml File PodCPUCongestionL1
Description Alert when cpu of pod is in CONGESTION_L1 state.
Summary Alert when cpu of pod is in CONGESTION_L1 state.Alert when pod is in CONGESTION_L1 state.
Severity Critical
Condition occnp_pod_resource_congestion_state{type="cpu",container!~"bulwark|diam-gateway"} == 2
OID 1.3.6.1.4.1.323.5.3.52.1.2.73
Metric Used occnp_pod_resource_congestion_state
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.6 POD_CPU_CONGESTION_L2

Table 8-8 POD_CPU_CONGESTION_L2

Field Details
Name in Alert Yaml File PodCPUCongestionL2
Description Alert when cpu of pod is in CONGESTION_L2 state.
Summary Alert when cpu of pod is in CONGESTION_L2 state.
Severity critical
Condition occnp_pod_resource_congestion_state{type="cpu"} == 3
OID 1.3.6.1.4.1.323.5.3.52.1.2.74
Metric Used occnp_pod_resource_congestion_state
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.7 Pod_Memory_DoC

Table 8-9 Pod_Memory_DoC

Field Details
Description Pod Resource Congestion status of {{$labels.service}} service is DoC for Memory type
Summary Pod Resource Congestion status of {{$labels.service}} service is DoC for Memory type
Severity Major
Condition occnp_pod_resource_congestion_state{type="memory"} == 1
OID 1.3.6.1.4.1.323.5.3.52.1.2.31
Metric Used occnp_pod_resource_congestion_state
Recommended Actions
Alert triggers based on the resource limit usage and load shedding configurations in congestion control. The CPU, Memory, and queue usage can be referred using the Grafana Dashboard.

Note:

Threshold levels can be configured using the PCF_Alertrules.yaml file.

For any additional guidance, contact My Oracle Support.

8.3.1.8 Pod_Memory_Congested

Table 8-10 Pod_Memory_Congested

Field Details
Description Pod Resource Congestion status of {{$labels.service}} service is congested for Memory type
Summary Pod Resource Congestion status of {{$labels.service}} service is congested for Memory type
Severity Critical
Condition occnp_pod_resource_congestion_state{type="memory"} == 2
OID 1.3.6.1.4.1.323.5.3.52.1.2.32
Metric Used occnp_pod_resource_congestion_state
Recommended Actions

Alert triggers based on the resource limit usage and load shedding configurations in congestion control. The CPU, Memory, and queue usage can be referred using the Grafana Dashboard.

For any additional guidance, contact My Oracle Support.

8.3.1.9 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-11 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description RAA Rx fail count exceeds the critical threshold limit.
Summary RAA Rx fail count exceeds the critical threshold limit.
Severity CRITICAL
Condition sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236"}[5m])) * 100 > 90
OID 1.3.6.1.4.1.323.5.3.52.1.2.35
Metric Used occnp_diam_response_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.10 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-12 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description RAA Rx fail count exceeds the major threshold limit.
Summary RAA Rx fail count exceeds the major threshold limit.
Severity MAJOR
Condition sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) * 100 <= 90
OID 1.3.6.1.4.1.323.5.3.52.1.2.35
Metric Used occnp_diam_response_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.11 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-13 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description RAA Rx fail count exceeds the minor threshold limit.
Summary RAA Rx fail count exceeds the minor threshold limit.
Severity MINOR
Condition sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) * 100 <= 80
OID 1.3.6.1.4.1.323.5.3.52.1.2.35
Metric Used occnp_diam_response_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.12 ASA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-14 ASA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description ASA Rx fail count exceeds the critical threshold limit.
Summary ASA Rx fail count exceeds the critical threshold limit.
Severity CRITICAL
Condition sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 90
OID 1.3.6.1.4.1.323.5.3.52.1.2.66
Metric Used occnp_diam_response_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.13 ASA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-15 ASA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description ASA Rx fail count exceeds the major threshold limit.
Summary ASA Rx fail count exceeds the major threshold limit.
Severity MAJOR
Condition sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 90
OID 1.3.6.1.4.1.323.5.3.52.1.2.66
Metric Used occnp_diam_response_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.14 ASA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-16 ASA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description ASA Rx fail count exceeds the minor threshold limit.
Summary ASA Rx fail count exceeds the minor threshold limit.
Severity MINOR
Condition sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 80
OID 1.3.6.1.4.1.323.5.3.52.1.2.66
Metric Used occnp_diam_response_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.15 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-17 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description ASA Rx timeout count exceeds the minor threshold limit
Summary ASA Rx timeout count exceeds the minor threshold limit
Severity MINOR
Condition sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 80
OID 1.3.6.1.4.1.323.5.3.52.1.2.67
Metric Used -
Recommended Actions -
8.3.1.16 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-18 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description ASA Rx timeout count exceeds the major threshold limit
Summary ASA Rx timeout count exceeds the major threshold limit
Severity sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 90
Condition MAJOR
OID 1.3.6.1.4.1.323.5.3.52.1.2.67
Metric Used -
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.17 ASA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-19 ASA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description ASA Rx timeout count exceeds the critical threshold limit
Summary ASA Rx timeout count exceeds the critical threshold limit
Severity CRITICAL
Condition sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 90
OID 1.3.6.1.4.1.323.5.3.52.1.2.67
Metric Used -
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.18 SCP_PEER_UNAVAILABLE

Table 8-20 SCP_PEER_UNAVAILABLE

Field Details
Description Configured SCP peer is unavailable.
Summary Configured SCP peer is unavailable.
Severity Major
Condition occnp_oc_egressgateway_peer_health_status != 0. SCP peer [ {{$labels.peer}} ] is unavailable.
OID 1.3.6.1.4.1.323.5.3.52.1.2.60
Metric Used occnp_oc_egressgateway_peer_health_status
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.19 SCP_PEER_SET_UNAVAILABLE

Table 8-21 SCP_PEER_SET_UNAVAILABLE

Field Details
Description None of the SCP peer available for configured peerset.
Summary None of the SCP peer available for configured peerset.
Severity Critical
Condition One of the SCPs has been marked unhealthy.
OID 1.3.6.1.4.1.323.5.3.52.1.2.61
Metric Used oc_egressgateway_peer_count and oc_egressgateway_peer_available_count
Recommended Actions

NF clears the critical alarm when atleast one SCP peer in a peerset becomes available such that all other SCP peers in the given peerset are still unavailable.

For any additional guidance, contact My Oracle Support.
8.3.1.20 STALE_CONFIGURATION

Table 8-22 STALE_CONFIGURATION

Field Details
Description In last 10 minutes, the current service config_level does not match the config_level from the config-server.
Summary In last 10 minutes, the current service config_level does not match the config_level from the config-server.
Severity Major
Condition (sum by(namespace) (topic_version{app_kubernetes_io_name="config-server",topicName="config.level"})) / (count by(namespace) (topic_version{app_kubernetes_io_name="config-server",topicName="config.level"})) != (sum by(namespace) (topic_version{app_kubernetes_io_name!="config-server",topicName="config.level"})) / (count by(namespace) (topic_version{app_kubernetes_io_name!="config-server",topicName="config.level"}))
OID 1.3.6.1.4.1.323.5.3.52.1.2.62
Metric Used topic_version
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.21 POLICY_SERVICES_DOWN

Table 8-23 POLICY_SERVICES_DOWN

Field Details
Name in Alert Yaml File PCF_SERVICES_DOWN
Description {{$labels.service}} service is not running.
Summary {{$labels.service}} service is not running.
Severity Critical
Condition None of the pods of the CNC Policy application are available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.1
Metric Used appinfo_service_running{vendor="Oracle", application="occnp", category!=""}!= 1
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.22 DIAM_TRAFFIC_RATE_ABOVE_THRESHOLD

Table 8-24 DIAM_TRAFFIC_RATE_ABOVE_THRESHOLD

Field Details
Name in Alert Yaml File DiamTrafficRateAboveThreshold
Description Diameter Connector Ingress traffic Rate is above threshold of Max MPS (current value is: {{ $value }})
Summary Traffic Rate is above 90 Percent of Max requests per second.
Severity Major
Condition The total Ingress traffic rate for Diameter connector has crossed the configured threshold of 900 TPS.

Default value of this alert trigger point in Common_Alertrules.yaml file is when Diameter Connector Ingress Rate crosses 90% of maximum ingress requests per second.

OID 1.3.6.1.4.1.323.5.3.36.1.2.6
Metric Used ocpm_ingress_request_total
Recommended Actions The alert gets cleared when the Ingress traffic rate falls below the threshold.

Note: Threshold levels can be configured using the Common_Alertrules.yaml file.

It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic:
  1. Refer Ingress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes.
  2. Check Ingress Gateway logs on Kibana to determine the reason for the errors.
For any additional guidance, contact My Oracle Support.
8.3.1.23 DIAM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Table 8-25 DIAM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Field Details
Name in Alert Yaml File DiamIngressErrorRateAbove10Percent
Description Transaction Error Rate detected above 10 Percent of Total on Diameter Connector (current value is: {{ $value }})
Summary Transaction Error Rate detected above 10 Percent of Total Transactions.
Severity Critical
Condition The number of failed transactions is above 10 percent of the total transactions on Diameter Connector.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7
Metric Used ocpm_ingress_response_total
Recommended Actions The alert gets cleared when the number of failed transactions are below 10% of the total transactions.
To assess the reason for failed transactions, perform the following steps:
  1. Check the service specific metrics to understand the service specific errors. For instance: ocpm_ingress_response_total{servicename_3gpp="rx",response_code!~"2.*"}
  2. The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH.
For any additional guidance, contact My Oracle Support.
8.3.1.24 DIAM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Table 8-26 DIAM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Field Details
Name in Alert Yaml File DiamEgressErrorRateAbove1Percent
Description Egress Transaction Error Rate detected above 1 Percent of Total on Diameter Connector (current value is: {{ $value }})
Summary Transaction Error Rate detected above 1 Percent of Total Transactions
Severity Minor
Condition The number of failed transactions is above 1 percent of the total Egress Gateway transactions on Diameter Connector.
OID 1.3.6.1.4.1.323.5.3.36.1.2.8
Metric Used ocpm_egress_response_total
Recommended Actions The alert gets cleared when the number of failed transactions are below 1% of the total transactions.
To assess the reason for failed transactions, perform the following steps:
  1. Check the service specific metrics to understand the errors. For instance: ocpm_egress_response_total{servicename_3gpp="rx",response_code!~"2.*"}
  2. The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH.
For any additional guidance, contact My Oracle Support.
8.3.1.25 UDR_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Table 8-27 UDR_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Field Details
Name in Alert Yaml File PcfUdrIngressTrafficRateAboveThreshold
Description User service Ingress traffic Rate from UDR is above threshold of Max MPS (current value is: {{ $value }})
Summary Traffic Rate is above 90 Percent of Max requests per second
Severity Major
Condition The total User Service Ingress traffic rate from UDR has crossed the configured threshold of 900 TPS.

Default value of this alert trigger point in Common_Alertrules.yaml file is when user service Ingress Rate from UDR crosses 90% of maximum ingress requests per second.

OID 1.3.6.1.4.1.323.5.3.36.1.2.9
Metric Used ocpm_userservice_inbound_count_total{service_resource="udr-service"}
Recommended Actions The alert gets cleared when the Ingress traffic rate falls below the threshold.

Note: Threshold levels can be configured using the Common_Alertrules.yaml file.

It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic:
  1. Refer Ingress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes.
  2. Check Ingress Gateway logs on Kibana to determine the reason for the errors.

For any additional guidance, contact My Oracle Support.

8.3.1.26 UDR_EGRESS_ERROR_RATE_ABOVE_10_PERCENT

Table 8-28 UDR_EGRESS_ERROR_RATE_ABOVE_10_PERCENT

Field Details
Name in Alert Yaml File PcfUdrEgressErrorRateAbove10Percent
Description Egress Transaction Error Rate detected above 10 Percent of Total on User service (current value is: {{ $value }})
Summary Transaction Error Rate detected above 10 Percent of Total Transactions
Severity Critical
Condition The number of failed transactions from UDR is more than 10 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.36.1.2.10
Metric Used ocpm_udr_tracking_response_total{servicename_3gpp="nudr-dr",response_code!~"2.*"}
Recommended Actions The alert gets cleared when the number of failure transactions falls below the configured threshold.

Note: Threshold levels can be configured using the Common_Alertrules.yaml file.

It is recommended to assess the reason for failed transactions. Perform the following steps to analyze the cause of increased traffic:
  1. Refer Egress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes.
  2. Check Egress Gateway logs on Kibana to determine the reason for the errors.

For any additional guidance, contact My Oracle Support.

8.3.1.27 POLICYDS_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Table 8-29 POLICYDS_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Field Details
Name in Alert Yaml File PolicyDsIngressTrafficRateAboveThreshold
Description Ingress Traffic Rate is above threshold of Max MPS (current value is: {{ $value }})
Summary Traffic Rate is above 90 Percent of Max requests per second
Severity Critical
Condition The total PolicyDS Ingress message rate has crossed the configured threshold of 900 TPS. 90% of maximum Ingress request rate.

Default value of this alert trigger point in Common_Alertrules.yaml file is when PolicyDS Ingress Rate crosses 90% of maximum ingress requests per second.

OID 1.3.6.1.4.1.323.5.3.36.1.2.13
Metric Used client_request_total

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.

Recommended Actions The alert gets cleared when the Ingress traffic rate falls below the threshold.

Note: Threshold levels can be configured using the Common_Alertrules.yaml file.

It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic:
  1. Refer Ingress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes.
  2. Check Ingress Gateway logs on Kibana to determine the reason for the errors.

For any additional guidance, contact My Oracle Support.

8.3.1.28 POLICYDS_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Table 8-30 POLICYDS_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Field Details
Name in Alert Yaml File PolicyDsIngressErrorRateAbove10Percent
Description Ingress Transaction Error Rate detected above 10 Percent of Totat on PolicyDS service (current value is: {{ $value }})
Summary Transaction Error Rate detected above 10 Percent of Total Transactions
Severity Critical
Condition The number of failed transactions is above 10 percent of the total transactions for PolicyDS service.
OID 1.3.6.1.4.1.323.5.3.36.1.2.14
Metric Used client_response_total
Recommended Actions The alert gets cleared when the number of failed transactions are below 10% of the total transactions.
To assess the reason for failed transactions, perform the following steps:
  1. Check the service specific metrics to understand the service specific errors. For instance: client_response_total{response!~"2.*"}
  2. The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH.

For any additional guidance, contact My Oracle Support.

8.3.1.29 POLICYDS_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Table 8-31 POLICYDS_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Field Details
Name in Alert Yaml File PolicyDsEgressErrorRateAbove1Percent
Description Egress Transaction Error Rate detected above 1 Percent of Total on PolicyDS service (current value is: {{ $value }})
Summary Transaction Error Rate detected above 1 Percent of Total Transactions
Severity Minor
Condition The number of failed transactions is above 1 percent of the total transactions for PolicyDS service.
OID 1.3.6.1.4.1.323.5.3.36.1.2.15
Metric Used server_response_total
Recommended Actions The alert gets cleared when the number of failed transactions are below 10% of the total transactions.
To assess the reason for failed transactions, perform the following steps:
  1. Check the service specific metrics to understand the service specific errors. For instance: server_response_total{response!~"2.*"}
  2. The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH.

For any additional guidance, contact My Oracle Support.

8.3.1.30 UDR_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD

Table 8-32 UDR_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File PcfUdrIngressTimeoutErrorAboveMajorThreshold
Description Ingress Timeout Error Rate detected above 10 Percent of Totat towards UDR service (current value is: {{ $value }})
Summary Timeout Error Rate detected above 10 Percent of Total Transactions
Severity Major
Condition The number of failed transactions due to timeout is above 10 percent of the total transactions for UDR service.
OID 1.3.6.1.4.1.323.5.3.36.1.2.16
Metric Used ocpm_udr_tracking_request_timeout_total{servicename_3gpp="nudr-dr"}
Recommended Actions The alert gets cleared when the number of failed transactions due to timeout are below 10% of the total transactions.
To assess the reason for failed transactions, perform the following steps:
  1. Check the service specific metrics to understand the service specific errors. For instance: ocpm_udr_tracking_request_timeout_total{servicename_3gpp="nudr-dr"}
  2. The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH.

For any additional guidance, contact My Oracle Support.

8.3.1.31 DB_TIER_DOWN_ALERT

Table 8-33 DB_TIER_DOWN_ALERT

Field Details
Name in Alert Yaml File DBTierDownAlert
Description DB cannot be reachable.
Summary DB cannot be reachable.
Severity Critical
Condition Database is not available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.18
Metric Used appinfo_category_running{category="database"}
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.32 CPU_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD

Table 8-34 CPU_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File CPUUsagePerServiceAboveMinorThreshold
Description CPU usage for {{$labels.service}} service is above 60
Summary CPU usage for {{$labels.service}} service is above 60
Severity Minor
Condition A service pod has reached the configured minor threshold (60%) of its CPU usage limits.
OID 1.3.6.1.4.1.323.5.3.36.1.2.19
Metric Used container_cpu_usage_seconds_total

Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.

Recommended Actions The alert gets cleared when the CPU utilization falls below the minor threshold or crosses the major threshold, in which case CPUUsagePerServiceAboveMajorThreshold alert shall be raised.

Note: Threshold levels can be configured using the PCF_Alertrules.yaml file.

For any additional guidance, contact My Oracle Support.

8.3.1.33 CPU_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD

Table 8-35 CPU_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File CPUUsagePerServiceAboveMajorThreshold
Description CPU usage for {{$labels.service}} service is above 80
Summary CPU usage for {{$labels.service}} service is above 80
Severity Major
Condition A service pod has reached the configured major threshold (80%) of its CPU usage limits.
OID 1.3.6.1.4.1.323.5.3.36.1.2.20
Metric Used container_cpu_usage_seconds_total

Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.

Recommended Actions The alert gets cleared when the CPU utilization falls below the major threshold or crosses the critical threshold, in which case CPUUsagePerServiceAboveCriticalThreshold alert shall be raised.

Note: Threshold levels can be configured using the PCF_Alertrules.yaml file.

For any additional guidance, contact My Oracle Support.

8.3.1.34 CPU_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD

Table 8-36 CPU_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File CPUUsagePerServiceAboveCriticalThreshold
Description CPU usage for {{$labels.service}} service is above 90
Summary CPU usage for {{$labels.service}} service is above 90
Severity Critical
Condition A service pod has reached the configured critical threshold (90%) of its CPU usage limits.
OID 1.3.6.1.4.1.323.5.3.36.1.2.21
Metric Used container_cpu_usage_seconds_total

Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.

Recommended Actions The alert gets cleared when the CPU utilization falls below the critical threshold.

Note: Threshold levels can be configured using the PCF_Alertrules.yaml file.

For any additional guidance, contact My Oracle Support.

8.3.1.35 MEMORY_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD

Table 8-37 MEMORY_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File MemoryUsagePerServiceAboveMinorThreshold
Description Memory usage for {{$labels.service}} service is above 60
Summary Memory usage for {{$labels.service}} service is above 60
Severity Minor
Condition A service pod has reached the configured minor threshold (60%) of its memory usage limits.
OID 1.3.6.1.4.1.323.5.3.36.1.2.22
Metric Used container_memory_usage_bytes

Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.

Recommended Actions The alert gets cleared when the memory utilization falls below the minor threshold or crosses the critical threshold, in which case MemoryUsagePerServiceAboveMajorThreshold alert shall be raised.

Note: Threshold levels can be configured using the PCF_Alertrules.yaml file.

For any additional guidance, contact My Oracle Support.

8.3.1.36 MEMORY_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD

Table 8-38 MEMORY_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File MemoryUsagePerServiceAboveMajorThreshold
Description Memory usage for {{$labels.service}} service is above 80
Summary Memory usage for {{$labels.service}} service is above 80
Severity Major
Condition A service pod has reached the configured major threshold (80%) of its memory usage limits.
OID 1.3.6.1.4.1.323.5.3.36.1.2.23
Metric Used container_memory_usage_bytes

Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.

Recommended Actions The alert gets cleared when the memory utilization falls below the major threshold or crosses the critical threshold, in which case MemoryUsagePerServiceAboveCriticalThreshold alert shall be raised.

Note: Threshold levels can be configured using the PCF_Alertrules.yaml file.

For any additional guidance, contact My Oracle Support.

8.3.1.37 MEMORY_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD

Table 8-39 MEMORY_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File MemoryUsagePerServiceAboveCriticalThreshold
Description Memory usage for {{$labels.service}} service is above 90
Summary Memory usage for {{$labels.service}} service is above 90
Severity Critical
Condition A service pod has reached the configured critical threshold (90%) of its memory usage limits.
OID 1.3.6.1.4.1.323.5.3.36.1.2.24
Metric Used container_memory_usage_bytes

Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.

Recommended Actions The alert gets cleared when the memory utilization falls below the critical threshold.

Note: Threshold levels can be configured using the PCF_Alertrules.yaml file.

For any additional guidance, contact My Oracle Support.

8.3.1.38 POD_CONGESTED

Table 8-40 POD_CONGESTED

Field Details
Name in Alert Yaml File PodCongested
Description The pod congestion status is set to congested.
Summary Pod Congestion status of {{$labels.service}} service is congested
Severity Critical
Condition occnp_pod_congestion_state == 4
OID 1.3.6.1.4.1.323.5.3.36.1.2.26
Metric Used occnp_pod_congestion_state
Recommended Actions The alert gets cleared when the system is back to normal state.

For any additional guidance, contact My Oracle Support.

8.3.1.39 POD_DANGER_OF_CONGESTION

Table 8-41 POD_DANGER_OF_CONGESTION

Field Details
Description The pod congestion status is set to Danger of Congestion.
Summary Pod Congestion status of {{$labels.service}} service is DoC
Severity Major
Condition occnp_pod_resource_congestion_state == 1
OID 1.3.6.1.4.1.323.5.3.36.1.2.25
Metric Used occnp_pod_congestion_state
Recommended Actions The alert gets cleared when the system is back to normal state.

For any additional guidance, contact My Oracle Support.

8.3.1.40 POD_PENDING_REQUEST_CONGESTED

Table 8-42 POD_PENDING_REQUEST_CONGESTED

Field Details
Name in Alert Yaml File PodPendingRequestCongested
Description The pod congestion status is set to congested for PendingRequest.
Summary Pod Resource Congestion status of {{$labels.service}} service is congested for PendingRequest type.
Severity Critical
Condition occnp_pod_resource_congestion_state{type="queue"} == 4
OID 1.3.6.1.4.1.323.5.3.36.1.2.28
Metric Used occnp_pod_resource_congestion_state{type="queue"}
Recommended Actions The alert gets cleared when the pending requests in the queue comes below the configured threshold value.

For any additional guidance, contact My Oracle Support.

8.3.1.41 POD_PENDING_REQUEST_DANGER_OF_CONGESTION

Table 8-43 POD_PENDING_REQUEST_DANGER_OF_CONGESTION

Field Details
Description The pod congestion status is set to DoC for pending requests.
Summary Pod Resource Congestion status of {{$labels.service}} service is DoC for PendingRequest type.
Severity Major
Condition occnp_pod_resource_congestion_state{type="queue"} == 1
OID 1.3.6.1.4.1.323.5.3.36.1.2.27
Metric Used occnp_pod_resource_congestion_state{type="queue"}
Recommended Actions The alert gets cleared when the pending requests in the queue comes below the configured threshold value.

For any additional guidance, contact My Oracle Support.

8.3.1.42 POD_CPU_CONGESTED

Table 8-44 POD_CPU_CONGESTED

Field Details
Name in Alert Yaml File PodCPUCongested
Description The pod congestion status is set to congested for CPU.
Summary Pod Resource Congestion status of {{$labels.service}} service is congested for CPU type.
Severity Critical
Condition occnp_pod_resource_congestion_state{type="cpu"} == 4
OID 1.3.6.1.4.1.323.5.3.36.1.2.30
Metric Used occnp_pod_resource_congestion_state{type="cpu"}
Recommended Actions The alert gets cleared when the system CPU usage comes below the configured threshold value.

For any additional guidance, contact My Oracle Support.

8.3.1.43 POD_CPU_DANGER_OF_CONGESTION

Table 8-45 POD_CPU_DANGER_OF_CONGESTION

Field Details
Description Pod Resource Congestion status of {{$labels.service}} service is DoC for CPU type.
Summary Pod Resource Congestion status of {{$labels.service}} service is DoC for CPU type.
Severity Major
Condition The pod congestion status is set to DoC for CPU.
OID 1.3.6.1.4.1.323.5.3.36.1.2.29
Metric Used occnp_pod_resource_congestion_state{type="cpu"}
Recommended Actions The alert gets cleared when the system CPU usage comes below the configured threshold value.

For any additional guidance, contact My Oracle Support.

8.3.1.44 SERVICE_OVERLOADED

Table 8-46 SERVICE_OVERLOADED

Field Details
Description Overload Level of {{$labels.service}} service is L1
Summary Overload Level of {{$labels.service}} service is L1
Severity Minor
Condition The overload level of the service is L1.
OID 1.3.6.1.4.1.323.5.3.36.1.2.40
Metric Used load_level
Recommended Actions The alert gets cleared when the system is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 8-47 SERVICE_OVERLOADED

Field Details
Description Overload Level of {{$labels.service}} service is L2
Summary Overload Level of {{$labels.service}} service is L2
Severity Major
Condition The overload level of the service is L2.
OID 1.3.6.1.4.1.323.5.3.36.1.2.40
Metric Used load_level
Recommended Actions The alert gets cleared when the system is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 8-48 SERVICE_OVERLOADED

Field Details
Description Overload Level of {{$labels.service}} service is L3
Summary Overload Level of {{$labels.service}} service is L3
Severity Critical
Condition The overload level of the service is L3.
OID 1.3.6.1.4.1.323.5.3.36.1.2.40
Metric Used load_level
Recommended Actions The alert gets cleared when the system is back to normal state.

For any additional guidance, contact My Oracle Support.

8.3.1.45 SERVICE_RESOURCE_OVERLOADED

Alerts when service is in overload state due to memory usage

Table 8-49 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L1 for {{$labels.type}} type
Summary {{$labels.service}} service is L1 for {{$labels.type}} type
Severity Minor
Condition The overload level of the service is L1 due to memory usage.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="memory"}
Recommended Actions The alert gets cleared when the memory usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 8-50 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L2 for {{$labels.type}} type
Summary {{$labels.service}} service is L2 for {{$labels.type}} type
Severity Major
Condition The overload level of the service is L2 due to memory usage.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="memory"}
Recommended Actions The alert gets cleared when the memory usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 8-51 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L3 for {{$labels.type}} type.
Summary {{$labels.service}} service is L3 for {{$labels.type}} type
Severity Critical
Condition The overload level of the service is L3 due to memory usage.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="memory"}
Recommended Actions The alert gets cleared when the memory usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to CPU usage

Table 8-52 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L1 for {{$labels.type}} type
Summary {{$labels.service}} service is L1 for {{$labels.type}} type
Severity Minor
Condition The overload level of the service is L1 due to CPU usage.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="cpu"}
Recommended Actions The alert gets cleared when the CPU usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 8-53 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L2 for {{$labels.type}} type
Summary {{$labels.service}} service is L2 for {{$labels.type}} type
Severity Major
Condition The overload level of the service is L2 due to CPU usage.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="cpu"}
Recommended Actions The alert gets cleared when the CPU usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 8-54 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L3 for {{$labels.type}} type
Summary {{$labels.service}} service is L3 for {{$labels.type}} type
Severity Critical
Condition The overload level of the service is L3 due to CPU usage.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="cpu"}
Recommended Actions The alert gets cleared when the CPU usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to number of pending messages

Table 8-55 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L1 for {{$labels.type}} type
Summary {{$labels.service}} service is L1 for {{$labels.type}} type
Severity Minor
Condition The overload level of the service is L1 due to number of pending messages.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="svc_pending_count"}
Recommended Actions The alert gets cleared when the number of pending messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 8-56 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L2 for {{$labels.type}} type
Summary {{$labels.service}} service is L2 for {{$labels.type}} type
Severity Major
Condition The overload level of the service is L2 due to number of pending messages.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="svc_pending_count"}
Recommended Actions The alert gets cleared when the number of pending messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 8-57 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L3 for {{$labels.type}} type
Summary {{$labels.service}} service is L3 for {{$labels.type}} type
Severity Critical
Condition The overload level of the service is L3 due to number of pending messages.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="svc_pending_count"}
Recommended Actions The alert gets cleared when the number of pending messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to number of failed requests

Table 8-58 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L1 for {{$labels.type}} type.
Summary {{$labels.service}} service is L1 for {{$labels.type}} type.
Severity Minor
Condition The overload level of the service is L1 due to number of failed requests.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="svc_failure_count"}
Recommended Actions The alert gets cleared when the number of failed messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 8-59 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L2 for {{$labels.type}} type.
Summary {{$labels.service}} service is L2 for {{$labels.type}} type.
Severity Major
Condition The overload level of the service is L2 due to number of failed requests.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="svc_failure_count"}
Recommended Actions The alert gets cleared when the number of failed messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 8-60 SERVICE_RESOURCE_OVERLOADED

Field Details
Description {{$labels.service}} service is L3 for {{$labels.type}} type.
Summary {{$labels.service}} service is L3 for {{$labels.type}} type.
Severity Critical
Condition The overload level of the service is L3 due to number of failed requests.
OID 1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used service_resource_overload_level{type="svc_failure_count"}
Recommended Actions The alert gets cleared when the number of failed messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

8.3.1.46 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_CRITICAL_THRESHOLD

Table 8-61 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description Notification Transaction Error exceeds the critical threshold limit for a given Subscriber Notification server
Summary Transaction Error exceeds the critical threshold limit for a given Subscriber Notification server
Severity Critical
Condition The number of error responses for a given subscriber notification server exceeds the critical threshold of 1000.
OID 1.3.6.1.4.1.323.5.3.36.1.2.42
Metric Used http_notification_response_total{responseCode!~"2.*"}
Recommended Actions

For any additional guidance, contact My Oracle Support.

Table 8-62 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description Notification Transaction Error exceeds the major threshold limit for a given Subscriber Notification server
Summary Transaction Error exceeds the major threshold limit for a given Subscriber Notification server
Severity Major
Condition The number of error responses for a given subscriber notification server exceeds the major threshold value, that is, between 750 and 1000.
OID 1.3.6.1.4.1.323.5.3.36.1.2.42
Metric Used http_notification_response_total{responseCode!~"2.*"}
Recommended Actions

For any additional guidance, contact My Oracle Support.

Table 8-63 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_MINOR_THRESHOLD

Field Details
Description Notification Transaction Error exceeds the minor threshold limit for a given Subscriber Notification server
Summary Transaction Error exceeds the minor threshold limit for a given Subscriber Notification server
Severity Minor
Condition The number of error responses for a given subscriber notification server exceeds the minor threshold value, that is, between 500 and 750.
OID 1.3.6.1.4.1.323.5.3.36.1.2.42
Metric Used http_notification_response_total{responseCode!~"2.*"}
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.47 SYSTEM_IMPAIRMENT_MAJOR

Table 8-64 SYSTEM_IMPAIRMENT_MAJOR

Field Details
Description Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Summary Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Severity Major
Condition Major Impairment alert
OID 1.3.6.1.4.1.323.5.3.36.1.2.43
Metric Used db_tier_replication_status
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.48 SYSTEM_IMPAIRMENT_CRITICAL

Table 8-65 SYSTEM_IMPAIRMENT_CRITICAL

Field Details
Description Critical Impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Summary Critical Impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Severity Critical
Condition Critical Impairment alert
OID 1.3.6.1.4.1.323.5.3.36.1.2.43
Metric Used db_tier_replication_status
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.49 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN

Table 8-66 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN

Field Details
Description System Operational State is now in partial shutdown state.
Summary System Operational State is now in partial shutdown state.
Severity Major
Condition System Operational State is now in partial shutdown state
OID 1.3.6.1.4.1.323.5.3.36.1.2.44
Metric Used system_operational_state == 2
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.50 SYSTEM_OPERATIONAL_STATE_COMPLETE_SHUTDOWN

Table 8-67 SYSTEM_OPERATIONAL_COMPLETE_SHUTDOWN

Field Details
Description System Operational State is now in complete shutdown state
Summary System Operational State is now in complete shutdown state
Severity Info
Condition System Operational State is now in complete shutdown state
OID 1.3.6.1.4.1.323.5.3.36.1.2.44
Metric Used system_operational_state == 3
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.51 TDF_CONNECTION_DOWN

Table 8-68 TDF_CONNECTION_DOWN

Field Details
Description TDF connection is down.
Summary TDF connection is down.
Severity Critical
Condition occnp_diam_conn_app_network{applicationName="Sd"} == 0
OID 1.3.6.1.4.1.323.5.3.52.1.2.48
Metric Used occnp_diam_conn_app_network
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.52 DIAM_CONN_PEER_DOWN

Table 8-69 DIAM_CONN_PEER_DOWN

Field Details
Description Diameter connection to peer {{ $labels.peerHost }} is down.
Summary Diameter connection to peer is down.
Severity Major
Condition Diameter connection to peer peerHost in given namespace is down.
OID 1.3.6.1.4.1.323.5.3.52.1.2.50
Metric Used occnp_diam_conn_network
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.53 DIAM_CONN_NETWORK_DOWN

Table 8-70 DIAM_CONN_NETWORK_DOWN

Field Details
Description All the diameter network connections are down.
Summary All the diameter network connections are down.
Severity Critical
Condition sum by (kubernetes_namespace)(occnp_diam_conn_network) == 0
OID 1.3.6.1.4.1.323.5.3.52.1.2.51
Metric Used occnp_diam_conn_network
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.54 DIAM_CONN_BACKEND_DOWN

Table 8-71 DIAM_CONN_BACKEND_DOWN

Field Details
Description All the diameter backend connections are down.
Summary All the diameter backend connections are down.
Severity Critical
Condition sum by (kubernetes_namespace)(occnp_diam_conn_backend) == 0
OID 1.3.6.1.4.1.323.5.3.52.1.2.52
Metric Used occnp_diam_conn_network
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.55 PerfInfoActiveOverloadThresholdFetchFailed

Table 8-72 PerfInfoActiveOverloadThresholdFetchFailed

Field Details
Description The application fails to get the current active overload level threshold data.
Summary The application fails to get the current active overload level threshold data.
Severity Major
Condition active_overload_threshold_fetch_failed == 1
OID 1.3.6.1.4.1.323.5.3.52.1.2.53
Metric Used active_overload_threshold_fetch_failed
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.56 SLA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-73 SLA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description SLA Sy fail count exceeds the critical threshold limit
Summary SLA Sy fail count exceeds the critical threshold limit
Severity Critical
Condition

sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 > 90

OID

1.3.6.1.4.1.323.5.3.52.1.2.58

Metric Used

occnp_diam_response_local_total

Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.57 SLA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-74 SLA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description

SLA Sy fail count exceeds the major threshold limit

Summary

SLA Sy fail count exceeds the major threshold limit

Severity Major
Condition

sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 <= 90

OID

1.3.6.1.4.1.323.5.3.52.1.2.58

Metric Used

occnp_diam_response_local_total

Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.58 SLA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-75 SLA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description

SLA Sy fail count exceeds the minor threshold limit

Summary

SLA Sy fail count exceeds the minor threshold limit

Severity Minor
Condition

sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 <= 80

OID

1.3.6.1.4.1.323.5.3.52.1.2.58

Metric Used

occnp_diam_response_local_total

Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.59 STA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-76 STA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description

STA Sy fail count exceeds the critical threshold limit.

Summary

STA Sy fail count exceeds the critical threshold limit.

Severity Critical
Condition

The failure rate of Sy STA responses is more than 90% of the total responses.

Expression

sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 > 90

OID

1.3.6.1.4.1.323.5.3.52.1.2.59

Metric Used

occnp_diam_response_local_total

Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.60 STA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-77 STA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description

STA Sy fail count exceeds the major threshold limit.

Summary

STA Sy fail count exceeds the major threshold limit.

Severity Major
Condition

The failure rate of Sy STA responses is more than 80% and less and or equal to 90% of the total responses.

Expression

sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 <= 90

OID

1.3.6.1.4.1.323.5.3.52.1.2.59

Metric Used

occnp_diam_response_local_total

Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.61 STA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-78 STA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description

STA Sy fail count exceeds the minor threshold limit.

Summary

STA Sy fail count exceeds the minor threshold limit.

Severity Minor
Condition

The failure rate of Sy STA responses is more than 60% and less and or equal to 80% of the total responses.

Expression

sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 <= 80

OID

1.3.6.1.4.1.323.5.3.52.1.2.59

Metric Used

occnp_diam_response_local_total

Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.62 SMSC_CONNECTION_DOWN

Table 8-79 STASYFailCountExceedsCritcalThreshold

Field Details
Description This alert is triggered when connection to SMSC host is down.
Summary Connection to SMSC peer {{$labels.smscName}} is down in notifier service pod {{$labels.pod}}
Severity Major
Condition sum by(namespace, pod, smscName)(occnp_active_smsc_conn_count) == 0
OID 1.3.6.1.4.1.323.5.3.52.1.2.63
Metric Used occnp_active_smsc_conn_count
Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.63 STA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-80 STASYFailCountExceedsCritcalThreshold

Field Details
Description

STA Rx fail count exceeds the critical threshold limit.

Summary

STA Rx fail count exceeds the critical threshold limit.

Severity Critical
Condition

The failure rate of Rx STA responses is more than 90% of the total responses.

Expression

sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 > 90

OID

1.3.6.1.4.1.323.5.3.52.1.2.64

Metric Used

occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}

Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present.

Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.64 STA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-81 STA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description

STA Rx fail count exceeds the major threshold limit.

Summary

STA Rx fail count exceeds the major threshold limit.

Severity Major
Condition

The failure rate of Rx STA responses is more than 80% and less and or equal to 90% of the total responses.

Expression

sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 <= 90

OID

1.3.6.1.4.1.323.5.3.52.1.2.64

Metric Used

occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}

Recommended Actions

Check the connectivity between diam-gw pod(s) & AF and ensure connectivity is present.

Check that the session and user is valid and hasn't been removed in the Policy database, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.65 STA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-82 STA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description

STA Rx fail count exceeds the minor threshold limit.

Summary

STA Rx fail count exceeds the minor threshold limit.

Severity Minor
Condition

The failure rate of Rx STA responses is more than 60% and less and or equal to 80% of the total responses.

Expression

sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 <= 80

OID

1.3.6.1.4.1.323.5.3.52.1.2.64

Metric Used

occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}

Recommended Actions

Check the connectivity between diam-gw pod(s) & AF and ensure connectivity is present.

Check that the session and user is valid and hasn't been removed in the Policy database, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.66 SNA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-83 SNA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description

SNA Sy fail count exceeds the critical threshold limit

Summary

SNA Sy fail count exceeds the critical threshold limit

Severity Critical
Condition

The failure rate of Sy SNA responses is more than 90% of the total responses.

Expression

sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 > 90

OID

1.3.6.1.4.1.323.5.3.52.1.2.65

Metric Used

occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}

Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present.

Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.67 SNA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-84 SNA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description

SNA Sy fail count exceeds the major threshold limit

Summary

SNA Sy fail count exceeds the major threshold limit

Severity Major
Condition

The failure rate of Sy SNA responses is more than 80% and less and or equal to 90% of the total responses.

Expression

sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 <= 90

OID

1.3.6.1.4.1.323.5.3.52.1.2.65

Metric Used

occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}

Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present.

Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.68 SNA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-85 SNA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description

SNA Sy fail count exceeds the minor threshold limit

Summary

SNA Sy fail count exceeds the minor threshold limit

Severity Minor
Condition

The failure rate of Sy STA responses is more than 60% and less and or equal to 80% of the total responses.

Expression

sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 <= 80

OID

1.3.6.1.4.1.323.5.3.52.1.2.65

Metric Used

occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}

Recommended Actions

Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present.

Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s).

For any additional guidance, contact My Oracle Support.

8.3.1.69 STALE_DIAMETER_REQUEST_CLEANUP_MINOR

Table 8-86 STALE_DIAMETER_REQUEST_CLEANUP_MINOR

Field Details
Description This alerts is triggered when more than 10 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary The Diam requests are being discarded due to timeout processing occurring above 10%.
Severity Minor
Expression (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 10
OID 1.3.6.1.4.1.323.5.3.52.1.2.82
Metric Used
  • ocpm_stale_diam_request_cleanup_total
  • occnp_diam_request_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.70 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR

Table 8-87 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR

Field Details
Description This alert is triggered when more than 20 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary The Diam requests are being discarded due to timeout processing occurring above 20%.
Severity Major
Expression (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 20
OID 1.3.6.1.4.1.323.5.3.52.1.2.82
Metric Used
  • ocpm_late_arrival_rejection_total
  • occnp_diam_request_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.71 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL

Table 8-88 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL

Field Details
Description This alert is triggered when more than 30 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary The Diam requests are being discarded due to timeout processing occurring above 30%.
Severity Critical
Expression (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 30
OID 1.3.6.1.4.1.323.5.3.52.1.2.82
Metric Used
  • ocpm_late_arrival_rejection_total
  • occnp_diam_request_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.72 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Table 8-89 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Field Details
Description Certificate expiry in less than 6 months.
Summary Certificate expiry in less than 6 months.
Severity Minor
Condition dgw_tls_cert_expiration_seconds - time() <= 15724800
OID 1.3.6.1.4.1.323.5.3.52.1.2.75
Metric Used dgw_tls_cert_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.73 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Table 8-90 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Field Details
Description Certificate expiry in less than 3 months.
Summary Certificate expiry in less than 3 months.
Severity Major
Condition dgw_tls_cert_expiration_seconds - time() <= 7862400
OID 1.3.6.1.4.1.323.5.3.52.1.2.75
Metric Used dgw_tls_cert_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.74 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Table 8-91 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Field Details
Description Certificate expiry in less than 1 month.
Summary Certificate expiry in less than 1 month.
Severity Critical
Condition dgw_tls_cert_expiration_seconds - time() <= 2592000
OID 1.3.6.1.4.1.323.5.3.52.1.2.75
Metric Used dgw_tls_cert_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.75 DGW_TLS_CONNECTION_FAILURE

Table 8-92 DGW_TLS_CONNECTION_FAILURE

Field Details
Description Alert for TLS connection establishment.
Summary TLS Connection failure when Diam gateway is an initiator.
Severity Major
Condition sum by (namespace,reason)(occnp_diam_failed_conn_network) > 0
OID 1.3.6.1.4.1.323.5.3.52.1.2.81
Metric Used occnp_diam_failed_conn_network
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.76 POLICY_CONNECTION_FAILURE

Table 8-93 BSF_CONNECTION_FAILURE

Field Details
Description Connection failure on Egress and Ingress Gateways for incoming and outgoing connections.
Summary
Severity Major
Condition This alert is raised when the TLS certificate is about to expire in three months.
OID 1.3.6.1.4.1.323.5.3.52.1.2.43
Metric Used occnp_oc_ingressgateway_connection_failure_total
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.77 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Table 8-94 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Field Details
Description TLS certificate to expire in 1 month.
Summary security_cert_x509_expiration_seconds - time() <= 2592000
Severity Critical
Condition This alert is raised when the TLS certificate is about to expire in one month.
OID 1.3.6.1.4.1.323.5.3.52.1.2.44
Metric Used security_cert_x509_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.78 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Table 8-95 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Field Details
Description TLS certificate to expire in 3 months.
Summary security_cert_x509_expiration_seconds - time() <= 7862400
Severity Major
Condition This alert is raised when the TLS certificate is about to expire in three months.
OID 1.3.6.1.4.1.323.5.3.52.1.2.44
Metric Used security_cert_x509_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.79 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Table 8-96 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Field Details
Description TLS certificate to expire in 6 months.
Summary security_cert_x509_expiration_seconds - time() <= 15724800
Severity Minor
Condition This alert is raised when the TLS certificate is about to expire in six months.
OID 1.3.6.1.4.1.323.5.3.52.1.2.44
Metric Used security_cert_x509_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.80 AUDIT_NOT_RUNNING

Table 8-97 AUDIT_NOT_RUNNING

Field Details
Description Audit has not been running for at least 1 hour.
Summary Audit has not been running for at least 1 hour.
Severity CRITICAL
Condition (absent_over_time(spring_data_repository_invocations_seconds_count{method="getQueuedTablesToAudit"}[1h]) == 1) OR (sum(increase(spring_data_repository_invocations_seconds_count{method="getQueuedTablesToAudit"}[1h])) == 0)
OID 1.3.6.1.4.1.323.5.3.52.1.2.78
Metric Used spring_data_repository_invocations_seconds_count
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.81 DIAMETER_POD_ERROR_RESPONSE_MINOR

Table 8-98 DIAMETER_POD_ERROR_RESPONSE_MINOR

Field Details
Description At least 1% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Summary At least 1% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Severity MINOR
Condition (topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=1
OID 1.3.6.1.4.1.323.5.3.52.1.2.79
Metric Used ocbsf_diam_response_network_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.82 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD

Table 8-99 DIAMETER_POD_ERROR_RESPONSE_MAJOR

Field Details
Description At least 5% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Summary At least 5% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Severity MAJOR
Condition (topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=5
OID 1.3.6.1.4.1.323.5.3.52.1.2.79
Metric Used ocbsf_diam_response_network_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.83 DIAMETER_POD_ERROR_RESPONSE_CRITICAL

Table 8-100 DIAMETER_POD_ERROR_RESPONSE_CRITICAL

Field Details
Description At least 10% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER
Summary At least 10% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER
Severity CRITICAL
Condition (topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=10
OID 1.3.6.1.4.1.323.5.3.52.1.2.79
Metric Used ocbsf_diam_response_network_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.1.84 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD

Table 8-101 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File lockAcquisitionExceedsCriticalThreshold
Description The lock requests fails to acquire the lock count exceeds the critical threshold limit. The (current value is: {{ $value }})
Summary Keys used in Bulwark lock request which are already in locked state detected above 75 Percent of Total Transactions.
Severity Critical
Expression (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=75
OID 1.3.6.1.4.1.323.5.3.52.1.2.69
Metric Used -
Recommended Actions -
8.3.1.85 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD

Table 8-102 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File lockAcquisitionExceedsMajorThreshold
Description The lock requests fails to acquire the lock count exceeds the major threshold limit. The (current value is: {{ $value }})
Summary Keys used in Bulwark lock request which are already in locked state detected above 50 Percent of Total Transactions.
Severity Major
Expression (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >= 50 < 75
OID 1.3.6.1.4.1.323.5.3.52.1.2.69
Metric Used -
Recommended Actions -
8.3.1.86 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD

Table 8-103 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File lockAcquisitionExceedsMinorThreshold
Description The lock requests fails to acquire the lock count exceeds the minor threshold limit. The (current value is: {{ $value }})
Summary Keys used in Bulwark lock request which are already in locked state detected above 20 Percent of Total Transactions.
Severity Minor
Expression (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=20 < 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.69
Metric Used -
Recommended Actions -
8.3.1.87 CERTIFICATE_EXPIRY_MINOR

Table 8-104 CERTIFICATE_EXPIRY_MINOR

Field Details
Description Certificate expiry in less than 6 months
Summary Certificate expiry in less than 6 months
Severity MINOR
Condition security_cert_x509_expiration_seconds - time() <= 15724800
OID 1.3.6.1.4.1.323.5.3.52.1.2.77
Metric Used -
Recommended Actions -
8.3.1.88 CERTIFICATE_EXPIRY_MAJOR

Table 8-105 CERTIFICATE_EXPIRY_MAJOR

Field Details
Description Certificate expiry in less than 3 months
Summary Certificate expiry in less than 3 months
Severity MAJOR
Condition security_cert_x509_expiration_seconds - time() <= 7862400
OID 1.3.6.1.4.1.323.5.3.52.1.2.77
Metric Used -
Recommended Actions -
8.3.1.89 CERTIFICATE_EXPIRY_CRITICAL

Table 8-106 CERTIFICATE_EXPIRY_CRITICAL

Field Details
Description Certificate expiry in less than 1 months
Summary Certificate expiry in less than 1 months
Severity CRITICAL
Condition security_cert_x509_expiration_seconds - time() <= 2592000
OID 1.3.6.1.4.1.323.5.3.52.1.2.77
Metric Used -
Recommended Actions -
8.3.1.90 PERF_INFO_ACTIVE_OVERLOADTHRESHOLD_DATA_PRESENT

Table 8-107 PERF_INFO_ACTIVE_OVERLOADTHRESHOLD_DATA_PRESENT

Field Details
Description -
Summary -
Severity MINOR
Condition active_overload_threshold_fetch_failed == 1
OID 1.3.6.1.4.1.323.5.3.52.1.2.53
Metric Used -
Recommended Actions -
8.3.1.91 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Table 8-108 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Field Details
Description More than 10% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Summary More than 10% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Severity MINOR
Condition (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 10
OID 1.3.6.1.4.1.323.5.3.52.1.2.85
Metric Used -
Recommended Actions -
8.3.1.92 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Table 8-109 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Field Details
Description More than 20% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Summary More than 20% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Severity MAJOR
Condition (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 20
OID 1.3.6.1.4.1.323.5.3.52.1.2.85
Metric Used -
Recommended Actions -
8.3.1.93 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Table 8-110 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Field Details
Description More than 30% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Summary More than 30% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Severity CRITICAL
Condition (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 30
OID 1.3.6.1.4.1.323.5.3.52.1.2.85
Metric Used -
Recommended Actions -
8.3.1.94 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Table 8-111 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Field Details
Description More than 10% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Summary More than 10% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Severity MINOR
Condition (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 10
OID 1.3.6.1.4.1.323.5.3.52.1.2.86
Metric Used -
Recommended Actions -
8.3.1.95 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Table 8-112 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Field Details
Description More than 20% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Summary More than 20% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Severity MAJOR
Condition (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 20
OID 1.3.6.1.4.1.323.5.3.52.1.2.86
Metric Used -
Recommended Actions -
8.3.1.96 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Table 8-113 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Field Details
Description More than 30% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Summary More than 30% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Severity CRITICAL
Condition (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 30
OID 1.3.6.1.4.1.323.5.3.52.1.2.86
Metric Used -
Recommended Actions -
8.3.1.97 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Table 8-114 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Field Details
Description This alarm is raised when OCNADD is not reachable.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} BSF Egress Gateway Data Director unreachable'
Severity Major
Condition This alarm is raised when data director is not reachable from Egress Gateway.
OID 1.3.6.1.4.1.323.5.3.37.1.2.48
Metric Used oc_egressgateway_dd_unreachable
Recommended Actions Alert gets cleared automatically when the connection with data director is established.
8.3.1.98 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Table 8-115 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Field Details
Description This alarm is raised when OCNADD is not reachable.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} BSF Ingress Gateway Data Director unreachable'
Severity Major
Condition This alarm is raised when data director is not reachable from Ingress Gateway.
OID 1.3.6.1.4.1.323.5.3.37.1.2.47
Metric Used oc_ingressgateway_dd_unreachable
Recommended Actions Alert gets cleared automatically when the connection with data director is established.
8.3.1.99 STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Table 8-116 STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Field Details
Description This alert is triggered when more than 30 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary -
Severity Critical
Expression -
OID -
Metric Used
  • ocpm_late_processing_rejection_total
  • occnp_diam_request_local_total
Recommended Actions -
8.3.1.100 STALE_HTTP_REQUEST_CLEANUP_MAJOR

Table 8-117 STALE_HTTP_REQUEST_CLEANUP_MAJOR

Field Details
Description This alert is triggered when more than 20 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary -
Severity Major
Expression -
OID -
Metric Used
  • ocpm_late_processing_rejection_total
  • occnp_diam_request_local_total
Recommended Actions -
8.3.1.101 STALE_HTTP_REQUEST_CLEANUP_MINOR

Table 8-118 STALE_HTTP_REQUEST_CLEANUP_MINOR

Field Details
Description This alert is triggered when more than 10 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary -
Severity Minor
Expression -
OID -
Metric Used
  • ocpm_late_processing_rejection_total
  • occnp_diam_request_local_total
Recommended Actions -
8.3.1.102 STALE_BINDING_REQUEST_REJECTION_CRITICAL

Table 8-119 STALE_BINDING_REQUEST_REJECTION_CRITICAL

Field Details
Description This alert is triggered when more than 30 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary '{{ $value }} % of requests are being discarded by binding svc due to request being stale either on arrival or during processing.'summary: "More than 30% of the Binding requests failed with error TIMED_OUT_REQUEST"
Severity Critical
Expression (sum by (namespace) (rate(occnp_late_processing_rejection_total {microservice=~".*binding"}[5m]))+sum by (namespace) rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m])))/(sum by (namespace) (rate(ocpm_binding_inbound_request_total{microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m]))) * 100 >= 30
OID 1.3.6.1.4.1.323.5.3.52.1.2.87
Metric Used
  • occnp_late_arrival_rejection_total
  • occnp_late_processing_rejection_total
  • ocpm_binding_inbound_request_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.103 STALE_BINDING_REQUEST_REJECTION_MAJOR

Table 8-120 STALE_BINDING_REQUEST_REJECTION_MAJOR

Field Details
Description This alert is triggered when more than 20 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary '{{ $value }} % of requests are being discarded by binding svc due to request being stale either on arrival or during processing.'summary: "More than 20% of the Binding requests failed with error TIMED_OUT_REQUEST"
Severity Major
Expression (sum by (namespace) (rate(occnp_late_processing_rejection_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m])))/(sum by (namespace) (rate(ocpm_binding_inbound_request_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m]))) * 100 >= 20
OID 1.3.6.1.4.1.323.5.3.52.1.2.87
Metric Used
  • occnp_late_arrival_rejection_total
  • occnp_late_processing_rejection_total
  • ocpm_binding_inbound_request_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.104 STALE_BINDING_REQUEST_REJECTION_MINOR

Table 8-121 STALE_BINDING_REQUEST_REJECTION_MINOR

Field Details
Description This alert is triggered when more than 10 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary '{{ $value }} % of requests are being discarded by binding service due to request being stale either on arrival or during processing.' summary: "More than 10% of the Binding requests failed with error TIMED_OUT_REQUEST"
Severity Minor
Expression (sum by (namespace) (rate(occnp_late_processing_rejection_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"} [5m])))/(sum by (namespace) (rate(ocpm_binding_inbound_request_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m]))) * 100 >= 10
OID 1.3.6.1.4.1.323.5.3.52.1.2.87
Metric Used
  • occnp_late_arrival_rejection_total
  • occnp_late_processing_rejection_total
  • ocpm_binding_inbound_request_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.1.105 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_CRITICAL

Table 8-122 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_CRITICAL

Field Details
Description The Diameter requests are being discarded due to timeout processing occurring above 30% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}}
Summary The Diameter requests are being discarded due to timeout processing occurring above 30% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}}
Severity Critical
Expression (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total{microservice=diam-connector}[5m]))) / (sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER", microservice=diam-connector}[5m]))) * 100 >= 30
OID 1.3.6.1.4.1.323.5.3.52.1.2.88
Metric Used
  • occnp_diam_request_local_total
  • occnp_stale_diam_request_cleanup_total
Recommended Actions

The alert gets cleared when the number of stale requests is below 30% of the total requests. To troubleshoot and resolve the issue, perform the following steps:

  1. Identify the root cause of the timeout processing by reviewing the logs for the pod {{$labels.pod}} and service {{$labels.microservice}} in {{$labels.namespace}}.
  2. Verify the performance and resource utilization (CPU, memory) of the pod and make sure it has sufficient resources to process the requests in a timely manner.
  3. Review the configuration settings of the Diameter connector and check timeout settings if necessary.
  4. Ensure that the backend services that the Diameter connector communicates with are healthy and responsive.

For further assistance, contact My Oracle Support.

8.3.1.106 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MAJOR

Table 8-123 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MAJOR

Field Details
Description The Diameter requests are being discarded due to timeout processing occurring above 20% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}}
Summary The Diameter requests are being discarded due to timeout processing occurring above 20% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}}
Severity Major
Expression (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total{microservice=diam-connector}[5m]))) / (sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER", microservice=diam-connector}[5m]))) * 100 >= 20
OID 1.3.6.1.4.1.323.5.3.52.1.2.88
Metric Used
  • occnp_diam_request_local_total
  • occnp_stale_diam_request_cleanup_total
Recommended Actions

The alert gets cleared when the number of stale requests is below 20% of the total requests. To troubleshoot and resolve the issue, perform the following steps:

  1. Identify the root cause of the timeout processing by reviewing the logs for the pod {{$labels.pod}} and service {{$labels.microservice}} in {{$labels.namespace}}.
  2. Verify the performance and resource utilization (CPU, memory) of the pod and make sure it has sufficient resources to process the requests in a timely manner.
  3. Review the configuration settings of the Diameter connector and check timeout settings if necessary.
  4. Ensure that the backend services that the Diameter connector communicates with are healthy and responsive.

For further assistance, contact My Oracle Support.

8.3.1.107 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MINOR

Table 8-124 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MINOR

Field Details
Description The Diameter requests are being discarded due to timeout processing occurring above 10% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}}
Summary The Diameter requests are being discarded due to timeout processing occurring above 10% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}}
Severity Minor
Expression (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total{microservice=diam-connector}[5m]))) / (sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER", microservice=diam-connector}[5m]))) * 100 >= 10
OID 1.3.6.1.4.1.323.5.3.52.1.2.88
Metric Used
  • occnp_diam_request_local_total
  • occnp_stale_diam_request_cleanup_total
Recommended Actions

The alert gets cleared when the number of stale requests is below 10% of the total requests. To troubleshoot and resolve the issue, perform the following steps:

  1. Identify the root cause of the timeout processing by reviewing the logs for the pod {{$labels.pod}} and service {{$labels.microservice}} in {{$labels.namespace}}.
  2. Verify the performance and resource utilization (CPU, memory) of the pod and make sure it has sufficient resources to process the requests in a timely manner.
  3. Review the configuration settings of the Diameter connector and check timeout settings if necessary.
  4. Ensure that the backend services that the Diameter connector communicates with are healthy and responsive.

For further assistance, contact My Oracle Support.

8.3.1.108 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Table 8-125 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Field Details
Description At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Summary At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Severity Minor
Expression -
OID -
Metric Used
  • occnp_late_arrival_rejection_total
  • occnp_late_processing_rejection_total
  • ocpm_userservice_inbound_count_total
Recommended Actions -
8.3.1.109 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Table 8-126 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Field Details
Description At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Summary At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Severity Major
Expression -
OID -
Metric Used
  • occnp_late_arrival_rejection_total
  • occnp_late_processing_rejection_total
  • ocpm_userservice_inbound_count_total
Recommended Actions -
8.3.1.110 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Table 8-127 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Field Details
Description At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Summary At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Severity Critical
Expression -
OID -
Metric Used
  • occnp_late_arrival_rejection_total
  • occnp_late_processing_rejection_total
  • ocpm_userservice_inbound_count_total
Recommended Actions -
8.3.1.111 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Table 8-128 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Field Details
Description At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Summary At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Severity Minor
Expression -
OID -
Metric Used
  • occnp_late_arrival_rejection_total
  • occnp_late_processing_rejection_total
  • ocpm_userservice_inbound_count_total
Recommended Actions -
8.3.1.112 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Table 8-129 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Field Details
Description At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Summary At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Severity Major
Expression -
OID -
Metric Used
  • occnp_late_arrival_rejection_total
  • occnp_late_processing_rejection_total
  • ocpm_userservice_inbound_count_total
Recommended Actions -
8.3.1.113 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Table 8-130 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Field Details
Description At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Summary At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them).
Severity Critical
Expression -
OID -
Metric Used
  • occnp_late_arrival_rejection_total
  • occnp_late_processing_rejection_total
  • ocpm_userservice_inbound_count_total
Recommended Actions -
8.3.1.114 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_CRITICAL_THRESHOLD

Table 8-131 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal to or above 70% of the total revalidation responses.
Summary Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal to or above 70% of the total revalidation responses.
Severity Critical
Condition

(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code="2xx",action="restored"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx"}[5m]))) * 100 >= 70

OID 1.3.6.1.4.1.323.5.3.52.1.2.89
Metric Used occnp_session_binding_revalidation_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.115 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MAJOR_THRESHOLD

Table 8-132 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal or above 50% but less than 70% of total revalidation responses.
Summary Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal or above 50% but less than 70% of total revalidation responses.
Severity Major
Condition

(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code="2xx",action="restored"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx"}[5m]))) * 100 >= 50 < 70

OID 1.3.6.1.4.1.323.5.3.52.1.2.89
Metric Used occnp_session_binding_revalidation_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.116 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MINOR_THRESHOLD

Table 8-133 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MINOR_THRESHOLD

Field Details
Description Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal or above 30% but less than 50% of total revalidation responses.
Summary Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal or above 30% but less than 50% of total revalidation responses.
Severity Minor
Condition

(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx",action="restored"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx"}[5m]))) * 100 >= 30 < 50

OID 1.3.6.1.4.1.323.5.3.52.1.2.89
Metric Used occnp_session_binding_revalidation_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.117 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_CRITICAL_THRESHOLD

Table 8-134 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal or above 70% of total revalidation responses.
Summary Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal or above 70% of total revalidation responses.
Severity Critical
Condition

(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code!~"2.*"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding"}[5m]))) * 100 >= 70

OID 1.3.6.1.4.1.323.5.3.52.1.2.90
Metric Used occnp_session_binding_revalidation_response_total
Recommended Actions

Verify the health condition of BSF Management Service.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.118 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MAJOR_THRESHOLD

Table 8-135 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal to or above 50% but less than 70% of total revalidation responses.
Summary Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal to or above 50% but less than 70% of total revalidation responses.
Severity Major
Condition

(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code!~"2.*"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding"}[5m]))) * 100 >= 50 < 70

OID 1.3.6.1.4.1.323.5.3.52.1.2.90
Metric Used occnp_session_binding_revalidation_response_total
Recommended Actions

Verify the health condition of BSF Management Service.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.119 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MINOR_THRESHOLD

Table 8-136 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MINOR_THRESHOLD

Field Details
Description Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal to or above 30% but less than 50% of total revalidation responses.
Summary Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal to or above 30% but less than 50% of total revalidation responses.
Severity Minor
Condition

(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code!~"2.*"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding"}[5m]))) * 100 >= 30 < 50

OID 1.3.6.1.4.1.323.5.3.52.1.2.90
Metric Used occnp_session_binding_revalidation_response_total
Recommended Actions

Verify the health condition of BSF Management Service.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.120 UPDATE_NOTIFY_TIMEOUT_ABOVE_70_PERCENT

Table 8-137 UPDATE_NOTIFY_TIMEOUT_ABOVE_70_PERCENT

Field Details
Description Number of Update Notify failed because a timeout is equal to or above 70% in a given time period.
Summary Number of Update Notify failed because a timeout is equal to or above 70% in a given time period.
Severity Critical
Condition (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_for_rx_collision_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70
OID -
Metric Used -
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.121 UPDATE_NOTIFY_TIMEOUT_ABOVE_50_PERCENT

Table 8-138 UPDATE_NOTIFY_TIMEOUT_ABOVE_50_PERCENT

Field Details
Description Number of Update Notify that failed because a timeout is equal to or above 50% but less than 70% in a given time period.
Summary Number of Update Notify that failed because a timeout is equal to or above 50% but less than 70% in a given time period.
Severity Major
Condition (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_for_rx_collision_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70
OID -
Metric Used -
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.122 UPDATE_NOTIFY_TIMEOUT_ABOVE_30_PERCENT

Table 8-139 UPDATE_NOTIFY_TIMEOUT_ABOVE_30_PERCENT

Field Details
Description Number of Update Notify that failed because a timeout is equal to or above 30% but less than 50% of total Rx sessions.
Summary Number of Update Notify that failed because a timeout is equal to or above 30% but less than 50% of total Rx sessions.
Severity Minor
Condition (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_for_rx_collision_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 30 < 50
OID -
Metric Used -
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.2 PCF Alerts

This section provides information on PCF alerts.

8.3.2.1 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MINOR_THRESHOLD

Table 8-140 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MINOR_THRESHOLD
Description More than 70% of timer capacity has been occupied for n1n2 transfer failure notification
Summary More than 70% of timer capacity has been occupied for n1n2 transfer failure notification
Severity Minor
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2TransferFailure"})/360000) * 100 > 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.107
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to N1N2 transfer failure notification reaches 70% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with the flow triggering N1N2 transfer failure notification and possibly enable re-transmission.

8.3.2.2 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MAJOR_THRESHOLD

Table 8-141 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MAJOR_THRESHOLD
Description More than 80% of timer capacity has been occupied for n1n2 transfer failure notification
Summary More than 80% of timer capacity has been occupied for n1n2 transfer failure notification
Severity Major
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2TransferFailure"})/360000) * 100 > 80
OID 1.3.6.1.4.1.323.5.3.52.1.2.107
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to N1N2 transfer failure notification reaches 80% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with the flow triggering N1N2 transfer failure notification and possibly enable re-transmission.

8.3.2.3 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_CRITICAL_THRESHOLD

Table 8-142 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_CRITICAL_THRESHOLD
Description More than 90% of timer capacity has been occupied for n1n2 transfer failure notification
Summary More than 90% of timer capacity has been occupied for n1n2 transfer failure notification
Severity Critical
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2TransferFailure"})/360000) * 100 > 90
OID 1.3.6.1.4.1.323.5.3.52.1.2.107
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to N1N2 transfer failure notification reaches 90% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with the flow triggering N1N2 transfer failure notification and possibly enable re-transmission.

8.3.2.4 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MINOR_THRESHOLD

Table 8-143 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MINOR_THRESHOLD
Description More than 70% of timers capacity has been occupied for amf discovery.
Summary More than 70% of timers capacity has been occupied for amf discovery.
Severity Minor
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_AMFDiscovery"})/360000) * 100 > 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.95
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to AMF discovery reaches 70% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with NRF discovery and possibly enable direct or indirect alternate routing from NRF client.

8.3.2.5 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MAJOR_THRESHOLD

Table 8-144 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MAJOR_THRESHOLD
Description More than 80% of timer capacity has been occupied for amf discovery.
Summary More than 80% of timer capacity has been occupied for amf discovery.
Severity Major
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_AMFDiscovery"})/360000) * 100 > 80
OID 1.3.6.1.4.1.323.5.3.52.1.2.95
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to AMF discovery reaches 80% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with NRF discovery and possibly enable direct or indirect alternate routing from NRF client.

8.3.2.6 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_CRITICAL_THRESHOLD

Table 8-145 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_CRITICAL_THRESHOLD
Description More than 90% of timer capacity has been occupied for amf discovery.
Summary More than 90% of timer capacity has been occupied for amf discovery.
Severity Critical
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_AMFDiscovery"})/360000) * 100 > 90
OID 1.3.6.1.4.1.323.5.3.52.1.2.95
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to AMF discovery reaches 90% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with NRF discovery and possibly enable direct or indirect alternate routing from NRF client.

8.3.2.7 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MINOR_THRESHOLD

Table 8-146 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MINOR_THRESHOLD
Description More than 70% of timer capacity has been occupied for n1n2 subscribe.
Summary More than 70% of timer capacity has been occupied for n1n2 subscribe.
Severity Minor
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageSubscribe"})/360000) * 100 > 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.96
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to N1N2 subscribe reaches 70% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with the flow triggering N1N2 subscription or on the AMF side and possibly enable the direct/indirect alternate routing.

8.3.2.8 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MAJOR_THRESHOLD

Table 8-147 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MAJOR_THRESHOLD
Description More than 80% of timer capacity has been occupied for n1n2 subscribe.
Summary More than 80% of timer capacity has been occupied for n1n2 subscribe.
Severity Major
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageSubscribe"})/360000) * 100 > 80
OID 1.3.6.1.4.1.323.5.3.52.1.2.96
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to N1N2 subscribe reaches 80% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with the flow triggering N1N2 subscription or on the AMF side and possibly enable the direct/indirect alternate routing.

8.3.2.9 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_CRITICAL_THRESHOLD

Table 8-148 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_CRITICAL_THRESHOLD
Description More than 90% of timer capacity has been occupied for n1n2 subscribe.
Summary More than 90% of timer capacity has been occupied for n1n2 subscribe.
Severity Critical
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageSubscribe"})/360000) * 100 > 90
OID 1.3.6.1.4.1.323.5.3.52.1.2.96
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to N1N2 subscribe reaches 90% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with the flow triggering N1N2 subscription or on the AMF side and possibly enable the direct/indirect alternate routing.

8.3.2.10 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MINOR_THRESHOLD

Table 8-149 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MINOR_THRESHOLD
Description More than 70% of timer capacity has been occupied for n1n2 transfer.
Summary More than 70% of timer capacity has been occupied for n1n2 transfer.
Severity Minor
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageTransfer"})/360000) * 100 > 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.97
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to N1N2 transfer subscribe reaches 70% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with the flow triggering N1N2 transfer and possibly enable direct/indirect alternate routing.

8.3.2.11 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MAJOR_THRESHOLD

Table 8-150 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MAJOR_THRESHOLD
Description More than 80% of timer capacity has been occupied for n1n2 transfer.
Summary More than 80% of timer capacity has been occupied for n1n2 transfer.
Severity Major
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageTransfer"})/360000) * 100 > 80
OID 1.3.6.1.4.1.323.5.3.52.1.2.97
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to N1N2 transfer subscribe reaches 80% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with the flow triggering N1N2 transfer and possibly enable direct/indirect alternate routing.

8.3.2.12 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_CRITICAL_THRESHOLD

Table 8-151 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_CRITICAL_THRESHOLD
Description More than 90% of timer capacity has been occupied for n1n2 transfer.
Summary More than 90% of timer capacity has been occupied for n1n2 transfer.
Severity Critical
Condition (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageTransfer"})/360000) * 100 > 90
OID 1.3.6.1.4.1.323.5.3.52.1.2.97
Metric Used occnp_timer_capacity
Recommended Actions

The occnp_timer_capacity metric is pegged during each timer scan, providing the current timers count. These timers were created when UE was not able to deliver the URSP rules and reattempt with back off. In this scenario an alert is triggered when the timers capacity corresponding to N1N2 transfer subscribe reaches 90% of the maximum rate limit of 360K. In this case the operator can troubleshoot and identify the reasons for failures with the flow triggering N1N2 transfer and possibly enable direct/indirect alternate routing.

8.3.2.13 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD

Table 8-152 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Description More than 25% of n1n2 subscribe reattempt failed.
Summary More than 25% of n1n2 subscribe reattempt failed.
Severity Minor
Condition (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",operationType="subscribe",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",operationType="subscribe"}[5m]))) * 100 > 25
OID 1.3.6.1.4.1.323.5.3.52.1.2.99
Metric Used http_out_conn_response_total, http_out_conn_request_total
Recommended Actions The http_out_conn_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. In this case the alert is notifying when there is a certain amount of reattempt failure for ue n1n2 subscribe.If there is an increase of failure, operator can revise the reason why the flow triggering n1n2 subscription is failing or if the AMF that request are going to is unhealty.
8.3.2.14 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD

Table 8-153 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Description More than 50% of n1n2 subscribe reattempt failed.
Summary More than 50% of n1n2 subscribe reattempt failed.
Severity Major
Condition (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",operationType="subscribe",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",operationType="subscribe"}[5m]))) * 100 > 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.99
Metric Used http_out_conn_response_total, http_out_conn_request_total
Recommended Actions The http_out_conn_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. In this case the alert is notifying when there is a certain amount of reattempt failure for ue n1n2 subscribe.If there is an increase of failure, operator can revise the reason why the flow triggering n1n2 subscription is failing or if the AMF that request are going to is unhealty.
8.3.2.15 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD

Table 8-154 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Description More than 75% of n1n2 subscribe reattempt failed.
Summary More than 75% of n1n2 subscribe reattempt failed.
Severity Critical
Condition (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",operationType="subscribe",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",operationType="subscribe"}[5m]))) * 100 > 75
OID 1.3.6.1.4.1.323.5.3.52.1.2.99
Metric Used http_out_conn_response_total, http_out_conn_request_total
Recommended Actions The http_out_conn_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. In this case the alert is notifying when there is a certain amount of reattempt failure for ue n1n2 subscribe.If there is an increase of failure, operator can revise the reason why the flow triggering n1n2 subscription is failing or if the AMF that request are going to is unhealty.
8.3.2.16 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD

Table 8-155 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Description More than 25% of n1n2 transfer reattempt failed.
Summary More than 25% of n1n2 transfer reattempt failed.
Severity Minor
Condition (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer"}[5m]))) * 100 > 25
OID 1.3.6.1.4.1.323.5.3.52.1.2.100
Metric Used http_out_conn_response_total, http_out_conn_request_total
Recommended Actions The http_out_conn_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. In this case the alert is notifying when there is a certain amount of reattempt failure for ue n1n2 transfer.If there is an increase of failure, operator can revise the reason why the flow triggering n1n2 message transfer is failing or if the AMF that request are going to is unhealty.
8.3.2.17 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD

Table 8-156 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Description More than 50% of n1n2 transfer reattempt failed.
Summary More than 50% of n1n2 transfer reattempt failed.
Severity Major
Condition (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer"}[5m]))) * 100 > 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.100
Metric Used http_out_conn_response_total, http_out_conn_request_total
Recommended Actions The http_out_conn_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. In this case the alert is notifying when there is a certain amount of reattempt failure for ue n1n2 transfer.If there is an increase of failure, operator can revise the reason why the flow triggering n1n2 message transfer is failing or if the AMF that request are going to is unhealty.
8.3.2.18 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD

Table 8-157 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Description More than 75% of n1n2 transfer reattempt failed.
Summary More than 75% of n1n2 transfer reattempt failed.
Severity Critical
Condition (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer"}[5m]))) * 100 > 75
OID 1.3.6.1.4.1.323.5.3.52.1.2.100
Metric Used http_out_conn_response_total, http_out_conn_request_total
Recommended Actions The http_out_conn_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. In this case the alert is notifying when there is a certain amount of reattempt failure for ue n1n2 transfer.If there is an increase of failure, operator can revise the reason why the flow triggering n1n2 message transfer is failing or if the AMF that request are going to is unhealty.
8.3.2.19 SM_STALE_REQUEST_PROCESSING_REJECT_MINOR

Table 8-158 SM_STALE_REQUEST_PROCESSING_REJECT_MINOR

Field Details
Name in Alert Yaml File SM_STALE_REQUEST_PROCESSING_REJECT_MINOR
Description

More than 10% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale

Summary

More than 10% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale

Severity Minor
Condition

(sum by (namespace,pod) (rate(occnp_late_processing_rejection_total{microservice=~"occnp_pcf_sm"}[5m])))/(sum by (namespace,pod) (rate(ocpm_ingress_request_total{microservice=~"occnp_pcf_sm"}[5m]))) * 100 >= 10 < 20

OID 1.3.6.1.4.1.323.5.3.52.1.2.101
Metric Used occnp_late_processing_rejection_total, ocpm_ingress_request_total
Recommended Actions The metric occnp_late_processing_rejection_total is pegged when Late Processing finds a stale session.
8.3.2.20 SM_STALE_REQUEST_PROCESSING_REJECT_MAJOR

Table 8-159 SM_STALE_REQUEST_PROCESSING_REJECT_MAJOR

Field Details
Name in Alert Yaml File SM_STALE_REQUEST_PROCESSING_REJECT_MAJOR
Description

More than 20% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale

Summary

More than 20% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale

Severity Major
Condition

(sum by (namespace,pod) (rate(occnp_late_processing_rejection_total{microservice=~"occnp_pcf_sm"}[5m])))/(sum by (namespace,pod) (rate(ocpm_ingress_request_total{microservice=~"occnp_pcf_sm"}[5m]))) * 100 >= 20 < 30

OID 1.3.6.1.4.1.323.5.3.52.1.2.101
Metric Used occnp_late_processing_rejection_total, ocpm_ingress_request_total
Recommended Actions The metric occnp_late_processing_rejection_total is pegged when Late Processing finds a stale session.
8.3.2.21 SM_STALE_REQUEST_PROCESSING_REJECT_CRITICAL

Table 8-160 SM_STALE_REQUEST_PROCESSING_REJECT_CRITICAL

Field Details
Name in Alert Yaml File SM_STALE_REQUEST_PROCESSING_REJECT_CRITICAL
Description

More than 30% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale

Summary

More than 30% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale

Severity Critical
Condition

(sum by (namespace,pod) (rate(occnp_late_processing_rejection_total{microservice=~"occnp_pcf_sm"}[5m])))/(sum by (namespace,pod) (rate(ocpm_ingress_request_total{microservice=~"occnp_pcf_sm"}[5m]))) * 100 >= 30

OID 1.3.6.1.4.1.323.5.3.52.1.2.101
Metric Used occnp_late_processing_rejection_total, ocpm_ingress_request_total
Recommended Actions The metric occnp_late_processing_rejection_total is pegged when Late Processing finds a stale session.
8.3.2.22 UE_STALE_REQUEST_PROCESSING_REJECT_MAJOR

Table 8-161 UE_STALE_REQUEST_PROCESSING_REJECT_MAJOR

Field Details
Description This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service.
Summary This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service.
Severity Major
Condition (sum by (namespace) (rate(occnp_late_processing_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 20
OID 1.3.6.1.4.1.323.5.3.52.1.2.104
Metric Used occnp_late_processing_rejection_total
Recommended Actions Metric occnp_late_processing_rejection_total is pegged when requests being processed become stale.
8.3.2.23 UE_STALE_REQUEST_PROCESSING_REJECT_CRITICAL

Table 8-162 UE_STALE_REQUEST_PROCESSING_REJECT_CRITICAL

Field Details
Description This alert is triggered when more than 30% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service.
Summary This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service.
Severity Critical
Condition (sum by (namespace) (rate(occnp_late_processing_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 30
OID 1.3.6.1.4.1.323.5.3.52.1.2.104
Metric Used occnp_late_processing_rejection_total
Recommended Actions Metric occnp_late_processing_rejection_total is pegged when requests being processed become stale.
8.3.2.24 UE_STALE_REQUEST_PROCESSING_REJECT_MINOR

Table 8-163 UE_STALE_REQUEST_PROCESSING_REJECT_MINOR

Field Details
Description This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service.
Summary This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service.
Severity Minor
Condition (sum by (namespace) (rate(occnp_late_processing_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 10
OID 1.3.6.1.4.1.323.5.3.52.1.2.104
Metric Used occnp_late_processing_rejection_total
Recommended Actions Metric occnp_late_processing_rejection_total is pegged when requests being processed become stale.
8.3.2.25 UE_STALE_REQUEST_ARRIVAL_REJECT_MINOR

Table 8-164 UE_STALE_REQUEST_ARRIVAL_REJECT_MINOR

Field Details
Description This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service.
Summary This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service.
Severity Minor
Condition (sum by (namespace) (rate(ocpm_late_arrival_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace)(rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 10
OID

1.3.6.1.4.1.323.5.3.52.1.2.109

Metric Used ocpm_late_arrival_rejection_total
Recommended Actions Metric ocpm_late_arrival_rejection_total is pegged when a received requests is stale.
8.3.2.26 UE_STALE_REQUEST_ARRIVAL_REJECT_MAJOR

Table 8-165 UE_STALE_REQUEST_ARRIVAL_REJECT_MAJOR

Field Details
Description This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service.
Summary This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service.
Severity Major
Condition (sum by (namespace) (rate(ocpm_late_arrival_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace)(rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 20
OID

1.3.6.1.4.1.323.5.3.52.1.2.109

Metric Used ocpm_late_arrival_rejection_total
Recommended Actions Metric ocpm_late_arrival_rejection_total is pegged when a received requests is stale.
8.3.2.27 UE_STALE_REQUEST_ARRIVAL_REJECT_CRITICAL

Table 8-166 UE_STALE_REQUEST_ARRIVAL_REJECT_CRITICAL

Field Details
Description This alert is triggered when more than 30% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service.
Summary This alert is triggered when more than 30% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service.
Severity Critical
Condition (sum by (namespace) (rate(ocpm_late_arrival_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace)(rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 30
OID

1.3.6.1.4.1.323.5.3.52.1.2.109

Metric Used ocpm_late_arrival_rejection_total
Recommended Actions Metric ocpm_late_arrival_rejection_total is pegged when a received requests is stale.
8.3.2.28 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD

Table 8-167 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Description More than 75% of N1N2 transfer failure notification reattempts failed.
Summary More than 75% of N1N2 transfer failure notification reattempts failed.
Severity Critical
Condition (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer"}[5m]))) * 100 > 75
OID 1.3.6.1.4.1.323.5.3.52.1.2.106
Metric Used http_out_conn_response_total, http_out_conn_request_total
Recommended Actions
The http_out_conn_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. Then in this case the alert notifies when there is a certain amount of reattempt failure for UE N1N2 transfer failure notification. If there is an increase of failure, operator can investigate on:
  • Why the flow triggering N1N2 transfer failure notification is failing, or
  • Check the health of the AMF to which the request are going to
8.3.2.29 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD

Table 8-168 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Description More than 50% of N1N2 transfer failure notification reattempts failed.
Summary More than 50% of N1N2 transfer failure notification reattempts failed.
Severity Major
Condition (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer"}[5m]))) * 100 > 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.106
Metric Used http_out_conn_response_total, http_out_conn_request_total
Recommended Actions
The http_out_conn_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. Then in this case the alert notifies when there is a certain amount of reattempt failure for UE N1N2 transfer failure notification. If there is an increase of failure, operator can investigate on:
  • Why the flow triggering N1N2 transfer failure notification is failing, or
  • Check the health of the AMF to which the request are going to
8.3.2.30 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD

Table 8-169 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Description More than 25% of N1N2 transfer failure notification reattempts failed.
Summary More than 25% of N1N2 transfer failure notification reattempts failed.
Severity Minor
Condition (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer"}[5m]))) * 100 > 25
OID 1.3.6.1.4.1.323.5.3.52.1.2.106
Metric Used http_out_conn_response_total, http_out_conn_request_total
Recommended Actions
The http_out_conn_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. Then in this case the alert notifies when there is a certain amount of reattempt failure for UE N1N2 transfer failure notification. If there is an increase of failure, operator can investigate on:
  • Why the flow triggering N1N2 transfer failure notification is failing, or
  • Check the health of the AMF to which the request are going to
8.3.2.31 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD

Table 8-170 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Description More than 75% of amf discovery reattempts failed.
Summary More than 75% of amf discovery reattempts failed.
Severity Critical
Condition (sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_response_total{operationType="timer_expiry_notification",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_request_total{operationType="timer_expiry_notification"}[5m]))) * 100 > 75
OID 1.3.6.1.4.1.323.5.3.52.1.2.105
Metric Used occnp_ue_nf_discovery_reattempt_response_total
Recommended Actions The occnp_ue_nf_discovery_reattempt_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. Then in this case, the alert notifies when there is a certain number of reattempt failure while discovering AMF. If there is an increase of failure, operator can investigate on:
  • Why the AMF discovery flow is failing, or
  • Check the health of the AMF to which the request are going to.
8.3.2.32 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD

Table 8-171 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Description More than 50% of amf discovery reattempts failed.
Summary More than 50% of amf discovery reattempts failed.
Severity Major
Condition (sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_response_total{operationType="timer_expiry_notification",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_request_total{operationType="timer_expiry_notification"}[5m]))) * 100 > 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.105
Metric Used occnp_ue_nf_discovery_reattempt_response_total
Recommended Actions The occnp_ue_nf_discovery_reattempt_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. Then in this case, the alert notifies when there is a certain number of reattempt failure while discovering AMF. If there is an increase of failure, operator can investigate on:
  • Why the AMF discovery flow is failing, or
  • Check the health of the AMF to which the request are going to.
8.3.2.33 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD

Table 8-172 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Description More than 25% of amf discovery reattempts failed.
Summary More than 25% of amf discovery reattempts failed.
Severity Minor
Condition (sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_response_total{operationType="timer_expiry_notification",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_request_total{operationType="timer_expiry_notification"}[5m]))) * 100 > 25
OID 1.3.6.1.4.1.323.5.3.52.1.2.105
Metric Used occnp_ue_nf_discovery_reattempt_response_total
Recommended Actions The occnp_ue_nf_discovery_reattempt_response_total metric is pegged when PCF-UE receives a response from a message that is going out of the NF. Then, in this case the alert notifies when there is a certain number of reattempt failure while discovering AMF. If there is an increase of failure, operator can investigate on:
  • Why the AMF discovery flow is failing, or
  • Check the health of the AMF to which the request are going to.
8.3.2.34 INGRESS_ERROR_RATE_ABOVE_10_PERCENT_PER_POD

Table 8-173 INGRESS_ERROR_RATE_ABOVE_10_PERCENT_PER_POD

Field Details
Name in Alert Yaml File IngressErrorRateAbove10PercentPerPod
Description Ingress Error Rate above 10 Percent in {{$labels.kubernetes_name}} in {{$labels.kubernetes_namespace}}
Summary Transaction Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity Critical
Condition The total number of failed transactions per pod is above 10 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.36.1.2.2
Metric Used ocpm_ingress_response_total
Recommended Actions The alert gets cleared when the number of failed transactions are below 10% of the total transactions.
To assess the reason for failed transactions, perform the following steps:
  1. Check the service specific metrics to understand the service specific errors.
  2. The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH.

For any additional guidance, contact My Oracle Support.

8.3.2.35 SM_TRAFFIC_RATE_ABOVE_THRESHOLD

Table 8-174 SM_TRAFFIC_RATE_ABOVE_THRESHOLD

Field Details
Name in Alert Yaml File SMTrafficRateAboveThreshold
Description SM service Ingress traffic Rate is above threshold of Max MPS (current value is: {{ $value }})
Summary Traffic Rate is above 90 Percent of Max requests per second
Severity Major
Condition The total SM service Ingress traffic rate has crossed the configured threshold of 900 TPS.

Default value of this alert trigger point in PCF_Alertrules.yaml file is when SM service Ingress Rate crosses 90% of maximum ingress requests per second.

OID 1.3.6.1.4.1.323.5.3.36.1.2.3
Metric Used ocpm_ingress_request_total{servicename_3gpp="npcf-smpolicycontrol"}
Recommended Actions The alert gets cleared when the Ingress traffic rate falls below the threshold.

Note: Threshold levels can be configured using the PCF_Alertrules.yaml file.

It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic:
  1. Refer Ingress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes.
  2. Check Ingress Gateway logs on Kibana to determine the reason for the errors.

For any additional guidance, contact My Oracle Support.

8.3.2.36 SM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Table 8-175 SM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Field Details
Name in Alert Yaml File SMIngressErrorRateAbove10Percent
Description Transaction Error Rate detected above 10 Percent of Total on SM service (current value is: {{ $value }})
Summary Transaction Error Rate detected above 10 Percent of Total Transactions
Severity Critical
Condition The number of failed transactions is above 10 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.36.1.2.4
Metric Used ocpm_ingress_response_total
Recommended Actions The alert gets cleared when the number of failed transactions are below 10% of the total transactions.
To assess the reason for failed transactions, perform the following steps:
  1. Check the service specific metrics to understand the service specific errors. For instance: ocpm_ingress_response_total{servicename_3gpp="npcf-smpolicycontrol",response_code!~"2.*"}
  2. The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH.

For any additional guidance, contact My Oracle Support.

8.3.2.37 SM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Table 8-176 SM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Field Details
Name in Alert Yaml File SMEgressErrorRateAbove1Percent
Description Egress Transaction Error Rate detected above 1 Percent of Total Transactions (current value is: {{ $value }})
Summary Transaction Error Rate detected above 1 Percent of Total Transactions
Severity Minor
Condition The number of failed transactions is above 1 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.36.1.2.5
Metric Used system_operational_state == 1
Recommended Actions The alert gets cleared when the number of failed transactions are below 1% of the total transactions.
To assess the reason for failed transactions, perform the following steps:
  1. Check the service specific metrics to understand the service specific errors. For instance: ocpm_egress_response_total{servicename_3gpp="npcf-smpolicycontrol",response_code!~"2.*"}
  2. The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH.

For any additional guidance, contact My Oracle Support.

8.3.2.38 PCF_CHF_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Table 8-177 PCF_CHF_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Field Details
Name in Alert Yaml File PcfChfIngressTrafficRateAboveThreshold
Description User service Ingress traffic Rate from CHF is above threshold of Max MPS (current value is: {{ $value }})
Summary Traffic Rate is above 90 Percent of Max requests per second
Severity Major
Condition The total User Service Ingress traffic rate from CHF has crossed the configured threshold of 900 TPS.

Default value of this alert trigger point in PCF_Alertrules.yaml file is when user service Ingress Rate from CHF crosses 90% of maximum ingress requests per second.

OID 1.3.6.1.4.1.323.5.3.36.1.2.11
Metric Used ocpm_userservice_inbound_count_total{service_resource="chf-service"}
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.39 PCF_CHF_EGRESS_ERROR_RATE_ABOVE_10_PERCENT

Table 8-178 PCF_CHF_EGRESS_ERROR_RATE_ABOVE_10_PERCENT

Field Details
Name in Alert Yaml File PcfChfEgressErrorRateAbove10Percent
Description The number of failed transactions from CHF is more than 10 percent of the total transactions.
Summary Transaction Error Rate detected above 10 Percent of Total Transactions
Severity Critical
Condition

(sum(rate(ocpm_chf_tracking_response_total {servicename_3gpp="nchf-spendinglimitcontrol",response_code!~"2.*"} [24h]) or (up * 0 ) ) / sum(rate(ocpm_chf_tracking_response_total {servicename_3gpp="nchf-spendinglimitcontrol"} [24h]))) 100 >= 10

OID 1.3.6.1.4.1.323.5.3.36.1.2.12
Metric Used ocpm_chf_tracking_response_total
Recommended Actions The alert gets cleared when the number of failure transactions falls below the configured threshold.

Note: Threshold levels can be configured using the PCF_Alertrules.yaml file.

It is recommended to assess the reason for failed transactions. Perform the following steps to analyze the cause of increased traffic:
  1. Refer Egress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes.
  2. Check Egress Gateway logs on Kibana to determine the reason for the errors.

For any additional guidance, contact My Oracle Support.

8.3.2.40 PCF_CHF_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD

Table 8-179 PCF_CHF_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD

Field Details
Description Ingress Timeout Error Rate detected above 10 Percent of Total towards CHF service (current value is: {{ $value }})
Summary Timeout Error Rate detected above 10 Percent of Total Transactions
Severity Major
Condition The number of failed transactions due to timeout is above 10 percent of the total transactions for CHF service.
OID 1.3.6.1.4.1.323.5.3.36.1.2.17
Metric Used ocpm_chf_tracking_request_timeout_total{servicename_3gpp="nchf-spendinglimitcontrol"}
Recommended Actions The alert gets cleared when the number of failed transactions due to timeout are below 10% of the total transactions.
To assess the reason for failed transactions, perform the following steps:
  1. Check the service specific metrics to understand the service specific errors. For instance: ocpm_chf_tracking_request_timeout_total{servicename_3gpp="nchf-spendinglimitcontrol"}
  2. The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH.

For any additional guidance, contact My Oracle Support.

8.3.2.41 PCF_PENDING_BINDING_SITE_TAKEOVER

Table 8-180 PCF_PENDING_BINDING_SITE_TAKEOVER

Field Details
Description The site takeover configuration has been activated
Summary The site takeover configuration has been activated
Severity CRITICAL
Condition sum by (application, container, namespace) (changes(occnp_pending_binding_site_takeover_total[2m])) > 0
OID 1.3.6.1.4.1.323.5.3.52.1.2.45
Metric Used occnp_pending_binding_site_takeover_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.42 PCF_PENDING_BINDING_THRESHOLD_LIMIT_REACHED

Table 8-181 PCF_PENDING_BINDING_THRESHOLD_LIMIT_REACHED

Field Details
Description The Pending Operation table threshold has been reached.
Summary The Pending Operation table threshold has been reached.
Severity CRITICAL
Condition sum by (application, container, namespace) (changes(occnp_threshold_limit_reached_total[2m])) > 0
OID 1.3.6.1.4.1.323.5.3.52.1.2.46
Metric Used occnp_threshold_limit_reached_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.43 PCF_PENDING_BINDING_RECORDS_COUNT

Table 8-182 PCF_PENDING_BINDING_RECORDS_COUNT

Field Details
Description An attempt to internally recreate a PCF binding has been triggered by PCF
Summary An attempt to internally recreate a PCF binding has been triggered by PCF
Severity MINOR
Condition sum by (application, container, namespace) (changes(occnp_pending_operation_records_count[10s])) > 0
OID 1.3.6.1.4.1.323.5.3.52.1.2.47
Metric Used occnp_pending_operation_records_count
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.44 AUTONOMOUS_SUBSCRIPTION_FAILURE

Table 8-183 AUTONOMOUS_SUBSCRIPTION_FAILURE

Field Details
Description Autonomous subscription failed for a configured Slice Load Level
Summary Autonomous subscription failed for a configured Slice Load Level
Severity Critical
Condition The number of failed Autonomous Subscription for a configured Slice Load Leve in nwdaf-agent is greater than zero.
OID 1.3.6.1.4.1.323.5.3.52.1.2.49
Metric Used subscription_failure{requestType="autonomous"}
Recommended Actions The alert gets cleared when the failed Autonomous Subscription is corrected.
To clear the alert, perform the following steps:
  1. Delete the Slice Load Level configuration.
  2. Re-provision the Slice Load Level configuration.

For any additional guidance, contact My Oracle Support.

8.3.2.45 AM_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT

Table 8-184 AM_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT

Field Details
Description AM Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }})
Summary AM Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }})
Severity MINOR
Condition (sum(rate(http_out_conn_response_total{pod=~".*amservice.*",responseCode!~"2.*",servicename3gpp="npcf-am-policy-control"}[1d])) / sum(rate(http_out_conn_response_total{pod=~".*amservice.*",servicename3gpp="npcf-am-policy-control"}[1d]))) * 100 >= 1
OID 1.3.6.1.4.1.323.5.3.52.1.2.54
Metric Used http_out_conn_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.46 AM_AR_ERROR_RATE_ABOVE_1_PERCENT

Table 8-185 AM_AR_ERROR_RATE_ABOVE_1_PERCENT

Field Details
Description Alternate Routing Error Rate detected above 1 Percent of Total on AM Service (current value is: {{ $value }})
Summary Alternate Routing Error Rate detected above 1 Percent of Total on AM Service (current value is: {{ $value }})
Severity MINOR
Condition (sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*amservice.*",responseCode!~"2.*",servicename3gpp="npcf-am-policy-control"}[1d])) / sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*amservice.*",servicename3gpp="npcf-am-policy-control"}[1d]))) * 100 >= 1
OID 1.3.6.1.4.1.323.5.3.52.1.2.55
Metric Used ocpm_ar_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.47 UE_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT

Table 8-186 UE_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT

Field Details
Description UE Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }})
Summary UE Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }})
Severity MINOR
Condition (sum(rate(http_out_conn_response_total{pod=~".*ueservice.*",responseCode!~"2.*",servicename3gpp="npcf-ue-policy-control"}[1d])) / sum(rate(http_out_conn_response_total{pod=~".*ueservice.*",servicename3gpp="npcf-ue-policy-control"}[1d]))) * 100 >= 1
OID 1.3.6.1.4.1.323.5.3.52.1.2.56
Metric Used http_out_conn_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.48 UE_AR_ERROR_RATE_ABOVE_1_PERCENT

Table 8-187 UE_AR_ERROR_RATE_ABOVE_1_PERCENT

Field Details
Description Alternate Routing Error Rate detected above 1 Percent of Total on UE Service (current value is: {{ $value }})
Summary Alternate Routing Error Rate detected above 1 Percent of Total on UE Service (current value is: {{ $value }})
Severity MINOR
Condition (sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*ueservice.*",responseCode!~"2.*",servicename3gpp="npcf-ue-policy-control"}[1d])) / sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*ueservice.*",servicename3gpp="npcf-ue-policy-control"}[1d]))) * 100 >= 1
OID 1.3.6.1.4.1.323.5.3.52.1.2.57
Metric Used ocpm_ar_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.49 SMSC_CONNECTION_DOWN

Table 8-188 SMSC_CONNECTION_DOWN

Field Details
Description Connection to SMSC peer {{$labels.smscName}} is down in notifier service pod {{$labels.pod}}
Summary Connection to SMSC peer {{$labels.smscName}} is down in notifier service pod {{$labels.pod}}
Severity MAJOR
Condition sum by(namespace, pod, smscName)(occnp_active_smsc_conn_count) == 0
OID
1.3.6.1.4.1.323.5.3.52.1.2.63
Metric Used occnp_active_smsc_conn_count
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.50 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD

Table 8-189 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD

Field Details
Name in Alert Yaml File lockAcquisitionExceedsMinorThreshold
Description The lock requests fails to acquire the lock count exceeds the minor threshold limit. The (current value is: {{ $value }})
Summary Keys used in Bulwark lock request which are already in locked state detected above 20 Percent of Total Transactions.
Severity Minor
Expression (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=20 < 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.69
Metric Used -
Recommended Actions -
8.3.2.51 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD

Table 8-190 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD

Field Details
Name in Alert Yaml File lockAcquisitionExceedsMajorThreshold
Description The lock requests fails to acquire the lock count exceeds the major threshold limit. The (current value is: {{ $value }})
Summary Keys used in Bulwark lock request which are already in locked state detected above 50 Percent of Total Transactions.
Severity Major
Expression (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >= 50 < 75
OID 1.3.6.1.4.1.323.5.3.52.1.2.69
Metric Used -
Recommended Actions -
8.3.2.52 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD

Table 8-191 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Name in Alert Yaml File lockAcquisitionExceedsCriticalThreshold
Description The lock requests fails to acquire the lock count exceeds the critical threshold limit. The (current value is: {{ $value }})
Summary Keys used in Bulwark lock request which are already in locked state detected above 75 Percent of Total Transactions.
Severity Critical
Expression (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=75
OID 1.3.6.1.4.1.323.5.3.52.1.2.69
Metric Used -
Recommended Actions -
8.3.2.53 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_MINOR_THRESHOLD

Table 8-192 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_MINOR_THRESHOLD

Field Details
Description Fail to register the coherence callback subscription for already locked keys and the count exceeds the minor threshold limit.
Summary Coherence callback registrations failures detected above 20 percent of total transactions.
Severity Minor
Expression (sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration",opStatus="failure"}[5m])) /sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration"}[5m]))) * 100 >=20 < 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.70
Metric Used -
Recommended Actions -
8.3.2.54 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_MAJOR_THRESHOLD

Table 8-193 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description Fail to register the coherence callback subscription for already locked keys and the count exceeds the major threshold limit. The (current value is: {{ $value }})
Summary Coherence callback registrations failures detected above 50 percent of total transactions.
Severity Major
Expression (sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration",opStatus="failure"}[5m])) /sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration"}[5m]))) * 100 >=50 < 75
OID 1.3.6.1.4.1.323.5.3.52.1.2.70
Metric Used -
Recommended Actions -
8.3.2.55 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_CRITICAL_THRESHOLD

Table 8-194 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description Fail to register the coherence callback subscription for already locked keys and the count exceeds the critical threshold limit. The (current value is: {{ $value }})
Summary Coherence callback registrations failures detected above 75 percent of total transactions.
Severity Critical
Expression (sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration",opStatus="failure"}[5m])) /sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration"}[5m]))) * 100 >=75
OID 1.3.6.1.4.1.323.5.3.52.1.2.70
Metric Used -
Recommended Actions -
8.3.2.56 SM_UPDATE_NOTIFY_FAILED_ABOVE_50_PERCENT

Table 8-195 SM_UPDATE_NOTIFY_FAILED_ABOVE_50_PERCENT

Field Details
Description Update Notify Terminate sent to SMF failed >= 50 < 60
Summary Update Notify Terminate sent to SMF failed >= 50 < 60
Severity MINOR
Condition (sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol",responseCode!~"2.*"})*100)/ sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol"}) >= 50 < 60
OID 1.3.6.1.4.1.323.5.3.52.1.2.80
Metric Used occnp_http_out_conn_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.57 SM_UPDATE_NOTIFY_FAILED_ABOVE_60_PERCENT

Table 8-196 SM_UPDATE_NOTIFY_FAILED_ABOVE_60_PERCENT

Field Details
Description Update Notify Terminate sent to SMF failed >= 60 < 70
Summary Update Notify Terminate sent to SMF failed >= 60 < 70
Severity MAJOR
Condition (sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol",responseCode!~"2.*"})*100)/ sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol"}) >= 60 < 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.80
Metric Used occnp_http_out_conn_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.58 SM_UPDATE_NOTIFY_FAILED_ABOVE_70_PERCENT

Table 8-197 SM_UPDATE_NOTIFY_FAILED_ABOVE_70_PERCENT

Field Details
Description Update Notify Terminate sent to SMF failed >= 70
Summary Update Notify Terminate sent to SMF failed >= 70
Severity CRITICAL
Condition (sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol",responseCode!~"2.*"})*100)/ sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol"}) >= 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.80
Metric Used occnp_http_out_conn_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.59 UPDATE_NOTIFY_FAILURE_ABOVE_30_PERCENT

Table 8-198 UPDATE_NOTIFY_FAILURE_ABOVE_30_PERCENT

Field Details
Description Number of Update notify that failed is equal or above 30% but less than 50% of total Rx sessions.
Summary Number of Update notify that failed is equal or above 30% but less than 50% of total Rx sessions.
Severity MINOR
Condition (sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.94
Metric Used occnp_http_out_conn_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.60 UPDATE_NOTIFY_FAILURE_ABOVE_50_PERCENT

Table 8-199 UPDATE_NOTIFY_FAILURE_ABOVE_50_PERCENT

Field Details
Description Number of Update notify that failed is equal or above 50% but less than 70% in a given time period
Summary Number of Update notify that failed is equal or above 50% but less than 70% in a given time period
Severity MAJOR
Condition (sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.94
Metric Used occnp_http_out_conn_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.61 UPDATE_NOTIFY_FAILURE_ABOVE_70_PERCENT

Table 8-200 UPDATE_NOTIFY_FAILURE_ABOVE_70_PERCENT

Field Details
Description Number of Update notify failed is equal or above 70% in a given time period
Summary Number of Update notify failed is equal or above 70% in a given time period
Severity Critical
Condition (sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.94
Metric Used occnp_http_out_conn_response_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.62 POD_PROTECTION_BY_RATELIMIT_REJECTED_REQUEST

Table 8-201 POD_PROTECTION_BY_RATELIMIT_REJECTED_REQUEST

Field Details
Description Ingress Gateway traffic gets rejected more than 1% because of ratelimiting.
Summary Ingress Gateway traffic gets rejected more than 1% because of ratelimiting.
Severity Major
Condition (sum by (namespace,pod) (rate(oc_ingressgateway_http_request_ratelimit_values_total {Allowed="false",app_kubernetes_io_name="occnp-ingress-gateway"}[2m])))/ (sum by (namespace,pod) (rate(oc_ingressgateway_http_request_ratelimit_values_total {app_kubernetes_io_name="occnp-ingress-gateway"}[2m]))) * 100 >= 1
OID 1.3.6.1.4.1.323.5.3.52.1.2.103
Metric Used oc_ingressgateway_http_request_ratelimit_values_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.63 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MINOR_THRESHOLD

Table 8-202 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MINOR_THRESHOLD

Field Details
Description UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 20 Percent of Total n1n2 notify Request.
Summary UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 20 Percent of Total n1n2 notify Request.
Severity Minor
Condition sum by (namespace) (rate(ue_n1_transfer_ue_notification_total{commandType="MANAGE_UE_POLICY_COMMAND_REJECT"}[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 20
OID 1.3.6.1.4.1.323.5.3.52.1.2.91
Metric Used ue_n1_transfer_ue_notification_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.64 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MAJOR_THRESHOLD

Table 8-203 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MAJOR_THRESHOLD

Field Details
Description UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 50 Percent of Total n1n2 notify Request.
Summary UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 50 Percent of Total n1n2 notify Request.
Severity Major
Condition sum by (namespace) (rate(ue_n1_transfer_ue_notification_total{commandType="MANAGE_UE_POLICY_COMMAND_REJECT"}[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.91
Metric Used ue_n1_transfer_ue_notification_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.65 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_CRITICAL_THRESHOLD

Table 8-204 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_CRITICAL_THRESHOLD

Field Details
Description UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 75 Percent of Total n1n2 notify Request.
Summary UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 75 Percent of Total n1n2 notify Request.
Severity CRITICAL
Condition sum by (namespace) (rate(ue_n1_transfer_ue_notification_total{commandType="MANAGE_UE_POLICY_COMMAND_REJECT"}[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 75
OID 1.3.6.1.4.1.323.5.3.52.1.2.91
Metric Used ue_n1_transfer_ue_notification_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.66 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MINOR_THRESHOLD

Table 8-205 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MINOR_THRESHOLD

Field Details
Description

Over 20% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF.

Summary

Above 20 percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF.

Severity Minor
Condition sum by (namespace) (rate(ue_n1_transfer_failure_notification_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 20
OID 1.3.6.1.4.1.323.5.3.52.1.2.92
Metric Used ue_n1_transfer_failure_notification_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.67 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MAJOR_THRESHOLD

Table 8-206 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MAJOR_THRESHOLD

Field Details
Description

Over 50% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF.

Summary

Over 50% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF.

Severity Major
Condition sum by (namespace) (rate(ue_n1_transfer_failure_notification_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.92
Metric Used ue_n1_transfer_failure_notification_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.68 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_CRITICAL_THRESHOLD

Table 8-207 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_CRITICAL_THRESHOLD

Field Details
Description

Over 75% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF.

Summary

Over 75% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF.

Severity Critical
Condition sum by (namespace) (rate(ue_n1_transfer_failure_notification_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 75
OID 1.3.6.1.4.1.323.5.3.52.1.2.92
Metric Used ue_n1_transfer_failure_notification_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.69 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MINOR_THRESHOLD

Table 8-208 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MINOR_THRESHOLD

Field Details
Description

Over 20% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer.

Summary

Over 20% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer.

Severity Minor
Condition sum by (namespace) (rate(ue_n1_transfer_t3501_expiry_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 20
OID 1.3.6.1.4.1.323.5.3.52.1.2.93
Metric Used ue_n1_transfer_t3501_expiry_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.70 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MAJOR_THRESHOLD

Table 8-209 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MAJOR_THRESHOLD

Field Details
Description

Over 50% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer.

Summary

Over 50% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer.

Severity Major
Condition sum by (namespace) (rate(ue_n1_transfer_t3501_expiry_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.93
Metric Used ue_n1_transfer_t3501_expiry_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.71 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_CRITICAL_THRESHOLD

Table 8-210 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_CRITICAL_THRESHOLD

Field Details
Description

Over 75% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer.

Summary

Over 75% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer.

Severity Critical
Condition sum by (namespace) (rate(ue_n1_transfer_t3501_expiry_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 75
OID 1.3.6.1.4.1.323.5.3.52.1.2.93
Metric Used ue_n1_transfer_t3501_expiry_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.72 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_CRITICAL_THRESHOLD

Table 8-211 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_CRITICAL_THRESHOLD

Field Details
Description This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period.
Summary This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period.
Severity Critical
Condition (sum by (namespace) (rate(ocpm_handle_update_notify_error_response_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.111
Metric Used ocpm_handle_update_notify_error_response_as_pending_confirmation_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.73 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MAJOR_THRESHOLD

Table 8-212 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MAJOR_THRESHOLD

Field Details
Description This alert is triggered when the number of update notify failed because a timeout is equal or above 50% in a given time period.
Summary This alert is triggered when the number of update notify failed because a timeout is equal or above 50% in a given time period.
Severity Major
Condition (sum by (namespace) (rate(ocpm_handle_update_notify_error_response_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.111
Metric Used ocpm_handle_update_notify_error_response_as_pending_confirmation_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.74 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MINOR_THRESHOLD

Table 8-213 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MINOR_THRESHOLD

Field Details
Description This alert is triggered when the number of update notify failed because a timeout is equal or above 30% but less than 50% of total Rx sessions.
Summary This alert is triggered when the number of update notify failed because a timeout is equal or above 30% but less than 50% of total Rx sessions.
Severity Minor
Condition (sum by (namespace) (rate(ocpm_handle_update_notify_error_response_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm", responseCode=~"5xx/4xx"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 30 < 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.111
Metric Used ocpm_handle_update_notify_error_response_as_pending_confirmation_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.75 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_CRITICAL_THRESHOLD

Table 8-214 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_CRITICAL_THRESHOLD

Field Details
Description This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period.
Summary This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period.
Severity Critical
Condition (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.112
Metric Used ocpm_handle_update_notify_timeout_as_pending_confirmation_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.76 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MAJOR_THRESHOLD

Table 8-215 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MAJOR_THRESHOLD

Field Details
Description This alert is triggered when the number of update notify that failed because a timeout is equal or above 50% but less than 70% in a given time period.
Summary This alert is triggered when the number of update notify that failed because a timeout is equal or above 50% but less than 70% in a given time period.
Severity Major
Condition (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70
OID 1.3.6.1.4.1.323.5.3.52.1.2.112
Metric Used ocpm_handle_update_notify_timeout_as_pending_confirmation_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.77 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MINOR_THRESHOLD

Table 8-216 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MINOR_THRESHOLD

Field Details
Description This alert is triggered when the number of update notify that failed because a timeout is equal or above 30% but less than 50% of total Rx sessions.
Summary This alert is triggered when the number of update notify that failed because a timeout is equal or above 30% but less than 50% of total Rx sessions.
Severity Minor
Condition (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 30 < 50
OID 1.3.6.1.4.1.323.5.3.52.1.2.112
Metric Used ocpm_handle_update_notify_timeout_as_pending_confirmation_total
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.2.78 PCF_STATE_NON_FUNCTIONAL_CRITICAL

Table 8-217 PCF_STATE_NON_FUNCTIONAL_CRITICAL

Field Details
Description Policy is in non functional state due to DB cluster state down.
Summary Policy is in non functional state due to DB cluster state down.
Severity Critical
Condition appinfo_nfDbFunctionalState_current{nfDbFunctionalState="Not_Running"} == 1
OID 1.3.6.1.4.1.323.5.3.52.1.2.102
Metric Used appinfo_nfDbFunctionalState_current
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.3 PCRF Alerts

This section provides information about PCRF alerts.

8.3.3.1 PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD

PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD

Table 8-218 PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description PRE fail count exceeds the critical threshold limit.
Summary Alert PRE unreachable NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition PRE fail count exceeds the critical threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.9
Metric Used http_out_conn_response_total{container="pcrf-core", responseCode!~"2.*", serviceResource="PRE"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.2 PRE_UNREACHABLE_EXCEEDS_MAJOR_THRESHOLD

PRE_UNREACHABLE_EXCEEDS_MAJOR_THRESHOLD

Table 8-219 PRE_UNREACHABLE_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description PRE fail count exceeds the major threshold limit.
Summary Alert PRE unreachable NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition PRE fail count exceeds the major threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.9
Metric Used http_out_conn_response_total{container="pcrf-core", responseCode!~"2.*", serviceResource="PRE"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.3 PRE_UNREACHABLE_EXCEEDS_MINOR_THRESHOLD

PRE_UNREACHABLE_EXCEEDS_MINOR_THRESHOLD

Table 8-220 PRE_UNREACHABLE_EXCEEDS_MINOR_THRESHOLD

Field Details
Description PRE fail count exceeds the minor threshold limit.
Summary Alert PRE unreachable NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity minor
Condition PRE fail count exceeds the minor threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.9
Metric Used http_out_conn_response_total{container="pcrf-core", responseCode!~"2.*", serviceResource="PRE"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.4 PCRF_DOWN

Table 8-221 PCRF_DOWN

Field Details
Description PCRF Service is down
Summary Alert PCRF_DOWN NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition None of the pods of the PCRF service are available.
OID 1.3.6.1.4.1.323.5.3.44.1.2.33
Metric Used appinfo_service_running{service=~".*pcrf-core"}
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.3.5 CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-222 CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description CCA fail count exceeds the critical threshold limit
Summary Alert CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The failure rate of CCA messages has exceeded the configured threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.13
Metric Used occnp_diam_response_local_total{msgType=~"CCA.*", responseCode!~"2.*"}
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.3.6 CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-223 CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description CCA fail count exceeds the major threshold limit
Summary Alert CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The failure rate of CCA messages has exceeded the configured major threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.13
Metric Used occnp_diam_response_local_total{msgType=~"CCA.*", responseCode!~"2.*"}
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.3.7 CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-224 CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description CCA fail count exceeds the minor threshold limit
Summary Alert CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The failure rate of CCA messages has exceeded the configured minor threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.13
Metric Used occnp_diam_response_local_total{msgType=~"CCA.*", responseCode!~"2.*"}
Recommended Actions

For any additional guidance, contact My Oracle Support.

8.3.3.8 AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-225 AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description AAA fail count exceeds the critical threshold limit
Summary Alert AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The failure rate of AAA messages has exceeded the critical threshold limit.
OID 1.3.6.1.4.1.323.5.3.36.1.2.34
Metric Used occnp_diam_response_local_total{msgType=~"AAA.*", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.9 AAA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

AAA Fail Count Exceeds Major Threshold

Table 8-226 AAA Fail Count Exceeds Major Threshold

Field Details
Description AAA fail count exceeds the major threshold limit
Summary Alert AAA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The failure rate of AAA messages has exceeded the major threshold limit.
OID 1.3.6.1.4.1.323.5.3.36.1.2.34
Metric Used occnp_diam_response_local_total{msgType=~"AAA.*", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.10 AAA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

AAA Fail Count Exceeds Minor Threshold

Table 8-227 AAA Fail Count Exceeds Minor Threshold

Field Details
Description AAA fail count exceeds the minor threshold limit
Summary Alert AAA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The failure rate of AAA messages has exceeded the minor threshold limit.
OID 1.3.6.1.4.1.323.5.3.36.1.2.34
Metric Used occnp_diam_response_local_total{msgType=~"AAA.*", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.11 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-228 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description RAA Rx fail count exceeds the critical threshold limit
Summary Alert RAA_Rx_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The failure rate of RAA Rx messages has exceeded the configured threshold limit.
OID 1.3.6.1.4.1.323.5.3.36.1.2.35
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.12 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

(Required) <Enter a short description here.>

RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-229 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description RAA Rx fail count exceeds the major threshold limit
Summary Alert RAA_Rx_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The failure rate of RAA Rx messages has exceeded the configured major threshold limit.
OID 1.3.6.1.4.1.323.5.3.36.1.2.35
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.13 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

(Required) <Enter a short description here.>

RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-230 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description RAA Rx fail count exceeds the minor threshold limit
Summary Alert RAA_Rx_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The failure rate of RAA Rx messages has exceeded the configured minor threshold limit.
OID 1.3.6.1.4.1.323.5.3.36.1.2.35
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.14 RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-231 RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description RAA Gx fail count exceeds the critical threshold limit
Summary Alert RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The failure rate of RAA Gx messages has exceeded the configured threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.18
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.15 RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

(Required) <Enter a short description here.>

RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-232 RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description RAA Gx fail count exceeds the major threshold limit
Summary Alert RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The failure rate of RAA Gx messages has exceeded the configured major threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.18
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.16 RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

(Required) <Enter a short description here.>

RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-233 RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description RAA Gx fail count exceeds the minor threshold limit
Summary Alert RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The failure rate of RAA Gx messages has exceeded the configured minor threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.18
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.17 ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-234 ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description ASA fail count exceeds the critical threshold limit
Summary Alert ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The failure rate of ASA messages has exceeded the configured threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.17
Metric Used occnp_diam_response_local_total{msgType=~"ASA.*", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.18 ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

(Required) <Enter a short description here.>

ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-235 ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description ASA fail count exceeds the major threshold limit
Summary Alert ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The failure rate of ASA messages has exceeded the configured major threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.17
Metric Used occnp_diam_response_local_total{msgType=~"ASA.*", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.19 ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

(Required) <Enter a short description here.>

ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-236 ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description ASA fail count exceeds the minor threshold limit
Summary Alert ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The failure rate of ASA messages has exceeded the configured minor threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.17
Metric Used occnp_diam_response_local_total{msgType=~"ASA.*", responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.20 STA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

STA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-237 STA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description STA fail count exceeds the critical threshold limit.
Summary sum(rate(occnp_diam_response_local_total{msgType="STA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA"}[5m])) * 100 > 90
Severity Critical
Condition The failure rate of STA messages has exceeded the configured critical threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.19
Metric Used occnp_diam_response_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.21 STA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

STA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-238 STA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description STA fail count exceeds the major threshold limit.
Summary sum(rate(occnp_diam_response_local_total{msgType="STA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA"}[5m])) * 100 > 80
Severity Major
Condition The failure rate of STA messages has exceeded the configured major threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.19
Metric Used occnp_diam_response_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.22 STA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

STA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-239 STA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description STA fail count exceeds the minor threshold limit.
Summary sum(rate(occnp_diam_response_local_total{msgType="STA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA"}[5m])) * 100 > 60
Severity Minor
Condition The failure rate of STA messages has exceeded the configured minor threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.19
Metric Used occnp_diam_response_local_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.23 ASATimeoutlCountExceedsThreshold

ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-240 ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description ASA timeout count exceeds the critical threshold limit
Summary Alert ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The timeout rate of ASA messages has exceeded the configured threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.31
Metric Used occnp_diam_response_local_total{msgType="ASA", responseCode="timeout"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.24 ASA_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

(Required) <Enter a short description here.>

ASA_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-241 ASA_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description ASA timeout count exceeds the major threshold limit
Summary Alert ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The timeout rate of ASA messages has exceeded the configured major threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.31
Metric Used occnp_diam_response_local_total{msgType="ASA", responseCode="timeout"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.25 ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

(Required) <Enter a short description here.>

ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-242 ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description ASA timeout count exceeds the minor threshold limit
Summary Alert ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The timeout rate of ASA messages has exceeded the configured minor threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.31
Metric Used occnp_diam_response_local_total{msgType="ASA", responseCode="timeout"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.26 RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-243 RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description RAA Gx timeout count exceeds the critical threshold limit
Summary Alert RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The timeout rate of RAA Gx messages has exceeded the configured threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.32
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"timeout"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.27 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-244 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description RAA Gx timeout count exceeds the major threshold limit
Summary Alert RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The timeout rate of RAA Gx messages has exceeded the configured major threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.32
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"timeout"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.28 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-245 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description RAA Gx timeout count exceeds the minor threshold limit
Summary Alert RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The timeout rate of RAA Gx messages has exceeded the configured minor threshold limit.
OID 1.3.6.1.4.1.323.5.3.44.1.2.32
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"timeout"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.29 RAA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

RAA Rx Timeout Count Exceeds Critical Threshold

Table 8-246 RAA Rx Timeout Count Exceeds Critical Threshold

Field Details
Description RAA Rx timeout count exceeds the critical threshold limit
Summary Alert RAA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The timeout rate of RAA Rx messages has exceeded the configured threshold limit.
OID 1.3.6.1.4.1.323.5.3.36.1.2.36
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"timeout"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.30 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-247 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description RAA Rx timeout count exceeds the major threshold limit
Summary Alert RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The timeout rate of RAA Rx messages has exceeded the configured major threshold limit.
OID 1.3.6.1.4.1.323.5.3.36.1.2.36
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"timeout"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.31 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-248 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description RAA Rx timeout count exceeds the minor threshold limit
Summary Alert RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The timeout rate of RAA Rx messages has exceeded the configured minor threshold limit.
OID 1.3.6.1.4.1.323.5.3.36.1.2.36
Metric Used occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"timeout"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.32 RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Table 8-249 RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Field Details
Description CCA, AAA, RAA, ASA and STA error rate combined is above 10 percent
Summary Alert RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The combined failure rate of CCA, AAA, RAA, ASA, and STA messages is more than 10% of the total responses.
OID 1.3.6.1.4.1.323.5.3.36.1.2.37
Metric Used occnp_diam_response_local_total{ responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.33 RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT

RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT

Table 8-250 RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT

Field Details
Description CCA, AAA, RAA, ASA and STA error rate combined is above 5 percent
Summary Alert RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The combined failure rate of CCA, AAA, RAA, ASA, and STA messages is more than 5% of the total responses.
OID 1.3.6.1.4.1.323.5.3.36.1.2.37
Metric Used occnp_diam_response_local_total{ responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.34 RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT

RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT

Table 8-251 RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT

Field Details
Description CCA, AAA, RAA, ASA and STA error rate combined is above 1 percent
Summary Alert RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The combined failure rate of CCA, AAA, RAA, ASA, and STA messages is more than 1% of the total responses.
OID 1.3.6.1.4.1.323.5.3.36.1.2.37
Metric Used occnp_diam_response_local_total{ responseCode!~"2.*"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.35 Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Table 8-252 Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Field Details
Description Rx error rate combined is above 10 percent
Summary Alert Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The failure rate of Rx responses is more than 10% of the total responses.
OID 1.3.6.1.4.1.323.5.3.36.1.2.38
Metric Used occnp_diam_response_local_total{ responseCode!~"2.*", appType="Rx"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.36 Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT

Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT

Table 8-253 Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT

Field Details
Description Rx error rate combined is above 5 percent
Summary Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The failure rate of Rx responses is more than 5% of the total responses.
OID 1.3.6.1.4.1.323.5.3.36.1.2.38
Metric Used occnp_diam_response_local_total{ responseCode!~"2.*", appType="Rx"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.37 Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT

Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT

Table 8-254 Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT

Field Details
Description Rx error rate combined is above 1 percent
Summary Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The failure rate of Rx responses is more than 1% of the total responses.
OID 1.3.6.1.4.1.323.5.3.36.1.2.38
Metric Used occnp_diam_response_local_total{ responseCode!~"2.*", appType="Rx"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.38 Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Table 8-255 Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Field Details
Description Gx error rate combined is above 10 percent
Summary Alert Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Critical
Condition The failure rate of Gx responses is more than 10% of the total responses.
OID 1.3.6.1.4.1.323.5.3.36.1.2.39
Metric Used occnp_diam_response_local_total{ responseCode!~"2.*", appType="Gx"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.39 Gx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT

Gx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT

Table 8-256 Gx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT

Field Details
Description Gx error rate combined is above 5 percent
Summary Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Major
Condition The failure rate of Gx responses is more than 5% of the total responses.
OID 1.3.6.1.4.1.323.5.3.36.1.2.39
Metric Used occnp_diam_response_local_total{ responseCode!~"2.*", appType="Gx"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.40 Gx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT

(Required) <Enter a short description here.>

Gx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT

Table 8-257 Gx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT

Field Details
Description Gx error rate combined is above 1 percent
Summary Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity Minor
Condition The failure rate of Gx responses is more than 1% of the total responses.
OID 1.3.6.1.4.1.323.5.3.36.1.2.39
Metric Used occnp_diam_response_local_total{ responseCode!~"2.*", appType="Gx"}
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.41 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL

STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL

Table 8-258 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL

Field Details
Description The Diameter requests are being discarded due to timeout processing occurring above 30%
Summary The Diameter requests are being discarded due to timeout processing occurring above 30%
Severity Critical
Condition (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 30
OID 1.3.6.1.4.1.323.5.3.52.1.2.82
Metric Used occnp_stale_diam_request_cleanup_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.42 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR

STALE_DIAMETER_REQUEST_CLEANUP_MAJOR

Table 8-259 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR

Field Details
Description The Diameter requests are being discarded due to timeout processing occurring above 20%
Summary The Diameter requests are being discarded due to timeout processing occurring above 20%
Severity Major
Condition (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 20
OID 1.3.6.1.4.1.323.5.3.52.1.2.82
Metric Used occnp_stale_diam_request_cleanup_total
Recommended Actions For any additional guidance, contact My Oracle Support.
8.3.3.43 STALE_DIAMETER_REQUEST_CLEANUP_MINOR

STALE_DIAMETER_REQUEST_CLEANUP_MINOR

Table 8-260 STALE_DIAMETER_REQUEST_CLEANUP_MINOR

Field Details
Description The Diameter requests are being discarded due to timeout processing occurring above 10%
Summary The Diameter requests are being discarded due to timeout processing occurring above 10%
Severity Minor
Condition (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 10
OID 1.3.6.1.4.1.323.5.3.52.1.2.82
Metric Used occnp_stale_diam_request_cleanup_total
Recommended Actions For any additional guidance, contact My Oracle Support.