Alerts

8 Alerts

This section provides information on Policy alerts and their configuration.

Note:

The performance and capacity of the system can vary based on the call model, configuration, including but not limited to the deployed policies and corresponding data, for example, policy tables.

8.1 Configuring Alerts

This section describes how to configure alerts in Policy. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

Note:

Sample alert files are packaged with Policy Custom Templates. The Policy Custom Templates.zip file can be downloaded from MOS. Unzip the folder to access the following files:
- Common_Alertrules_cne1.9+.yaml
- PCF_Alertrules_cne1.9+.yaml
- PCRF_Alertrules_cne1.9+.yaml
Name in the metadata section should be unique while applying more than one unique files. For example:
```
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: null
  labels:
    role: cnc-alerting-rules
  name: occnp-pcf-alerting-rules
```
If required, edit the threshold values of various alerts in the alert files before configuring the alerts.
The Alert Manager and Prometheus tools should run in CNE namespace, for example, occne-infra.

Use the following table to select the appropriate files on the basis of deployment mode and CNE version

Table 8-1 Alert Configuration

Deployment Mode	CNE 1.9+
Converged Mode	Common_Alertrules_cne1.9+.yaml PCF_Alertrules_cne1.9+.yaml PCRF_Alertrules_cne1.9+.yaml
PCF only	Common_Alertrules_cne1.9+.yaml PCF_Alertrules_cne1.9+.yaml
PCRF only	Common_Alertrules_cne1.9+.yaml PCRF_Alertrules_cne1.9+.yaml

Deployment Mode

CNE 1.9+

Converged Mode

Common_Alertrules_cne1.9+.yaml

PCF_Alertrules_cne1.9+.yaml

PCRF_Alertrules_cne1.9+.yaml

PCF only

Common_Alertrules_cne1.9+.yaml

PCF_Alertrules_cne1.9+.yaml

PCRF only

Common_Alertrules_cne1.9+.yaml

PCRF_Alertrules_cne1.9+.yaml

Configuring Alerts in Prometheus for CNE 1.9.0 and later versions

To configure PCF alerts in Prometheus for CNE 1.9.0, perform the following steps:

Copy the the required file to the Bastion Host.
To create or replace the PrometheusRule CRD, run the following command:
```
$ kubectl apply -f Common_Alertrules_cne1.9+.yaml -n <namespace>
```
```
$ kubectl apply -f PCF_Alertrules_cne1.9+.yaml -n <namespace>
```
```
$ kubectl apply -f PCRF_Alertrules_cne1.9+.yaml -n <namespace>
```
Note:
This is a sample command for Converged mode of deployment.
To verify if the CRD is created, run the following command:
```
kubectl get prometheusrule -n <namespace>
```
Example:
```
kubectl get prometheusrule -n occnp
```
Verify the alerts in the Prometheus GUI. To do so, select the Alerts tab, and view alert details by selecting any individual rule from the list.

Validating Alerts

After configuring the alerts in Prometheus server, a user can verify using the following procedure:

Open the Prometheus server from your browser using the <IP>:<Port>
Navigate to Status and then Rules
Search Policy. Policy Alerts list is displayed.

If you are unable to see the alerts, verify if the alert file is correct and then try again.

Adding worker node name in metrics

To add the worker node name in metrics, perform the following steps:

Edit the configmap occne-prometheus-server in namespace - occne-infra.
Locate the the following job:
```
job_name: kubernetes-pods
kubernetes_sd_configs:
role: pod
```

Add the following in the relabel_configs:

action: replace
source_labels:
__meta_kubernetes_pod_node_name
target_label: kubernetes_pod_node_name

8.2 Configuring SNMP Notifier

This section describes the procedure to configure SNMP Notifier.

Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using the following procedure:

Run the following command to edit the deployment:
```
$ kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>
```
Example:
```
$ kubectl edit deploy occne-snmp-notifier -n occne-infra
```
SNMP deployment yaml file is displayed.
Edit the SNMP destination in the deployment yaml file as follows:
```
--snmp.destination=<destination_ip>:<destination_port>
```
Example:
```
--snmp.destination=10.75.203.94:162
```
Save the file.

Checking SNMP Traps

Following is an example on how to capture the logs of the trap receiver server to view the generated SNMP traps:

$ docker logs <trapd_container_id>

Sample output:

Figure 8-1 Sample output for SNMP Trap

MIB Files for Policy

There are two MIB files which are used to generate the traps. Update these files along with the Alert file in order to fetch the traps in their environment.

toplevel.mib
This is the top level mib file, where the Objects and their data types are defined.
policy-alarm-mib.mib
This file fetches objects from the top level mib file and these objects can be selected for display.

Note:

MIB files are packaged along with Custom Templates. Download the file from MOS. For more information on downloading custom templates, see Oracle Communications Cloud Native Core, Converged Policy Installation, Upgrade, and Fault Recovery Guide.

8.3 List of Alerts

This section provides detailed information about the alert rules defined for Policy. It consists of the following three types of alerts:

Common Alerts - This category of alerts is common and required for all three modes of deployment.
PCF Alerts - This category of alerts is specific to PCF microservices and required for Converged and PCF only modes of deployment.
PCRF Alerts - This category of alerts is specific to PCRF microservices and required for Converged and PCRF only modes of deployment.

8.3.1 Common Alerts

This section provides information about alerts that are common for PCF and PCRF.

8.3.1.1 POD_CONGESTION_L1

Table 8-2 POD_CONGESTION_L1

Field	Details
Name in Alert Yaml File	PodCongestionL1
Description	Alert when cpu of pod is in CONGESTION_L1 state.
Summary	Alert when cpu of pod is in CONGESTION_L1 state.
Severity	Critical
Condition	occnp_pod_resource_congestion_state{type="cpu",container!~"bulwark\|diam-gateway"} == 2
OID	1.3.6.1.4.1.323.5.3.52.1.2.71
Metric Used	occnp_pod_resource_congestion_state
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.2 POD_CONGESTION_L2

Table 8-3 POD_CONGESTION_L2

Field	Details
Name in Alert Yaml File	PodCongestionL2
Description	Alert when cpu of pod is in CONGESTION_L2 state.
Summary	Alert when cpu of pod is in CONGESTION_L2 state.
Severity	Critical
Condition	occnp_pod_resource_congestion_state{type="cpu"} == 3
OID	1.3.6.1.4.1.323.5.3.52.1.2.72
Metric Used	occnp_pod_resource_congestion_state
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.3 POD_PENDING_REQUEST_CONGESTION_L1

Table 8-4 POD_PENDING_REQUEST_CONGESTION_L1

Field	Details
Name in Alert Yaml File	PodPendingRequestCongestionL1
Description	Alert when queue of pod is in CONGESTION_L1 state.
Summary	Alert when queue of pod is in CONGESTION_L1 state.
Severity	critical
Condition	occnp_pod_resource_congestion_state{type="queue",container!~"bulwark\|diam-gateway"} == 2
OID	1.3.6.1.4.1.323.5.3.52.1.2.73
Metric Used	occnp_pod_resource_congestion_state
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.4 POD_PENDING_REQUEST_CONGESTION_L2

Table 8-5 POD_PENDING_REQUEST_CONGESTION_L2

Field	Details
Name in Alert Yaml File	PodPendingRequestCongestionL2
Description	Alert when queue of pod is in CONGESTION_L2 state.
Summary	Alert when queue of pod is in CONGESTION_L2 state.
Severity	critical
Condition	occnp_pod_resource_congestion_state{type="queue"} == 3
OID	1.3.6.1.4.1.323.5.3.52.1.2.74
Metric Used	occnp_pod_resource_congestion_state
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.5 POD_CPU_CONGESTION_L1

Table 8-6 POD_CPU_CONGESTION_L1

Field	Details
Name in Alert Yaml File	PodCPUCongestionL1
Description	Alert when cpu of pod is in CONGESTION_L1 state.
Summary	Alert when cpu of pod is in CONGESTION_L1 state.Alert when pod is in CONGESTION_L1 state.
Severity	Critical
Condition	occnp_pod_resource_congestion_state{type="cpu",container!~"bulwark\|diam-gateway"} == 2
OID	1.3.6.1.4.1.323.5.3.52.1.2.73
Metric Used	occnp_pod_resource_congestion_state
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.6 POD_CPU_CONGESTION_L2

Table 8-7 POD_CPU_CONGESTION_L2

Field	Details
Name in Alert Yaml File	PodCPUCongestionL2
Description	Alert when cpu of pod is in CONGESTION_L2 state.
Summary	Alert when cpu of pod is in CONGESTION_L2 state.
Severity	critical
Condition	occnp_pod_resource_congestion_state{type="cpu"} == 3
OID	1.3.6.1.4.1.323.5.3.52.1.2.74
Metric Used	occnp_pod_resource_congestion_state
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.7 PodMemoryDoC

Table 8-8 PodMemoryDoC

Field	Details
Description	Pod Resource Congestion status of {{$labels.service}} service is DoC for Memory type
Summary	Pod Resource Congestion status of {{$labels.service}} service is DoC for Memory type
Severity	Major
Condition	occnp_pod_resource_congestion_state{type="memory"} == 1
OID	1.3.6.1.4.1.323.5.3.52.1.2.31
Metric Used	occnp_pod_resource_congestion_state
Recommended Actions	Alert triggers based on the resource limit usage and load shedding configurations in congestion control. The CPU, Memory, and queue usage can be referred using the Grafana Dashboard. Note: Threshold levels can be configured using the `PCF_Alertrules.yaml` file. For any additional guidance, contact My Oracle Support.

8.3.1.8 PodMemoryCongested

Table 8-9 PodMemoryCongested

Field	Details
Description	Pod Resource Congestion status of {{$labels.service}} service is congested for Memory type
Summary	Pod Resource Congestion status of {{$labels.service}} service is congested for Memory type
Severity	Critical
Condition	occnp_pod_resource_congestion_state{type="memory"} == 2
OID	1.3.6.1.4.1.323.5.3.52.1.2.32
Metric Used	occnp_pod_resource_congestion_state
Recommended Actions	Alert triggers based on the resource limit usage and load shedding configurations in congestion control. The CPU, Memory, and queue usage can be referred using the Grafana Dashboard. For any additional guidance, contact My Oracle Support.

8.3.1.9 PodDoc

Table 8-10 PodDoc

Field	Details
Description	Pod Congestion status of {{$labels.service}} service is DoC.
Summary	Pod Congestion status of {{$labels.service}} service is DoC.
Severity	Major
Condition	occnp_pod_congestion_state == 1
OID	1.3.6.1.4.1.323.5.3.52.1.2.25
Metric Used	occnp_pod_congestion_state
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.10 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-11 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	RAA Rx fail count exceeds the critical threshold limit.
Summary	RAA Rx fail count exceeds the critical threshold limit.
Severity	CRITICAL
Condition	sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236"}[5m])) 100 > 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.35
Metric Used	occnp_diam_response_local_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.11 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-12 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field	Details
Description	RAA Rx fail count exceeds the major threshold limit.
Summary	RAA Rx fail count exceeds the major threshold limit.
Severity	MAJOR
Condition	sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236"}[5m])) 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) 100 <= 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.35
Metric Used	occnp_diam_response_local_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.12 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-13 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field	Details
Description	RAA Rx fail count exceeds the minor threshold limit.
Summary	RAA Rx fail count exceeds the minor threshold limit.
Severity	MINOR
Condition	sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) 100 <= 80
OID	1.3.6.1.4.1.323.5.3.52.1.2.35
Metric Used	occnp_diam_response_local_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.13 ASA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-14 ASA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	ASA Rx fail count exceeds the critical threshold limit.
Summary	ASA Rx fail count exceeds the critical threshold limit.
Severity	CRITICAL
Condition	sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) 100 > 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.66
Metric Used	occnp_diam_response_local_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.14 ASA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-15 ASA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field	Details
Description	ASA Rx fail count exceeds the major threshold limit.
Summary	ASA Rx fail count exceeds the major threshold limit.
Severity	MAJOR
Condition	sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) 100 <= 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.66
Metric Used	occnp_diam_response_local_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.15 ASA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-16 ASA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field	Details
Description	ASA Rx fail count exceeds the minor threshold limit.
Summary	ASA Rx fail count exceeds the minor threshold limit.
Severity	MINOR
Condition	sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) 100 <= 80
OID	1.3.6.1.4.1.323.5.3.52.1.2.66
Metric Used	occnp_diam_response_local_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.16 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-17 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD

Field	Details
Description	ASA Rx timeout count exceeds the minor threshold limit
Summary	ASA Rx timeout count exceeds the minor threshold limit
Severity	MINOR
Condition	sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 80
OID	1.3.6.1.4.1.323.5.3.52.1.2.67
Metric Used
Recommended Actions

8.3.1.17 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-18 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field	Details
Description	ASA Rx timeout count exceeds the major threshold limit
Summary	ASA Rx timeout count exceeds the major threshold limit
Severity	sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 90
Condition	MAJOR
OID	1.3.6.1.4.1.323.5.3.52.1.2.67
Metric Used
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.18 ASA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-19 ASA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	ASA Rx timeout count exceeds the critical threshold limit
Summary	ASA Rx timeout count exceeds the critical threshold limit
Severity	CRITICAL
Condition	sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.67
Metric Used
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.19 SCP_PEER_UNAVAILABLE

Table 8-20 SCP_PEER_UNAVAILABLE

Field	Details
Description	Configured SCP peer is unavailable.
Summary	Configured SCP peer is unavailable.
Severity	Major
Condition	occnp_oc_egressgateway_peer_health_status != 0. SCP peer [ {{$labels.peer}} ] is unavailable.
OID	1.3.6.1.4.1.323.5.3.52.1.2.60
Metric Used	occnp_oc_egressgateway_peer_health_status
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.20 SCP_PEER_SET_UNAVAILABLE

Table 8-21 SCP_PEER_SET_UNAVAILABLE

Field	Details
Description	None of the SCP peer available for configured peerset.
Summary	None of the SCP peer available for configured peerset.
Severity	Critical
Condition	One of the SCPs has been marked unhealthy.
OID	1.3.6.1.4.1.323.5.3.52.1.2.61
Metric Used	oc_egressgateway_peer_count and oc_egressgateway_peer_available_count
Recommended Actions	NF clears the critical alarm when atleast one SCP peer in a peerset becomes available such that all other SCP peers in the given peerset are still unavailable. For any additional guidance, contact My Oracle Support.

8.3.1.21 STALE_CONFIGURATION

Table 8-22 STALE_CONFIGURATION

Field	Details
Description	In last 10 minutes, the current service config_level does not match the config_level from the config-server.
Summary	In last 10 minutes, the current service config_level does not match the config_level from the config-server.
Severity	Major
Condition	(sum by(namespace) (topic_version{app_kubernetes_io_name="config-server",topicName="config.level"})) / (count by(namespace) (topic_version{app_kubernetes_io_name="config-server",topicName="config.level"})) != (sum by(namespace) (topic_version{app_kubernetes_io_name!="config-server",topicName="config.level"})) / (count by(namespace) (topic_version{app_kubernetes_io_name!="config-server",topicName="config.level"}))
OID	1.3.6.1.4.1.323.5.3.52.1.2.62
Metric Used	topic_version
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.22 POLICY_SERVICES_DOWN

Table 8-23 POLICY_SERVICES_DOWN

Field	Details
Name in Alert Yaml File	PCF_SERVICES_DOWN
Description	{{$labels.service}} service is not running.
Summary	{{$labels.service}} service is not running.
Severity	Critical
Condition	None of the pods of the CNC Policy application are available.
OID	1.3.6.1.4.1.323.5.3.36.1.2.1
Metric Used	appinfo_service_running{vendor="Oracle", application="occnp", category!=""}!= 1
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.23 DIAM_TRAFFIC_RATE_ABOVE_THRESHOLD

Table 8-24 DIAM_TRAFFIC_RATE_ABOVE_THRESHOLD

Field	Details
Name in Alert Yaml File	DiamTrafficRateAboveThreshold
Description	Diameter Connector Ingress traffic Rate is above threshold of Max MPS (current value is: {{ $value }})
Summary	Traffic Rate is above 90 Percent of Max requests per second.
Severity	Major
Condition	The total Ingress traffic rate for Diameter connector has crossed the configured threshold of 900 TPS. Default value of this alert trigger point in Common_Alertrules.yaml file is when Diameter Connector Ingress Rate crosses 90% of maximum ingress requests per second.
OID	1.3.6.1.4.1.323.5.3.36.1.2.6
Metric Used	ocpm_ingress_request_total
Recommended Actions	The alert gets cleared when the Ingress traffic rate falls below the threshold. Note: Threshold levels can be configured using the `Common_Alertrules.yaml` file. It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic: Refer Ingress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes. Check Ingress Gateway logs on Kibana to determine the reason for the errors. For any additional guidance, contact My Oracle Support.

8.3.1.24 DIAM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Table 8-25 DIAM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Field	Details
Name in Alert Yaml File	DiamIngressErrorRateAbove10Percent
Description	Transaction Error Rate detected above 10 Percent of Total on Diameter Connector (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 10 Percent of Total Transactions.
Severity	Critical
Condition	The number of failed transactions is above 10 percent of the total transactions on Diameter Connector.
OID	1.3.6.1.4.1.323.5.3.36.1.2.7
Metric Used	ocpm_ingress_response_total
Recommended Actions	The alert gets cleared when the number of failed transactions are below 10% of the total transactions. To assess the reason for failed transactions, perform the following steps: Check the service specific metrics to understand the service specific errors. For instance:`ocpm_ingress_response_total{servicename_3gpp="rx",response_code!~"2.*"}` The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH. For any additional guidance, contact My Oracle Support.

8.3.1.25 DIAM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Table 8-26 DIAM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Field	Details
Name in Alert Yaml File	DiamEgressErrorRateAbove1Percent
Description	Egress Transaction Error Rate detected above 1 Percent of Total on Diameter Connector (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 1 Percent of Total Transactions
Severity	Minor
Condition	The number of failed transactions is above 1 percent of the total Egress Gateway transactions on Diameter Connector.
OID	1.3.6.1.4.1.323.5.3.36.1.2.8
Metric Used	ocpm_egress_response_total
Recommended Actions	The alert gets cleared when the number of failed transactions are below 1% of the total transactions. To assess the reason for failed transactions, perform the following steps: Check the service specific metrics to understand the errors. For instance:`ocpm_egress_response_total{servicename_3gpp="rx",response_code!~"2.*"}` The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH. For any additional guidance, contact My Oracle Support.

8.3.1.26 UDR_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Table 8-27 UDR_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Field	Details
Description	User service Ingress traffic Rate from UDR is above threshold of Max MPS (current value is: {{ $value }})
Summary	Traffic Rate is above 90 Percent of Max requests per second
Severity	Major
Condition	The total User Service Ingress traffic rate from UDR has crossed the configured threshold of 900 TPS. Default value of this alert trigger point in Common_Alertrules.yaml file is when user service Ingress Rate from UDR crosses 90% of maximum ingress requests per second.
OID	1.3.6.1.4.1.323.5.3.36.1.2.9
Metric Used	ocpm_userservice_inbound_count_total{service_resource="udr-service"}
Recommended Actions	The alert gets cleared when the Ingress traffic rate falls below the threshold. Note: Threshold levels can be configured using the `Common_Alertrules.yaml` file. It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic: Refer Ingress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes. Check Ingress Gateway logs on Kibana to determine the reason for the errors. For any additional guidance, contact My Oracle Support.

8.3.1.27 UDR_EGRESS_ERROR_RATE_ABOVE_10_PERCENT

Table 8-28 UDR_EGRESS_ERROR_RATE_ABOVE_10_PERCENT

Field	Details
Description	Egress Transaction Error Rate detected above 10 Percent of Total on User service (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 10 Percent of Total Transactions
Severity	Critical
Condition	The number of failed transactions from UDR is more than 10 percent of the total transactions.
OID	1.3.6.1.4.1.323.5.3.36.1.2.10
Metric Used	ocpm_udr_tracking_response_total{servicename_3gpp="nudr-dr",response_code!~"2.*"}
Recommended Actions	The alert gets cleared when the number of failure transactions falls below the configured threshold. Note: Threshold levels can be configured using the `Common_Alertrules.yaml` file. It is recommended to assess the reason for failed transactions. Perform the following steps to analyze the cause of increased traffic: Refer Egress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes. Check Egress Gateway logs on Kibana to determine the reason for the errors. For any additional guidance, contact My Oracle Support.

8.3.1.28 POLICYDS_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Table 8-29 POLICYDS_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Field	Details
Description	Ingress Traffic Rate is above threshold of Max MPS (current value is: {{ $value }})
Summary	Traffic Rate is above 90 Percent of Max requests per second
Severity	Critical
Condition	The total PolicyDS Ingress message rate has crossed the configured threshold of 900 TPS. 90% of maximum Ingress request rate. Default value of this alert trigger point in Common_Alertrules.yaml file is when PolicyDS Ingress Rate crosses 90% of maximum ingress requests per second.
OID	1.3.6.1.4.1.323.5.3.36.1.2.13
Metric Used	client_request_total Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.
Recommended Actions	The alert gets cleared when the Ingress traffic rate falls below the threshold. Note: Threshold levels can be configured using the `Common_Alertrules.yaml` file. It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic: Refer Ingress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes. Check Ingress Gateway logs on Kibana to determine the reason for the errors. For any additional guidance, contact My Oracle Support.

8.3.1.29 POLICYDS_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Table 8-30 POLICYDS_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Field	Details
Description	Ingress Transaction Error Rate detected above 10 Percent of Totat on PolicyDS service (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 10 Percent of Total Transactions
Severity	Critical
Condition	The number of failed transactions is above 10 percent of the total transactions for PolicyDS service.
OID	1.3.6.1.4.1.323.5.3.36.1.2.14
Metric Used	client_response_total
Recommended Actions	The alert gets cleared when the number of failed transactions are below 10% of the total transactions. To assess the reason for failed transactions, perform the following steps: Check the service specific metrics to understand the service specific errors. For instance:`client_response_total{response!~"2.*"}` The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH. For any additional guidance, contact My Oracle Support.

8.3.1.30 POLICYDS_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Table 8-31 POLICYDS_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Field	Details
Description	Egress Transaction Error Rate detected above 1 Percent of Total on PolicyDS service (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 1 Percent of Total Transactions
Severity	Minor
Condition	The number of failed transactions is above 1 percent of the total transactions for PolicyDS service.
OID	1.3.6.1.4.1.323.5.3.36.1.2.15
Metric Used	server_response_total
Recommended Actions	The alert gets cleared when the number of failed transactions are below 10% of the total transactions. To assess the reason for failed transactions, perform the following steps: Check the service specific metrics to understand the service specific errors. For instance:`server_response_total{response!~"2.*"}` The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH. For any additional guidance, contact My Oracle Support.

8.3.1.31 UDR_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD

Table 8-32 UDR_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD

Field	Details
Description	Ingress Timeout Error Rate detected above 10 Percent of Totat towards UDR service (current value is: {{ $value }})
Summary	Timeout Error Rate detected above 10 Percent of Total Transactions
Severity	Major
Condition	The number of failed transactions due to timeout is above 10 percent of the total transactions for UDR service.
OID	1.3.6.1.4.1.323.5.3.36.1.2.16
Metric Used	ocpm_udr_tracking_request_timeout_total{servicename_3gpp="nudr-dr"}
Recommended Actions	The alert gets cleared when the number of failed transactions due to timeout are below 10% of the total transactions. To assess the reason for failed transactions, perform the following steps: Check the service specific metrics to understand the service specific errors. For instance: `ocpm_udr_tracking_request_timeout_total{servicename_3gpp="nudr-dr"}` The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH. For any additional guidance, contact My Oracle Support.

8.3.1.32 DB_TIER_DOWN_ALERT

Table 8-33 DB_TIER_DOWN_ALERT

Field	Details
Description	DB cannot be reachable.
Summary	DB cannot be reachable.
Severity	Critical
Condition	Database is not available.
OID	1.3.6.1.4.1.323.5.3.36.1.2.18
Metric Used	appinfo_category_running{category="database"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.33 CPUUsagePerServiceAboveMinorThreshold

Table 8-34 CPUUsagePerServiceAboveMinorThreshold

Field	Details
Description	CPU usage for {{$labels.service}} service is above 60
Summary	CPU usage for {{$labels.service}} service is above 60
Severity	Minor
Condition	A service pod has reached the configured minor threshold (60%) of its CPU usage limits.
OID	1.3.6.1.4.1.323.5.3.36.1.2.19
Metric Used	container_cpu_usage_seconds_total Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.
Recommended Actions	The alert gets cleared when the CPU utilization falls below the minor threshold or crosses the major threshold, in which case CPUUsagePerServiceAboveMajorThreshold alert shall be raised. Note: Threshold levels can be configured using the `PCF_Alertrules.yaml` file. For any additional guidance, contact My Oracle Support.

8.3.1.34 CPUUsagePerServiceAboveMajorThreshold

Table 8-35 CPUUsagePerServiceAboveMajorThreshold

Field	Details
Description	CPU usage for {{$labels.service}} service is above 80
Summary	CPU usage for {{$labels.service}} service is above 80
Severity	Major
Condition	A service pod has reached the configured major threshold (80%) of its CPU usage limits.
OID	1.3.6.1.4.1.323.5.3.36.1.2.20
Metric Used	container_cpu_usage_seconds_total Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.
Recommended Actions	The alert gets cleared when the CPU utilization falls below the major threshold or crosses the critical threshold, in which case CPUUsagePerServiceAboveCriticalThreshold alert shall be raised. Note: Threshold levels can be configured using the `PCF_Alertrules.yaml` file. For any additional guidance, contact My Oracle Support.

8.3.1.35 CPUUsagePerServiceAboveCriticalThreshold

Table 8-36 CPUUsagePerServiceAboveCriticalThreshold

Field	Details
Description	CPU usage for {{$labels.service}} service is above 90
Summary	CPU usage for {{$labels.service}} service is above 90
Severity	Critical
Condition	A service pod has reached the configured critical threshold (90%) of its CPU usage limits.
OID	1.3.6.1.4.1.323.5.3.36.1.2.21
Metric Used	container_cpu_usage_seconds_total Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.
Recommended Actions	The alert gets cleared when the CPU utilization falls below the critical threshold. Note: Threshold levels can be configured using the `PCF_Alertrules.yaml` file. For any additional guidance, contact My Oracle Support.

8.3.1.36 MemoryUsagePerServiceAboveMinorThreshold

Table 8-37 MemoryUsagePerServiceAboveMinorThreshold

Field	Details
Description	Memory usage for {{$labels.service}} service is above 60
Summary	Memory usage for {{$labels.service}} service is above 60
Severity	Minor
Condition	A service pod has reached the configured minor threshold (60%) of its memory usage limits.
OID	1.3.6.1.4.1.323.5.3.36.1.2.22
Metric Used	container_memory_usage_bytes Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.
Recommended Actions	The alert gets cleared when the memory utilization falls below the minor threshold or crosses the critical threshold, in which case MemoryUsagePerServiceAboveMajorThreshold alert shall be raised. Note: Threshold levels can be configured using the `PCF_Alertrules.yaml` file. For any additional guidance, contact My Oracle Support.

8.3.1.37 MemoryUsagePerServiceAboveMajorThreshold

Table 8-38 MemoryUsagePerServiceAboveMajorThreshold

Field	Details
Description	Memory usage for {{$labels.service}} service is above 80
Summary	Memory usage for {{$labels.service}} service is above 80
Severity	Major
Condition	A service pod has reached the configured major threshold (80%) of its memory usage limits.
OID	1.3.6.1.4.1.323.5.3.36.1.2.23
Metric Used	container_memory_usage_bytes Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.
Recommended Actions	The alert gets cleared when the memory utilization falls below the major threshold or crosses the critical threshold, in which case MemoryUsagePerServiceAboveCriticalThreshold alert shall be raised. Note: Threshold levels can be configured using the `PCF_Alertrules.yaml` file. For any additional guidance, contact My Oracle Support.

8.3.1.38 MemoryUsagePerServiceAboveCriticalThreshold

Table 8-39 MemoryUsagePerServiceAboveCriticalThreshold

Field	Details
Description	Memory usage for {{$labels.service}} service is above 90
Summary	Memory usage for {{$labels.service}} service is above 90
Severity	Critical
Condition	A service pod has reached the configured critical threshold (90%) of its memory usage limits.
OID	1.3.6.1.4.1.323.5.3.36.1.2.24
Metric Used	container_memory_usage_bytes Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system.
Recommended Actions	The alert gets cleared when the memory utilization falls below the critical threshold. Note: Threshold levels can be configured using the `PCF_Alertrules.yaml` file. For any additional guidance, contact My Oracle Support.

8.3.1.39 POD_CONGESTED

Table 8-40 POD_CONGESTED

Field	Details
Description	Pod Congestion status of {{$labels.service}} service is congested
Summary	Pod Congestion status of {{$labels.service}} service is congested
Severity	Critical
Condition	The pod congestion status is set to congested.
OID	1.3.6.1.4.1.323.5.3.36.1.2.26
Metric Used	occnp_pod_congestion_state
Recommended Actions	The alert gets cleared when the system is back to normal state. For any additional guidance, contact My Oracle Support.

8.3.1.40 POD_DANGER_OF_CONGESTION

Table 8-41 POD_DANGER_OF_CONGESTION

Field	Details
Description	Pod Congestion status of {{$labels.service}} service is DoC
Summary	Pod Congestion status of {{$labels.service}} service is DoC
Severity	Major
Condition	The pod congestion status is set to Danger of Congestion.
OID	1.3.6.1.4.1.323.5.3.36.1.2.25
Metric Used	occnp_pod_congestion_state
Recommended Actions	The alert gets cleared when the system is back to normal state. For any additional guidance, contact My Oracle Support.

8.3.1.41 POD_PENDING_REQUEST_CONGESTED

Table 8-42 POD_PENDING_REQUEST_CONGESTED

Field	Details
Description	Pod Resource Congestion status of {{$labels.service}} service is congested for PendingRequest type.
Summary	Pod Resource Congestion status of {{$labels.service}} service is congested for PendingRequest type.
Severity	Critical
Condition	The pod congestion status is set to congested for PendingRequest.
OID	1.3.6.1.4.1.323.5.3.36.1.2.28
Metric Used	occnp_pod_resource_congestion_state{type="queue"}
Recommended Actions	The alert gets cleared when the pending requests in the queue comes below the configured threshold value. For any additional guidance, contact My Oracle Support.

8.3.1.42 POD_PENDING_REQUEST_DANGER_OF_CONGESTION

Table 8-43 POD_PENDING_REQUEST_DANGER_OF_CONGESTION

Field	Details
Description	Pod Resource Congestion status of {{$labels.service}} service is DoC for PendingRequest type.
Summary	Pod Resource Congestion status of {{$labels.service}} service is DoC for PendingRequest type.
Severity	Major
Condition	The pod congestion status is set to DoC for pending requests.
OID	1.3.6.1.4.1.323.5.3.36.1.2.27
Metric Used	occnp_pod_resource_congestion_state{type="queue"}
Recommended Actions	The alert gets cleared when the pending requests in the queue comes below the configured threshold value. For any additional guidance, contact My Oracle Support.

8.3.1.43 POD_CPU_CONGESTED

Table 8-44 POD_CPU_CONGESTED

Field	Details
Description	Pod Resource Congestion status of {{$labels.service}} service is congested for CPU type.
Summary	Pod Resource Congestion status of {{$labels.service}} service is congested for CPU type.
Severity	Critical
Condition	The pod congestion status is set to congested for CPU.
OID	1.3.6.1.4.1.323.5.3.36.1.2.30
Metric Used	occnp_pod_resource_congestion_state{type="cpu"}
Recommended Actions	The alert gets cleared when the system CPU usage comes below the configured threshold value. For any additional guidance, contact My Oracle Support.

8.3.1.44 POD_CPU_DANGER_OF_CONGESTION

Table 8-45 POD_CPU_DANGER_OF_CONGESTION

Field	Details
Description	Pod Resource Congestion status of {{$labels.service}} service is DoC for CPU type.
Summary	Pod Resource Congestion status of {{$labels.service}} service is DoC for CPU type.
Severity	Major
Condition	The pod congestion status is set to DoC for CPU.
OID	1.3.6.1.4.1.323.5.3.36.1.2.29
Metric Used	occnp_pod_resource_congestion_state{type="cpu"}
Recommended Actions	The alert gets cleared when the system CPU usage comes below the configured threshold value. For any additional guidance, contact My Oracle Support.

8.3.1.45 SERVICE_OVERLOADED

Table 8-46 SERVICE_OVERLOADED

Field	Details
Description	Overload Level of {{$labels.service}} service is L1
Summary	Overload Level of {{$labels.service}} service is L1
Severity	Minor
Condition	The overload level of the service is L1.
OID	1.3.6.1.4.1.323.5.3.36.1.2.40
Metric Used	load_level
Recommended Actions	The alert gets cleared when the system is back to normal state. For any additional guidance, contact My Oracle Support.

Table 8-47 SERVICE_OVERLOADED

Field	Details
Description	Overload Level of {{$labels.service}} service is L2
Summary	Overload Level of {{$labels.service}} service is L2
Severity	Major
Condition	The overload level of the service is L2.
OID	1.3.6.1.4.1.323.5.3.36.1.2.40
Metric Used	load_level
Recommended Actions	The alert gets cleared when the system is back to normal state. For any additional guidance, contact My Oracle Support.

Table 8-48 SERVICE_OVERLOADED

Field	Details
Description	Overload Level of {{$labels.service}} service is L3
Summary	Overload Level of {{$labels.service}} service is L3
Severity	Critical
Condition	The overload level of the service is L3.
OID	1.3.6.1.4.1.323.5.3.36.1.2.40
Metric Used	load_level
Recommended Actions	The alert gets cleared when the system is back to normal state. For any additional guidance, contact My Oracle Support.

8.3.1.46 SERVICE_RESOURCE_OVERLOADED

Alerts when service is in overload state due to memory usage

Table 8-49 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L1 for {{$labels.type}} type
Summary	{{$labels.service}} service is L1 for {{$labels.type}} type
Severity	Minor
Condition	The overload level of the service is L1 due to memory usage.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="memory"}
Recommended Actions	The alert gets cleared when the memory usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 8-50 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L2 for {{$labels.type}} type
Summary	{{$labels.service}} service is L2 for {{$labels.type}} type
Severity	Major
Condition	The overload level of the service is L2 due to memory usage.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="memory"}
Recommended Actions	The alert gets cleared when the memory usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 8-51 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L3 for {{$labels.type}} type.
Summary	{{$labels.service}} service is L3 for {{$labels.type}} type
Severity	Critical
Condition	The overload level of the service is L3 due to memory usage.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="memory"}
Recommended Actions	The alert gets cleared when the memory usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to CPU usage

Table 8-52 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L1 for {{$labels.type}} type
Summary	{{$labels.service}} service is L1 for {{$labels.type}} type
Severity	Minor
Condition	The overload level of the service is L1 due to CPU usage.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="cpu"}
Recommended Actions	The alert gets cleared when the CPU usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 8-53 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L2 for {{$labels.type}} type
Summary	{{$labels.service}} service is L2 for {{$labels.type}} type
Severity	Major
Condition	The overload level of the service is L2 due to CPU usage.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="cpu"}
Recommended Actions	The alert gets cleared when the CPU usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 8-54 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L3 for {{$labels.type}} type
Summary	{{$labels.service}} service is L3 for {{$labels.type}} type
Severity	Major
Condition	The overload level of the service is L3 due to CPU usage.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="cpu"}
Recommended Actions	The alert gets cleared when the CPU usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to number of pending messages

Table 8-55 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L1 for {{$labels.type}} type
Summary	{{$labels.service}} service is L1 for {{$labels.type}} type
Severity	Minor
Condition	The overload level of the service is L1 due to number of pending messages.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="svc_pending_count"}
Recommended Actions	The alert gets cleared when the number of pending messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 8-56 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L2 for {{$labels.type}} type
Summary	{{$labels.service}} service is L2 for {{$labels.type}} type
Severity	Major
Condition	The overload level of the service is L2 due to number of pending messages.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="svc_pending_count"}
Recommended Actions	The alert gets cleared when the number of pending messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 8-57 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L3 for {{$labels.type}} type
Summary	{{$labels.service}} service is L3 for {{$labels.type}} type
Severity	Critical
Condition	The overload level of the service is L3 due to number of pending messages.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="svc_pending_count"}
Recommended Actions	The alert gets cleared when the number of pending messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to number of failed requests

Table 8-58 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L1 for {{$labels.type}} type.
Summary	{{$labels.service}} service is L1 for {{$labels.type}} type.
Severity	Minor
Condition	The overload level of the service is L1 due to number of failed requests.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="svc_failure_count"}
Recommended Actions	The alert gets cleared when the number of failed messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 8-59 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L2 for {{$labels.type}} type.
Summary	{{$labels.service}} service is L2 for {{$labels.type}} type.
Severity	Major
Condition	The overload level of the service is L2 due to number of failed requests.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="svc_failure_count"}
Recommended Actions	The alert gets cleared when the number of failed messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 8-60 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.service}} service is L3 for {{$labels.type}} type.
Summary	{{$labels.service}} service is L3 for {{$labels.type}} type.
Severity	Critical
Condition	The overload level of the service is L3 due to number of failed requests.
OID	1.3.6.1.4.1.323.5.3.36.1.2.41
Metric Used	service_resource_overload_level{type="svc_failure_count"}
Recommended Actions	The alert gets cleared when the number of failed messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

8.3.1.47 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_CRITICAL_THRESHOLD

Table 8-61 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	Notification Transaction Error exceeds the critical threshold limit for a given Subscriber Notification server
Summary	Transaction Error exceeds the critical threshold limit for a given Subscriber Notification server
Severity	Critical
Condition	The number of error responses for a given subscriber notification server exceeds the critical threshold of 1000.
OID	1.3.6.1.4.1.323.5.3.36.1.2.42
Metric Used	http_notification_response_total{responseCode!~"2.*"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

Table 8-62 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_MAJOR_THRESHOLD

Field	Details
Description	Notification Transaction Error exceeds the major threshold limit for a given Subscriber Notification server
Summary	Transaction Error exceeds the major threshold limit for a given Subscriber Notification server
Severity	Major
Condition	The number of error responses for a given subscriber notification server exceeds the major threshold value, that is, between 750 and 1000.
OID	1.3.6.1.4.1.323.5.3.36.1.2.42
Metric Used	http_notification_response_total{responseCode!~"2.*"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

Table 8-63 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_MINOR_THRESHOLD

Field	Details
Description	Notification Transaction Error exceeds the minor threshold limit for a given Subscriber Notification server
Summary	Transaction Error exceeds the minor threshold limit for a given Subscriber Notification server
Severity	Minor
Condition	The number of error responses for a given subscriber notification server exceeds the minor threshold value, that is, between 500 and 750.
OID	1.3.6.1.4.1.323.5.3.36.1.2.42
Metric Used	http_notification_response_total{responseCode!~"2.*"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.48 SYSTEM_IMPAIRMENT_MAJOR

Table 8-64 SYSTEM_IMPAIRMENT_MAJOR

Field	Details
Description	Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Summary	Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Severity	Major
Condition	Major Impairment alert
OID	1.3.6.1.4.1.323.5.3.36.1.2.43
Metric Used	db_tier_replication_status
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.49 SYSTEM_IMPAIRMENT_CRITICAL

Table 8-65 SYSTEM_IMPAIRMENT_CRITICAL

Field	Details
Description	Critical Impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Summary	Critical Impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Severity	Critical
Condition	Critical Impairment alert
OID	1.3.6.1.4.1.323.5.3.36.1.2.43
Metric Used	db_tier_replication_status
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.50 SYSTEM_OPERATIONAL_STATE_NORMAL

Table 8-66 SYSTEM_OPERATIONAL_STATE_NORMAL

Field	Details
Description	System Operational State is now in normal state
Summary	System Operational State is now in normal state
Severity	Info
Condition	System Operational State is now in normal state
OID	1.3.6.1.4.1.323.5.3.36.1.2.44
Metric Used	system_operational_state == 1
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.51 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN

Table 8-67 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN

Field	Details
Description	System Operational State is now in partial shutdown state.
Summary	System Operational State is now in partial shutdown state.
Severity	Info
Condition	System Operational State is now in partial shutdown state
OID	1.3.6.1.4.1.323.5.3.36.1.2.44
Metric Used	system_operational_state == 2
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.52 SYSTEM_OPERATIONAL_STATE_COMPLETE_SHUTDOWN

Table 8-68 SYSTEM_OPERATIONAL_COMPLETE_SHUTDOWN

Field	Details
Description	System Operational State is now in complete shutdown state
Summary	System Operational State is now in complete shutdown state
Severity	Info
Condition	System Operational State is now in complete shutdown state
OID	1.3.6.1.4.1.323.5.3.36.1.2.44
Metric Used	system_operational_state == 3
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.53 TDFConnectionDown

Table 8-69 TDFConnectionDown

Field	Details
Description	TDF connection is down.
Summary	TDF connection is down.
Severity	Critical
Condition	occnp_diam_conn_app_network{applicationName="Sd"} == 0
OID	1.3.6.1.4.1.323.5.3.52.1.2.48
Metric Used	occnp_diam_conn_app_network
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.54 DiamConnPeerDown

Table 8-70 DiamConnPeerDown

Field	Details
Description	Diameter connection to peer is down.
Summary	Diameter connection to peer is down.
Severity	Major
Condition	Diameter connection to peer is down.
OID	1.3.6.1.4.1.323.5.3.52.1.2.50
Metric Used	occnp_diam_conn_network
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.55 DiamConnNetworkDown

Table 8-71 DiamConnNetworkDown

Field	Details
Description	All the diameter network connections are down.
Summary	All the diameter network connections are down.
Severity	Critical
Condition	sum by (kubernetes_namespace)(occnp_diam_conn_network) == 0
OID	1.3.6.1.4.1.323.5.3.52.1.2.51
Metric Used	occnp_diam_conn_network
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.56 DiamConnBackendDown

Table 8-72 DiamConnBackendDown

Field	Details
Description	All the diameter backend connections are down.
Summary	All the diameter backend connections are down.
Severity	Critical
Condition	sum by (kubernetes_namespace)(occnp_diam_conn_backend) == 0
OID	1.3.6.1.4.1.323.5.3.52.1.2.52
Metric Used	occnp_diam_conn_network
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.57 PerfInfoActiveOverloadThresholdFetchFailed

Table 8-73 PerfInfoActiveOverloadThresholdFetchFailed

Field	Details
Description	The application fails to get the current active overload level threshold data.
Summary	The application fails to get the current active overload level threshold data.
Severity	Major
Condition	active_overload_threshold_fetch_failed == 1
OID	1.3.6.1.4.1.323.5.3.52.1.2.53
Metric Used	active_overload_threshold_fetch_failed
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.58 SLASYFailCountExceedsCritcalThreshold

Table 8-74 SLASYFailCountExceedsCritcalThreshold

Field	Details
Description	SLA Sy fail count exceeds the critical threshold limit
Summary	SLA Sy fail count exceeds the critical threshold limit
Severity	Critical
Condition	sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) 100 > 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.58
Metric Used	occnp_diam_response_local_total
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.59 SLASYFailCountExceedsMajorThreshold

Table 8-75 SLASYFailCountExceedsMajorThreshold

Field	Details
Description	SLA Sy fail count exceeds the major threshold limit
Summary	SLA Sy fail count exceeds the major threshold limit
Severity	Major
Condition	sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) 100 <= 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.58
Metric Used	occnp_diam_response_local_total
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.60 SLASYFailCountExceedsMinorThreshold

Table 8-76 SLASYFailCountExceedsMinorThreshold

Field	Details
Description	SLA Sy fail count exceeds the minor threshold limit
Summary	SLA Sy fail count exceeds the minor threshold limit
Severity	Minor
Condition	sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) 100 <= 80
OID	1.3.6.1.4.1.323.5.3.52.1.2.58
Metric Used	occnp_diam_response_local_total
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.61 STASYFailCountExceedsCritcalThreshold

Table 8-77 STASYFailCountExceedsCritcalThreshold

Field	Details
Description	STA Sy fail count exceeds the critical threshold limit.
Summary	STA Sy fail count exceeds the critical threshold limit.
Severity	Critical
Condition	The failure rate of Sy STA responses is more than 90% of the total responses.
Expression	sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) 100 > 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.59
Metric Used	occnp_diam_response_local_total
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.62 STA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-78 STA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field	Details
Description	STA Sy fail count exceeds the major threshold limit.
Summary	STA Sy fail count exceeds the major threshold limit.
Severity	Major
Condition	The failure rate of Sy STA responses is more than 80% and less and or equal to 90% of the total responses.
Expression	sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) 100 <= 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.59
Metric Used	occnp_diam_response_local_total
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.63 STASYFailCountExceedsMinorThreshold

Table 8-79 STASYFailCountExceedsMinorThreshold

Field	Details
Description	STA Sy fail count exceeds the minor threshold limit.
Summary	STA Sy fail count exceeds the minor threshold limit.
Severity	Minor
Condition	The failure rate of Sy STA responses is more than 60% and less and or equal to 80% of the total responses.
Expression	sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) 100 <= 80
OID	1.3.6.1.4.1.323.5.3.52.1.2.59
Metric Used	occnp_diam_response_local_total
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.64 SMSC_CONNECTION_DOWN

Table 8-80 STASYFailCountExceedsCritcalThreshold

Field	Details
Description	This alert is triggered when connection to SMSC host is down.
Summary	Connection to SMSC peer {{$labels.smscName}} is down in notifier service pod {{$labels.pod}}
Severity	Major
Condition	sum by(namespace, pod, smscName)(occnp_active_smsc_conn_count) == 0
OID	1.3.6.1.4.1.323.5.3.52.1.2.63
Metric Used	occnp_active_smsc_conn_count
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.65 STA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-81 STASYFailCountExceedsCritcalThreshold

Field	Details
Description	STA Rx fail count exceeds the critical threshold limit.
Summary	STA Rx fail count exceeds the critical threshold limit.
Severity	Critical
Condition	The failure rate of Rx STA responses is more than 90% of the total responses.
Expression	sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) 100 > 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.64
Metric Used	occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.66 STA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-82 STA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field	Details
Description	STA Rx fail count exceeds the major threshold limit.
Summary	STA Rx fail count exceeds the major threshold limit.
Severity	Major
Condition	The failure rate of Rx STA responses is more than 80% and less and or equal to 90% of the total responses.
Expression	sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) 100 <= 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.64
Metric Used	occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}
Recommended Actions	Check the connectivity between diam-gw pod(s) & AF and ensure connectivity is present. Check that the session and user is valid and hasn't been removed in the Policy database, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.67 STA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-83 STA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field	Details
Description	STA Rx fail count exceeds the minor threshold limit.
Summary	STA Rx fail count exceeds the minor threshold limit.
Severity	Minor
Condition	The failure rate of Rx STA responses is more than 60% and less and or equal to 80% of the total responses.
Expression	sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) 100 <= 80
OID	1.3.6.1.4.1.323.5.3.52.1.2.64
Metric Used	occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}
Recommended Actions	Check the connectivity between diam-gw pod(s) & AF and ensure connectivity is present. Check that the session and user is valid and hasn't been removed in the Policy database, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.68 SNA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-84 SNA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	SNA Sy fail count exceeds the critical threshold limit
Summary	SNA Sy fail count exceeds the critical threshold limit
Severity	Critical
Condition	The failure rate of Sy SNA responses is more than 90% of the total responses.
Expression	sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) 100 > 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.65
Metric Used	occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.69 SNA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 8-85 SNA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field	Details
Description	SNA Sy fail count exceeds the major threshold limit
Summary	SNA Sy fail count exceeds the major threshold limit
Severity	Major
Condition	The failure rate of Sy SNA responses is more than 80% and less and or equal to 90% of the total responses.
Expression	sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) 100 <= 90
OID	1.3.6.1.4.1.323.5.3.52.1.2.65
Metric Used	occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.70 SNA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 8-86 SNA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field	Details
Description	SNA Sy fail count exceeds the minor threshold limit
Summary	SNA Sy fail count exceeds the minor threshold limit
Severity	Minor
Condition	The failure rate of Sy STA responses is more than 60% and less and or equal to 80% of the total responses.
Expression	sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2."}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) 100 <= 80
OID	1.3.6.1.4.1.323.5.3.52.1.2.65
Metric Used	occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}
Recommended Actions	Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support.

8.3.1.71 STALE_DIAMETER_REQUEST_CLEANUP_MINOR

Table 8-87 STALE_DIAMETER_REQUEST_CLEANUP_MINOR

Field	Details
Description	This alerts is triggered when more than 10 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary
Severity	Minor
Expression
OID
Metric Used	ocpm_stale_diam_request_cleanup_total occnp_diam_request_local_total
Recommended Actions

8.3.1.72 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR

Table 8-88 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR

Field	Details
Description	This alert is triggered when more than 20 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary
Severity	Major
Expression
OID
Metric Used	ocpm_late_arrival_rejection_total occnp_diam_request_local_total
Recommended Actions

8.3.1.73 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL

Table 8-89 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL

Field	Details
Description	This alert is triggered when more than 30 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them).
Summary
Severity	Critical
Expression
OID
Metric Used	ocpm_late_arrival_rejection_total occnp_diam_request_local_total
Recommended Actions

8.3.1.74 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Table 8-90 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Field	Details
Description	Certificate expiry in less than 6 months.
Summary	Certificate expiry in less than 6 months.
Severity	Minor
Condition	dgw_tls_cert_expiration_seconds - time() <= 15724800
OID	1.3.6.1.4.1.323.5.3.52.1.2.75
Metric Used	dgw_tls_cert_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.75 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Table 8-91 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Field	Details
Description	Certificate expiry in less than 3 months.
Summary	Certificate expiry in less than 3 months.
Severity	Major
Condition	dgw_tls_cert_expiration_seconds - time() <= 7862400
OID	1.3.6.1.4.1.323.5.3.52.1.2.75
Metric Used	dgw_tls_cert_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.76 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Table 8-92 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Field	Details
Description	Certificate expiry in less than 1 month.
Summary	Certificate expiry in less than 1 month.
Severity	Critical
Condition	dgw_tls_cert_expiration_seconds - time() <= 2592000
OID	1.3.6.1.4.1.323.5.3.52.1.2.75
Metric Used	dgw_tls_cert_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.77 DGW_TLS_CONNECTION_FAILURE

Table 8-93 DGW_TLS_CONNECTION_FAILURE

Field	Details
Description	Alert for TLS connection establishment.
Summary	TLS Connection failure when Diam gateway is an initiator.
Severity	Major
Condition	sum by (namespace,reason)(occnp_diam_failed_conn_network) > 0
OID	1.3.6.1.4.1.323.5.3.52.1.2.81
Metric Used	occnp_diam_failed_conn_network
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.78 POLICY_CONNECTION_FAILURE

Table 8-94 BSF_CONNECTION_FAILURE

Field	Details
Description	Connection failure on Egress and Ingress Gateways for incoming and outgoing connections.
Summary
Severity	Major
Condition	This alert is raised when the TLS certificate is about to expire in three months.
OID	1.3.6.1.4.1.323.5.3.52.1.2.43
Metric Used	occnp_oc_ingressgateway_connection_failure_total
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.79 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Table 8-95 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Field	Details
Description	TLS certificate to expire in 1 month.
Summary	security_cert_x509_expiration_seconds - time() <= 2592000
Severity	Critical
Condition	This alert is raised when the TLS certificate is about to expire in one month.
OID	1.3.6.1.4.1.323.5.3.52.1.2.44
Metric Used	security_cert_x509_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.80 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Table 8-96 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Field	Details
Description	TLS certificate to expire in 3 months.
Summary	security_cert_x509_expiration_seconds - time() <= 7862400
Severity	Major
Condition	This alert is raised when the TLS certificate is about to expire in three months.
OID	1.3.6.1.4.1.323.5.3.52.1.2.44
Metric Used	security_cert_x509_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.81 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Table 8-97 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Field	Details
Description	TLS certificate to expire in 6 months.
Summary	security_cert_x509_expiration_seconds - time() <= 15724800
Severity	Minor
Condition	This alert is raised when the TLS certificate is about to expire in six months.
OID	1.3.6.1.4.1.323.5.3.52.1.2.44
Metric Used	security_cert_x509_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

8.3.1.82 AUDIT_NOT_RUNNING

Table 8-98 AUDIT_NOT_RUNNING

Field	Details
Description	Audit has not been running for at least 1 hour.
Summary	Audit has not been running for at least 1 hour.
Severity	CRITICAL
Condition	(absent_over_time(spring_data_repository_invocations_seconds_count{method="getQueuedTablesToAudit"}[1h]) == 1) OR (sum(increase(spring_data_repository_invocations_seconds_count{method="getQueuedTablesToAudit"}[1h])) == 0)
OID	1.3.6.1.4.1.323.5.3.52.1.2.78
Metric Used	spring_data_repository_invocations_seconds_count
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.83 DIAMETER_POD_ERROR_RESPONSE_MINOR

Table 8-99 DIAMETER_POD_ERROR_RESPONSE_MINOR

Field	Details
Description	At least 1% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Summary	At least 1% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Severity	MINOR
Condition	(topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=1
OID	1.3.6.1.4.1.323.5.3.52.1.2.79
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.84 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD

Table 8-100 DIAMETER_POD_ERROR_RESPONSE_MAJOR

Field	Details
Description	At least 5% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Summary	At least 5% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Severity	MAJOR
Condition	(topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=5
OID	1.3.6.1.4.1.323.5.3.52.1.2.79
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.85 DIAMETER_POD_ERROR_RESPONSE_CRITICAL

Table 8-101 DIAMETER_POD_ERROR_RESPONSE_CRITICAL

Field	Details
Description	At least 10% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER
Summary	At least 10% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER
Severity	CRITICAL
Condition	(topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=10
OID	1.3.6.1.4.1.323.5.3.52.1.2.79
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.1.86 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD

Table 8-102 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	The lock requests fails to acquire the lock count exceeds the critical threshold limit. The (current value is: {{ $value }})
Summary	Keys used in Bulwark lock request which are already in locked state detected above 75 Percent of Total Transactions.
Severity	Critical
Expression	(sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=75
OID	1.3.6.1.4.1.323.5.3.52.1.2.69
Metric Used
Recommended Actions

8.3.1.87 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD

Table 8-103 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD

Field	Details
Description	The lock requests fails to acquire the lock count exceeds the major threshold limit. The (current value is: {{ $value }})
Summary	Keys used in Bulwark lock request which are already in locked state detected above 50 Percent of Total Transactions.
Severity	Major
Expression	(sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >= 50 < 75
OID	1.3.6.1.4.1.323.5.3.52.1.2.69
Metric Used
Recommended Actions

8.3.1.88 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD

Table 8-104 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD

Field	Details
Description	The lock requests fails to acquire the lock count exceeds the minor threshold limit. The (current value is: {{ $value }})
Summary	Keys used in Bulwark lock request which are already in locked state detected above 20 Percent of Total Transactions.
Severity	Minor
Expression	(sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=20 < 50
OID	1.3.6.1.4.1.323.5.3.52.1.2.69
Metric Used
Recommended Actions

8.3.1.89 CERTIFICATE_EXPIRY_MINOR

Table 8-105 CERTIFICATE_EXPIRY_MINOR

Field	Details
Description	Certificate expiry in less than 6 months
Summary	Certificate expiry in less than 6 months
Severity	MINOR
Condition	security_cert_x509_expiration_seconds - time() <= 15724800
OID	1.3.6.1.4.1.323.5.3.52.1.2.77
Metric Used	-
Recommended Actions	-

8.3.1.90 CERTIFICATE_EXPIRY_MAJOR

Table 8-106 CERTIFICATE_EXPIRY_MAJOR

Field	Details
Description	Certificate expiry in less than 3 months
Summary	Certificate expiry in less than 3 months
Severity	MAJOR
Condition	security_cert_x509_expiration_seconds - time() <= 7862400
OID	1.3.6.1.4.1.323.5.3.52.1.2.77
Metric Used	-
Recommended Actions	-

8.3.1.91 CERTIFICATE_EXPIRY_CRITICAL

Table 8-107 CERTIFICATE_EXPIRY_CRITICAL

Field	Details
Description	Certificate expiry in less than 1 months
Summary	Certificate expiry in less than 1 months
Severity	CRITICAL
Condition	security_cert_x509_expiration_seconds - time() <= 2592000
OID	1.3.6.1.4.1.323.5.3.52.1.2.77
Metric Used	-
Recommended Actions	-

8.3.1.92 PERF_INFO_ACTIVE_OVERLOADTHRESHOLD_DATA_PRESENT

Table 8-108 PERF_INFO_ACTIVE_OVERLOADTHRESHOLD_DATA_PRESENT

Field	Details
Description
Summary
Severity	MINOR
Condition	active_overload_threshold_fetch_failed == 1
OID	1.3.6.1.4.1.323.5.3.52.1.2.53
Metric Used
Recommended Actions

8.3.1.93 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Table 8-109 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Field	Details
Description	More than 10% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Summary	More than 10% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Severity	MINOR
Condition	(sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 10
OID	1.3.6.1.4.1.323.5.3.52.1.2.85
Metric Used	-
Recommended Actions	-

8.3.1.94 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Table 8-110 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Field	Details
Description	More than 20% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Summary	More than 20% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Severity	MAJOR
Condition	(sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 20
OID	1.3.6.1.4.1.323.5.3.52.1.2.85
Metric Used	-
Recommended Actions	-

8.3.1.95 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Table 8-111 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Field	Details
Description	More than 30% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Summary	More than 30% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector
Severity	CRITICAL
Condition	(sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 30
OID	1.3.6.1.4.1.323.5.3.52.1.2.85
Metric Used	-
Recommended Actions	-

8.3.1.96 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Table 8-112 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR

Field	Details
Description	More than 10% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Summary	More than 10% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Severity	MINOR
Condition	(sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 10
OID	1.3.6.1.4.1.323.5.3.52.1.2.86
Metric Used	-
Recommended Actions	-

8.3.1.97 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Table 8-113 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR

Field	Details
Description	More than 20% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Summary	More than 20% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Severity	MAJOR
Condition	(sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 20
OID	1.3.6.1.4.1.323.5.3.52.1.2.86
Metric Used	-
Recommended Actions	-

8.3.1.98 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Table 8-114 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL

Field	Details
Description	More than 30% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Summary	More than 30% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector
Severity	CRITICAL
Condition	(sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 30
OID	1.3.6.1.4.1.323.5.3.52.1.2.86
Metric Used	-
Recommended Actions	-

8.3.1.99 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Table 8-115 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Field	Details
Description	This alarm is raised when OCNADD is not reachable.
Summary	'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} BSF Egress Gateway Data Director unreachable'
Severity	Major
Condition	This alarm is raised when data director is not reachable from Egress Gateway.
OID	1.3.6.1.4.1.323.5.3.37.1.2.48
Metric Used	oc_egressgateway_dd_unreachable
Recommended Actions	Alert gets cleared automatically when the connection with data director is established.

8.3.1.100 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Table 8-116 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Field	Details
Description	This alarm is raised when OCNADD is not reachable.
Summary	'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} BSF Ingress Gateway Data Director unreachable'
Severity	Major
Condition	This alarm is raised when data director is not reachable from Ingress Gateway.
OID	1.3.6.1.4.1.323.5.3.37.1.2.47
Metric Used	oc_ingressgateway_dd_unreachable
Recommended Actions	Alert gets cleared automatically when the connection with data director is established.

8.3.2 PCF Alerts

This section provides information on PCF alerts.

8.3.2.1 INGRESS_ERROR_RATE_ABOVE_10_PERCENT_PER_POD

Table 8-117 INGRESS_ERROR_RATE_ABOVE_10_PERCENT_PER_POD

Field	Details
Description	Ingress Error Rate above 10 Percent in {{$labels.kubernetes_name}} in {{$labels.kubernetes_namespace}}
Summary	Transaction Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity	Critical
Condition	The total number of failed transactions per pod is above 10 percent of the total transactions.
OID	1.3.6.1.4.1.323.5.3.36.1.2.2
Metric Used	ocpm_ingress_response_total
Recommended Actions	The alert gets cleared when the number of failed transactions are below 10% of the total transactions. To assess the reason for failed transactions, perform the following steps: Check the service specific metrics to understand the service specific errors. The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH. For any additional guidance, contact My Oracle Support.

8.3.2.2 SM_TRAFFIC_RATE_ABOVE_THRESHOLD

Table 8-118 SM_TRAFFIC_RATE_ABOVE_THRESHOLD

Field	Details
Description	SM service Ingress traffic Rate is above threshold of Max MPS (current value is: {{ $value }})
Summary	Traffic Rate is above 90 Percent of Max requests per second
Severity	Major
Condition	The total SM service Ingress traffic rate has crossed the configured threshold of 900 TPS. Default value of this alert trigger point in PCF_Alertrules.yaml file is when SM service Ingress Rate crosses 90% of maximum ingress requests per second.
OID	1.3.6.1.4.1.323.5.3.36.1.2.3
Metric Used	ocpm_ingress_request_total{servicename_3gpp="npcf-smpolicycontrol"}
Recommended Actions	The alert gets cleared when the Ingress traffic rate falls below the threshold. Note: Threshold levels can be configured using the `PCF_Alertrules.yaml` file. It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic: Refer Ingress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes. Check Ingress Gateway logs on Kibana to determine the reason for the errors. For any additional guidance, contact My Oracle Support.

8.3.2.3 SM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Table 8-119 SM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT

Field	Details
Description	Transaction Error Rate detected above 10 Percent of Total on SM service (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 10 Percent of Total Transactions
Severity	Critical
Condition	The number of failed transactions is above 10 percent of the total transactions.
OID	1.3.6.1.4.1.323.5.3.36.1.2.4
Metric Used	ocpm_ingress_response_total
Recommended Actions	The alert gets cleared when the number of failed transactions are below 10% of the total transactions. To assess the reason for failed transactions, perform the following steps: Check the service specific metrics to understand the service specific errors. For instance: `ocpm_ingress_response_total{servicename_3gpp="npcf-smpolicycontrol",response_code!~"2.*"}` The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH. For any additional guidance, contact My Oracle Support.

8.3.2.4 SM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Table 8-120 SM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT

Field	Details
Description	Egress Transaction Error Rate detected above 1 Percent of Total Transactions (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 1 Percent of Total Transactions
Severity	Minor
Condition	The number of failed transactions is above 1 percent of the total transactions.
OID	1.3.6.1.4.1.323.5.3.36.1.2.5
Metric Used	system_operational_state == 1
Recommended Actions	The alert gets cleared when the number of failed transactions are below 1% of the total transactions. To assess the reason for failed transactions, perform the following steps: Check the service specific metrics to understand the service specific errors. For instance: `ocpm_egress_response_total{servicename_3gpp="npcf-smpolicycontrol",response_code!~"2.*"}` The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH. For any additional guidance, contact My Oracle Support.

8.3.2.5 PCF_CHF_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Table 8-121 PCF_CHF_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD

Field	Details
Description	User service Ingress traffic Rate from CHF is above threshold of Max MPS (current value is: {{ $value }})
Summary	Traffic Rate is above 90 Percent of Max requests per second
Severity	Major
Condition	The total User Service Ingress traffic rate from CHF has crossed the configured threshold of 900 TPS. Default value of this alert trigger point in PCF_Alertrules.yaml file is when user service Ingress Rate from CHF crosses 90% of maximum ingress requests per second.
OID	1.3.6.1.4.1.323.5.3.36.1.2.11
Metric Used	ocpm_userservice_inbound_count_total{service_resource="chf-service"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.2.6 PcfChfEgressErrorRateAbove10Percent

Table 8-122 PcfChfEgressErrorRateAbove10Percent

Field	Details
Description	Egress Transaction Error Rate detected above 10 Percent of Total on User service (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 10 Percent of Total Transactions
Severity	Critical
Condition	The number of failed transactions from UDR is more than 10 percent of the total transactions.
OID	1.3.6.1.4.1.323.5.3.36.1.2.12
Metric Used	ocpm_chf_tracking_response_total{servicename_3gpp="nchf-spendinglimitcontrol",response_code!~"2.*"}
Recommended Actions	The alert gets cleared when the number of failure transactions falls below the configured threshold. Note: Threshold levels can be configured using the `PCF_Alertrules.yaml` file. It is recommended to assess the reason for failed transactions. Perform the following steps to analyze the cause of increased traffic: Refer Egress Gateway section in Grafana to determine increase in 4xx and 5xx error response codes. Check Egress Gateway logs on Kibana to determine the reason for the errors. For any additional guidance, contact My Oracle Support.

8.3.2.7 PcfChfIngressErrorAboveMajorThreshold

Table 8-123 PcfChfIngressErrorAboveMajorThreshold

Field	Details
Description	Ingress Timeout Error Rate detected above 10 Percent of Total towards CHF service (current value is: {{ $value }})
Summary	Timeout Error Rate detected above 10 Percent of Total Transactions
Severity	Major
Condition	The number of failed transactions due to timeout is above 10 percent of the total transactions for CHF service.
OID	1.3.6.1.4.1.323.5.3.36.1.2.17
Metric Used	ocpm_chf_tracking_request_timeout_total{servicename_3gpp="nchf-spendinglimitcontrol"}
Recommended Actions	The alert gets cleared when the number of failed transactions due to timeout are below 10% of the total transactions. To assess the reason for failed transactions, perform the following steps: Check the service specific metrics to understand the service specific errors. For instance: `ocpm_chf_tracking_request_timeout_total{servicename_3gpp="nchf-spendinglimitcontrol"}` The service specific errors can be further filtered for errors specific to a method such as GET, PUT, POST, DELETE, and PATCH. For any additional guidance, contact My Oracle Support.

8.3.2.8 PCF_PENDING_BINDING_SITE_TAKEOVER

Table 8-124 PCF_PENDING_BINDING_SITE_TAKEOVER

Field	Details
Description	The site takeover configuration has been activated
Summary	The site takeover configuration has been activated
Severity	CRITICAL
Condition	sum by (application, container, namespace) (changes(occnp_pending_binding_site_takeover_total[2m])) > 0
OID	1.3.6.1.4.1.323.5.3.52.1.2.45
Metric Used	occnp_pending_binding_site_takeover_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.2.9 PCF_PENDING_BINDING_THRESHOLD_LIMIT_REACHED

Table 8-125 PCF_PENDING_BINDING_THRESHOLD_LIMIT_REACHED

Field	Details
Description	The Pending Operation table threshold has been reached.
Summary	The Pending Operation table threshold has been reached.
Severity	CRITICAL
Condition	sum by (application, container, namespace) (changes(occnp_threshold_limit_reached_total[2m])) > 0
OID	1.3.6.1.4.1.323.5.3.52.1.2.46
Metric Used	occnp_threshold_limit_reached_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.2.10 PCF_PENDING_BINDING_RECORDS_COUNT

Table 8-126 PCF_PENDING_BINDING_RECORDS_COUNT

Field	Details
Description	An attempt to internally recreate a PCF binding has been triggered by PCF
Summary	An attempt to internally recreate a PCF binding has been triggered by PCF
Severity	MINOR
Condition	sum by (application, container, namespace) (changes(occnp_pending_operation_records_count[10s])) > 0
OID	1.3.6.1.4.1.323.5.3.52.1.2.47
Metric Used	occnp_pending_operation_records_count
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.2.11 TDF_CONNECTION_DOWN

Table 8-127 TDF_CONNECTION_DOWN

Field	Details
Description	TDF connection is down.
Summary	TDF connection is down.
Severity	Critical
Condition	Diameter gateway raises an alert any time there is a disconnection with TDF peer node that is configured.
OID	1.3.6.1.4.1.323.5.3.52.1.2.48
Metric Used	occnp_diam_conn_app_network{applicationName="Sd"} == 0
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.2.12 AUTONOMOUS_SUBSCRIPTION_FAILURE

Table 8-128 AUTONOMOUS_SUBSCRIPTION_FAILURE

Field	Details
Description	Autonomous subscription failed for a configured Slice Load Level
Summary	Autonomous subscription failed for a configured Slice Load Level
Severity	Critical
Condition	The number of failed Autonomous Subscription for a configured Slice Load Leve in nwdaf-agent is greater than zero.
OID	1.3.6.1.4.1.323.5.3.52.1.2.49
Metric Used	subscription_failure{requestType="autonomous"}
Recommended Actions	The alert gets cleared when the failed Autonomous Subscription is corrected. To clear the alert, perform the following steps: Delete the Slice Load Level configuration. Re-provision the Slice Load Level configuration. For any additional guidance, contact My Oracle Support.

8.3.2.13 AM_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT

Table 8-129 AM_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT

Field	Details
Description	AM Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }})
Summary	AM Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }})
Severity	MINOR
Condition	(sum(rate(http_out_conn_response_total{pod=~".amservice.",responseCode!~"2.",servicename3gpp="npcf-am-policy-control"}[1d])) / sum(rate(http_out_conn_response_total{pod=~".amservice.",servicename3gpp="npcf-am-policy-control"}[1d]))) 100 >= 1
OID	1.3.6.1.4.1.323.5.3.52.1.2.54
Metric Used	http_out_conn_response_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.2.14 AM_AR_ERROR_RATE_ABOVE_1_PERCENT

Table 8-130 AM_AR_ERROR_RATE_ABOVE_1_PERCENT

Field	Details
Description	Alternate Routing Error Rate detected above 1 Percent of Total on AM Service (current value is: {{ $value }})
Summary	Alternate Routing Error Rate detected above 1 Percent of Total on AM Service (current value is: {{ $value }})
Severity	MINOR
Condition	(sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".amservice.",responseCode!~"2.",servicename3gpp="npcf-am-policy-control"}[1d])) / sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".amservice.",servicename3gpp="npcf-am-policy-control"}[1d]))) 100 >= 1
OID	1.3.6.1.4.1.323.5.3.52.1.2.55
Metric Used	ocpm_ar_response_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.2.15 UE_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT

Table 8-131 UE_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT

Field	Details
Description	UE Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }})
Summary	UE Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }})
Severity	MINOR
Condition	(sum(rate(http_out_conn_response_total{pod=~".ueservice.",responseCode!~"2.",servicename3gpp="npcf-ue-policy-control"}[1d])) / sum(rate(http_out_conn_response_total{pod=~".ueservice.",servicename3gpp="npcf-ue-policy-control"}[1d]))) 100 >= 1
OID	1.3.6.1.4.1.323.5.3.52.1.2.56
Metric Used	http_out_conn_response_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.2.16 UE_AR_ERROR_RATE_ABOVE_1_PERCENT

Table 8-132 UE_AR_ERROR_RATE_ABOVE_1_PERCENT

Field	Details
Description	Alternate Routing Error Rate detected above 1 Percent of Total on UE Service (current value is: {{ $value }})
Summary	Alternate Routing Error Rate detected above 1 Percent of Total on UE Service (current value is: {{ $value }})
Severity	MINOR
Condition	(sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".ueservice.",responseCode!~"2.",servicename3gpp="npcf-ue-policy-control"}[1d])) / sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".ueservice.",servicename3gpp="npcf-ue-policy-control"}[1d]))) 100 >= 1
OID	1.3.6.1.4.1.323.5.3.52.1.2.57
Metric Used	ocpm_ar_response_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3 PCRF Alerts

This section provides information about PCRF alerts.

8.3.3.1 PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD

PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD

Table 8-133 PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	PRE fail count exceeds the critical threshold limit.
Summary	Alert PRE unreachable NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	PRE fail count exceeds the critical threshold limit.
OID	1.3.6.1.4.1.323.5.3.44.1.2.9
Metric Used	http_out_conn_response_total{container="pcrf-core", responseCode!~"2.*", serviceResource="PRE"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.2 PcrfDown

Table 8-134 PcrfDown

Field	Details
Description	PCRF Service is down
Summary	Alert PCRF_DOWN NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	None of the pods of the PCRF service are available.
OID	1.3.6.1.4.1.323.5.3.44.1.2.33
Metric Used	appinfo_service_running{service=~".*pcrf-core"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.3 CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-135 CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	CCA fail count exceeds the critical threshold limit
Summary	Alert CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The failure rate of CCA messages has exceeded the configured threshold limit.
OID	1.3.6.1.4.1.323.5.3.44.1.2.13
Metric Used	occnp_diam_response_local_total{msgType=~"CCA.", responseCode!~"2."}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.4 AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-136 AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	AAA fail count exceeds the critical threshold limit
Summary	Alert AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The failure rate of AAA messages has exceeded the critical threshold limit.
OID	1.3.6.1.4.1.323.5.3.36.1.2.34
Metric Used	occnp_diam_response_local_total{msgType=~"AAA.", responseCode!~"2."}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.5 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-137 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	RAA Rx fail count exceeds the critical threshold limit
Summary	Alert RAA_Rx_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The failure rate of RAA Rx messages has exceeded the configured threshold limit.
OID	1.3.6.1.4.1.323.5.3.36.1.2.35
Metric Used	occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"2.*"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.6 RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-138 RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	RAA Gx fail count exceeds the critical threshold limit
Summary	Alert RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The failure rate of RAA Gx messages has exceeded the configured threshold limit.
OID	1.3.6.1.4.1.323.5.3.44.1.2.18
Metric Used	occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"2.*"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.7 ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-139 ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	ASA fail count exceeds the critical threshold limit
Summary	Alert ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The failure rate of ASA messages has exceeded the configured threshold limit.
OID	1.3.6.1.4.1.323.5.3.44.1.2.17
Metric Used	occnp_diam_response_local_total{msgType=~"ASA.", responseCode!~"2."}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.8 ASATimeoutlCountExceedsThreshold

ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-140 ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	ASA timeout count exceeds the critical threshold limit
Summary	Alert ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The timeout rate of ASA messages has exceeded the configured threshold limit.
OID	1.3.6.1.4.1.323.5.3.44.1.2.31
Metric Used	occnp_diam_response_local_total{msgType="ASA", responseCode="timeout"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.9 RAA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

RAA Rx Timeout Count Exceeds Critical Threshold

Table 8-141 RAA Rx Timeout Count Exceeds Critical Threshold

Field	Details
Description	RAA Rx timeout count exceeds the critical threshold limit
Summary	Alert RAA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The timeout rate of RAA Rx messages has exceeded the configured threshold limit.
OID	1.3.6.1.4.1.323.5.3.36.1.2.36
Metric Used	occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"timeout"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.10 RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 8-142 RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	RAA Gx timeout count exceeds the critical threshold limit
Summary	Alert RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The timeout rate of RAA Gx messages has exceeded the configured threshold limit.
OID	1.3.6.1.4.1.323.5.3.44.1.2.32
Metric Used	occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"timeout"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.11 RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Table 8-143 RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Field	Details
Description	CCA, AAA, RAA, ASA and STA error rate combined is above 10 percent
Summary	Alert RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The combined failure rate of CCA, AAA, RAA, ASA, and STA messages is more than 10% of the total responses.
OID	1.3.6.1.4.1.323.5.3.36.1.2.37
Metric Used	occnp_diam_response_local_total{ responseCode!~"2.*"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.12 Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Table 8-144 Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Field	Details
Description	Rx error rate combined is above 10 percent
Summary	Alert Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The failure rate of Rx responses is more than 10% of the total responses.
OID	1.3.6.1.4.1.323.5.3.36.1.2.38
Metric Used	occnp_diam_response_local_total{ responseCode!~"2.*", appType="Rx"}
Recommended Actions	For any additional guidance, contact My Oracle Support.

8.3.3.13 Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Table 8-145 Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT

Field	Details
Description	Gx error rate combined is above 10 percent
Summary	Alert Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }}
Severity	Critical
Condition	The failure rate of Gx responses is more than 10% of the total responses.
OID	1.3.6.1.4.1.323.5.3.36.1.2.39
Metric Used	occnp_diam_response_local_total{ responseCode!~"2.*", appType="Gx"}
Recommended Actions	For any additional guidance, contact My Oracle Support.