8 Alerts
This section provides information on Policy alerts and their configuration.
Note:
The performance and capacity of the system can vary based on the call model, configuration, including but not limited to the deployed policies and corresponding data, for example, policy tables.You can configure alerts in Prometheus and Alertrules.yaml
file.
The following table describes the various severity types of alerts generated by Policy:
Table 8-1 Alerts Levels or Severity Types
Alerts Levels / Severity Types | Definition |
---|---|
Critical | Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions can affect the service of Policy. |
Major | Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions can affect the service of Policy. |
Minor | Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions can affect the service of Policy. |
Info or Warn (Informational) | Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of Policy. |
8.1 Configuring Alerts
This section describes how to configure alerts in Policy. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.
Note:
- Sample alert files are packaged with Policy Custom
Templates. The
Policy Custom Templates.zip
file can be downloaded from MOS. Unzip the folder to access the following files:- Common_Alertrules_cne1.9+.yaml
- PCF_Alertrules_cne1.9+.yaml
- PCRF_Alertrules_cne1.9+.yaml
- Name in the metadata
section should be unique while applying more than one unique
files. For
example:
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: null labels: role: cnc-alerting-rules name: occnp-pcf-alerting-rules
- If required, edit the threshold values of various alerts in the alert files before configuring the alerts.
- The Alert Manager and Prometheus tools should run in CNE namespace, for example, occne-infra.
- Use the following table to select the appropriate files on the
basis of deployment mode and CNE version
Table 8-2 Alert Configuration
Deployment Mode CNE 1.9+ Converged Mode Common_Alertrules_cne1.9+.yaml
PCF_Alertrules_cne1.9+.yaml
PCRF_Alertrules_cne1.9+.yaml
PCF only Common_Alertrules_cne1.9+.yaml
PCF_Alertrules_cne1.9+.yaml
PCRF only Common_Alertrules_cne1.9+.yaml
PCRF_Alertrules_cne1.9+.yaml
Configuring Alerts in Prometheus for CNE 1.9.0 and later versions
- Copy the the required file to the Bastion Host.
- To create or replace the PrometheusRule
CRD, run the following command:
$ kubectl apply -f Common_Alertrules_cne1.9+.yaml -n <namespace>
$ kubectl apply -f PCF_Alertrules_cne1.9+.yaml -n <namespace>
$ kubectl apply -f PCRF_Alertrules_cne1.9+.yaml -n <namespace>
Note:
This is a sample command for Converged mode of deployment.To verify if the CRD is created, run the following command:kubectl get prometheusrule -n <namespace>
Example:kubectl get prometheusrule -n occnp
- Verify the alerts in the Prometheus GUI. To do so, select the Alerts tab, and view alert details by selecting any individual rule from the list.
Validating Alerts
- Open the Prometheus server from your browser using the <IP>:<Port>
- Navigate to Status and then Rules
- Search Policy. Policy Alerts list is displayed.
If you are unable to see the alerts, verify if the alert file is correct and then try again.
Adding worker node name in metrics
- Edit the configmap
occne-prometheus-server
in namespace -occne-infra
. - Locate the the following
job:
job_name: kubernetes-pods kubernetes_sd_configs: role: pod
- Add the following in the
relabel_configs
:action: replace source_labels: __meta_kubernetes_pod_node_name target_label: kubernetes_pod_node_name
8.2 Configuring SNMP Notifier
This section describes the procedure to configure SNMP Notifier.
- Run the following command to edit the
deployment:
$ kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>
Example:
$ kubectl edit deploy occne-snmp-notifier -n occne-infra
SNMP deployment yaml file is displayed.
- Edit the SNMP destination in the deployment
yaml file as
follows:
--snmp.destination=<destination_ip>:<destination_port>
Example:
--snmp.destination=10.75.203.94:162
- Save the file.
$ docker logs <trapd_container_id>
Figure 8-1 Sample output for SNMP Trap

There are two MIB files which are used to generate the traps. Update these files along with the Alert file in order to fetch the traps in their environment.
toplevel.mib
This is the top level mib file, where the Objects and their data types are defined.
policy-alarm-mib.mib
This file fetches objects from the top level mib file and these objects can be selected for display.
Note:
MIB files are packaged along with CNC Policy Custom Templates. Download the file from MOS. For more information on downloading custom templates, see Oracle Communications Cloud Native Core Policy Installation and Upgrade Guide.8.3 List of Alerts
- Common Alerts - This category of alerts is common and required for all three modes of deployment.
- PCF Alerts - This category of alerts is specific to PCF microservices and required for Converged and PCF only modes of deployment.
- PCRF Alerts - This category of alerts is specific to PCRF microservices and required for Converged and PCRF only modes of deployment.
8.3.1 Common Alerts
This section provides information about alerts that are common for PCF and PCRF.
8.3.1.1 POD_CONGESTION_L1
Table 8-3 POD_CONGESTION_L1
Field | Details |
---|---|
Name in Alert Yaml File | PodCongestionL1 |
Description | Alert when cpu of pod is in CONGESTION_L1 state. |
Summary | Alert when cpu of pod is in CONGESTION_L1 state. |
Severity | Critical |
Condition | occnp_pod_resource_congestion_state{type="cpu",container!~"bulwark|diam-gateway"} == 2 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.71 |
Metric Used | occnp_pod_resource_congestion_state |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.2 POD_CONGESTION_L2
Table 8-4 POD_CONGESTION_L2
Field | Details |
---|---|
Name in Alert Yaml File | PodCongestionL2 |
Description | Alert when cpu of pod is in CONGESTION_L2 state. |
Summary | Alert when cpu of pod is in CONGESTION_L2 state. |
Severity | Critical |
Condition | occnp_pod_resource_congestion_state{type="cpu"} == 3 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.72 |
Metric Used | occnp_pod_resource_congestion_state |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.3 POD_PENDING_REQUEST_CONGESTION_L1
Table 8-5 POD_PENDING_REQUEST_CONGESTION_L1
Field | Details |
---|---|
Name in Alert Yaml File | PodPendingRequestCongestionL1 |
Description | Alert when queue of pod is in CONGESTION_L1 state. |
Summary | Alert when queue of pod is in CONGESTION_L1 state. |
Severity | critical |
Condition | occnp_pod_resource_congestion_state{type="queue",container!~"bulwark|diam-gateway"} == 2 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.73 |
Metric Used | occnp_pod_resource_congestion_state |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.4 POD_PENDING_REQUEST_CONGESTION_L2
Table 8-6 POD_PENDING_REQUEST_CONGESTION_L2
Field | Details |
---|---|
Name in Alert Yaml File | PodPendingRequestCongestionL2 |
Description | Alert when queue of pod is in CONGESTION_L2 state. |
Summary | Alert when queue of pod is in CONGESTION_L2 state. |
Severity | critical |
Condition | occnp_pod_resource_congestion_state{type="queue"} == 3 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.74 |
Metric Used | occnp_pod_resource_congestion_state |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.5 POD_CPU_CONGESTION_L1
Table 8-7 POD_CPU_CONGESTION_L1
Field | Details |
---|---|
Name in Alert Yaml File | PodCPUCongestionL1 |
Description | Alert when cpu of pod is in CONGESTION_L1 state. |
Summary | Alert when cpu of pod is in CONGESTION_L1 state.Alert when pod is in CONGESTION_L1 state. |
Severity | Critical |
Condition | occnp_pod_resource_congestion_state{type="cpu",container!~"bulwark|diam-gateway"} == 2 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.73 |
Metric Used | occnp_pod_resource_congestion_state |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.6 POD_CPU_CONGESTION_L2
Table 8-8 POD_CPU_CONGESTION_L2
Field | Details |
---|---|
Name in Alert Yaml File | PodCPUCongestionL2 |
Description | Alert when cpu of pod is in CONGESTION_L2 state. |
Summary | Alert when cpu of pod is in CONGESTION_L2 state. |
Severity | critical |
Condition | occnp_pod_resource_congestion_state{type="cpu"} == 3 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.74 |
Metric Used | occnp_pod_resource_congestion_state |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.7 Pod_Memory_DoC
Table 8-9 Pod_Memory_DoC
Field | Details |
---|---|
Description | Pod Resource Congestion status of {{$labels.service}} service is DoC for Memory type |
Summary | Pod Resource Congestion status of {{$labels.service}} service is DoC for Memory type |
Severity | Major |
Condition | occnp_pod_resource_congestion_state{type="memory"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.31 |
Metric Used | occnp_pod_resource_congestion_state |
Recommended Actions |
Alert triggers based on the resource limit usage and load shedding
configurations in congestion control. The CPU, Memory, and queue
usage can be referred using the Grafana Dashboard.
Note: Threshold levels can be configured using thePCF_Alertrules.yaml file.
For any additional guidance, contact My Oracle Support. |
8.3.1.8 Pod_Memory_Congested
Table 8-10 Pod_Memory_Congested
Field | Details |
---|---|
Description | Pod Resource Congestion status of {{$labels.service}} service is congested for Memory type |
Summary | Pod Resource Congestion status of {{$labels.service}} service is congested for Memory type |
Severity | Critical |
Condition | occnp_pod_resource_congestion_state{type="memory"} == 2 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.32 |
Metric Used | occnp_pod_resource_congestion_state |
Recommended Actions |
Alert triggers based on the resource limit usage and load shedding configurations in congestion control. The CPU, Memory, and queue usage can be referred using the Grafana Dashboard. For any additional guidance, contact My Oracle Support. |
8.3.1.9 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-11 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | RAA Rx fail count exceeds the critical threshold limit. |
Summary | RAA Rx fail count exceeds the critical threshold limit. |
Severity | CRITICAL |
Condition | sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236"}[5m])) * 100 > 90 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.35 |
Metric Used | occnp_diam_response_local_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.10 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-12 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | RAA Rx fail count exceeds the major threshold limit. |
Summary | RAA Rx fail count exceeds the major threshold limit. |
Severity | MAJOR |
Condition | sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="RAA", appId="16777236"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) * 100 <= 90 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.35 |
Metric Used | occnp_diam_response_local_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.11 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-13 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | RAA Rx fail count exceeds the minor threshold limit. |
Summary | RAA Rx fail count exceeds the minor threshold limit. |
Severity | MINOR |
Condition | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="RAA"}[5m])) * 100 <= 80 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.35 |
Metric Used | occnp_diam_response_local_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.12 ASA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-14 ASA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | ASA Rx fail count exceeds the critical threshold limit. |
Summary | ASA Rx fail count exceeds the critical threshold limit. |
Severity | CRITICAL |
Condition | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 90 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.66 |
Metric Used | occnp_diam_response_local_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.13 ASA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-15 ASA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | ASA Rx fail count exceeds the major threshold limit. |
Summary | ASA Rx fail count exceeds the major threshold limit. |
Severity | MAJOR |
Condition | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 90 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.66 |
Metric Used | occnp_diam_response_local_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.14 ASA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-16 ASA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | ASA Rx fail count exceeds the minor threshold limit. |
Summary | ASA Rx fail count exceeds the minor threshold limit. |
Severity | MINOR |
Condition | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 80 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.66 |
Metric Used | occnp_diam_response_local_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.15 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-17 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | ASA Rx timeout count exceeds the minor threshold limit |
Summary | ASA Rx timeout count exceeds the minor threshold limit |
Severity | MINOR |
Condition | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 80 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.67 |
Metric Used | - |
Recommended Actions | - |
8.3.1.16 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-18 ASA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | ASA Rx timeout count exceeds the major threshold limit |
Summary | ASA Rx timeout count exceeds the major threshold limit |
Severity | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 <= 90 |
Condition | MAJOR |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.67 |
Metric Used | - |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.17 ASA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-19 ASA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | ASA Rx timeout count exceeds the critical threshold limit |
Summary | ASA Rx timeout count exceeds the critical threshold limit |
Severity | CRITICAL |
Condition | sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA",responseCode="timeout"}[5m])) / sum(rate(occnp_diam_response_local_total{appId="16777236",msgType="ASA"}[5m])) * 100 > 90 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.67 |
Metric Used | - |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.18 SCP_PEER_UNAVAILABLE
Table 8-20 SCP_PEER_UNAVAILABLE
Field | Details |
---|---|
Description | Configured SCP peer is unavailable. |
Summary | Configured SCP peer is unavailable. |
Severity | Major |
Condition | occnp_oc_egressgateway_peer_health_status != 0. SCP peer [ {{$labels.peer}} ] is unavailable. |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.60 |
Metric Used | occnp_oc_egressgateway_peer_health_status |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.19 SCP_PEER_SET_UNAVAILABLE
Table 8-21 SCP_PEER_SET_UNAVAILABLE
Field | Details |
---|---|
Description | None of the SCP peer available for configured peerset. |
Summary | None of the SCP peer available for configured peerset. |
Severity | Critical |
Condition | One of the SCPs has been marked unhealthy. |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.61 |
Metric Used | oc_egressgateway_peer_count and oc_egressgateway_peer_available_count |
Recommended Actions |
NF clears the critical alarm when atleast one SCP peer in a peerset becomes available such that all other SCP peers in the given peerset are still unavailable. For any additional guidance, contact My Oracle Support. |
8.3.1.20 STALE_CONFIGURATION
Table 8-22 STALE_CONFIGURATION
Field | Details |
---|---|
Description | In last 10 minutes, the current service config_level does not match the config_level from the config-server. |
Summary | In last 10 minutes, the current service config_level does not match the config_level from the config-server. |
Severity | Major |
Condition | (sum by(namespace) (topic_version{app_kubernetes_io_name="config-server",topicName="config.level"})) / (count by(namespace) (topic_version{app_kubernetes_io_name="config-server",topicName="config.level"})) != (sum by(namespace) (topic_version{app_kubernetes_io_name!="config-server",topicName="config.level"})) / (count by(namespace) (topic_version{app_kubernetes_io_name!="config-server",topicName="config.level"})) |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.62 |
Metric Used | topic_version |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.21 POLICY_SERVICES_DOWN
Table 8-23 POLICY_SERVICES_DOWN
Field | Details |
---|---|
Name in Alert Yaml File | PCF_SERVICES_DOWN |
Description | {{$labels.service}} service is not running. |
Summary | {{$labels.service}} service is not running. |
Severity | Critical |
Condition | None of the pods of the CNC Policy application are available. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.1 |
Metric Used | appinfo_service_running{vendor="Oracle", application="occnp", category!=""}!= 1 |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.22 DIAM_TRAFFIC_RATE_ABOVE_THRESHOLD
Table 8-24 DIAM_TRAFFIC_RATE_ABOVE_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | DiamTrafficRateAboveThreshold |
Description | Diameter Connector Ingress traffic Rate is above threshold of Max MPS (current value is: {{ $value }}) |
Summary | Traffic Rate is above 90 Percent of Max requests per second. |
Severity | Major |
Condition | The total Ingress traffic rate for Diameter connector
has crossed the configured threshold of 900 TPS.
Default value of this alert trigger point in Common_Alertrules.yaml file is when Diameter Connector Ingress Rate crosses 90% of maximum ingress requests per second. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.6 |
Metric Used | ocpm_ingress_request_total |
Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels can be
configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
For any additional guidance, contact My Oracle
Support.
|
8.3.1.23 DIAM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
Table 8-25 DIAM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
Field | Details |
---|---|
Name in Alert Yaml File | DiamIngressErrorRateAbove10Percent |
Description | Transaction Error Rate detected above 10 Percent of Total on Diameter Connector (current value is: {{ $value }}) |
Summary | Transaction Error Rate detected above 10 Percent of Total Transactions. |
Severity | Critical |
Condition | The number of failed transactions is above 10 percent of the total transactions on Diameter Connector. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.7 |
Metric Used | ocpm_ingress_response_total |
Recommended Actions | The alert gets cleared when the number of failed
transactions are below 10% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle
Support.
|
8.3.1.24 DIAM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
Table 8-26 DIAM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
Field | Details |
---|---|
Name in Alert Yaml File | DiamEgressErrorRateAbove1Percent |
Description | Egress Transaction Error Rate detected above 1 Percent of Total on Diameter Connector (current value is: {{ $value }}) |
Summary | Transaction Error Rate detected above 1 Percent of Total Transactions |
Severity | Minor |
Condition | The number of failed transactions is above 1 percent of the total Egress Gateway transactions on Diameter Connector. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.8 |
Metric Used | ocpm_egress_response_total |
Recommended Actions | The alert gets cleared when the number of failed
transactions are below 1% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle
Support.
|
8.3.1.25 UDR_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
Table 8-27 UDR_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | PcfUdrIngressTrafficRateAboveThreshold |
Description | User service Ingress traffic Rate from UDR is above threshold of Max MPS (current value is: {{ $value }}) |
Summary | Traffic Rate is above 90 Percent of Max requests per second |
Severity | Major |
Condition | The total User Service Ingress traffic rate from UDR has
crossed the configured threshold of 900 TPS.
Default value of this alert trigger point in Common_Alertrules.yaml file is when user service Ingress Rate from UDR crosses 90% of maximum ingress requests per second. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.9 |
Metric Used | ocpm_userservice_inbound_count_total{service_resource="udr-service"} |
Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels can be
configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
For any additional guidance, contact My Oracle Support. |
8.3.1.26 UDR_EGRESS_ERROR_RATE_ABOVE_10_PERCENT
Table 8-28 UDR_EGRESS_ERROR_RATE_ABOVE_10_PERCENT
Field | Details |
---|---|
Name in Alert Yaml File | PcfUdrEgressErrorRateAbove10Percent |
Description | Egress Transaction Error Rate detected above 10 Percent of Total on User service (current value is: {{ $value }}) |
Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
Severity | Critical |
Condition | The number of failed transactions from UDR is more than 10 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.10 |
Metric Used | ocpm_udr_tracking_response_total{servicename_3gpp="nudr-dr",response_code!~"2.*"} |
Recommended Actions | The alert gets cleared when the number of failure
transactions falls below the configured threshold.
Note:
Threshold levels can be configured using the
It is recommended to assess the reason for failed
transactions. Perform the following steps to analyze the cause of
increased traffic:
For any additional guidance, contact My Oracle Support. |
8.3.1.27 POLICYDS_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
Table 8-29 POLICYDS_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | PolicyDsIngressTrafficRateAboveThreshold |
Description | Ingress Traffic Rate is above threshold of Max MPS (current value is: {{ $value }}) |
Summary | Traffic Rate is above 90 Percent of Max requests per second |
Severity | Critical |
Condition | The total PolicyDS Ingress message rate has crossed the
configured threshold of 900 TPS. 90% of maximum Ingress request rate.
Default value of this alert trigger point in Common_Alertrules.yaml file is when PolicyDS Ingress Rate crosses 90% of maximum ingress requests per second. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.13 |
Metric Used | client_request_total
Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels can be
configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
For any additional guidance, contact My Oracle Support. |
8.3.1.28 POLICYDS_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
Table 8-30 POLICYDS_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
Field | Details |
---|---|
Name in Alert Yaml File | PolicyDsIngressErrorRateAbove10Percent |
Description | Ingress Transaction Error Rate detected above 10 Percent of Totat on PolicyDS service (current value is: {{ $value }}) |
Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
Severity | Critical |
Condition | The number of failed transactions is above 10 percent of the total transactions for PolicyDS service. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.14 |
Metric Used | client_response_total |
Recommended Actions | The alert gets cleared when the number of failed
transactions are below 10% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle Support. |
8.3.1.29 POLICYDS_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
Table 8-31 POLICYDS_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
Field | Details |
---|---|
Name in Alert Yaml File | PolicyDsEgressErrorRateAbove1Percent |
Description | Egress Transaction Error Rate detected above 1 Percent of Total on PolicyDS service (current value is: {{ $value }}) |
Summary | Transaction Error Rate detected above 1 Percent of Total Transactions |
Severity | Minor |
Condition | The number of failed transactions is above 1 percent of the total transactions for PolicyDS service. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.15 |
Metric Used | server_response_total |
Recommended Actions | The alert gets cleared when the number of failed
transactions are below 10% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle Support. |
8.3.1.30 UDR_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD
Table 8-32 UDR_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | PcfUdrIngressTimeoutErrorAboveMajorThreshold |
Description | Ingress Timeout Error Rate detected above 10 Percent of Totat towards UDR service (current value is: {{ $value }}) |
Summary | Timeout Error Rate detected above 10 Percent of Total Transactions |
Severity | Major |
Condition | The number of failed transactions due to timeout is above 10 percent of the total transactions for UDR service. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.16 |
Metric Used | ocpm_udr_tracking_request_timeout_total{servicename_3gpp="nudr-dr"} |
Recommended Actions | The alert gets cleared when the number of failed
transactions due to timeout are below 10% of the total transactions.
To assess the reason for failed transactions, perform
the following steps:
For any additional guidance, contact My Oracle Support. |
8.3.1.31 DB_TIER_DOWN_ALERT
Table 8-33 DB_TIER_DOWN_ALERT
Field | Details |
---|---|
Name in Alert Yaml File | DBTierDownAlert |
Description | DB cannot be reachable. |
Summary | DB cannot be reachable. |
Severity | Critical |
Condition | Database is not available. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.18 |
Metric Used | appinfo_category_running{category="database"} |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.32 CPU_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
Table 8-34 CPU_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | CPUUsagePerServiceAboveMinorThreshold |
Description | CPU usage for {{$labels.service}} service is above 60 |
Summary | CPU usage for {{$labels.service}} service is above 60 |
Severity | Minor |
Condition | A service pod has reached the configured minor threshold (60%) of its CPU usage limits. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.19 |
Metric Used | container_cpu_usage_seconds_total
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the CPU utilization falls
below the minor threshold or crosses the major threshold, in which case
CPUUsagePerServiceAboveMajorThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any additional guidance, contact My Oracle Support. |
8.3.1.33 CPU_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
Table 8-35 CPU_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | CPUUsagePerServiceAboveMajorThreshold |
Description | CPU usage for {{$labels.service}} service is above 80 |
Summary | CPU usage for {{$labels.service}} service is above 80 |
Severity | Major |
Condition | A service pod has reached the configured major threshold (80%) of its CPU usage limits. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.20 |
Metric Used | container_cpu_usage_seconds_total
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the CPU utilization falls
below the major threshold or crosses the critical threshold, in which
case CPUUsagePerServiceAboveCriticalThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any additional guidance, contact My Oracle Support. |
8.3.1.34 CPU_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
Table 8-36 CPU_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | CPUUsagePerServiceAboveCriticalThreshold |
Description | CPU usage for {{$labels.service}} service is above 90 |
Summary | CPU usage for {{$labels.service}} service is above 90 |
Severity | Critical |
Condition | A service pod has reached the configured critical threshold (90%) of its CPU usage limits. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.21 |
Metric Used | container_cpu_usage_seconds_total
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the CPU utilization falls
below the critical threshold.
Note: Threshold levels can be
configured using the For any additional guidance, contact My Oracle Support. |
8.3.1.35 MEMORY_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
Table 8-37 MEMORY_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | MemoryUsagePerServiceAboveMinorThreshold |
Description | Memory usage for {{$labels.service}} service is above 60 |
Summary | Memory usage for {{$labels.service}} service is above 60 |
Severity | Minor |
Condition | A service pod has reached the configured minor threshold (60%) of its memory usage limits. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.22 |
Metric Used | container_memory_usage_bytes
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the memory utilization falls
below the minor threshold or crosses the critical threshold, in which
case MemoryUsagePerServiceAboveMajorThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any additional guidance, contact My Oracle Support. |
8.3.1.36 MEMORY_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
Table 8-38 MEMORY_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | MemoryUsagePerServiceAboveMajorThreshold |
Description | Memory usage for {{$labels.service}} service is above 80 |
Summary | Memory usage for {{$labels.service}} service is above 80 |
Severity | Major |
Condition | A service pod has reached the configured major threshold (80%) of its memory usage limits. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.23 |
Metric Used | container_memory_usage_bytes
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the memory utilization falls
below the major threshold or crosses the critical threshold, in which
case MemoryUsagePerServiceAboveCriticalThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any additional guidance, contact My Oracle Support. |
8.3.1.37 MEMORY_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
Table 8-39 MEMORY_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | MemoryUsagePerServiceAboveCriticalThreshold |
Description | Memory usage for {{$labels.service}} service is above 90 |
Summary | Memory usage for {{$labels.service}} service is above 90 |
Severity | Critical |
Condition | A service pod has reached the configured critical threshold (90%) of its memory usage limits. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.24 |
Metric Used | container_memory_usage_bytes
Note: This is a Kubernetes used for instance availability monitoring. If the metric is not available, use similar metrics exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the memory utilization falls
below the critical threshold.
Note: Threshold levels can be
configured using the For any additional guidance, contact My Oracle Support. |
8.3.1.38 POD_CONGESTED
Table 8-40 POD_CONGESTED
Field | Details |
---|---|
Name in Alert Yaml File | PodCongested |
Description | The pod congestion status is set to congested. |
Summary | Pod Congestion status of {{$labels.service}} service is congested |
Severity | Critical |
Condition | occnp_pod_congestion_state == 4 |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.26 |
Metric Used | occnp_pod_congestion_state |
Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
8.3.1.39 POD_DANGER_OF_CONGESTION
Table 8-41 POD_DANGER_OF_CONGESTION
Field | Details |
---|---|
Description | The pod congestion status is set to Danger of Congestion. |
Summary | Pod Congestion status of {{$labels.service}} service is DoC |
Severity | Major |
Condition | occnp_pod_resource_congestion_state == 1 |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.25 |
Metric Used | occnp_pod_congestion_state |
Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
8.3.1.40 POD_PENDING_REQUEST_CONGESTED
Table 8-42 POD_PENDING_REQUEST_CONGESTED
Field | Details |
---|---|
Name in Alert Yaml File | PodPendingRequestCongested |
Description | The pod congestion status is set to congested for PendingRequest. |
Summary | Pod Resource Congestion status of {{$labels.service}} service is congested for PendingRequest type. |
Severity | Critical |
Condition | occnp_pod_resource_congestion_state{type="queue"} == 4 |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.28 |
Metric Used | occnp_pod_resource_congestion_state{type="queue"} |
Recommended Actions | The alert gets cleared when the pending requests in the
queue comes below the configured threshold value.
For any additional guidance, contact My Oracle Support. |
8.3.1.41 POD_PENDING_REQUEST_DANGER_OF_CONGESTION
Table 8-43 POD_PENDING_REQUEST_DANGER_OF_CONGESTION
Field | Details |
---|---|
Description | The pod congestion status is set to DoC for pending requests. |
Summary | Pod Resource Congestion status of {{$labels.service}} service is DoC for PendingRequest type. |
Severity | Major |
Condition | occnp_pod_resource_congestion_state{type="queue"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.27 |
Metric Used | occnp_pod_resource_congestion_state{type="queue"} |
Recommended Actions | The alert gets cleared when the pending requests in the
queue comes below the configured threshold value.
For any additional guidance, contact My Oracle Support. |
8.3.1.42 POD_CPU_CONGESTED
Table 8-44 POD_CPU_CONGESTED
Field | Details |
---|---|
Name in Alert Yaml File | PodCPUCongested |
Description | The pod congestion status is set to congested for CPU. |
Summary | Pod Resource Congestion status of {{$labels.service}} service is congested for CPU type. |
Severity | Critical |
Condition | occnp_pod_resource_congestion_state{type="cpu"} == 4 |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.30 |
Metric Used | occnp_pod_resource_congestion_state{type="cpu"} |
Recommended Actions | The alert gets cleared when the system CPU usage comes
below the configured threshold value.
For any additional guidance, contact My Oracle Support. |
8.3.1.43 POD_CPU_DANGER_OF_CONGESTION
Table 8-45 POD_CPU_DANGER_OF_CONGESTION
Field | Details |
---|---|
Description | Pod Resource Congestion status of {{$labels.service}} service is DoC for CPU type. |
Summary | Pod Resource Congestion status of {{$labels.service}} service is DoC for CPU type. |
Severity | Major |
Condition | The pod congestion status is set to DoC for CPU. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.29 |
Metric Used | occnp_pod_resource_congestion_state{type="cpu"} |
Recommended Actions | The alert gets cleared when the system CPU usage comes
below the configured threshold value.
For any additional guidance, contact My Oracle Support. |
8.3.1.44 SERVICE_OVERLOADED
Table 8-46 SERVICE_OVERLOADED
Field | Details |
---|---|
Description | Overload Level of {{$labels.service}} service is L1 |
Summary | Overload Level of {{$labels.service}} service is L1 |
Severity | Minor |
Condition | The overload level of the service is L1. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.40 |
Metric Used | load_level |
Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
Table 8-47 SERVICE_OVERLOADED
Field | Details |
---|---|
Description | Overload Level of {{$labels.service}} service is L2 |
Summary | Overload Level of {{$labels.service}} service is L2 |
Severity | Major |
Condition | The overload level of the service is L2. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.40 |
Metric Used | load_level |
Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
Table 8-48 SERVICE_OVERLOADED
Field | Details |
---|---|
Description | Overload Level of {{$labels.service}} service is L3 |
Summary | Overload Level of {{$labels.service}} service is L3 |
Severity | Critical |
Condition | The overload level of the service is L3. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.40 |
Metric Used | load_level |
Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
8.3.1.45 SERVICE_RESOURCE_OVERLOADED
Alerts when service is in overload state due to memory usage
Table 8-49 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L1 for {{$labels.type}} type |
Summary | {{$labels.service}} service is L1 for {{$labels.type}} type |
Severity | Minor |
Condition | The overload level of the service is L1 due to memory usage. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="memory"} |
Recommended Actions | The alert gets cleared when the memory usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-50 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L2 for {{$labels.type}} type |
Summary | {{$labels.service}} service is L2 for {{$labels.type}} type |
Severity | Major |
Condition | The overload level of the service is L2 due to memory usage. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="memory"} |
Recommended Actions | The alert gets cleared when the memory usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-51 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L3 for {{$labels.type}} type. |
Summary | {{$labels.service}} service is L3 for {{$labels.type}} type |
Severity | Critical |
Condition | The overload level of the service is L3 due to memory usage. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="memory"} |
Recommended Actions | The alert gets cleared when the memory usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Alerts when service is in overload state due to CPU usage
Table 8-52 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L1 for {{$labels.type}} type |
Summary | {{$labels.service}} service is L1 for {{$labels.type}} type |
Severity | Minor |
Condition | The overload level of the service is L1 due to CPU usage. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="cpu"} |
Recommended Actions | The alert gets cleared when the CPU usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-53 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L2 for {{$labels.type}} type |
Summary | {{$labels.service}} service is L2 for {{$labels.type}} type |
Severity | Major |
Condition | The overload level of the service is L2 due to CPU usage. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="cpu"} |
Recommended Actions | The alert gets cleared when the CPU usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-54 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L3 for {{$labels.type}} type |
Summary | {{$labels.service}} service is L3 for {{$labels.type}} type |
Severity | Critical |
Condition | The overload level of the service is L3 due to CPU usage. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="cpu"} |
Recommended Actions | The alert gets cleared when the CPU usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Alerts when service is in overload state due to number of pending messages
Table 8-55 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L1 for {{$labels.type}} type |
Summary | {{$labels.service}} service is L1 for {{$labels.type}} type |
Severity | Minor |
Condition | The overload level of the service is L1 due to number of pending messages. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="svc_pending_count"} |
Recommended Actions | The alert gets cleared when the number of pending
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-56 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L2 for {{$labels.type}} type |
Summary | {{$labels.service}} service is L2 for {{$labels.type}} type |
Severity | Major |
Condition | The overload level of the service is L2 due to number of pending messages. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="svc_pending_count"} |
Recommended Actions | The alert gets cleared when the number of pending
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-57 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L3 for {{$labels.type}} type |
Summary | {{$labels.service}} service is L3 for {{$labels.type}} type |
Severity | Critical |
Condition | The overload level of the service is L3 due to number of pending messages. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="svc_pending_count"} |
Recommended Actions | The alert gets cleared when the number of pending
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Alerts when service is in overload state due to number of failed requests
Table 8-58 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L1 for {{$labels.type}} type. |
Summary | {{$labels.service}} service is L1 for {{$labels.type}} type. |
Severity | Minor |
Condition | The overload level of the service is L1 due to number of failed requests. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="svc_failure_count"} |
Recommended Actions | The alert gets cleared when the number of failed
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-59 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L2 for {{$labels.type}} type. |
Summary | {{$labels.service}} service is L2 for {{$labels.type}} type. |
Severity | Major |
Condition | The overload level of the service is L2 due to number of failed requests. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="svc_failure_count"} |
Recommended Actions | The alert gets cleared when the number of failed
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 8-60 SERVICE_RESOURCE_OVERLOADED
Field | Details |
---|---|
Description | {{$labels.service}} service is L3 for {{$labels.type}} type. |
Summary | {{$labels.service}} service is L3 for {{$labels.type}} type. |
Severity | Critical |
Condition | The overload level of the service is L3 due to number of failed requests. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.41 |
Metric Used | service_resource_overload_level{type="svc_failure_count"} |
Recommended Actions | The alert gets cleared when the number of failed
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
8.3.1.46 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_CRITICAL_THRESHOLD
Table 8-61 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | Notification Transaction Error exceeds the critical threshold limit for a given Subscriber Notification server |
Summary | Transaction Error exceeds the critical threshold limit for a given Subscriber Notification server |
Severity | Critical |
Condition | The number of error responses for a given subscriber notification server exceeds the critical threshold of 1000. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.42 |
Metric Used | http_notification_response_total{responseCode!~"2.*"} |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
Table 8-62 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | Notification Transaction Error exceeds the major threshold limit for a given Subscriber Notification server |
Summary | Transaction Error exceeds the major threshold limit for a given Subscriber Notification server |
Severity | Major |
Condition | The number of error responses for a given subscriber notification server exceeds the major threshold value, that is, between 750 and 1000. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.42 |
Metric Used | http_notification_response_total{responseCode!~"2.*"} |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
Table 8-63 SUBSCRIBER_NOTIFICATION_ERROR_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | Notification Transaction Error exceeds the minor threshold limit for a given Subscriber Notification server |
Summary | Transaction Error exceeds the minor threshold limit for a given Subscriber Notification server |
Severity | Minor |
Condition | The number of error responses for a given subscriber notification server exceeds the minor threshold value, that is, between 500 and 750. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.42 |
Metric Used | http_notification_response_total{responseCode!~"2.*"} |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.47 SYSTEM_IMPAIRMENT_MAJOR
Table 8-64 SYSTEM_IMPAIRMENT_MAJOR
Field | Details |
---|---|
Description | Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage |
Summary | Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage |
Severity | Major |
Condition | Major Impairment alert |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.43 |
Metric Used | db_tier_replication_status |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.48 SYSTEM_IMPAIRMENT_CRITICAL
Table 8-65 SYSTEM_IMPAIRMENT_CRITICAL
Field | Details |
---|---|
Description | Critical Impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage |
Summary | Critical Impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage |
Severity | Critical |
Condition | Critical Impairment alert |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.43 |
Metric Used | db_tier_replication_status |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.49 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN
Table 8-66 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN
Field | Details |
---|---|
Description | System Operational State is now in partial shutdown state. |
Summary | System Operational State is now in partial shutdown state. |
Severity | Major |
Condition | System Operational State is now in partial shutdown state |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.44 |
Metric Used | system_operational_state == 2 |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.50 SYSTEM_OPERATIONAL_STATE_COMPLETE_SHUTDOWN
Table 8-67 SYSTEM_OPERATIONAL_COMPLETE_SHUTDOWN
Field | Details |
---|---|
Description | System Operational State is now in complete shutdown state |
Summary | System Operational State is now in complete shutdown state |
Severity | Info |
Condition | System Operational State is now in complete shutdown state |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.44 |
Metric Used | system_operational_state == 3 |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.51 TDF_CONNECTION_DOWN
Table 8-68 TDF_CONNECTION_DOWN
Field | Details |
---|---|
Description | TDF connection is down. |
Summary | TDF connection is down. |
Severity | Critical |
Condition | occnp_diam_conn_app_network{applicationName="Sd"} == 0 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.48 |
Metric Used | occnp_diam_conn_app_network |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.52 DIAM_CONN_PEER_DOWN
Table 8-69 DIAM_CONN_PEER_DOWN
Field | Details |
---|---|
Description | Diameter connection to peer {{ $labels.peerHost }} is down. |
Summary | Diameter connection to peer is down. |
Severity | Major |
Condition | Diameter connection to peer peerHost in given namespace is down. |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.50 |
Metric Used | occnp_diam_conn_network |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.53 DIAM_CONN_NETWORK_DOWN
Table 8-70 DIAM_CONN_NETWORK_DOWN
Field | Details |
---|---|
Description | All the diameter network connections are down. |
Summary | All the diameter network connections are down. |
Severity | Critical |
Condition | sum by (kubernetes_namespace)(occnp_diam_conn_network) == 0 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.51 |
Metric Used | occnp_diam_conn_network |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.54 DIAM_CONN_BACKEND_DOWN
Table 8-71 DIAM_CONN_BACKEND_DOWN
Field | Details |
---|---|
Description | All the diameter backend connections are down. |
Summary | All the diameter backend connections are down. |
Severity | Critical |
Condition | sum by (kubernetes_namespace)(occnp_diam_conn_backend) == 0 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.52 |
Metric Used | occnp_diam_conn_network |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.55 PerfInfoActiveOverloadThresholdFetchFailed
Table 8-72 PerfInfoActiveOverloadThresholdFetchFailed
Field | Details |
---|---|
Description | The application fails to get the current active overload level threshold data. |
Summary | The application fails to get the current active overload level threshold data. |
Severity | Major |
Condition | active_overload_threshold_fetch_failed == 1 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.53 |
Metric Used | active_overload_threshold_fetch_failed |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.56 SLA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-73 SLA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | SLA Sy fail count exceeds the critical threshold limit |
Summary | SLA Sy fail count exceeds the critical threshold limit |
Severity | Critical |
Condition |
sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 > 90 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.58 |
Metric Used |
occnp_diam_response_local_total |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.57 SLA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-74 SLA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description |
SLA Sy fail count exceeds the major threshold limit |
Summary |
SLA Sy fail count exceeds the major threshold limit |
Severity | Major |
Condition |
sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 <= 90 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.58 |
Metric Used |
occnp_diam_response_local_total |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.58 SLA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-75 SLA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description |
SLA Sy fail count exceeds the minor threshold limit |
Summary |
SLA Sy fail count exceeds the minor threshold limit |
Severity | Minor |
Condition |
sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="SLA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SLA"}[5m])) * 100 <= 80 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.58 |
Metric Used |
occnp_diam_response_local_total |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.59 STA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-76 STA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description |
STA Sy fail count exceeds the critical threshold limit. |
Summary |
STA Sy fail count exceeds the critical threshold limit. |
Severity | Critical |
Condition |
The failure rate of Sy STA responses is more than 90% of the total responses. |
Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 > 90 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.59 |
Metric Used |
occnp_diam_response_local_total |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.60 STA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-77 STA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description |
STA Sy fail count exceeds the major threshold limit. |
Summary |
STA Sy fail count exceeds the major threshold limit. |
Severity | Major |
Condition |
The failure rate of Sy STA responses is more than 80% and less and or equal to 90% of the total responses. |
Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 <= 90 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.59 |
Metric Used |
occnp_diam_response_local_total |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.61 STA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-78 STA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description |
STA Sy fail count exceeds the minor threshold limit. |
Summary |
STA Sy fail count exceeds the minor threshold limit. |
Severity | Minor |
Condition |
The failure rate of Sy STA responses is more than 60% and less and or equal to 80% of the total responses. |
Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777302"}[5m])) * 100 <= 80 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.59 |
Metric Used |
occnp_diam_response_local_total |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.62 SMSC_CONNECTION_DOWN
Table 8-79 STASYFailCountExceedsCritcalThreshold
Field | Details |
---|---|
Description | This alert is triggered when connection to SMSC host is down. |
Summary | Connection to SMSC peer {{$labels.smscName}} is down in notifier service pod {{$labels.pod}} |
Severity | Major |
Condition | sum by(namespace, pod, smscName)(occnp_active_smsc_conn_count) == 0 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.63 |
Metric Used | occnp_active_smsc_conn_count |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. If the user hasn't been added in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.63 STA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-80 STASYFailCountExceedsCritcalThreshold
Field | Details |
---|---|
Description |
STA Rx fail count exceeds the critical threshold limit. |
Summary |
STA Rx fail count exceeds the critical threshold limit. |
Severity | Critical |
Condition |
The failure rate of Rx STA responses is more than 90% of the total responses. |
Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 > 90 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.64 |
Metric Used |
occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"} |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.64 STA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-81 STA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description |
STA Rx fail count exceeds the major threshold limit. |
Summary |
STA Rx fail count exceeds the major threshold limit. |
Severity | Major |
Condition |
The failure rate of Rx STA responses is more than 80% and less and or equal to 90% of the total responses. |
Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 <= 90 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.64 |
Metric Used |
occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"} |
Recommended Actions |
Check the connectivity between diam-gw pod(s) & AF and ensure connectivity is present. Check that the session and user is valid and hasn't been removed in the Policy database, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.65 STA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-82 STA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description |
STA Rx fail count exceeds the minor threshold limit. |
Summary |
STA Rx fail count exceeds the minor threshold limit. |
Severity | Minor |
Condition |
The failure rate of Rx STA responses is more than 60% and less and or equal to 80% of the total responses. |
Expression |
sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA", appId="16777236"}[5m])) * 100 <= 80 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.64 |
Metric Used |
occnp_diam_response_local_total{msgType="STA", appId="16777236", responseCode!~"2.*"} |
Recommended Actions |
Check the connectivity between diam-gw pod(s) & AF and ensure connectivity is present. Check that the session and user is valid and hasn't been removed in the Policy database, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.66 SNA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-83 SNA_SY_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description |
SNA Sy fail count exceeds the critical threshold limit |
Summary |
SNA Sy fail count exceeds the critical threshold limit |
Severity | Critical |
Condition |
The failure rate of Sy SNA responses is more than 90% of the total responses. |
Expression |
sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 > 90 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.65 |
Metric Used |
occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"} |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.67 SNA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-84 SNA_SY_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description |
SNA Sy fail count exceeds the major threshold limit |
Summary |
SNA Sy fail count exceeds the major threshold limit |
Severity | Major |
Condition |
The failure rate of Sy SNA responses is more than 80% and less and or equal to 90% of the total responses. |
Expression |
sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 > 80 and sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 <= 90 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.65 |
Metric Used |
occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"} |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.68 SNA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-85 SNA_SY_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description |
SNA Sy fail count exceeds the minor threshold limit |
Summary |
SNA Sy fail count exceeds the minor threshold limit |
Severity | Minor |
Condition |
The failure rate of Sy STA responses is more than 60% and less and or equal to 80% of the total responses. |
Expression |
sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 > 60 and sum(rate(occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="SNA"}[5m])) * 100 <= 80 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.65 |
Metric Used |
occnp_diam_response_local_total{msgType="SNA", responseCode!~"2.*"} |
Recommended Actions |
Check the connectivity between diam-gw pod(s) and OCS server and ensure connectivity is present. Check that the session and user hasn't been removed in the OCS configuration, then configure the user(s). For any additional guidance, contact My Oracle Support. |
8.3.1.69 STALE_DIAMETER_REQUEST_CLEANUP_MINOR
Table 8-86 STALE_DIAMETER_REQUEST_CLEANUP_MINOR
Field | Details |
---|---|
Description | This alerts is triggered when more than 10 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them). |
Summary | The Diam requests are being discarded due to timeout processing occurring above 10%. |
Severity | Minor |
Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 10 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
Metric Used |
|
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.70 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR
Table 8-87 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR
Field | Details |
---|---|
Description | This alert is triggered when more than 20 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them). |
Summary | The Diam requests are being discarded due to timeout processing occurring above 20%. |
Severity | Major |
Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
Metric Used |
|
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.71 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL
Table 8-88 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL
Field | Details |
---|---|
Description | This alert is triggered when more than 30 % of the received Diameter requests are cancelled due to them being stale (received too late, or took too much time to process them). |
Summary | The Diam requests are being discarded due to timeout processing occurring above 30%. |
Severity | Critical |
Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 30 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
Metric Used |
|
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.72 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR
Table 8-89 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR
Field | Details |
---|---|
Description | Certificate expiry in less than 6 months. |
Summary | Certificate expiry in less than 6 months. |
Severity | Minor |
Condition | dgw_tls_cert_expiration_seconds - time() <= 15724800 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.75 |
Metric Used | dgw_tls_cert_expiration_seconds |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.73 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR
Table 8-90 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR
Field | Details |
---|---|
Description | Certificate expiry in less than 3 months. |
Summary | Certificate expiry in less than 3 months. |
Severity | Major |
Condition | dgw_tls_cert_expiration_seconds - time() <= 7862400 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.75 |
Metric Used | dgw_tls_cert_expiration_seconds |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.74 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL
Table 8-91 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL
Field | Details |
---|---|
Description | Certificate expiry in less than 1 month. |
Summary | Certificate expiry in less than 1 month. |
Severity | Critical |
Condition | dgw_tls_cert_expiration_seconds - time() <= 2592000 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.75 |
Metric Used | dgw_tls_cert_expiration_seconds |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.75 DGW_TLS_CONNECTION_FAILURE
Table 8-92 DGW_TLS_CONNECTION_FAILURE
Field | Details |
---|---|
Description | Alert for TLS connection establishment. |
Summary | TLS Connection failure when Diam gateway is an initiator. |
Severity | Major |
Condition | sum by (namespace,reason)(occnp_diam_failed_conn_network) > 0 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.81 |
Metric Used | occnp_diam_failed_conn_network |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.76 POLICY_CONNECTION_FAILURE
Table 8-93 BSF_CONNECTION_FAILURE
Field | Details |
---|---|
Description | Connection failure on Egress and Ingress Gateways for incoming and outgoing connections. |
Summary | |
Severity | Major |
Condition | This alert is raised when the TLS certificate is about to expire in three months. |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.43 |
Metric Used | occnp_oc_ingressgateway_connection_failure_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.77 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL
Table 8-94 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL
Field | Details |
---|---|
Description | TLS certificate to expire in 1 month. |
Summary | security_cert_x509_expiration_seconds - time() <= 2592000 |
Severity | Critical |
Condition | This alert is raised when the TLS certificate is about to expire in one month. |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.44 |
Metric Used | security_cert_x509_expiration_seconds |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.78 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR
Table 8-95 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR
Field | Details |
---|---|
Description | TLS certificate to expire in 3 months. |
Summary | security_cert_x509_expiration_seconds - time() <= 7862400 |
Severity | Major |
Condition | This alert is raised when the TLS certificate is about to expire in three months. |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.44 |
Metric Used | security_cert_x509_expiration_seconds |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.79 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR
Table 8-96 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR
Field | Details |
---|---|
Description | TLS certificate to expire in 6 months. |
Summary | security_cert_x509_expiration_seconds - time() <= 15724800 |
Severity | Minor |
Condition | This alert is raised when the TLS certificate is about to expire in six months. |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.44 |
Metric Used | security_cert_x509_expiration_seconds |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.80 AUDIT_NOT_RUNNING
Table 8-97 AUDIT_NOT_RUNNING
Field | Details |
---|---|
Description | Audit has not been running for at least 1 hour. |
Summary | Audit has not been running for at least 1 hour. |
Severity | CRITICAL |
Condition | (absent_over_time(spring_data_repository_invocations_seconds_count{method="getQueuedTablesToAudit"}[1h]) == 1) OR (sum(increase(spring_data_repository_invocations_seconds_count{method="getQueuedTablesToAudit"}[1h])) == 0) |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.78 |
Metric Used | spring_data_repository_invocations_seconds_count |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.81 DIAMETER_POD_ERROR_RESPONSE_MINOR
Table 8-98 DIAMETER_POD_ERROR_RESPONSE_MINOR
Field | Details |
---|---|
Description | At least 1% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
Summary | At least 1% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
Severity | MINOR |
Condition | (topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=1 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.79 |
Metric Used | ocbsf_diam_response_network_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.82 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD
Table 8-99 DIAMETER_POD_ERROR_RESPONSE_MAJOR
Field | Details |
---|---|
Description | At least 5% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
Summary | At least 5% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
Severity | MAJOR |
Condition | (topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=5 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.79 |
Metric Used | ocbsf_diam_response_network_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.83 DIAMETER_POD_ERROR_RESPONSE_CRITICAL
Table 8-100 DIAMETER_POD_ERROR_RESPONSE_CRITICAL
Field | Details |
---|---|
Description | At least 10% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER |
Summary | At least 10% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER |
Severity | CRITICAL |
Condition | (topk(1,((sort_desc(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m])))) * 100))) >=10 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.79 |
Metric Used | ocbsf_diam_response_network_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.1.84 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD
Table 8-101 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | lockAcquisitionExceedsCriticalThreshold |
Description | The lock requests fails to acquire the lock count exceeds the critical threshold limit. The (current value is: {{ $value }}) |
Summary | Keys used in Bulwark lock request which are already in locked state detected above 75 Percent of Total Transactions. |
Severity | Critical |
Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
Metric Used | - |
Recommended Actions | - |
8.3.1.85 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD
Table 8-102 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | lockAcquisitionExceedsMajorThreshold |
Description | The lock requests fails to acquire the lock count exceeds the major threshold limit. The (current value is: {{ $value }}) |
Summary | Keys used in Bulwark lock request which are already in locked state detected above 50 Percent of Total Transactions. |
Severity | Major |
Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >= 50 < 75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
Metric Used | - |
Recommended Actions | - |
8.3.1.86 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD
Table 8-103 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | lockAcquisitionExceedsMinorThreshold |
Description | The lock requests fails to acquire the lock count exceeds the minor threshold limit. The (current value is: {{ $value }}) |
Summary | Keys used in Bulwark lock request which are already in locked state detected above 20 Percent of Total Transactions. |
Severity | Minor |
Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=20 < 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
Metric Used | - |
Recommended Actions | - |
8.3.1.87 CERTIFICATE_EXPIRY_MINOR
Table 8-104 CERTIFICATE_EXPIRY_MINOR
Field | Details |
---|---|
Description | Certificate expiry in less than 6 months |
Summary | Certificate expiry in less than 6 months |
Severity | MINOR |
Condition | security_cert_x509_expiration_seconds - time() <= 15724800 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.77 |
Metric Used | - |
Recommended Actions | - |
8.3.1.88 CERTIFICATE_EXPIRY_MAJOR
Table 8-105 CERTIFICATE_EXPIRY_MAJOR
Field | Details |
---|---|
Description | Certificate expiry in less than 3 months |
Summary | Certificate expiry in less than 3 months |
Severity | MAJOR |
Condition | security_cert_x509_expiration_seconds - time() <= 7862400 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.77 |
Metric Used | - |
Recommended Actions | - |
8.3.1.89 CERTIFICATE_EXPIRY_CRITICAL
Table 8-106 CERTIFICATE_EXPIRY_CRITICAL
Field | Details |
---|---|
Description | Certificate expiry in less than 1 months |
Summary | Certificate expiry in less than 1 months |
Severity | CRITICAL |
Condition | security_cert_x509_expiration_seconds - time() <= 2592000 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.77 |
Metric Used | - |
Recommended Actions | - |
8.3.1.90 PERF_INFO_ACTIVE_OVERLOADTHRESHOLD_DATA_PRESENT
Table 8-107 PERF_INFO_ACTIVE_OVERLOADTHRESHOLD_DATA_PRESENT
Field | Details |
---|---|
Description | - |
Summary | - |
Severity | MINOR |
Condition | active_overload_threshold_fetch_failed == 1 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.53 |
Metric Used | - |
Recommended Actions | - |
8.3.1.91 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Table 8-108 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Field | Details |
---|---|
Description | More than 10% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
Summary | More than 10% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
Severity | MINOR |
Condition | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 10 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.85 |
Metric Used | - |
Recommended Actions | - |
8.3.1.92 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Table 8-109 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Field | Details |
---|---|
Description | More than 20% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
Summary | More than 20% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
Severity | MAJOR |
Condition | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.85 |
Metric Used | - |
Recommended Actions | - |
8.3.1.93 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Table 8-110 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Field | Details |
---|---|
Description | More than 30% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
Summary | More than 30% of incoming requests towards UDR-connector is rejected due to request being stale on arrival or during processing by the connector |
Severity | CRITICAL |
Condition | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="UDR-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="udr-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="UDR-C"}[5m]))) * 100 > 30 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.85 |
Metric Used | - |
Recommended Actions | - |
8.3.1.94 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Table 8-111 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Field | Details |
---|---|
Description | More than 10% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
Summary | More than 10% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
Severity | MINOR |
Condition | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 10 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.86 |
Metric Used | - |
Recommended Actions | - |
8.3.1.95 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Table 8-112 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Field | Details |
---|---|
Description | More than 20% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
Summary | More than 20% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
Severity | MAJOR |
Condition | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.86 |
Metric Used | - |
Recommended Actions | - |
8.3.1.96 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Table 8-113 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Field | Details |
---|---|
Description | More than 30% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
Summary | More than 30% of incoming requests towards CHF-connector is rejected due to request being stale on arrival or during processing by the connector |
Severity | CRITICAL |
Condition | (sum by (namespace) (rate(occnp_late_processing_rejection_total{mode="CHF-C"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m])))/(sum by (namespace) (rate(ocpm_userservice_inbound_count_total{service_resource="chf-service"}[5m])) + sum by (namespace) (rate(occnp_late_arrival_rejection_total{mode="CHF-C"}[5m]))) * 100 > 30 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.86 |
Metric Used | - |
Recommended Actions | - |
8.3.1.97 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
Table 8-114 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
Field | Details |
---|---|
Description | This alarm is raised when OCNADD is not reachable. |
Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} BSF Egress Gateway Data Director unreachable' |
Severity | Major |
Condition | This alarm is raised when data director is not reachable from Egress Gateway. |
OID | 1.3.6.1.4.1.323.5.3.37.1.2.48 |
Metric Used | oc_egressgateway_dd_unreachable |
Recommended Actions | Alert gets cleared automatically when the connection with data director is established. |
8.3.1.98 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
Table 8-115 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
Field | Details |
---|---|
Description | This alarm is raised when OCNADD is not reachable. |
Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} BSF Ingress Gateway Data Director unreachable' |
Severity | Major |
Condition | This alarm is raised when data director is not reachable from Ingress Gateway. |
OID | 1.3.6.1.4.1.323.5.3.37.1.2.47 |
Metric Used | oc_ingressgateway_dd_unreachable |
Recommended Actions | Alert gets cleared automatically when the connection with data director is established. |
8.3.1.99 STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Table 8-116 STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Field | Details |
---|---|
Description | This alert is triggered when more than 30 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
Summary | - |
Severity | Critical |
Expression | - |
OID | - |
Metric Used |
|
Recommended Actions | - |
8.3.1.100 STALE_HTTP_REQUEST_CLEANUP_MAJOR
Table 8-117 STALE_HTTP_REQUEST_CLEANUP_MAJOR
Field | Details |
---|---|
Description | This alert is triggered when more than 20 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
Summary | - |
Severity | Major |
Expression | - |
OID | - |
Metric Used |
|
Recommended Actions | - |
8.3.1.101 STALE_HTTP_REQUEST_CLEANUP_MINOR
Table 8-118 STALE_HTTP_REQUEST_CLEANUP_MINOR
Field | Details |
---|---|
Description | This alert is triggered when more than 10 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
Summary | - |
Severity | Minor |
Expression | - |
OID | - |
Metric Used |
|
Recommended Actions | - |
8.3.1.102 STALE_BINDING_REQUEST_REJECTION_CRITICAL
Table 8-119 STALE_BINDING_REQUEST_REJECTION_CRITICAL
Field | Details |
---|---|
Description | This alert is triggered when more than 30 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
Summary | '{{ $value }} % of requests are being discarded by binding svc due to request being stale either on arrival or during processing.'summary: "More than 30% of the Binding requests failed with error TIMED_OUT_REQUEST" |
Severity | Critical |
Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total {microservice=~".*binding"}[5m]))+sum by (namespace) rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m])))/(sum by (namespace) (rate(ocpm_binding_inbound_request_total{microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m]))) * 100 >= 30 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.87 |
Metric Used |
|
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.103 STALE_BINDING_REQUEST_REJECTION_MAJOR
Table 8-120 STALE_BINDING_REQUEST_REJECTION_MAJOR
Field | Details |
---|---|
Description | This alert is triggered when more than 20 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
Summary | '{{ $value }} % of requests are being discarded by binding svc due to request being stale either on arrival or during processing.'summary: "More than 20% of the Binding requests failed with error TIMED_OUT_REQUEST" |
Severity | Major |
Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m])))/(sum by (namespace) (rate(ocpm_binding_inbound_request_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m]))) * 100 >= 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.87 |
Metric Used |
|
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.104 STALE_BINDING_REQUEST_REJECTION_MINOR
Table 8-121 STALE_BINDING_REQUEST_REJECTION_MINOR
Field | Details |
---|---|
Description | This alert is triggered when more than 10 % of the received HTTP requests are cancelled due to them being stale (received too late, or took too much time to process them). |
Summary | '{{ $value }} % of requests are being discarded by binding service due to request being stale either on arrival or during processing.' summary: "More than 10% of the Binding requests failed with error TIMED_OUT_REQUEST" |
Severity | Minor |
Expression | (sum by (namespace) (rate(occnp_late_processing_rejection_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"} [5m])))/(sum by (namespace) (rate(ocpm_binding_inbound_request_total {microservice=~".*binding"}[5m]))+sum by (namespace) (rate(occnp_late_arrival_rejection_total{microservice=~".*binding"}[5m]))) * 100 >= 10 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.87 |
Metric Used |
|
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.1.105 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_CRITICAL
Table 8-122 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_CRITICAL
Field | Details |
---|---|
Description | The Diameter requests are being discarded due to timeout processing occurring above 30% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}} |
Summary | The Diameter requests are being discarded due to timeout processing occurring above 30% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}} |
Severity | Critical |
Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total{microservice=diam-connector}[5m]))) / (sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER", microservice=diam-connector}[5m]))) * 100 >= 30 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.88 |
Metric Used |
|
Recommended Actions |
The alert gets cleared when the number of stale requests is below 30% of the total requests. To troubleshoot and resolve the issue, perform the following steps:
For further assistance, contact My Oracle Support. |
8.3.1.106 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MAJOR
Table 8-123 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MAJOR
Field | Details |
---|---|
Description | The Diameter requests are being discarded due to timeout processing occurring above 20% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}} |
Summary | The Diameter requests are being discarded due to timeout processing occurring above 20% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}} |
Severity | Major |
Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total{microservice=diam-connector}[5m]))) / (sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER", microservice=diam-connector}[5m]))) * 100 >= 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.88 |
Metric Used |
|
Recommended Actions |
The alert gets cleared when the number of stale requests is below 20% of the total requests. To troubleshoot and resolve the issue, perform the following steps:
For further assistance, contact My Oracle Support. |
8.3.1.107 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MINOR
Table 8-124 STALE_DIAMETER_CONNECTOR_REQUEST_CLEANUP_MINOR
Field | Details |
---|---|
Description | The Diameter requests are being discarded due to timeout processing occurring above 10% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}} |
Summary | The Diameter requests are being discarded due to timeout processing occurring above 10% inside pod {{$labels.pod}} for service {{$labels.microservice}} in {{$labels.namespace}} |
Severity | Minor |
Expression | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total{microservice=diam-connector}[5m]))) / (sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER", microservice=diam-connector}[5m]))) * 100 >= 10 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.88 |
Metric Used |
|
Recommended Actions |
The alert gets cleared when the number of stale requests is below 10% of the total requests. To troubleshoot and resolve the issue, perform the following steps:
For further assistance, contact My Oracle Support. |
8.3.1.108 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Table 8-125 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Field | Details |
---|---|
Description | At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Summary | At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Severity | Minor |
Expression | - |
OID | - |
Metric Used |
|
Recommended Actions | - |
8.3.1.109 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Table 8-126 UDR_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Field | Details |
---|---|
Description | At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Summary | At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Severity | Major |
Expression | - |
OID | - |
Metric Used |
|
Recommended Actions | - |
8.3.1.110 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Table 8-127 UDR_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Field | Details |
---|---|
Description | At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Summary | At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Severity | Critical |
Expression | - |
OID | - |
Metric Used |
|
Recommended Actions | - |
8.3.1.111 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Table 8-128 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MINOR
Field | Details |
---|---|
Description | At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Summary | At least 10 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Severity | Minor |
Expression | - |
OID | - |
Metric Used |
|
Recommended Actions | - |
8.3.1.112 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Table 8-129 CHF_C_STALE_HTTP_REQUEST_CLEANUP_MAJOR
Field | Details |
---|---|
Description | At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Summary | At least 20 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Severity | Major |
Expression | - |
OID | - |
Metric Used |
|
Recommended Actions | - |
8.3.1.113 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Table 8-130 CHF_C_STALE_HTTP_REQUEST_CLEANUP_CRITICAL
Field | Details |
---|---|
Description | At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Summary | At least 30 % of the received HTTP requests are cancelled per operation type due to them being stale (received too late, or took too much time to process them). |
Severity | Critical |
Expression | - |
OID | - |
Metric Used |
|
Recommended Actions | - |
8.3.1.114 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_CRITICAL_THRESHOLD
Table 8-131 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal to or above 70% of the total revalidation responses. |
Summary | Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal to or above 70% of the total revalidation responses. |
Severity | Critical |
Condition |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code="2xx",action="restored"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx"}[5m]))) * 100 >= 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.89 |
Metric Used | occnp_session_binding_revalidation_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.115 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MAJOR_THRESHOLD
Table 8-132 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal or above 50% but less than 70% of total revalidation responses. |
Summary | Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal or above 50% but less than 70% of total revalidation responses. |
Severity | Major |
Condition |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code="2xx",action="restored"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx"}[5m]))) * 100 >= 50 < 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.89 |
Metric Used | occnp_session_binding_revalidation_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.116 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MINOR_THRESHOLD
Table 8-133 SESSION_BINDING_MISSING_FROM_BSF_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal or above 30% but less than 50% of total revalidation responses. |
Summary | Number of revalidation responses indicated that the binding was missing, but restored from BSF. Overall valid sessions being audited is equal or above 30% but less than 50% of total revalidation responses. |
Severity | Minor |
Condition |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx",action="restored"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding",response_code="2xx"}[5m]))) * 100 >= 30 < 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.89 |
Metric Used | occnp_session_binding_revalidation_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.117 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_CRITICAL_THRESHOLD
Table 8-134 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal or above 70% of total revalidation responses. |
Summary | Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal or above 70% of total revalidation responses. |
Severity | Critical |
Condition |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code!~"2.*"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding"}[5m]))) * 100 >= 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.90 |
Metric Used | occnp_session_binding_revalidation_response_total |
Recommended Actions |
Verify the health condition of BSF Management Service. For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.118 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MAJOR_THRESHOLD
Table 8-135 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal to or above 50% but less than 70% of total revalidation responses. |
Summary | Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal to or above 50% but less than 70% of total revalidation responses. |
Severity | Major |
Condition |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code!~"2.*"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding"}[5m]))) * 100 >= 50 < 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.90 |
Metric Used | occnp_session_binding_revalidation_response_total |
Recommended Actions |
Verify the health condition of BSF Management Service. For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.119 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MINOR_THRESHOLD
Table 8-136 SESSION_BINDING_REVALIDATION_WITH_BSF_FAILURE_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal to or above 30% but less than 50% of total revalidation responses. |
Summary | Number of unsuccessful revalidation responses with error received from BSF, while in PCF the binding association is valid sessions is equal to or above 30% but less than 50% of total revalidation responses. |
Severity | Minor |
Condition |
(sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding", response_code!~"2.*"}[5m])) /sum by (namespace)(rate(occnp_session_binding_revalidation_response_total{microservice=~".*binding"}[5m]))) * 100 >= 30 < 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.90 |
Metric Used | occnp_session_binding_revalidation_response_total |
Recommended Actions |
Verify the health condition of BSF Management Service. For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.120 UPDATE_NOTIFY_TIMEOUT_ABOVE_70_PERCENT
Table 8-137 UPDATE_NOTIFY_TIMEOUT_ABOVE_70_PERCENT
Field | Details |
---|---|
Description | Number of Update Notify failed because a timeout is equal to or above 70% in a given time period. |
Summary | Number of Update Notify failed because a timeout is equal to or above 70% in a given time period. |
Severity | Critical |
Condition | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_for_rx_collision_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70 |
OID | - |
Metric Used | - |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.121 UPDATE_NOTIFY_TIMEOUT_ABOVE_50_PERCENT
Table 8-138 UPDATE_NOTIFY_TIMEOUT_ABOVE_50_PERCENT
Field | Details |
---|---|
Description | Number of Update Notify that failed because a timeout is equal to or above 50% but less than 70% in a given time period. |
Summary | Number of Update Notify that failed because a timeout is equal to or above 50% but less than 70% in a given time period. |
Severity | Major |
Condition | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_for_rx_collision_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70 |
OID | - |
Metric Used | - |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.1.122 UPDATE_NOTIFY_TIMEOUT_ABOVE_30_PERCENT
Table 8-139 UPDATE_NOTIFY_TIMEOUT_ABOVE_30_PERCENT
Field | Details |
---|---|
Description | Number of Update Notify that failed because a timeout is equal to or above 30% but less than 50% of total Rx sessions. |
Summary | Number of Update Notify that failed because a timeout is equal to or above 30% but less than 50% of total Rx sessions. |
Severity | Minor |
Condition | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_for_rx_collision_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 30 < 50 |
OID | - |
Metric Used | - |
Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
8.3.2 PCF Alerts
This section provides information on PCF alerts.
8.3.2.1 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MINOR_THRESHOLD
Table 8-140 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MINOR_THRESHOLD |
Description | More than 70% of timer capacity has been occupied for n1n2 transfer failure notification |
Summary | More than 70% of timer capacity has been occupied for n1n2 transfer failure notification |
Severity | Minor |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2TransferFailure"})/360000) * 100 > 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.107 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.2 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MAJOR_THRESHOLD
Table 8-141 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_MAJOR_THRESHOLD |
Description | More than 80% of timer capacity has been occupied for n1n2 transfer failure notification |
Summary | More than 80% of timer capacity has been occupied for n1n2 transfer failure notification |
Severity | Major |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2TransferFailure"})/360000) * 100 > 80 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.107 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.3 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_CRITICAL_THRESHOLD
Table 8-142 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_ABOVE_CRITICAL_THRESHOLD |
Description | More than 90% of timer capacity has been occupied for n1n2 transfer failure notification |
Summary | More than 90% of timer capacity has been occupied for n1n2 transfer failure notification |
Severity | Critical |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2TransferFailure"})/360000) * 100 > 90 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.107 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.4 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MINOR_THRESHOLD
Table 8-143 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MINOR_THRESHOLD |
Description | More than 70% of timers capacity has been occupied for amf discovery. |
Summary | More than 70% of timers capacity has been occupied for amf discovery. |
Severity | Minor |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_AMFDiscovery"})/360000) * 100 > 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.95 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.5 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MAJOR_THRESHOLD
Table 8-144 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_MAJOR_THRESHOLD |
Description | More than 80% of timer capacity has been occupied for amf discovery. |
Summary | More than 80% of timer capacity has been occupied for amf discovery. |
Severity | Major |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_AMFDiscovery"})/360000) * 100 > 80 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.95 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.6 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_CRITICAL_THRESHOLD
Table 8-145 AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_AMF_DISCOVERY_ABOVE_CRITICAL_THRESHOLD |
Description | More than 90% of timer capacity has been occupied for amf discovery. |
Summary | More than 90% of timer capacity has been occupied for amf discovery. |
Severity | Critical |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_AMFDiscovery"})/360000) * 100 > 90 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.95 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.7 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MINOR_THRESHOLD
Table 8-146 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MINOR_THRESHOLD |
Description | More than 70% of timer capacity has been occupied for n1n2 subscribe. |
Summary | More than 70% of timer capacity has been occupied for n1n2 subscribe. |
Severity | Minor |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageSubscribe"})/360000) * 100 > 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.96 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.8 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MAJOR_THRESHOLD
Table 8-147 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_MAJOR_THRESHOLD |
Description | More than 80% of timer capacity has been occupied for n1n2 subscribe. |
Summary | More than 80% of timer capacity has been occupied for n1n2 subscribe. |
Severity | Major |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageSubscribe"})/360000) * 100 > 80 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.96 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.9 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_CRITICAL_THRESHOLD
Table 8-148 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_SUBSCRIBE_ABOVE_CRITICAL_THRESHOLD |
Description | More than 90% of timer capacity has been occupied for n1n2 subscribe. |
Summary | More than 90% of timer capacity has been occupied for n1n2 subscribe. |
Severity | Critical |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageSubscribe"})/360000) * 100 > 90 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.96 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.10 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MINOR_THRESHOLD
Table 8-149 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MINOR_THRESHOLD |
Description | More than 70% of timer capacity has been occupied for n1n2 transfer. |
Summary | More than 70% of timer capacity has been occupied for n1n2 transfer. |
Severity | Minor |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageTransfer"})/360000) * 100 > 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.97 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.11 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MAJOR_THRESHOLD
Table 8-150 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_MAJOR_THRESHOLD |
Description | More than 80% of timer capacity has been occupied for n1n2 transfer. |
Summary | More than 80% of timer capacity has been occupied for n1n2 transfer. |
Severity | Major |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageTransfer"})/360000) * 100 > 80 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.97 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.12 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_CRITICAL_THRESHOLD
Table 8-151 AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | AUDIT_TIMER_CAPACITY_FOR_UE_N1N2_TRANSFER_ABOVE_CRITICAL_THRESHOLD |
Description | More than 90% of timer capacity has been occupied for n1n2 transfer. |
Summary | More than 90% of timer capacity has been occupied for n1n2 transfer. |
Severity | Critical |
Condition | (max by (namespace) (occnp_timer_capacity{timerName="UE_N1N2MessageTransfer"})/360000) * 100 > 90 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.97 |
Metric Used | occnp_timer_capacity |
Recommended Actions |
The |
8.3.2.13 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Table 8-152 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD |
Description | More than 25% of n1n2 subscribe reattempt failed. |
Summary | More than 25% of n1n2 subscribe reattempt failed. |
Severity | Minor |
Condition | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",operationType="subscribe",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",operationType="subscribe"}[5m]))) * 100 > 25 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.99 |
Metric Used | http_out_conn_response_total, http_out_conn_request_total |
Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 subscribe.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 subscription is failing or if the AMF that request are
going to is unhealty.
|
8.3.2.14 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Table 8-153 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD |
Description | More than 50% of n1n2 subscribe reattempt failed. |
Summary | More than 50% of n1n2 subscribe reattempt failed. |
Severity | Major |
Condition | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",operationType="subscribe",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",operationType="subscribe"}[5m]))) * 100 > 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.99 |
Metric Used | http_out_conn_response_total, http_out_conn_request_total |
Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 subscribe.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 subscription is failing or if the AMF that request are
going to is unhealty.
|
8.3.2.15 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Table 8-154 UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_N1N2_SUBSCRIBE_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD |
Description | More than 75% of n1n2 subscribe reattempt failed. |
Summary | More than 75% of n1n2 subscribe reattempt failed. |
Severity | Critical |
Condition | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",operationType="subscribe",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",operationType="subscribe"}[5m]))) * 100 > 75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.99 |
Metric Used | http_out_conn_response_total, http_out_conn_request_total |
Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 subscribe.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 subscription is failing or if the AMF that request are
going to is unhealty.
|
8.3.2.16 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Table 8-155 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD |
Description | More than 25% of n1n2 transfer reattempt failed. |
Summary | More than 25% of n1n2 transfer reattempt failed. |
Severity | Minor |
Condition | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer"}[5m]))) * 100 > 25 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.100 |
Metric Used | http_out_conn_response_total, http_out_conn_request_total |
Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 transfer.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 message transfer is failing or if the AMF that request
are going to is unhealty.
|
8.3.2.17 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Table 8-156 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD |
Description | More than 50% of n1n2 transfer reattempt failed. |
Summary | More than 50% of n1n2 transfer reattempt failed. |
Severity | Major |
Condition | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer"}[5m]))) * 100 > 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.100 |
Metric Used | http_out_conn_response_total, http_out_conn_request_total |
Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 transfer.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 message transfer is failing or if the AMF that request
are going to is unhealty.
|
8.3.2.18 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Table 8-157 UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_N1N2_TRANSFER_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD |
Description | More than 75% of n1n2 transfer reattempt failed. |
Summary | More than 75% of n1n2 transfer reattempt failed. |
Severity | Critical |
Condition | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2MessageTransfer", operationType="transfer"}[5m]))) * 100 > 75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.100 |
Metric Used | http_out_conn_response_total, http_out_conn_request_total |
Recommended Actions | The http_out_conn_response_total metric
is pegged when PCF-UE receives a response from a message that is going
out of the NF. In this case the alert is notifying when there is a
certain amount of reattempt failure for ue n1n2 transfer.If there is an
increase of failure, operator can revise the reason why the flow
triggering n1n2 message transfer is failing or if the AMF that request
are going to is unhealty.
|
8.3.2.19 SM_STALE_REQUEST_PROCESSING_REJECT_MINOR
Table 8-158 SM_STALE_REQUEST_PROCESSING_REJECT_MINOR
Field | Details |
---|---|
Name in Alert Yaml File | SM_STALE_REQUEST_PROCESSING_REJECT_MINOR |
Description |
More than 10% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
Summary |
More than 10% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
Severity | Minor |
Condition |
(sum by (namespace,pod) (rate(occnp_late_processing_rejection_total{microservice=~"occnp_pcf_sm"}[5m])))/(sum by (namespace,pod) (rate(ocpm_ingress_request_total{microservice=~"occnp_pcf_sm"}[5m]))) * 100 >= 10 < 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.101 |
Metric Used | occnp_late_processing_rejection_total, ocpm_ingress_request_total |
Recommended Actions | The metric occnp_late_processing_rejection_total is pegged when Late Processing finds a stale session. |
8.3.2.20 SM_STALE_REQUEST_PROCESSING_REJECT_MAJOR
Table 8-159 SM_STALE_REQUEST_PROCESSING_REJECT_MAJOR
Field | Details |
---|---|
Name in Alert Yaml File | SM_STALE_REQUEST_PROCESSING_REJECT_MAJOR |
Description |
More than 20% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
Summary |
More than 20% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
Severity | Major |
Condition |
(sum by (namespace,pod) (rate(occnp_late_processing_rejection_total{microservice=~"occnp_pcf_sm"}[5m])))/(sum by (namespace,pod) (rate(ocpm_ingress_request_total{microservice=~"occnp_pcf_sm"}[5m]))) * 100 >= 20 < 30 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.101 |
Metric Used | occnp_late_processing_rejection_total, ocpm_ingress_request_total |
Recommended Actions | The metric occnp_late_processing_rejection_total is pegged when Late Processing finds a stale session. |
8.3.2.21 SM_STALE_REQUEST_PROCESSING_REJECT_CRITICAL
Table 8-160 SM_STALE_REQUEST_PROCESSING_REJECT_CRITICAL
Field | Details |
---|---|
Name in Alert Yaml File | SM_STALE_REQUEST_PROCESSING_REJECT_CRITICAL |
Description |
More than 30% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
Summary |
More than 30% of the Ingress requests failed with error 504 GATEWAY_TIMEOUT due to request being stale |
Severity | Critical |
Condition |
(sum by (namespace,pod) (rate(occnp_late_processing_rejection_total{microservice=~"occnp_pcf_sm"}[5m])))/(sum by (namespace,pod) (rate(ocpm_ingress_request_total{microservice=~"occnp_pcf_sm"}[5m]))) * 100 >= 30 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.101 |
Metric Used | occnp_late_processing_rejection_total, ocpm_ingress_request_total |
Recommended Actions | The metric occnp_late_processing_rejection_total is pegged when Late Processing finds a stale session. |
8.3.2.22 UE_STALE_REQUEST_PROCESSING_REJECT_MAJOR
Table 8-161 UE_STALE_REQUEST_PROCESSING_REJECT_MAJOR
Field | Details |
---|---|
Description | This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
Summary | This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
Severity | Major |
Condition | (sum by (namespace) (rate(occnp_late_processing_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.104 |
Metric Used | occnp_late_processing_rejection_total |
Recommended Actions | Metric
occnp_late_processing_rejection_total is pegged
when requests being processed become stale. |
8.3.2.23 UE_STALE_REQUEST_PROCESSING_REJECT_CRITICAL
Table 8-162 UE_STALE_REQUEST_PROCESSING_REJECT_CRITICAL
Field | Details |
---|---|
Description | This alert is triggered when more than 30% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
Summary | This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
Severity | Critical |
Condition | (sum by (namespace) (rate(occnp_late_processing_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 30 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.104 |
Metric Used | occnp_late_processing_rejection_total |
Recommended Actions | Metric
occnp_late_processing_rejection_total is pegged
when requests being processed become stale. |
8.3.2.24 UE_STALE_REQUEST_PROCESSING_REJECT_MINOR
Table 8-163 UE_STALE_REQUEST_PROCESSING_REJECT_MINOR
Field | Details |
---|---|
Description | This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
Summary | This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to request going stale, while being processed by the service. |
Severity | Minor |
Condition | (sum by (namespace) (rate(occnp_late_processing_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace) (rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 10 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.104 |
Metric Used | occnp_late_processing_rejection_total |
Recommended Actions | Metric
occnp_late_processing_rejection_total is pegged
when requests being processed become stale. |
8.3.2.25 UE_STALE_REQUEST_ARRIVAL_REJECT_MINOR
Table 8-164 UE_STALE_REQUEST_ARRIVAL_REJECT_MINOR
Field | Details |
---|---|
Description | This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
Summary | This alert is triggered when more than 10% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
Severity | Minor |
Condition | (sum by (namespace) (rate(ocpm_late_arrival_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace)(rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 10 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.109 |
Metric Used | ocpm_late_arrival_rejection_total |
Recommended Actions | Metric ocpm_late_arrival_rejection_total
is pegged when a received requests is stale.
|
8.3.2.26 UE_STALE_REQUEST_ARRIVAL_REJECT_MAJOR
Table 8-165 UE_STALE_REQUEST_ARRIVAL_REJECT_MAJOR
Field | Details |
---|---|
Description | This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
Summary | This alert is triggered when more than 20% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
Severity | Major |
Condition | (sum by (namespace) (rate(ocpm_late_arrival_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace)(rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 20 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.109 |
Metric Used | ocpm_late_arrival_rejection_total |
Recommended Actions | Metric ocpm_late_arrival_rejection_total
is pegged when a received requests is stale.
|
8.3.2.27 UE_STALE_REQUEST_ARRIVAL_REJECT_CRITICAL
Table 8-166 UE_STALE_REQUEST_ARRIVAL_REJECT_CRITICAL
Field | Details |
---|---|
Description | This alert is triggered when more than 30% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
Summary | This alert is triggered when more than 30% of the incoming requests towards UE Policy service are rejected due to requests being stale upon arrival to the service. |
Severity | Critical |
Condition | (sum by (namespace) (rate(ocpm_late_arrival_rejection_total{microservice=~".*pcf_ueservice"}[5m])) / sum by (namespace)(rate(ocpm_ingress_request_total{microservice=~".*pcf_ueservice"}[5m]))) * 100 > 30 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.109 |
Metric Used | ocpm_late_arrival_rejection_total |
Recommended Actions | Metric ocpm_late_arrival_rejection_total
is pegged when a received requests is stale.
|
8.3.2.28 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Table 8-167 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD |
Description | More than 75% of N1N2 transfer failure notification reattempts failed. |
Summary | More than 75% of N1N2 transfer failure notification reattempts failed. |
Severity | Critical |
Condition | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer"}[5m]))) * 100 > 75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.106 |
Metric Used | http_out_conn_response_total, http_out_conn_request_total |
Recommended Actions |
The
http_out_conn_response_total metric is pegged
when PCF-UE receives a response from a message that is going out of
the NF. Then in this case the alert notifies when there is a certain
amount of reattempt failure for UE N1N2 transfer failure
notification. If there is an increase of failure, operator can
investigate on:
|
8.3.2.29 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Table 8-168 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD |
Description | More than 50% of N1N2 transfer failure notification reattempts failed. |
Summary | More than 50% of N1N2 transfer failure notification reattempts failed. |
Severity | Major |
Condition | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer"}[5m]))) * 100 > 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.106 |
Metric Used | http_out_conn_response_total, http_out_conn_request_total |
Recommended Actions |
The
http_out_conn_response_total metric is pegged
when PCF-UE receives a response from a message that is going out of
the NF. Then in this case the alert notifies when there is a certain
amount of reattempt failure for UE N1N2 transfer failure
notification. If there is an increase of failure, operator can
investigate on:
|
8.3.2.30 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Table 8-169 UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_N1N2_TRANSFER_FAILURE_NOTIFICATION_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD |
Description | More than 25% of N1N2 transfer failure notification reattempts failed. |
Summary | More than 25% of N1N2 transfer failure notification reattempts failed. |
Severity | Minor |
Condition | (sum by (namespace) (increase(http_out_conn_response_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(http_out_conn_request_total{isReattempt="true",reattemptType="UE_N1N2TransferFailure",operationType="transfer"}[5m]))) * 100 > 25 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.106 |
Metric Used | http_out_conn_response_total, http_out_conn_request_total |
Recommended Actions |
The
http_out_conn_response_total metric is pegged
when PCF-UE receives a response from a message that is going out of
the NF. Then in this case the alert notifies when there is a certain
amount of reattempt failure for UE N1N2 transfer failure
notification. If there is an increase of failure, operator can
investigate on:
|
8.3.2.31 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Table 8-170 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_CRITICAL_THRESHOLD |
Description | More than 75% of amf discovery reattempts failed. |
Summary | More than 75% of amf discovery reattempts failed. |
Severity | Critical |
Condition | (sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_response_total{operationType="timer_expiry_notification",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_request_total{operationType="timer_expiry_notification"}[5m]))) * 100 > 75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.105 |
Metric Used | occnp_ue_nf_discovery_reattempt_response_total |
Recommended Actions | The
occnp_ue_nf_discovery_reattempt_response_total
metric is pegged when PCF-UE receives a response from a message that is
going out of the NF. Then in this case, the alert notifies when there is
a certain number of reattempt failure while discovering AMF. If there is
an increase of failure, operator can investigate on:
|
8.3.2.32 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Table 8-171 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MAJOR_THRESHOLD |
Description | More than 50% of amf discovery reattempts failed. |
Summary | More than 50% of amf discovery reattempts failed. |
Severity | Major |
Condition | (sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_response_total{operationType="timer_expiry_notification",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_request_total{operationType="timer_expiry_notification"}[5m]))) * 100 > 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.105 |
Metric Used | occnp_ue_nf_discovery_reattempt_response_total |
Recommended Actions | The
occnp_ue_nf_discovery_reattempt_response_total
metric is pegged when PCF-UE receives a response from a message that is
going out of the NF. Then in this case, the alert notifies when there is
a certain number of reattempt failure while discovering AMF. If there is
an increase of failure, operator can investigate on:
|
8.3.2.33 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Table 8-172 UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | UE_AMF_DISCOVERY_REATTEMPT_FAILURE_ABOVE_MINOR_THRESHOLD |
Description | More than 25% of amf discovery reattempts failed. |
Summary | More than 25% of amf discovery reattempts failed. |
Severity | Minor |
Condition | (sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_response_total{operationType="timer_expiry_notification",responseCode!~"2.*"}[5m])) / sum by (namespace) (increase(occnp_ue_nf_discovery_reattempt_request_total{operationType="timer_expiry_notification"}[5m]))) * 100 > 25 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.105 |
Metric Used | occnp_ue_nf_discovery_reattempt_response_total |
Recommended Actions | The
occnp_ue_nf_discovery_reattempt_response_total
metric is pegged when PCF-UE receives a response from a message that is
going out of the NF. Then, in this case the alert notifies when there is
a certain number of reattempt failure while discovering AMF. If there is
an increase of failure, operator can investigate on:
|
8.3.2.34 INGRESS_ERROR_RATE_ABOVE_10_PERCENT_PER_POD
Table 8-173 INGRESS_ERROR_RATE_ABOVE_10_PERCENT_PER_POD
Field | Details |
---|---|
Name in Alert Yaml File | IngressErrorRateAbove10PercentPerPod |
Description | Ingress Error Rate above 10 Percent in {{$labels.kubernetes_name}} in {{$labels.kubernetes_namespace}} |
Summary | Transaction Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
Severity | Critical |
Condition | The total number of failed transactions per pod is above 10 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.2 |
Metric Used | ocpm_ingress_response_total |
Recommended Actions | The alert gets cleared when the number of failed
transactions are below 10% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle Support. |
8.3.2.35 SM_TRAFFIC_RATE_ABOVE_THRESHOLD
Table 8-174 SM_TRAFFIC_RATE_ABOVE_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | SMTrafficRateAboveThreshold |
Description | SM service Ingress traffic Rate is above threshold of Max MPS (current value is: {{ $value }}) |
Summary | Traffic Rate is above 90 Percent of Max requests per second |
Severity | Major |
Condition | The total SM service Ingress traffic rate has crossed
the configured threshold of 900 TPS.
Default value of this alert trigger point in PCF_Alertrules.yaml file is when SM service Ingress Rate crosses 90% of maximum ingress requests per second. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.3 |
Metric Used | ocpm_ingress_request_total{servicename_3gpp="npcf-smpolicycontrol"} |
Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels can be
configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
For any additional guidance, contact My Oracle Support. |
8.3.2.36 SM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
Table 8-175 SM_INGRESS_ERROR_RATE_ABOVE_10_PERCENT
Field | Details |
---|---|
Name in Alert Yaml File | SMIngressErrorRateAbove10Percent |
Description | Transaction Error Rate detected above 10 Percent of Total on SM service (current value is: {{ $value }}) |
Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
Severity | Critical |
Condition | The number of failed transactions is above 10 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.4 |
Metric Used | ocpm_ingress_response_total |
Recommended Actions | The alert gets cleared when the number of failed
transactions are below 10% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle Support. |
8.3.2.37 SM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
Table 8-176 SM_EGRESS_ERROR_RATE_ABOVE_1_PERCENT
Field | Details |
---|---|
Name in Alert Yaml File | SMEgressErrorRateAbove1Percent |
Description | Egress Transaction Error Rate detected above 1 Percent of Total Transactions (current value is: {{ $value }}) |
Summary | Transaction Error Rate detected above 1 Percent of Total Transactions |
Severity | Minor |
Condition | The number of failed transactions is above 1 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.5 |
Metric Used | system_operational_state == 1 |
Recommended Actions | The alert gets cleared when the number of failed
transactions are below 1% of the total transactions.
To
assess the reason for failed transactions, perform the following
steps:
For any additional guidance, contact My Oracle Support. |
8.3.2.38 PCF_CHF_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
Table 8-177 PCF_CHF_INGRESS_TRAFFIC_RATE_ABOVE_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | PcfChfIngressTrafficRateAboveThreshold |
Description | User service Ingress traffic Rate from CHF is above threshold of Max MPS (current value is: {{ $value }}) |
Summary | Traffic Rate is above 90 Percent of Max requests per second |
Severity | Major |
Condition | The total User Service Ingress traffic rate from CHF has
crossed the configured threshold of 900 TPS.
Default value of this alert trigger point in PCF_Alertrules.yaml file is when user service Ingress Rate from CHF crosses 90% of maximum ingress requests per second. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.11 |
Metric Used | ocpm_userservice_inbound_count_total{service_resource="chf-service"} |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.39 PCF_CHF_EGRESS_ERROR_RATE_ABOVE_10_PERCENT
Table 8-178 PCF_CHF_EGRESS_ERROR_RATE_ABOVE_10_PERCENT
Field | Details |
---|---|
Name in Alert Yaml File | PcfChfEgressErrorRateAbove10Percent |
Description | The number of failed transactions from CHF is more than 10 percent of the total transactions. |
Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
Severity | Critical |
Condition |
(sum(rate(ocpm_chf_tracking_response_total {servicename_3gpp="nchf-spendinglimitcontrol",response_code!~"2.*"} [24h]) or (up * 0 ) ) / sum(rate(ocpm_chf_tracking_response_total {servicename_3gpp="nchf-spendinglimitcontrol"} [24h]))) 100 >= 10 |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.12 |
Metric Used | ocpm_chf_tracking_response_total |
Recommended Actions | The alert gets cleared when the number of failure
transactions falls below the configured threshold.
Note:
Threshold levels can be configured using the
It is recommended to assess the reason for failed transactions.
Perform the following steps to analyze the cause of increased
traffic:
For any additional guidance, contact My Oracle Support. |
8.3.2.40 PCF_CHF_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD
Table 8-179 PCF_CHF_INGRESS_TIMEOUT_ERROR_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | Ingress Timeout Error Rate detected above 10 Percent of Total towards CHF service (current value is: {{ $value }}) |
Summary | Timeout Error Rate detected above 10 Percent of Total Transactions |
Severity | Major |
Condition | The number of failed transactions due to timeout is above 10 percent of the total transactions for CHF service. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.17 |
Metric Used | ocpm_chf_tracking_request_timeout_total{servicename_3gpp="nchf-spendinglimitcontrol"} |
Recommended Actions | The alert gets cleared when the number of failed
transactions due to timeout are below 10% of the total transactions.
To assess the reason for failed transactions, perform
the following steps:
For any additional guidance, contact My Oracle Support. |
8.3.2.41 PCF_PENDING_BINDING_SITE_TAKEOVER
Table 8-180 PCF_PENDING_BINDING_SITE_TAKEOVER
Field | Details |
---|---|
Description | The site takeover configuration has been activated |
Summary | The site takeover configuration has been activated |
Severity | CRITICAL |
Condition | sum by (application, container, namespace) (changes(occnp_pending_binding_site_takeover_total[2m])) > 0 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.45 |
Metric Used | occnp_pending_binding_site_takeover_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.42 PCF_PENDING_BINDING_THRESHOLD_LIMIT_REACHED
Table 8-181 PCF_PENDING_BINDING_THRESHOLD_LIMIT_REACHED
Field | Details |
---|---|
Description | The Pending Operation table threshold has been reached. |
Summary | The Pending Operation table threshold has been reached. |
Severity | CRITICAL |
Condition | sum by (application, container, namespace) (changes(occnp_threshold_limit_reached_total[2m])) > 0 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.46 |
Metric Used | occnp_threshold_limit_reached_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.43 PCF_PENDING_BINDING_RECORDS_COUNT
Table 8-182 PCF_PENDING_BINDING_RECORDS_COUNT
Field | Details |
---|---|
Description | An attempt to internally recreate a PCF binding has been triggered by PCF |
Summary | An attempt to internally recreate a PCF binding has been triggered by PCF |
Severity | MINOR |
Condition | sum by (application, container, namespace) (changes(occnp_pending_operation_records_count[10s])) > 0 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.47 |
Metric Used | occnp_pending_operation_records_count |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.44 AUTONOMOUS_SUBSCRIPTION_FAILURE
Table 8-183 AUTONOMOUS_SUBSCRIPTION_FAILURE
Field | Details |
---|---|
Description | Autonomous subscription failed for a configured Slice Load Level |
Summary | Autonomous subscription failed for a configured Slice Load Level |
Severity | Critical |
Condition | The number of failed Autonomous Subscription for a configured Slice Load Leve in nwdaf-agent is greater than zero. |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.49 |
Metric Used | subscription_failure{requestType="autonomous"} |
Recommended Actions | The alert gets cleared when the failed Autonomous
Subscription is corrected.
To clear the alert, perform the
following steps:
For any additional guidance, contact My Oracle Support. |
8.3.2.45 AM_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT
Table 8-184 AM_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT
Field | Details |
---|---|
Description | AM Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }}) |
Summary | AM Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }}) |
Severity | MINOR |
Condition | (sum(rate(http_out_conn_response_total{pod=~".*amservice.*",responseCode!~"2.*",servicename3gpp="npcf-am-policy-control"}[1d])) / sum(rate(http_out_conn_response_total{pod=~".*amservice.*",servicename3gpp="npcf-am-policy-control"}[1d]))) * 100 >= 1 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.54 |
Metric Used | http_out_conn_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.46 AM_AR_ERROR_RATE_ABOVE_1_PERCENT
Table 8-185 AM_AR_ERROR_RATE_ABOVE_1_PERCENT
Field | Details |
---|---|
Description | Alternate Routing Error Rate detected above 1 Percent of Total on AM Service (current value is: {{ $value }}) |
Summary | Alternate Routing Error Rate detected above 1 Percent of Total on AM Service (current value is: {{ $value }}) |
Severity | MINOR |
Condition | (sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*amservice.*",responseCode!~"2.*",servicename3gpp="npcf-am-policy-control"}[1d])) / sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*amservice.*",servicename3gpp="npcf-am-policy-control"}[1d]))) * 100 >= 1 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.55 |
Metric Used | ocpm_ar_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.47 UE_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT
Table 8-186 UE_NOTIFICATION_ERROR_RATE_ABOVE_1_PERCENT
Field | Details |
---|---|
Description | UE Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }}) |
Summary | UE Notification Error Rate detected above 1 Percent of Total (current value is: {{ $value }}) |
Severity | MINOR |
Condition | (sum(rate(http_out_conn_response_total{pod=~".*ueservice.*",responseCode!~"2.*",servicename3gpp="npcf-ue-policy-control"}[1d])) / sum(rate(http_out_conn_response_total{pod=~".*ueservice.*",servicename3gpp="npcf-ue-policy-control"}[1d]))) * 100 >= 1 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.56 |
Metric Used | http_out_conn_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.48 UE_AR_ERROR_RATE_ABOVE_1_PERCENT
Table 8-187 UE_AR_ERROR_RATE_ABOVE_1_PERCENT
Field | Details |
---|---|
Description | Alternate Routing Error Rate detected above 1 Percent of Total on UE Service (current value is: {{ $value }}) |
Summary | Alternate Routing Error Rate detected above 1 Percent of Total on UE Service (current value is: {{ $value }}) |
Severity | MINOR |
Condition | (sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*ueservice.*",responseCode!~"2.*",servicename3gpp="npcf-ue-policy-control"}[1d])) / sum by (fqdn) (rate(ocpm_ar_response_total{pod=~".*ueservice.*",servicename3gpp="npcf-ue-policy-control"}[1d]))) * 100 >= 1 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.57 |
Metric Used | ocpm_ar_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.49 SMSC_CONNECTION_DOWN
Table 8-188 SMSC_CONNECTION_DOWN
Field | Details |
---|---|
Description | Connection to SMSC peer {{$labels.smscName}} is down in notifier service pod {{$labels.pod}} |
Summary | Connection to SMSC peer {{$labels.smscName}} is down in notifier service pod {{$labels.pod}} |
Severity | MAJOR |
Condition | sum by(namespace, pod, smscName)(occnp_active_smsc_conn_count) == 0 |
OID |
1.3.6.1.4.1.323.5.3.52.1.2.63 |
Metric Used | occnp_active_smsc_conn_count |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.50 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD
Table 8-189 LOCK_ACQUISITION_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | lockAcquisitionExceedsMinorThreshold |
Description | The lock requests fails to acquire the lock count exceeds the minor threshold limit. The (current value is: {{ $value }}) |
Summary | Keys used in Bulwark lock request which are already in locked state detected above 20 Percent of Total Transactions. |
Severity | Minor |
Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=20 < 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
Metric Used | - |
Recommended Actions | - |
8.3.2.51 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD
Table 8-190 LOCK_ACQUISITION_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | lockAcquisitionExceedsMajorThreshold |
Description | The lock requests fails to acquire the lock count exceeds the major threshold limit. The (current value is: {{ $value }}) |
Summary | Keys used in Bulwark lock request which are already in locked state detected above 50 Percent of Total Transactions. |
Severity | Major |
Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >= 50 < 75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
Metric Used | - |
Recommended Actions | - |
8.3.2.52 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD
Table 8-191 LOCK_ACQUISITION_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Name in Alert Yaml File | lockAcquisitionExceedsCriticalThreshold |
Description | The lock requests fails to acquire the lock count exceeds the critical threshold limit. The (current value is: {{ $value }}) |
Summary | Keys used in Bulwark lock request which are already in locked state detected above 75 Percent of Total Transactions. |
Severity | Critical |
Expression | (sum by (namespace) (increase(lock_response_total{requestType="acquireLock",responseType="failure"}[5m])) /sum by (namespace) (increase(lock_request_total{requestType="acquireLock"}[5m]))) * 100 >=75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.69 |
Metric Used | - |
Recommended Actions | - |
8.3.2.53 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_MINOR_THRESHOLD
Table 8-192 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | Fail to register the coherence callback subscription for already locked keys and the count exceeds the minor threshold limit. |
Summary | Coherence callback registrations failures detected above 20 percent of total transactions. |
Severity | Minor |
Expression | (sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration",opStatus="failure"}[5m])) /sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration"}[5m]))) * 100 >=20 < 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.70 |
Metric Used | - |
Recommended Actions | - |
8.3.2.54 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_MAJOR_THRESHOLD
Table 8-193 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | Fail to register the coherence callback subscription for already locked keys and the count exceeds the major threshold limit. The (current value is: {{ $value }}) |
Summary | Coherence callback registrations failures detected above 50 percent of total transactions. |
Severity | Major |
Expression | (sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration",opStatus="failure"}[5m])) /sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration"}[5m]))) * 100 >=50 < 75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.70 |
Metric Used | - |
Recommended Actions | - |
8.3.2.55 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_CRITICAL_THRESHOLD
Table 8-194 LOCK_SUBSCRIPTION_CALLBACK_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | Fail to register the coherence callback subscription for already locked keys and the count exceeds the critical threshold limit. The (current value is: {{ $value }}) |
Summary | Coherence callback registrations failures detected above 75 percent of total transactions. |
Severity | Critical |
Expression | (sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration",opStatus="failure"}[5m])) /sum by (namespace) (increase(coherence_callback_operation_total{opType="Registration"}[5m]))) * 100 >=75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.70 |
Metric Used | - |
Recommended Actions | - |
8.3.2.56 SM_UPDATE_NOTIFY_FAILED_ABOVE_50_PERCENT
Table 8-195 SM_UPDATE_NOTIFY_FAILED_ABOVE_50_PERCENT
Field | Details |
---|---|
Description | Update Notify Terminate sent to SMF failed >= 50 < 60 |
Summary | Update Notify Terminate sent to SMF failed >= 50 < 60 |
Severity | MINOR |
Condition | (sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol",responseCode!~"2.*"})*100)/ sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol"}) >= 50 < 60 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.80 |
Metric Used | occnp_http_out_conn_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.57 SM_UPDATE_NOTIFY_FAILED_ABOVE_60_PERCENT
Table 8-196 SM_UPDATE_NOTIFY_FAILED_ABOVE_60_PERCENT
Field | Details |
---|---|
Description | Update Notify Terminate sent to SMF failed >= 60 < 70 |
Summary | Update Notify Terminate sent to SMF failed >= 60 < 70 |
Severity | MAJOR |
Condition | (sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol",responseCode!~"2.*"})*100)/ sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol"}) >= 60 < 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.80 |
Metric Used | occnp_http_out_conn_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.58 SM_UPDATE_NOTIFY_FAILED_ABOVE_70_PERCENT
Table 8-197 SM_UPDATE_NOTIFY_FAILED_ABOVE_70_PERCENT
Field | Details |
---|---|
Description | Update Notify Terminate sent to SMF failed >= 70 |
Summary | Update Notify Terminate sent to SMF failed >= 70 |
Severity | CRITICAL |
Condition | (sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol",responseCode!~"2.*"})*100)/ sum(occnp_http_out_conn_response_total{operationType="terminate_notify",pod=~".*smservice.*",servicename3gpp="npcf-smpolicycontrol"}) >= 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.80 |
Metric Used | occnp_http_out_conn_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.59 UPDATE_NOTIFY_FAILURE_ABOVE_30_PERCENT
Table 8-198 UPDATE_NOTIFY_FAILURE_ABOVE_30_PERCENT
Field | Details |
---|---|
Description | Number of Update notify that failed is equal or above 30% but less than 50% of total Rx sessions. |
Summary | Number of Update notify that failed is equal or above 30% but less than 50% of total Rx sessions. |
Severity | MINOR |
Condition | (sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.94 |
Metric Used | occnp_http_out_conn_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.60 UPDATE_NOTIFY_FAILURE_ABOVE_50_PERCENT
Table 8-199 UPDATE_NOTIFY_FAILURE_ABOVE_50_PERCENT
Field | Details |
---|---|
Description | Number of Update notify that failed is equal or above 50% but less than 70% in a given time period |
Summary | Number of Update notify that failed is equal or above 50% but less than 70% in a given time period |
Severity | MAJOR |
Condition | (sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.94 |
Metric Used | occnp_http_out_conn_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.61 UPDATE_NOTIFY_FAILURE_ABOVE_70_PERCENT
Table 8-200 UPDATE_NOTIFY_FAILURE_ABOVE_70_PERCENT
Field | Details |
---|---|
Description | Number of Update notify failed is equal or above 70% in a given time period |
Summary | Number of Update notify failed is equal or above 70% in a given time period |
Severity | Critical |
Condition | (sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm",responseCode!~"2.*"}[5m])) / sum by (namespace) (rate(occnp_http_out_conn_response_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.94 |
Metric Used | occnp_http_out_conn_response_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.62 POD_PROTECTION_BY_RATELIMIT_REJECTED_REQUEST
Table 8-201 POD_PROTECTION_BY_RATELIMIT_REJECTED_REQUEST
Field | Details |
---|---|
Description | Ingress Gateway traffic gets rejected more than 1% because of ratelimiting. |
Summary | Ingress Gateway traffic gets rejected more than 1% because of ratelimiting. |
Severity | Major |
Condition | (sum by (namespace,pod) (rate(oc_ingressgateway_http_request_ratelimit_values_total {Allowed="false",app_kubernetes_io_name="occnp-ingress-gateway"}[2m])))/ (sum by (namespace,pod) (rate(oc_ingressgateway_http_request_ratelimit_values_total {app_kubernetes_io_name="occnp-ingress-gateway"}[2m]))) * 100 >= 1 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.103 |
Metric Used | oc_ingressgateway_http_request_ratelimit_values_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.63 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MINOR_THRESHOLD
Table 8-202 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Description | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 20 Percent of Total n1n2 notify Request. |
Summary | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 20 Percent of Total n1n2 notify Request. |
Severity | Minor |
Condition | sum by (namespace) (rate(ue_n1_transfer_ue_notification_total{commandType="MANAGE_UE_POLICY_COMMAND_REJECT"}[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.91 |
Metric Used | ue_n1_transfer_ue_notification_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.64 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MAJOR_THRESHOLD
Table 8-203 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 50 Percent of Total n1n2 notify Request. |
Summary | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 50 Percent of Total n1n2 notify Request. |
Severity | Major |
Condition | sum by (namespace) (rate(ue_n1_transfer_ue_notification_total{commandType="MANAGE_UE_POLICY_COMMAND_REJECT"}[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.91 |
Metric Used | ue_n1_transfer_ue_notification_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.65 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_CRITICAL_THRESHOLD
Table 8-204 UE_N1N2_NOTIFY_REJECTION_RATE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 75 Percent of Total n1n2 notify Request. |
Summary | UE N1N2 Notification Rate containing request of MANAGE_UE_POLICY_COMMAND_REJECT from AMF is detected to be above 75 Percent of Total n1n2 notify Request. |
Severity | CRITICAL |
Condition | sum by (namespace) (rate(ue_n1_transfer_ue_notification_total{commandType="MANAGE_UE_POLICY_COMMAND_REJECT"}[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.91 |
Metric Used | ue_n1_transfer_ue_notification_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.66 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MINOR_THRESHOLD
Table 8-205 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Description |
Over 20% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
Summary |
Above 20 percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
Severity | Minor |
Condition | sum by (namespace) (rate(ue_n1_transfer_failure_notification_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.92 |
Metric Used | ue_n1_transfer_failure_notification_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.67 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MAJOR_THRESHOLD
Table 8-206 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Description |
Over 50% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
Summary |
Over 50% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
Severity | Major |
Condition | sum by (namespace) (rate(ue_n1_transfer_failure_notification_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.92 |
Metric Used | ue_n1_transfer_failure_notification_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.68 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_CRITICAL_THRESHOLD
Table 8-207 UE_N1N2_TRANSFER_FAILURE_RATE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description |
Over 75% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
Summary |
Over 75% percent of total N1N2 transfer requests from AMF are of N1N2 transfer failure notification requests from AMF. |
Severity | Critical |
Condition | sum by (namespace) (rate(ue_n1_transfer_failure_notification_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.92 |
Metric Used | ue_n1_transfer_failure_notification_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.69 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MINOR_THRESHOLD
Table 8-208 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Description |
Over 20% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
Summary |
Over 20% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
Severity | Minor |
Condition | sum by (namespace) (rate(ue_n1_transfer_t3501_expiry_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.93 |
Metric Used | ue_n1_transfer_t3501_expiry_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.70 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MAJOR_THRESHOLD
Table 8-209 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Description |
Over 50% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
Summary |
Over 50% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
Severity | Major |
Condition | sum by (namespace) (rate(ue_n1_transfer_t3501_expiry_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.93 |
Metric Used | ue_n1_transfer_t3501_expiry_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.71 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_CRITICAL_THRESHOLD
Table 8-210 UE_N1N2_TRANSFER_T3501_TIMER_EXPIRY_RATE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description |
Over 75% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
Summary |
Over 75% of UE N1N2 transfers have T3501 timer expiry before the N1N2 notify is received from AMF for the respective transfer. |
Severity | Critical |
Condition | sum by (namespace) (rate(ue_n1_transfer_t3501_expiry_total[5m])) / sum by (namespace) (rate(ue_n1_transfer_response_total[5m])) * 100 > 75 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.93 |
Metric Used | ue_n1_transfer_t3501_expiry_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.72 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_CRITICAL_THRESHOLD
Table 8-211 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period. |
Summary | This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period. |
Severity | Critical |
Condition | (sum by (namespace) (rate(ocpm_handle_update_notify_error_response_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.111 |
Metric Used | ocpm_handle_update_notify_error_response_as_pending_confirmation_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.73 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MAJOR_THRESHOLD
Table 8-212 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | This alert is triggered when the number of update notify failed because a timeout is equal or above 50% in a given time period. |
Summary | This alert is triggered when the number of update notify failed because a timeout is equal or above 50% in a given time period. |
Severity | Major |
Condition | (sum by (namespace) (rate(ocpm_handle_update_notify_error_response_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.111 |
Metric Used | ocpm_handle_update_notify_error_response_as_pending_confirmation_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.74 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MINOR_THRESHOLD
Table 8-213 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_ERROR_RESPONSE_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Description | This alert is triggered when the number of update notify failed because a timeout is equal or above 30% but less than 50% of total Rx sessions. |
Summary | This alert is triggered when the number of update notify failed because a timeout is equal or above 30% but less than 50% of total Rx sessions. |
Severity | Minor |
Condition | (sum by (namespace) (rate(ocpm_handle_update_notify_error_response_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm", responseCode=~"5xx/4xx"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 30 < 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.111 |
Metric Used | ocpm_handle_update_notify_error_response_as_pending_confirmation_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.75 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_CRITICAL_THRESHOLD
Table 8-214 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period. |
Summary | This alert is triggered when the number of update notify failed because a timeout is equal or above 70% in a given time period. |
Severity | Critical |
Condition | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.112 |
Metric Used | ocpm_handle_update_notify_timeout_as_pending_confirmation_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.76 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MAJOR_THRESHOLD
Table 8-215 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | This alert is triggered when the number of update notify that failed because a timeout is equal or above 50% but less than 70% in a given time period. |
Summary | This alert is triggered when the number of update notify that failed because a timeout is equal or above 50% but less than 70% in a given time period. |
Severity | Major |
Condition | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 50 < 70 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.112 |
Metric Used | ocpm_handle_update_notify_timeout_as_pending_confirmation_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.77 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MINOR_THRESHOLD
Table 8-216 RX_PENDING_CONFIRMATION_UPDATE_NOTIFY_TIMEOUT_ABOVE_MINOR_THRESHOLD
Field | Details |
---|---|
Description | This alert is triggered when the number of update notify that failed because a timeout is equal or above 30% but less than 50% of total Rx sessions. |
Summary | This alert is triggered when the number of update notify that failed because a timeout is equal or above 30% but less than 50% of total Rx sessions. |
Severity | Minor |
Condition | (sum by (namespace) (rate(ocpm_handle_update_notify_timeout_as_pending_confirmation_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m])) / sum by (namespace) (rate(ocpm_rx_update_notify_request_total{operationType="update_notify",microservice=~".*pcf_sm"}[5m]))) * 100 >= 30 < 50 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.112 |
Metric Used | ocpm_handle_update_notify_timeout_as_pending_confirmation_total |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.2.78 PCF_STATE_NON_FUNCTIONAL_CRITICAL
Table 8-217 PCF_STATE_NON_FUNCTIONAL_CRITICAL
Field | Details |
---|---|
Description | Policy is in non functional state due to DB cluster state down. |
Summary | Policy is in non functional state due to DB cluster state down. |
Severity | Critical |
Condition | appinfo_nfDbFunctionalState_current{nfDbFunctionalState="Not_Running"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.102 |
Metric Used | appinfo_nfDbFunctionalState_current |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.3 PCRF Alerts
This section provides information about PCRF alerts.
8.3.3.1 PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD
PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD
Table 8-218 PRE_UNREACHABLE_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | PRE fail count exceeds the critical threshold limit. |
Summary | Alert PRE unreachable NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | PRE fail count exceeds the critical threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.9 |
Metric Used | http_out_conn_response_total{container="pcrf-core", responseCode!~"2.*", serviceResource="PRE"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.2 PRE_UNREACHABLE_EXCEEDS_MAJOR_THRESHOLD
PRE_UNREACHABLE_EXCEEDS_MAJOR_THRESHOLD
Table 8-219 PRE_UNREACHABLE_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | PRE fail count exceeds the major threshold limit. |
Summary | Alert PRE unreachable NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | PRE fail count exceeds the major threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.9 |
Metric Used | http_out_conn_response_total{container="pcrf-core", responseCode!~"2.*", serviceResource="PRE"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.3 PRE_UNREACHABLE_EXCEEDS_MINOR_THRESHOLD
PRE_UNREACHABLE_EXCEEDS_MINOR_THRESHOLD
Table 8-220 PRE_UNREACHABLE_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | PRE fail count exceeds the minor threshold limit. |
Summary | Alert PRE unreachable NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | minor |
Condition | PRE fail count exceeds the minor threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.9 |
Metric Used | http_out_conn_response_total{container="pcrf-core", responseCode!~"2.*", serviceResource="PRE"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.4 PCRF_DOWN
Table 8-221 PCRF_DOWN
Field | Details |
---|---|
Description | PCRF Service is down |
Summary | Alert PCRF_DOWN NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | None of the pods of the PCRF service are available. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.33 |
Metric Used | appinfo_service_running{service=~".*pcrf-core"} |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.3.5 CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-222 CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | CCA fail count exceeds the critical threshold limit |
Summary | Alert CCA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The failure rate of CCA messages has exceeded the configured threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.13 |
Metric Used | occnp_diam_response_local_total{msgType=~"CCA.*", responseCode!~"2.*"} |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.3.6 CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-223 CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | CCA fail count exceeds the major threshold limit |
Summary | Alert CCA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The failure rate of CCA messages has exceeded the configured major threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.13 |
Metric Used | occnp_diam_response_local_total{msgType=~"CCA.*", responseCode!~"2.*"} |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.3.7 CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-224 CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | CCA fail count exceeds the minor threshold limit |
Summary | Alert CCA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The failure rate of CCA messages has exceeded the configured minor threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.13 |
Metric Used | occnp_diam_response_local_total{msgType=~"CCA.*", responseCode!~"2.*"} |
Recommended Actions |
For any additional guidance, contact My Oracle Support. |
8.3.3.8 AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-225 AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | AAA fail count exceeds the critical threshold limit |
Summary | Alert AAA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The failure rate of AAA messages has exceeded the critical threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.34 |
Metric Used | occnp_diam_response_local_total{msgType=~"AAA.*", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.9 AAA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
AAA Fail Count Exceeds Major Threshold
Table 8-226 AAA Fail Count Exceeds Major Threshold
Field | Details |
---|---|
Description | AAA fail count exceeds the major threshold limit |
Summary | Alert AAA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The failure rate of AAA messages has exceeded the major threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.34 |
Metric Used | occnp_diam_response_local_total{msgType=~"AAA.*", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.10 AAA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
AAA Fail Count Exceeds Minor Threshold
Table 8-227 AAA Fail Count Exceeds Minor Threshold
Field | Details |
---|---|
Description | AAA fail count exceeds the minor threshold limit |
Summary | Alert AAA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The failure rate of AAA messages has exceeded the minor threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.34 |
Metric Used | occnp_diam_response_local_total{msgType=~"AAA.*", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.11 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-228 RAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | RAA Rx fail count exceeds the critical threshold limit |
Summary | Alert RAA_Rx_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The failure rate of RAA Rx messages has exceeded the configured threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.35 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.12 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
(Required) <Enter a short description here.>
RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-229 RAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | RAA Rx fail count exceeds the major threshold limit |
Summary | Alert RAA_Rx_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The failure rate of RAA Rx messages has exceeded the configured major threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.35 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.13 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
(Required) <Enter a short description here.>
RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-230 RAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | RAA Rx fail count exceeds the minor threshold limit |
Summary | Alert RAA_Rx_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The failure rate of RAA Rx messages has exceeded the configured minor threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.35 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.14 RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-231 RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | RAA Gx fail count exceeds the critical threshold limit |
Summary | Alert RAA_GX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The failure rate of RAA Gx messages has exceeded the configured threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.18 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.15 RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
(Required) <Enter a short description here.>
RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-232 RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | RAA Gx fail count exceeds the major threshold limit |
Summary | Alert RAA_GX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The failure rate of RAA Gx messages has exceeded the configured major threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.18 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.16 RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
(Required) <Enter a short description here.>
RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-233 RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | RAA Gx fail count exceeds the minor threshold limit |
Summary | Alert RAA_GX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The failure rate of RAA Gx messages has exceeded the configured minor threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.18 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.17 ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-234 ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | ASA fail count exceeds the critical threshold limit |
Summary | Alert ASA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The failure rate of ASA messages has exceeded the configured threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.17 |
Metric Used | occnp_diam_response_local_total{msgType=~"ASA.*", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.18 ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
(Required) <Enter a short description here.>
ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-235 ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | ASA fail count exceeds the major threshold limit |
Summary | Alert ASA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The failure rate of ASA messages has exceeded the configured major threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.17 |
Metric Used | occnp_diam_response_local_total{msgType=~"ASA.*", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.19 ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
(Required) <Enter a short description here.>
ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-236 ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | ASA fail count exceeds the minor threshold limit |
Summary | Alert ASA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The failure rate of ASA messages has exceeded the configured minor threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.17 |
Metric Used | occnp_diam_response_local_total{msgType=~"ASA.*", responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.20 STA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
STA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-237 STA_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | STA fail count exceeds the critical threshold limit. |
Summary | sum(rate(occnp_diam_response_local_total{msgType="STA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA"}[5m])) * 100 > 90 |
Severity | Critical |
Condition | The failure rate of STA messages has exceeded the configured critical threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.19 |
Metric Used | occnp_diam_response_local_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.21 STA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
STA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-238 STA_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | STA fail count exceeds the major threshold limit. |
Summary | sum(rate(occnp_diam_response_local_total{msgType="STA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA"}[5m])) * 100 > 80 |
Severity | Major |
Condition | The failure rate of STA messages has exceeded the configured major threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.19 |
Metric Used | occnp_diam_response_local_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.22 STA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
STA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-239 STA_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | STA fail count exceeds the minor threshold limit. |
Summary | sum(rate(occnp_diam_response_local_total{msgType="STA", responseCode!~"2.*"}[5m])) / sum(rate(occnp_diam_response_local_total{msgType="STA"}[5m])) * 100 > 60 |
Severity | Minor |
Condition | The failure rate of STA messages has exceeded the configured minor threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.19 |
Metric Used | occnp_diam_response_local_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.23 ASATimeoutlCountExceedsThreshold
ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-240 ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | ASA timeout count exceeds the critical threshold limit |
Summary | Alert ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The timeout rate of ASA messages has exceeded the configured threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.31 |
Metric Used | occnp_diam_response_local_total{msgType="ASA", responseCode="timeout"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.24 ASA_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
(Required) <Enter a short description here.>
ASA_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-241 ASA_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | ASA timeout count exceeds the major threshold limit |
Summary | Alert ASA_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The timeout rate of ASA messages has exceeded the configured major threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.31 |
Metric Used | occnp_diam_response_local_total{msgType="ASA", responseCode="timeout"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.25 ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
(Required) <Enter a short description here.>
ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-242 ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | ASA timeout count exceeds the minor threshold limit |
Summary | Alert ASA_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The timeout rate of ASA messages has exceeded the configured minor threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.31 |
Metric Used | occnp_diam_response_local_total{msgType="ASA", responseCode="timeout"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.26 RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 8-243 RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Field | Details |
---|---|
Description | RAA Gx timeout count exceeds the critical threshold limit |
Summary | Alert RAA_GX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The timeout rate of RAA Gx messages has exceeded the configured threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.32 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"timeout"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.27 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-244 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | RAA Gx timeout count exceeds the major threshold limit |
Summary | Alert RAA_GX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The timeout rate of RAA Gx messages has exceeded the configured major threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.32 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"timeout"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.28 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-245 RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | RAA Gx timeout count exceeds the minor threshold limit |
Summary | Alert RAA_GX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The timeout rate of RAA Gx messages has exceeded the configured minor threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.44.1.2.32 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Gx", responseCode!~"timeout"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.29 RAA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD
RAA Rx Timeout Count Exceeds Critical Threshold
Table 8-246 RAA Rx Timeout Count Exceeds Critical Threshold
Field | Details |
---|---|
Description | RAA Rx timeout count exceeds the critical threshold limit |
Summary | Alert RAA_RX_TIMEOUT_COUNT_EXCEEDS_CRITICAL_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The timeout rate of RAA Rx messages has exceeded the configured threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.36 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"timeout"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.30 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 8-247 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD
Field | Details |
---|---|
Description | RAA Rx timeout count exceeds the major threshold limit |
Summary | Alert RAA_RX_TIMEOUT_COUNT_EXCEEDS_MAJOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The timeout rate of RAA Rx messages has exceeded the configured major threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.36 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"timeout"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.31 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 8-248 RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD
Field | Details |
---|---|
Description | RAA Rx timeout count exceeds the minor threshold limit |
Summary | Alert RAA_RX_TIMEOUT_COUNT_EXCEEDS_MINOR_THRESHOLD NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The timeout rate of RAA Rx messages has exceeded the configured minor threshold limit. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.36 |
Metric Used | occnp_diam_response_local_total{msgType="RAA", appType="Rx", responseCode!~"timeout"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.32 RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Table 8-249 RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Field | Details |
---|---|
Description | CCA, AAA, RAA, ASA and STA error rate combined is above 10 percent |
Summary | Alert RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The combined failure rate of CCA, AAA, RAA, ASA, and STA messages is more than 10% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.37 |
Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.33 RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Table 8-250 RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Field | Details |
---|---|
Description | CCA, AAA, RAA, ASA and STA error rate combined is above 5 percent |
Summary | Alert RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The combined failure rate of CCA, AAA, RAA, ASA, and STA messages is more than 5% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.37 |
Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.34 RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Table 8-251 RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Field | Details |
---|---|
Description | CCA, AAA, RAA, ASA and STA error rate combined is above 1 percent |
Summary | Alert RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The combined failure rate of CCA, AAA, RAA, ASA, and STA messages is more than 1% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.37 |
Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.35 Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Table 8-252 Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Field | Details |
---|---|
Description | Rx error rate combined is above 10 percent |
Summary | Alert Rx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The failure rate of Rx responses is more than 10% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.38 |
Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Rx"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.36 Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Table 8-253 Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Field | Details |
---|---|
Description | Rx error rate combined is above 5 percent |
Summary | Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The failure rate of Rx responses is more than 5% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.38 |
Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Rx"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.37 Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Table 8-254 Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Field | Details |
---|---|
Description | Rx error rate combined is above 1 percent |
Summary | Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The failure rate of Rx responses is more than 1% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.38 |
Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Rx"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.38 Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Table 8-255 Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT
Field | Details |
---|---|
Description | Gx error rate combined is above 10 percent |
Summary | Alert Gx_RESPONSE_ERROR_RATE_ABOVE_CRITICAL_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Critical |
Condition | The failure rate of Gx responses is more than 10% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.39 |
Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Gx"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.39 Gx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Gx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Table 8-256 Gx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT
Field | Details |
---|---|
Description | Gx error rate combined is above 5 percent |
Summary | Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MAJOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Major |
Condition | The failure rate of Gx responses is more than 5% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.39 |
Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Gx"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.40 Gx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
(Required) <Enter a short description here.>
Gx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Table 8-257 Gx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT
Field | Details |
---|---|
Description | Gx error rate combined is above 1 percent |
Summary | Alert Rx_RESPONSE_ERROR_RATE_ABOVE_MINOR_PERCENT NS:{{ $labels.kubernetes_namespace }}, PODNAME:{{ $labels.kubernetes_pod_name }}, INST:{{ $labels.instance }} REL:{{ $labels.release }} |
Severity | Minor |
Condition | The failure rate of Gx responses is more than 1% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.36.1.2.39 |
Metric Used | occnp_diam_response_local_total{ responseCode!~"2.*", appType="Gx"} |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.41 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL
STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL
Table 8-258 STALE_DIAMETER_REQUEST_CLEANUP_CRITICAL
Field | Details |
---|---|
Description | The Diameter requests are being discarded due to timeout processing occurring above 30% |
Summary | The Diameter requests are being discarded due to timeout processing occurring above 30% |
Severity | Critical |
Condition | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 30 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
Metric Used | occnp_stale_diam_request_cleanup_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.42 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR
STALE_DIAMETER_REQUEST_CLEANUP_MAJOR
Table 8-259 STALE_DIAMETER_REQUEST_CLEANUP_MAJOR
Field | Details |
---|---|
Description | The Diameter requests are being discarded due to timeout processing occurring above 20% |
Summary | The Diameter requests are being discarded due to timeout processing occurring above 20% |
Severity | Major |
Condition | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 20 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
Metric Used | occnp_stale_diam_request_cleanup_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |
8.3.3.43 STALE_DIAMETER_REQUEST_CLEANUP_MINOR
STALE_DIAMETER_REQUEST_CLEANUP_MINOR
Table 8-260 STALE_DIAMETER_REQUEST_CLEANUP_MINOR
Field | Details |
---|---|
Description | The Diameter requests are being discarded due to timeout processing occurring above 10% |
Summary | The Diameter requests are being discarded due to timeout processing occurring above 10% |
Severity | Minor |
Condition | (sum by (namespace, microservice, pod) (increase(occnp_stale_diam_request_cleanup_total[24h])) / sum by (namespace, microservice, pod) (increase(occnp_diam_request_local_total{msgType!~"DWR|CER"}[24h]))) * 100 >= 10 |
OID | 1.3.6.1.4.1.323.5.3.52.1.2.82 |
Metric Used | occnp_stale_diam_request_cleanup_total |
Recommended Actions | For any additional guidance, contact My Oracle Support. |