Alert Configuration

4 Alert Configuration

This section describes how to configure alert rules for the UDR. It provides guidance on setting up measurement-based alert rules, where the alerting system evaluates metrics reported by UDR microservices against specified rule conditions to generate alerts as needed. UDR alert rules are configured based on metrics reported by UDR components. The alerting workflow monitors these metrics and issues notifications when the defined conditions are met. For more information about configuring UDR alerts in Prometheus, see the “Alert Configuration” section in Oracle Communications Cloud Native Core, Unified Data Repository Installation, Upgrade, and Fault Recovery Guide.

Alert Details

4.1 Alert Details

This section describes alerts in detail.

Note:

Max Ingress requests/sec in consideration is 1000/second.

Table 4-1 Alerts Levels or Severity Types

Alerts Levels / Severity Types	Definition
Critical	Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of UDR.
Major	Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of UDR.
Minor	Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of UDR.
Info or Warn (Informational)	Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of UDR.

The below table provides alert names for UDR and EIR.

Table 4-2 Alert names for UDR/SLF and EIR

UDR/SLF	EIR
OcudrTrafficRateAboveMajorThreshold	OceirTrafficRateAboveMajorThreshold
OcudrTrafficRateAboveMinorThreshold	OceirTrafficRateAboveMinorThreshold
OcudrTrafficRateAboveCriticalThreshold	OceirTrafficRateAboveCriticalThreshold
OcudrTransactionErrorRateAbove0.1Percent	OceirTransactionErrorRateAbove0.1Percent
OcudrTransactionErrorRateAbove1Percent	OceirTransactionErrorRateAbove1Percent
OcudrTransactionErrorRateAbove10Percent	OceirTransactionErrorRateAbove10Percent
OcudrTrafficRateAboveCriticalThreshold	OceirTrafficRateAboveCriticalThreshold
OcudrTrafficRateAboveMajorThreshold	OceirTrafficRateAboveMajorThreshold
OcudrTrafficRateAboveMinorThreshold	OceirTrafficRateAboveMinorThreshold
OcudrTransactionErrorRateAbove0.1Percent	OceirTransactionErrorRateAbove0.1Percent
OcudrTransactionErrorRateAbove1Percent	OceirTransactionErrorRateAbove1Percent
OcudrTransactionErrorRateAbove10Percent	OceirTransactionErrorRateAbove10Percent
OcudrTransactionErrorRateAbove25Percent	OceirTransactionErrorRateAbove25Percent
OcudrTransactionErrorRateAbove50Percent	OceirTransactionErrorRateAbove50Percent
OcudrSubscriberNotFoundAbove1Percent	OceirSubscriberNotFoundAbove1Percent
OcudrSubscriberNotFoundAbove10Percent	OceirSubscriberNotFoundAbove10Percent
OcudrSubscriberNotFoundAbove25Percent	OceirSubscriberNotFoundAbove25Percent
OcudrSubscriberNotFoundAbove50Percent	OceirSubscriberNotFoundAbove50Percent
OcudrPodsRestart	OceirPodsRestart
NudrServiceDown	NudrServiceDown
NudrProvServiceDown	NudrProvServiceDown
NudrNotifyServiceServiceDown	NA
NudrNRFClientServiceDown	NudrNRFClientServiceDown
NudrConfigServiceDown	NudrConfigServiceDown
NudrDiameterProxyServiceDown	NudrDiameterProxyServiceDown
NudrOnDemandMigrationServiceDown	NA
OcudrIngressGatewayServiceDown	OceirIngressGatewayServiceDown
OcudrEgressGatewayServiceDown	OceirEgressGatewayServiceDown
OcudrDbServiceDown	OceirDbServiceDown
OcudrXFCCValidationFailureAbove10Percent	OceirXFCCValidationFailureAbove10Percent
OcudrXFCCValidationFailureAbove20Percent	OceirXFCCValidationFailureAbove20Percent
OcudrXFCCValidationFailureAbove50Percent	OceirXFCCValidationFailureAbove50Percent
DRServiceOverload60Percent	DRServiceOverload60Percent
DRServiceOverload75Percent	DRServiceOverload75Percent
DRServiceOverload80Percent	DRServiceOverload80Percent
DRServiceOverload90Percent	DRServiceOverload90Percent
SLFSucessTxnDefaultGroupIdRateAbove1Percent	NA
SLFSucessTxnDefaultGroupIdRateAbove10Percent	NA
SLFSucessTxnDefaultGroupIdRateAbove25Percent	NA
SLFSucessTxnDefaultGroupIdRateAbove50Percent	NA
OcudrDiameterCongestionCongestedState	OceirDiameterCongestionCongestedState
OcudrDiameterCongestionDocState	OceirDiameterCongestionDocState
DRProvServiceOverload60Percent	DRProvServiceOverload60Percent
DRProvServiceOverload75Percent	DRProvServiceOverload75Percent
DRProvServiceOverload80Percent	DRProvServiceOverload80Percent
DRProvServiceOverload90Percent	DRProvServiceOverload90Percent
OcudrIngressGatewayProvServiceDown	OceirIngressGatewayProvServiceDown
OcudrProvisioningTrafficRateAboveMajorThreshold	OceirProvisioningTrafficRateAboveMajorThreshold
OcudrProvisioningTrafficRateAboveCriticalThreshold	OceirProvisioningTrafficRateAboveCriticalThreshold
OcudrProvisioningTransactionErrorRateAbove25Percent	OceirProvisioningTransactionErrorRateAbove25Percent
OcudrProvisioningTransactionErrorRateAbove50Percent	OceirProvisioningTransactionErrorRateAbove50Percent
PVCFullForSLFExport	NA
FailedExtractForSLFExport	NA
BulkImportTransferInFailed	BulkImportTransferInFailed
BulkImportTransferOutFailed	BulkImportTransferOutFailed
ExportToolTransferOutFailed	ExportToolTransferOutFailed
PVCFullForXMLBulkImport	PVCFullForXMLBulkImport
PVCFullForBulkImport	PVCFullForBulkImport
OperationalStatusCompleteShutdown	OperationalStatusCompleteShutdown
NFScoreCalculationFailed	NFScoreCalculationFailed
PVCFullForUDRExport	NA
UDRExportFailed	NA
IngressgatewayPodProtectionDocState	IngressgatewayPodProtectionDocState
IngressgatewayPodProtectionCongestedState	IngressgatewayPodProtectionCongestedState
RetryNotificationRecordsMaxLimitExceeded	RetryNotificationRecordsMaxLimitExceeded
UserAgentHeaderNotFoundMorethan10PercentRequest	NA
EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold	EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold
EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold	EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold
EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold	EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold
NudrDiameterGatewayDown	NudrDiameterGatewayDown
DiameterPeerConnectionsDropped	DiameterPeerConnectionsDropped
IGWSignallingPodProtectionDOCState	NA
IGWSignallingPodProtectionCongestedState	NA
IGWSignallingPodProtectionByRateLimitRejectedRequest	NA

Note:

For the following alert details, only UDR alerts names are provided. The corresponding EIR alert names can be found in Table 4-2.

Parent topic: Alert Configuration

4.1.1 System Level Alerts

This section lists the system level alerts.

Parent topic: Alert Details

4.1.1.1 OcudrSubscriberNotFoundAbove1Percent

Table 4-3 OcudrSubscriberNotFoundAbove1Percent

Field	Details
Description	Total number of response if subscriber not found is about 1% of ingress traffic
Summary	Total number of response if subscriber not found is about 1% of ingress traffic
Severity	Warning
Condition	Alert if number of subscribers not found is 1% of all ingress traffic
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7003 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7003 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7003 (For EIR alert name, see Alert Details)
Metric Used	udr_subscriber_not_found_total
Recommended Actions	The alert is cleared when the number of failure of Subscriber Not Found are below 1% of the total. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: System Level Alerts

4.1.1.2 OcudrSubscriberNotFoundAbove10Percent

Table 4-4 OcudrSubscriberNotFoundAbove10Percent

Field	Details
Description	Total number of response if subscriber not found is about 10% of ingress traffic
Summary	Total number of response if subscriber not found is about 10% of ingress traffic
Severity	Minor
Condition	Alert if number of subscribers not found is 10% of all ingress traffic
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7003 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7003 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7003 (For EIR alert name, see Alert Details)
Metric Used	udr_subscriber_not_found_total
Recommended Actions	The alert is cleared when the number of failure of Subscriber Not Found are below 10% of the total. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: System Level Alerts

4.1.1.3 OcudrSubscriberNotFoundAbove25Percent

Table 4-5 OcudrSubscriberNotFoundAbove25Percent

Field	Details
Description	Total number of response if subscriber not found is about 25% of ingress traffic
Summary	Total number of response if subscriber not found is about 25% of ingress traffic
Severity	Major
Condition	Alert if number of subscribers not found is 25% of all ingress traffic
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7003 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7003 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7003 (For EIR alert name, see Alert Details)
Metric Used	udr_subscriber_not_found_total
Recommended Actions	The alert is cleared when the number of failure of Subscriber Not Found are below 25% of the total. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: System Level Alerts

4.1.1.4 OcudrSubscriberNotFoundAbove50Percent

Table 4-6 OcudrSubscriberNotFoundAbove50Percent

Field	Details
Description	Total number of response if subscriber not found is about 50% of ingress traffic
Summary	Total number of response if subscriber not found is about 50% of ingress traffic
Severity	Critical
Condition	Alert if number of subscribers not found is 50% of all ingress traffic
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7003 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7003 EIR: NA
Metric Used	udr_subscriber_not_found_total
Recommended Actions	The alert is cleared when the number of failure of Subscriber Not Found are below 50% of the total. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: System Level Alerts

4.1.1.5 OcudrNfStatusUnavailable

Table 4-7 OcudrNfStatusUnavailable

Field	Details
Description	OCUDR services unavailable
Summary	OCUDR services unavailable
Severity	Critical
Condition	This alert is triggered if OCUDR services are unavailable.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7004 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7004
Metric Used	absent(up{app_kubernetes_io_part_of="ocudr",kubernetes_namespace="ocudr"}) or sum(up{app_kubernetes_io_part_of="ocudr",kubernetes_namespace="ocudr"}) == 0
Recommended Actions	The alert is cleared when all the OCUDR Services will be available. Steps: Check the Service specific metrics to understand the specific service request errors. For eg: absent(up{app_kubernetes_io_part_of="ocudr",kubernetes_namespace="ocudr"}) or sum(up{app_kubernetes_io_part_of="ocudr",kubernetes_namespace="ocudr"}) == 0 If guidance is required, contact My Oracle Support.

Parent topic: System Level Alerts

4.1.1.6 OcudrPodsRestart

Table 4-8 OcudrPodsRestart

Field	Details
Description	Pod {{$labels.pod}} has restarted.
Summary	namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : A Pod has restarted
Severity	Major
Condition	Alert if any of the pod got restarted
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7005 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7005 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7005 (For EIR alert name, see Alert Details)
Metric Used	kube_pod_container_status_restarts_total
Recommended Actions	The alert is cleared automatically if the specific pod is up. Steps: Refer to the application logs on Kibana and filter based on pod name, check for database related failures such as connectivity, kubernetes secrets and so on. Check orchestration logs for liveness or readiness probe failures using the following commands. kubectl get po -n <namespace> Note the full name of the pod that is not running and use it in the following command. kubectl describe pod <desired full pod name> -n <namespace> Check the DB status. For more information, see Oracle Communications Cloud Native Core, cnDBTier User Guide. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. For more information, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Parent topic: System Level Alerts

4.1.1.7 NudrServiceDown

Table 4-9 NudrServiceDown

Field	Details
Description	OCUDR Nudr_DRService {{$labels.app_kubernetes_io_name}} is down
Summary	namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : DR Service is down
Severity	Critical
Condition	Alert if Nudr-dr service is down
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7006 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7006 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7006
Metric Used	app_kubernetes_io_name="nudr-drservice
Recommended Actions	The alert is cleared when the NudrService service is available. Steps: Check the orchestration logs of appinfo service and check for liveness or readiness probe failures using the following commands. kubectl get po -n <namespace> Note the full name of the pod that is not running. It must be used in the following command kubectl describe pod <specific desired full pod name> -n <namespace> Refer the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, capture all the outputs for the above steps and Contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. Refer "NF Data Collector tool user guide" for more details.

Parent topic: System Level Alerts

4.1.1.8 NudrProvServiceDown

Table 4-10 NudrProvServiceDown

Field	Details
Description	OCUDR Nudr_DR_PROVService {{$labels.app_kubernetes_io_name}} is down
Summary	'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : DR Prov Service is down'
Severity	Critical
Condition	Alert if Nudr-dr service is down
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7016 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7015 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7014
Metric Used	app_kubernetes_io_name="nudr-dr-provservice
Recommended Actions	The alert is cleared when the NudrProvService service is available. Steps: Check the orchestration logs of appinfo service and check for liveness or readiness probe failures using the following commands. kubectl get po -n <namespace> Note the full name of the pod that is not running. It must be used in the following command kubectl describe pod <specific desired full pod name> -n <namespace> Refer the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, capture all the outputs for the above steps and Contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. Refer "NF Data Collector tool user guide" for more details.

Parent topic: System Level Alerts

4.1.1.9 NudrNotifyServiceServiceDown

Table 4-11 NudrNotifyServiceServiceDown

Field	Details
Description	OCUDR NudrNotifyServiceService {{$labels.app_kubernetes_io_name}} is down
Summary	namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Nudr Notify Service down.
Severity	Critical
Condition	Alert if Nudr Notify service is down
OID	1.3.6.1.4.1.323.5.3.43.1.2.7016
Metric Used	app_kubernetes_io_name="nudr-notify-service"
Recommended Actions	The alert is cleared when the NotifyService service is available. Steps: Check the orchestration logs of appinfo service and check for liveness or readiness probe failures using the following commands. kubectl get po -n <namespace> Note the full name of the pod that is not running. It must be used in the following command kubectl describe pod <specific desired full pod name> -n <namespace> Refer the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, capture all the outputs for the above steps and Contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. Refer "NF Data Collector tool user guide" for more details.

Parent topic: System Level Alerts

4.1.1.10 NudrNRFClientServiceDown

Table 4-12 NudrNRFClientServiceDown

Field	Details
Description	OCUDR NRFClient service {{$labels.app_kubernetes_io_name}} is down
Summary	namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NRF Client service down
Severity	Critical
Condition	Alert if Nudr Nrf Client service is down
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7007 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7007 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7007
Metric Used	app_kubernetes_io_name="nrf-client-nfmanagement
Recommended Actions	The alert is cleared when the NRFClientService service is available. Steps: Check the orchestration logs of appinfo service and check for liveness or readiness probe failures using the following commands. kubectl get po -n <namespace> Note the full name of the pod that is not running. It must be used in the following command kubectl describe pod <specific desired full pod name> -n <namespace> Refer the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, capture all the outputs for the above steps and Contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. Refer "NF Data Collector tool user guide" for more details.

Parent topic: System Level Alerts

4.1.1.11 NudrConfigServiceDown

Table 4-13 NudrConfigServiceDown

Field	Details
Description	OCUDR config service {{$labels.app_kubernetes_io_name}} is down
Summary	namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : nudr-config service down
Severity	Critical
Condition	Alert if Nudr Config service is down
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7010 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7008 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7008
Metric Used	app_kubernetes_io_name="nudr-config"
Recommended Actions	The alert is cleared when the ConfigService service is available. Steps: Check the orchestration logs of appinfo service and check for liveness or readiness probe failures using the following commands. kubectl get po -n <namespace> Note the full name of the pod that is not running. It must be used in the following command kubectl describe pod <specific desired full pod name> -n <namespace> Refer the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, capture all the outputs for the above steps and Contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. Refer "NF Data Collector tool user guide" for more details.

Parent topic: System Level Alerts

4.1.1.12 NudrDiameterProxyServiceDown

Table 4-14 NudrDiameterProxyServiceDown

Field	Details
Description	OCUDR diameterproxy service {{$labels.app_kubernetes_io_name}} is down
Summary	namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : nudr-diameterproxy service is down
Severity	Critical
Condition	Alert if Nudr Diameter Proxy is down
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7008 SLF: NA EIR: NA
Metric Used	app_kubernetes_io_name="nudr-diameterproxy"
Recommended Actions	The alert is cleared when the DiameterProxyService service is available. Steps: Check the orchestration logs of appinfo service and check for liveness or readiness probe failures using the following commands. kubectl get po -n <namespace> Note the full name of the pod that is not running. It must be used in the following command kubectl describe pod <specific desired full pod name> -n <namespace> Refer the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, capture all the outputs for the above steps and Contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. Refer "NF Data Collector tool user guide" for more details.

Parent topic: System Level Alerts

4.1.1.13 NudrOnDemandMigrationServiceDown

Table 4-15 NudrOnDemandMigrationServiceDown

Field	Details
Description	OCUDR ondemand-migration service {{$labels.app_kubernetes_io_name}} is down
Summary	namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : NFSubscription service is down
Severity	Critical
Condition	Alert if Nudr On Demand Migration is down
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7009 SLF: NA EIR: NA
Metric Used	app_kubernetes_io_name="nudr-ondemand-migration"
Recommended Actions	The alert is cleared when the OnDemandMigrationService service is available. Steps: Check the orchestration logs of appinfo service and check for liveness or readiness probe failures using the following commands. kubectl get po -n <namespace> Note the full name of the pod that is not running. It must be used in the following command kubectl describe pod <specific desired full pod name> -n <namespace> Refer the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, capture all the outputs for the above steps and Contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. Refer "NF Data Collector tool user guide" for more details.

Parent topic: System Level Alerts

4.1.1.14 OcudrIngressGatewayServiceDown

Table 4-16 OcudrIngressGatewayServiceDown

Field	Details
Description	OCUDR Ingress-Gateway service {{$labels.app_kubernetes_io_name}} is down
Summary	namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Ingress-gateway service down
Severity	Critical
Condition	Alert if Ingress Service is down
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7011 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7009 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7009 (For EIR alert name, see Alert Details)
Metric Used	app_kubernetes_io_name="ingressgateway"
Recommended Actions	The alert is cleared when the ingressgateway service is available. Steps: Check the orchestration logs of ingress-gateway service and check for liveness or readiness probe failures using the following commands. kubectl get po -n <namespace> Note the full name of the pod that is not running. It must be used in the following command kubectl describe pod <specific desired full pod name> -n <namespace> Refer the application logs on Kibana and filter based on ingress-gateway service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, capture all the outputs for the above steps and Contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. Refer "NF Data Collector tool user guide" for more details.

Parent topic: System Level Alerts

4.1.1.15 OcudrEgressGatewayServiceDown

Table 4-17 OcudrEgressGatewayServiceDown

Field	Details
Description	OCUDR Egress-Gateway service {{$labels.app_kubernetes_io_name}} is down
Summary	namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Egress-Gateway service down
Severity	Critical
Condition	Alert if Egress Service is down
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7012 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7010 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7010 (For EIR alert name, see Alert Details)
Metric Used	app_kubernetes_io_name="egressgateway"
Recommended Actions	The alert is cleared when the egressgateway service is available. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Check the orchestration logs of egress-gateway service and check for liveness or readiness probe failures using the following commands. kubectl get po -n <namespace> Note the full name of the pod that is not running. It must be used in the following command kubectl describe pod <specific desired full pod name> -n <namespace> Refer the application logs on Kibana and filter based on egress-gateway service names. Check for ERROR WARNING logs related to thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, capture all the outputs for the above steps and Contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. Refer "NF Data Collector tool user guide" for more details.

Parent topic: System Level Alerts

4.1.1.16 OcudrDbServiceDown

Table 4-18 OcudrDbServiceDown

Field	Details
Description	Mysql connectivity service is down
Summary	namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : MySQL connectivity service down
Severity	Critical
Condition	Alert if Mysql connectivity is down
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7013 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7011 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7011 (For EIR alert name, see Alert Details)
Metric Used	appinfo_service_running
Recommended Actions	This alert clears when the microservice nudr-drservice is up and running.

Parent topic: System Level Alerts

4.1.1.17 OcudrIngressGatewayProvServiceDown

Table 4-19 OcudrIngressGatewayProvServiceDown

Field	Details
Description	OCUDR Ingress-Gateway service {{$labels.app_kubernetes_io_name}} is down
Summary	namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} : Ingress-gateway service down
Severity	Critical
Condition	Alert if Ingressgateway-prov service is down
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7019 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7017 EIR: NA
Metric Used	app_kubernetes_io_name="ingressgateway-prov"
Recommended Actions	The alert is cleared when the ingress-gateway service is available. Steps: Check the orchestration logs of the ingress-gateway service and check for liveness or readiness probe failures using the following commands: `kubectl get po -n <namespace>` Note the full name of the pod that is not running. It must be used in the following command: `kubectl describe pod <specific desired full pod name> -n <namespace>` Refer the application logs on Kibana and filter based on the ingress-gateway service names. Check for the ERROR WARNING logs related to the thread exceptions. Depending on the failure reason, take the resolution steps. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support. Note: Use the CNC NF Data Collector tool for capturing logs. Refer to NF Data Collector tool user guide for more details.

Parent topic: System Level Alerts

4.1.2 Application Level Alerts

This section lists the application level alerts.

Parent topic: Alert Details

4.1.2.1 OcudrSignallingTrafficRateAboveMajorThreshold

Table 4-20 OcudrSignallingTrafficRateAboveMajorThreshold

Field	Details
Description	'Ingress traffic Rate is above major threshold i.e. 900 requests per second
Summary	'Traffic Rate is above 90 Percent of Max requests per second(1000)'
Severity	Major
Condition	Alert if Ingress traffic reaches 90% of max TPS
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7001 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7001 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7001 (For EIR alert name, see Alert Details
Metric Used	oc_ingressgateway_http_requests_total
Recommended Actions	The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (eg : Mated site OCUDR is unavailable in georedundancy scenario). If this is unexpected, contact My Oracle Support and: Refer Grafana to determine which service is receiving high traffic. Refer Ingress Gateway section in Grafana to determine an increase in 4xx and 5xx error codes. Check Ingress Gateway logs on Kibana to determine the reason for the errors.

Parent topic: Application Level Alerts

4.1.2.2 OcudrSignallingTrafficRateAboveMinorThreshold

Table 4-21 OcudrSignallingTrafficRateAboveMinorThreshold

Field	Details
Description	Ingress traffic rate is above minor threshold i.e. 800 requests per second
Summary	Traffic rate is above 80 Percent of Max requests per second(1000)
Severity	Minor
Condition	Alert if Ingress traffic reaches 80% of max TPS
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7001 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7001 EIR: NA
Metric Used	oc_ingressgateway_http_requests_total
Recommended Actions	The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcudrTrafficRateAboveMinorThreshold alert shall be raised. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic(eg : Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and: Refer Grafana to determine which service is receiving high traffic. Refer Ingress Gateway section in Grafana to determine increase in 4xx and 5xx Error codes. Check Ingress Gateway logs on Kibana to determine the reason for the errors.

Parent topic: Application Level Alerts

4.1.2.3 OcudrSignallingTrafficRateAboveCriticalThreshold

Table 4-22 OcudrSignallingTrafficRateAboveCriticalThreshold

Field	Details
Description	'Ingress traffic Rate is above critical threshold i.e. 950 requests per second
Summary	'Traffic Rate is above 95 Percent of Max requests per second(1000)'
Severity	Critical
Condition	Alert if Ingress traffic reaches 95% of max TPS
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7001 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7001 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7001 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_http_requests_total
Recommended Actions	The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (Example: Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and: Refer Grafana to determine which service is receiving high traffic. Refer Ingress Gateway section in Grafana to determine increase in 4xx and 5xx Error codes. Check Ingress Gateway logs on Kibana to determine the reason for the errors.

Parent topic: Application Level Alerts

4.1.2.4 OcudrSignallingTransactionErrorRateAbove0.1Percent

Table 4-23 OcudrSignallingTransactionErrorRateAbove0.1Percent

Field	Details
Description	Transaction error rate is above 0.1 Percent of Total Transactions
Summary	Transaction Error Rate detected above 0.1 Percent of Total Transactions
Severity	Warning
Condition	Alert if all error rate exceeds 0.1% of the total transactions
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 SLF: NA EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_http_responses_total
Recommended Actions	The alert is cleared when the number of failed transactions is below 0.1 percent of the total transactions or when the number of failed transactions crosses the 1% threshold in which case the OcudrTransactionErrorRateAbove0.1Percent is raised. Steps: Check metrics per service, per method For example, discovery requests can be deduced from these metrics Metrics="oc_ingressgateway_http_responses_total" Method="GET" Status="503 SERVICE_UNAVAILABLE" If guidance is required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.5 OcudrSignallingTransactionErrorRateAbove1Percent

Table 4-24 OcudrSignallingTransactionErrorRateAbove1Percent

Field	Details
Description	'Transaction Error rate is above 1 Percent of Total Transactions
Summary	'Transaction Error Rate detected above 1 Percent of Total Transactions'
Severity	Warning
Condition	Alert if all error rate exceeds 1% of the total transactions
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 SLF: NA EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_http_responses_total
Recommended Actions	The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised. Steps: Check metrics per service, per method For example discovery requests can be deduced from this metrics Metrics="oc_ingressgateway_http_responses_total" Method="GET" Status="503 SERVICE_UNAVAILABLE" If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.6 OcudrSignallingTransactionErrorRateAbove10Percent

Table 4-25 OcudrSignallingTransactionErrorRateAbove10Percent

Field	Details
Description	Transaction error rate is above 10 Percent of Total Transactions
Summary	Transaction Error Rate detected above 10 Percent of Total Transactions
Severity	Minor
Condition	Alert if all error rate exceeds 10% of the total transactions
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 SLF: NA EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_http_responses_total
Recommended Actions	The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised. Steps: Check metrics per service, per method For example discovery requests can be deduced from this metrics Metrics="oc_ingressgateway_http_responses_total" Method="GET" Status="503 SERVICE_UNAVAILABLE" If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.7 OcudrSignallingTransactionErrorRateAbove25Percent

Table 4-26 OcudrSignallingTransactionErrorRateAbove25Percent

Field	Details
Description	Transaction Error Rate detected above 25 Percent of Total Transactions
Summary	Transaction Error Rate detected above 25 Percent of Total Transactions
Severity	Major
Condition	Alert if all error rate exceeds 25% of the total transactions
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7002 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_http_responses_total
Recommended Actions	The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the OcnrfTransactionErrorRateAbove50Percent shall be raised. Steps: Check metrics per service, per method For example discovery requests can be deduced from this metrics Metrics="oc_ingressgateway_http_responses_total" Method="GET" Status="503 SERVICE_UNAVAILABLE" If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.8 OcudrSignallingTransactionErrorRateAbove50Percent

Table 4-27 OcudrSignallingTransactionErrorRateAbove50Percent

Field	Details
Description	Transaction Error Rate detected above 50 Percent of Total Transactions
Summary	Transaction Error Rate detected above 50 Percent of Total Transactions
Severity	Critical
Condition	Alert if all error rate exceeds 50% of the total transactions
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7002 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_http_responses_total
Recommended Actions	The alert is cleared when the number of failure transactions are below 50 percent of the total transactions. Steps: Check metrics per service, per method For example, discovery requests can be deduced from this metrics Metrics="oc_ingressgateway_http_responses_total" Method="GET" Status="503 SERVICE_UNAVAILABLE" If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.9 OcudrXFCCValidationFailureAbove10Percent

Table 4-28 OcudrXFCCValidationFailureAbove10Percent

Field	Details
Description	Total number of response with xfcc validation failure is about 10% of ingress traffic
Summary	Total number of response with xfcc validation failure is about 10% of ingress traffic
Severity	Minor
Condition	Alert if XFCC validation failure is 10% of the total XFCC validations
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7014 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7012 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7012 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_xfcc_header_validate_total
Recommended Actions	The alert is cleared when the number of failure of XFCCValidationFailure are below 10% of the total. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.10 OcudrXFCCValidationFailureAbove20Percent

Table 4-29 OcudrXFCCValidationFailureAbove20Percent

Field	Details
Description	Total number of response with xfcc validation failure is about 20% of ingress traffic
Summary	Total number of response with xfcc validation failure is about 20% of ingress traffic
Severity	Major
Condition	Alert if XFCC validation failure is 20% of the total XFCC validations
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7014 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7012 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7012 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_xfcc_header_validate_total
Recommended Actions	The alert is cleared when the number of failure of XFCCValidationFailure are below 20% of the total. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.11 OcudrXFCCValidationFailureAbove50Percent

Table 4-30 OcudrXFCCValidationFailureAbove50Percent

Field	Details
Description	Total number of response with XFCC validation failure is about 50% of ingress traffic
Summary	Total number of response with XFCC validation failure is about 50% of ingress traffic.
Severity	Critical
Condition	Alert if XFCC validation failure is 50% of the total XFCC validations
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7014 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7012 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7012 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_xfcc_header_validate_total
Recommended Actions	The alert is cleared when the number of failure of XFCCValidationFailure are below 50% of the total. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.12 DRServiceOverload60Percent

Table 4-31 DRServiceOverload60Percent

Field	Details
Description	This alert is fired when the application go to the overload level of Warn level
Summary	This alert is fired when the application go to the overload level of Warn level
Severity	Warning
Condition	Alert If the application overloads at 60%
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7015 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7013 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7013
Metric Used	load_level
Recommended Actions	This alert is cleared when the incoming traffic is reduced to below Warn level. Steps: Check the service specific metrics to understand the specific service request errors. for eg: udr_rest_failure_response_total If guidance required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.13 DRServiceOverload75Percent

Table 4-32 DRServiceOverload75Percent

Field	Details
Description	This alert is fired when the application go to the overload level of Minor level
Summary	This alert is fired when the application go to the overload level of Minor level.
Severity	Minor
Condition	Alert If the application overloads at 75%
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7015 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7013 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7013
Metric Used	load_level
Recommended Actions	This alert is cleared when the incoming traffic is reduced to below Minor level. Steps: Check the service specific metrics to understand the specific service request errors. for eg: udr_rest_failure_response_total If guidance required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.14 DRServiceOverload80Percent

Table 4-33 DRServiceOverload80Percent

Field	Details
Description	This alert is fired when the application go to the overload level of Minor level
Summary	This alert is fired when the application go to the overload level of Minor level
Severity	Major
Condition	Alert If the application overloads at 80%
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7015 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7013 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7013
Metric Used	load_level
Recommended Actions	This alert is cleared when the incoming traffic is reduced to below Major level. Steps: Check the service specific metrics to understand the specific service request errors. for eg: udr_rest_failure_response_total If guidance required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.15 DRServiceOverload90Percent

Table 4-34 DRServiceOverload90Percent

Field	Details
Description	This alert is fired when the application go to the overload level of Minor level
Summary	This alert is fired when the application go to the overload level of Minor level
Severity	Critical
Condition	Alert if the application overloads at 90%
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7015 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7013 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7013
Metric Used	load_level
Recommended Actions	This alert is cleared when the incoming traffic is reduced to below Critical level. Steps: Check the service specific metrics to understand the specific service request errors. for eg: udr_rest_failure_response_total If guidance required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.16 SLFSucessTxnDefaultGroupIdRateAbove1Percent

Table 4-35 SLFSucessTxnDefaultGroupIdRateAbove1Percent

Field	Details
Description	Transaction Error Rate detected above 1 Percent of Total Transactions
Summary	Transaction Error rate is above 1 Percent of Total Transactions
Severity	Warning
Condition	Alert if number of SLF Lookup requests responded with default Group ID exceeds 1% of the total responses.
OID	UDR: NA SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7014 EIR: NA
Metric Used	slf_sucess_txn_default_grp_id_total
Recommended Actions	This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers.

Parent topic: Application Level Alerts

4.1.2.17 SLFSucessTxnDefaultGroupIdRateAbove10Percent

Table 4-36 SLFSucessTxnDefaultGroupIdRateAbove10Percent

Field	Details
Description	Transaction Error Rate detected above 10 Percent of Total Transactions
Summary	Transaction Error rate is above 10 Percent of Total Transactions
Severity	Minor
Condition	Alert if number of SLF Lookup requests responded with default Group ID exceeds 10% of the total responses.
OID	UDR: NA SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7014 EIR: NA
Metric Used	slf_sucess_txn_default_grp_id_total
Recommended Actions	This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers.

Parent topic: Application Level Alerts

4.1.2.18 SLFSucessTxnDefaultGroupIdRateAbove25Percent

Table 4-37 SLFSucessTxnDefaultGroupIdRateAbove25Percent

Field	Details
Description	Transaction Error Rate detected above 25 Percent of Total Transactions
Summary	Transaction Error rate is above 25 Percent of Total Transactions
Severity	Major
Condition	Alert if number of SLF Lookup requests responded with default Group ID exceeds 25% of the total responses.
OID	UDR: NA SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7014 EIR: NA
Metric Used	slf_sucess_txn_default_grp_id_total
Recommended Actions	This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers.

Parent topic: Application Level Alerts

4.1.2.19 SLFSucessTxnDefaultGroupIdRateAbove50Percent

Table 4-38 SLFSucessTxnDefaultGroupIdRateAbove50Percent

Field	Details
Description	Transaction Error Rate detected above 50 Percent of Total Transactions
Summary	Transaction Error rate is above 50 Percent of Total Transactions
Severity	Critical
Condition	Alert if number of SLF Lookup requests responded with default Group ID exceeds 50% of the total responses.
OID	UDR: NA SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7014 EIR: NA
Metric Used	slf_sucess_txn_default_grp_id_total
Recommended Actions	This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers.

Parent topic: Application Level Alerts

4.1.2.20 OcudrDiameterCongestionCongestedState

Table 4-39 OcudrDiameterCongestionCongestedState

Field	Details
Description	Alert will be raised if the diameter gateway pod is in CONGESTED state.
Summary	Alert will be raised if the diameter gateway pod is in CONGESTED state.
Severity	Critical
Condition	Alert will be raised if the diameter gateway pod is in CONGESTED state.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7018 SLF: NA EIR: NA
Metric Used	ocudr_pod_congestion_state = = 2
Recommended Actions	This alert is raised when the Diameter Gateway pod congestion level is set to the CONGESTED state. Steps: Decrease the traffic run or use proper perf resource. Check the pod congestion configurations and resource limit in CNC Console.

Parent topic: Application Level Alerts

4.1.2.21 OcudrDiameterCongestionDocState

Table 4-40 OcudrDiameterCongestionDocState

Field	Details
Description	Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state.
Summary	Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state.
Severity	Major
Condition	Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7018 SLF: NA EIR: NA
Metric Used	ocudr_pod_congestion_state = = 1
Recommended Actions	This alert is raised when the Diameter Gateway pod congestion level is set to the Danger of Congestion (DOC) state. Steps: Decrease the traffic run or use proper perf resource. Check the pod congestion configurations and resource limit in CNC Console.

Parent topic: Application Level Alerts

4.1.2.22 DRProvServiceOverload60Percent

Table 4-41 DRProvServiceOverload60Percent

Field	Details
Description	This alert is fired when the application go to the overload level of Warn level
Summary	This alert is fired when the application go to the overload level of Warn level
Severity	Warning
Condition	Alert If the application overloads at 60%
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7017 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7016 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7015
Metric Used	load_level
Recommended Actions	This alert is cleared when the incoming traffic is reduced to below Warn level. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.23 DRProvServiceOverload75Percent

Table 4-42 DRProvServiceOverload75Percent

Field	Details
Description	This alert is fired when the application go to the overload level of Minor level
Summary	This alert is fired when the application go to the overload level of Minor level
Severity	Minor
Condition	Alert If the application overloads at 75%
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7017 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7016 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7015
Metric Used	load_level
Recommended Actions	This alert is cleared when the incoming traffic is reduced to below Minor level. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.24 DRProvServiceOverload80Percent

Table 4-43 DRProvServiceOverload80Percent

Field	Details
Description	This alert is fired when the application go to the overload level of Major level
Summary	This alert is fired when the application go to the overload level of Major level
Severity	Major
Condition	Alert If the application overloads at 80%
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7017 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7016 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7015
Metric Used	load_level
Recommended Actions	This alert is cleared when the incoming traffic is reduced to below Major level. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.25 DRProvServiceOverload90Percent

Table 4-44 DRProvServiceOverload90Percent

Field	Details
Description	This alert is fired when the application go to the overload level of critical level
Summary	This alert is fired when the application go to the overload level of critical level
Severity	Critical
Condition	Alert If the application overloads at 90%
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7017 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7016 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7015
Metric Used	load_level
Recommended Actions	This alert is cleared when the incoming traffic is reduced to below critical level. Steps: Check the Service specific metrics to understand the specific service request errors. Example: udr_rest_failure_response_total If guidance required, Contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.26 OcudrProvisioningTrafficRateAboveMajorThreshold

Table 4-45 OcudrProvisioningTrafficRateAboveMajorThreshold

Field	Details
Description	Ingress traffic Rate is above critical threshold, that is, 950 requests per second
Summary	Traffic Rate is above 95 Percent of Max requests per second (1000)
Severity	Critical
Condition	Alert if Ingress traffic reaches 95% of max TPS
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7020 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7018 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7017 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_http_requests_total
Recommended Actions	The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in `UDR_Alertrules.yaml`. Steps: Reassess why OCUDR is receiving an additional traffic (for example, Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support. Refer Grafana to determine the service that is recieving high traffic. Refer to the Ingress gateway section in Grafana to determine an increase in 4xx and 5xx Error codes. Check the Ingress gateway logs on Kibana to determine the reason for the errors.

Parent topic: Application Level Alerts

4.1.2.27 OcudrProvisioningTrafficRateAboveCriticalThreshold

Table 4-46 OcudrProvisioningTrafficRateAboveCriticalThreshold

Field	Details
Description	Ingress traffic Rate is above major threshold, that is, 900 requests per second
Summary	Traffic Rate is above 90 Percent of Max requests per second (1000)
Severity	Major
Condition	Alert if Ingress traffic reaches 90% of max TPS
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7020 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7018 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7017 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_http_requests_total
Recommended Actions	The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate exceeds the Critical threshold in which the OcudrTrafficRateAboveMajorThreshold alert is raised. Note: The threshold is configurable in `UDR_Alertrules.yaml`. Steps: Reassess why OCUDR is receiving an additional traffic (for example, Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support. Refer Grafana to determine the service that is recieving high traffic. Refer to the Ingress gateway section in Grafana to determine an increase in 4xx and 5xx Error codes. Check the Ingress gateway logs on Kibana to determine the reason for the errors.

Parent topic: Application Level Alerts

4.1.2.28 OcudrProvisioningTransactionErrorRateAbove25Percent

Table 4-47 OcudrProvisioningTransactionErrorRateAbove25Percent

Field	Details
Description	Transaction Error Rate detected above 25 Percent of Total Transactions
Summary	Transaction Error Rate detected above 25 Percent of Total Transactions
Severity	Major
Condition	Alert if all error rate exceeds 25% of the total transactions
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7021 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7019 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7018 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_http_responses_total
Recommended Actions	The alert is cleared when the number of failure transactions is below 25% of the total transactions or when the number of failure transactions exceeds the 50% threshold in which the OcnrfTransactionErrorRateAbove50Percent is raised. Steps: Check the metrics per service per method, for example, discovery requests can be deduced from these metrics. Metrics="oc_ingressgateway_http_responses_total" Method="GET" Status="503 SERVICE_UNAVAILABLE" If guidance required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.29 OcudrProvisioningTransactionErrorRateAbove50Percent

Table 4-48 OcudrProvisioningTransactionErrorRateAbove50Percent

Field	Details
Description	Transaction Error Rate detected above 50 Percent of Total Transactions
Summary	Transaction Error Rate detected above 50 Percent of Total Transactions
Severity	Critical
Condition	Alert if all error rate exceeds 50% of the total transactions
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7021 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7019 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7018 (For EIR alert name, see Alert Details)
Metric Used	oc_ingressgateway_http_responses_total
Recommended Actions	The alert is cleared when the number of failure transactions is below 50 percent of the total transactions. Steps: Check the metrics per service per method, for example, discovery requests can be deduced from these metrics. Metrics="oc_ingressgateway_http_responses_total" Method="GET" Status="503 SERVICE_UNAVAILABLE" If guidance required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.30 PVCFullForSLFExport

Table 4-49 PVCFullForSLFExport

Field	Details
Description	Storage for Export tool is full
Summary	Storage for Export tool is full
Severity	Critical
Condition	Alert if PVC allocated for export tool dump path is full
OID	UDR: NA SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7020 EIR: NA
Metric Used	export_tool_full_usage
Recommended Actions	Alert will be cleared when the PVC usage is optimized. Configure maxDumps to lower value to clear old dumps. Remove old dumps, if any from the export tool container.

Parent topic: Application Level Alerts

4.1.2.31 FailedExtractForSLFExport

Table 4-50 FailedExtractForSLFExport

Field	Details
Description	Export tool job is failed
Summary	Export tool job is failed
Severity	Critical
Condition	Alert of the export operation fails
OID	UDR: NA SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7021 EIR: NA
Metric Used	export_failure
Recommended Actions	Check logs for failure. The alert will be cleared when the export job succeeds next time.

Parent topic: Application Level Alerts

4.1.2.32 BulkImportTransferInFailed

Table 4-51 BulkImportTransferInFailed

Field	Details
Description	Transfer-in failed for bulk import
Summary	Transfer-in failed for bulk import
Severity	Major
Condition	Alert will be raised, if Transfer-In failed from Remote to PVC
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7022 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7022 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7019
Metric Used	bulkimport_transfer_in_status
Recommended Actions	This alert is cleared when the transfer-in is success from bulk import. Steps Check the service specific metrics to understand the specific service request errors. For example, udr_rest_failure_response_total. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.33 ExportToolTransferOutFailed

Table 4-52 ExportToolTransferOutFailed

Field	Details
Description	Transfer-out failed for export-tool
Summary	Transfer-out failed for export-tool"
Severity	Major
Condition	Alert will be raised if Transfer-Out failed from PVC to Remote
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7024 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7024 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7021
Metric Used	sftp_transfer_status
Recommended Actions	This alert is cleared when the transfer-out is success from export tool. Steps Check the service specific metrics to understand the specific service request errors.. For example, udr_rest_failure_response_total. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.34 BulkImportTransferOutFailed

Table 4-53 BulkImportTransferOutFailed

Field	Details
Description	Transfer-out failed for bulk import
Summary	Transfer-out failed for bulk import
Severity	Major
Condition	Alert will be raised if Transfer-Out failed from PVC to Remote
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7023 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7023 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7020
Metric Used	bulkimport_transfer_out_status
Recommended Actions	This alert is cleared when the transfer-out is success from bulk import. Steps Check the service specific metrics to understand the specific service request errors. For example, udr_rest_failure_response_total. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.35 PVCFullForXMLBulkImport

Table 4-54 PVCFullForXMLBulkImport

Field	Details
Description	Storage for XML Bulk Import tool is full
Summary	Storage for XML Bulk Import tool is full
Severity	Critical
Condition	Alert will be raised if the PVC is full for xml-csv container
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7025 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7025 EIR: NA
Metric Used	nudr_bulk_import_tool_pvc_full_usage{app_kubernetes_io_name="nudr-xmltocsv",kubernetes_namespace="ocudr"}==1
Recommended Actions	This alert will be cleared when the PVC is back to normal. Steps: Check the service specific metrics to understand the specific service request errors. For example, udr_rest_failure_response_total. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.36 PVCFullForBulkImport

Table 4-55 PVCFullForBulkImport

Field	Details
Description	Storage for Bulk Import tool is full
Summary	Storage for Bulk Import tool is full
Severity	Critical
Condition	Alert will be raised if the PVC is full for bulk import container
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7026 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7026 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7025
Metric Used	nudr_bulk_import_tool_pvc_full_usage{app_kubernetes_io_name="nudr-bulk-import",kubernetes_namespace="ocudr"}==1
Recommended Actions	This alert will be cleared when the PVC is back to normal. Steps: Check the service specific metrics to understand the specific service request errors. For example, udr_rest_failure_response_total. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.37 OperationalStatusCompleteShutdown

Table 4-56 OperationalStatusCompleteShutdown

Field	Details
Description	Operational state is control shutdown
Summary	Operational state is control shutdown
Severity	Critical
Condition	Alert will be raised if the opertational state of the UDR, SLF, or EIR is COMPLETE_SHUTDOWN
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7027 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7027 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7026
Metric Used	nudr_config_operational_status{kubernetes_namespace="ocudr"}==1
Recommended Actions	This alert will be cleared when the operational status is back to normal. Steps: Check the service specific metrics to understand the specific service request errors. For example, udr_rest_failure_response_total. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.38 NFScoreCalculationFailed

Table 4-57 NFScoreCalculationFailed

Field	Details
Description	NFScoreCalculationFailed
Summary	NFScoreCalculationFailed
Severity	Major
Condition	Alert is raised if the NF Score calculation are failed for any of the scoring factors
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7028 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7028 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7027
Metric Used	nfscore{kubernetes_namespace="ocudr" ,factor=~"successTPS\|signallingConnections\|serviceHealth\|replicationHealth\|localityPreference\|bulkImport\|bulkExport",calculatedStatus="failed"}
Recommended Actions	This alert is cleared when the NF score calculation is successful. Steps: Check the service specific metrics to understand the specific service request errors. For example, udr_rest_failure_response_total. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.39 PVCFullForUDRExport

Table 4-58 PVCFullForUDRExport

Field	Details
Description	Storage for Export tool is full
Summary	Storage for Export tool is full
Severity	Critical
Condition	Alert is raised if PVC allocated for export tool dump path is full.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7030 SLF: NA EIR: NA
Metric Used	export_tool_full_usage{namespace="ocudr"}==1
Recommended Actions	Alert is cleared when the PVC usage is optimized. You must configure maxDumps to a lower value to clear old dumps. Steps: If present, remove the old dumps from the export tool container.

Parent topic: Application Level Alerts

4.1.2.40 UDRExportFailed

Table 4-59 UDRExportFailed

Field	Details
Description	Export tool job is failed
Summary	Export tool job is failed
Severity	Critical
Condition	Alert is raised if the export operation fails for UDR Mode
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7031 SLF: NA EIR: NA
Metric Used	export_failure{namespace="ocudr"}== 1
Recommended Actions	You must check the logs for failure. When the next export job is successful the alert is cleared.

Parent topic: Application Level Alerts

4.1.2.41 IngressgatewayPodProtectionDocState

Table 4-60 IngressgatewayPodProtectionDocState

Field	Details
Description	Ingress congestion in Doc state
Summary	Ingress congestion Doc state
Severity	Critical
Condition	Alert is raised if Ingress congestion is in doc state.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7032 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7029 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7028
Metric Used	oc_ingressgateway_pod_congestion_state{namespace="ocudr"}==1
Recommended Actions	This alert will be cleared when the ingress gateway comes to normal state. Steps: Check the service specific metrics to understand the specific service request errors. For example, udr_rest_failure_response_total. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.42 IngressgatewayPodProtectionCongestedState

Table 4-61 IngressgatewayPodProtectionCongestedState

Field	Details
Description	Ingress congestion in Congested state
Summary	Ingress congestion in Congested state
Severity	Critical
Condition	Alert is raised if ingress congestion is in congested state.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7033 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7030 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7029
Metric Used	oc_ingressgateway_pod_congestion_state{namespace="ocudr"}==2
Recommended Actions	This alert will be cleared when the ingress gateway comes to normal state. Steps: Check the service specific metrics to understand the specific service request errors. For example, udr_rest_failure_response_total. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.43 RetryNotificationRecordsMaxLimitExceeded

Table 4-62 RetryNotificationRecordsMaxLimitExceeded

Field	Details
Description	Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit.
Summary	Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit.
Severity	Critical
Condition	Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit.
OID:	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7036 SLF: NA EIR: NA
Metric Used	nudr_notif_records_limit_exceeded{namespace="ocudr"}==1
Recommended Actions	This alert is raised when there are more notification failures and the retry notifications stored in database is more than 50k. Steps: Check the notification failure rate and fix the reason for failures. This reduces the number of notifications marked for retry that is stored in UDR database. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.44 UserAgentHeaderNotFoundMorethan10PercentRequest

Table 4-63 UserAgentHeaderNotFoundMorethan10PercentRequest

Field	Details
Description	Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic when suppress notification feature is enabled.
Summary	Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic when suppress notification feature is enabled.
Severity	Critical
Condition	Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7035 SLF: NA EIR: NA
Metric Used	(sum by(namespace)(rate(suppress_user_agent_not_found_total{namespace="ocudr"}[5m]))/sum by(namespace)(rate(oc_ingressgateway_http_requests_total{namespace="ocudr"}[5m])))*100 >= 10
Recommended Actions	This alert is cleared if the total number of requests not having User-Agent header is less than 10% of ingress traffic. Steps: Check the service specific metrics to understand the specific service request errors. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.45 EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold

Table 4-64 EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold

Field	Details
Description	Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit.
Summary	Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit.
Severity	Minor
Condition	Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7034 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7034 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7034
Metric Used	sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".egress."}) >= 1300000000
Recommended Actions	This alert is cleared if the egress gateway JVM buffer memory is below the minor threshold limit. Steps: Check the reason for egress gateway JVM buffer memory is above the threshold limit. and why it is not clearing sufficient memory by itself to reach below the threshold limit. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.46 EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold

Table 4-65 EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold

Field	Details
Description	Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit.
Summary	Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit.
Severity	Major
Condition	Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7034 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7034 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7034
Metric Used	sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".egress."}) >= 1500000000
Recommended Actions	This alert is cleared if the egress gateway JVM buffer memory is below the major threshold limit. Steps: Check the reason for egress gateway JVM buffer memory is above the threshold limit. and why it is not clearing sufficient memory by itself to reach below the threshold limit. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.47 EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold

Table 4-66 EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold

Field	Details
Description	Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit.
Summary	Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit.
Severity	Critical
Condition	Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7034 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7034 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7034
Metric Used	sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".egress."}) >= 1800000000
Recommended Actions	This alert is cleared if the egress gateway JVM buffer memory is below the critical threshold limit. Steps: Check the reason for egress gateway JVM buffer memory is above the threshold limit. and why it is not clearing sufficient memory by itself to reach below the threshold limit. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.48 NudrDiameterGatewayDown

Table 4-67 NudrDiameterGatewayDown

Field	Details
Description	Alert will be raised if Nudr-diam-gateway service is down.
Summary	Alert will be raised if Nudr-diam-gateway service is down.
Severity	Critical
Condition	Alert will be raised if Nudr-diam-gateway service is down.
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7037 SLF: NA EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7037
Metric Used	absent(up{container="nudr-diam-gateway",namespace="ocudr"}) or up{container="nudr-diam-gateway",namespace="ocudr"} == 0
Recommended Actions	This alert is cleared when the NudrDiamGateway service is available. Steps: Run the following command to check the orchestration logs of appinfo service and check for liveness or readiness probe failures. `kubectl get po -n <namespace>` Run the following command using the full name of the pod that is not running. `kubectl describe pod <specific desired full pod name> -n <namespace>` Refer the application logs on Kibana and filter based on the appinfo service names. Check for `ERROR WARNING` logs related to thread exceptions. Perform the resolution steps depending on the reason for failure. Contact My Oracle Support, if guidance is required. Note: Use CNC NF Data Collector tool for capturing logs. For more information, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Parent topic: Application Level Alerts

4.1.2.49 DiameterPeerConnectionsDropped

Table 4-68 DiameterPeerConnectionsDropped

Field	Details
Description	Alert will be raised if there are no connections between diameter peer and diameter gateway.
Summary	Alert will be raised if there are no connections between diameter peer and diameter gateway.
Severity	Major
Condition	Alert will be raised if there are no connections between diameter peer and diameter gateway.
OID:	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7029 SLF: NA EIR: NA
Metric Used	sum(ocudr_diam_conn_network{origHost=~".CHI.",container="nudr-diam-gateway",namespace="ocudr"} or vector(0))< 2 or sum(ocudr_diam_conn_network{origHost=~".IND.",container="nudr-diam-gateway",namespace="ocudr"} or vector(0)) < 2 or (sum(ocudr_diam_conn_network{origHost=~".CHI.",container="nudr-diam-gateway",kubernetes_namespace="ocudr"} or vector(0)) + sum(ocudr_diam_conn_network{origHost=~".IND.",container="nudr-diam-gateway",namespace="ocudr"}) or vector(0)) < 5
Recommended Actions	This alert is cleared when the NudrDiamGateway service is available. Steps: Run the following command to check the orchestration logs of appinfo service and check for liveness or readiness probe failures. `kubectl get po -n <namespace>` Run the following command using the full name of the pod that is not running. `kubectl describe pod <specific desired full pod name> -n <namespace>` Refer the application logs on Kibana and filter based on the appinfo service names. Check for `ERROR WARNING` logs related to thread exceptions. Perform the resolution steps depending on the reason for failure. Contact My Oracle Support, if guidance is required. Note: Use CNC NF Data Collector tool for capturing logs. For more information, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Parent topic: Application Level Alerts

4.1.2.50 IGWSignallingPodProtectionDOCState

Table 4-69 IGWSignallingPodProtectionDOCState

Field	Details
Description	Alert will be raised when the ingress gateway signaling traffic at DOC State.
Summary	Alert will be raised when the ingress gateway signaling traffic at DOC State.
Severity	Major
Condition	Alert will be raised when the ingress gateway signaling traffic at DOC State.
OID:	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7038 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7038 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7038
Metric Used	sum({namespace="ocudr",container="ingressgateway-sig"}) by (pod) == 2
Recommended Actions	This alert is cleared when the signaling traffic reaches NORMAL state. Steps: Check the service specific metrics to for the specific service request errors. For example, `oc_ingressgateway_congestion_system_state`. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.51 IGWSignallingPodProtectionCongestedState

Table 4-70 IGWSignallingPodProtectionCongestedState

Field	Details
Description	Alert will be raised when the ingress gateway signaling traffic at Congested State.
Summary	Alert will be raised when the ingress gateway signaling traffic at Congested State.
Severity	Critical
Condition	Alert will be raised when the ingress gateway signaling traffic at Congested State.
OID:	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7038 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7038 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7038
Metric Used	sum(oc_ingressgateway_congestion_system_state{namespace="ocudr",container="ingressgateway-sig"}) by (pod) == 3
Recommended Actions	This alert is cleared when the signaling traffic reaches NORMAL or DOC state. Steps: Check the service specific metrics to for the specific service request errors. For example, `oc_ingressgateway_congestion_system_state`. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.52 IGWSignallingPodProtectionByRateLimitRejectedRequest

Table 4-71 IGWSignallingPodProtectionByRateLimitRejectedRequest

Field	Details
Description	Alert will be raised when total rejections crossed more than 1% traffic of the total incoming traffic.
Summary	Alert will be raised when total rejections crossed more than 1% traffic of the total incoming traffic.
Severity	Critical
Condition	Alert will be raised when total rejections crossed more than 1% traffic of the total incoming traffic.
OID:	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7039 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7039 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7039
Metric Used	(sum (rate(oc_ingressgateway_http_request_ratelimit_denied_count_total{Action="REJECT",namespace="ocudr"}[2m]) or (up * 0 ) ) )/ sum(rate(oc_ingressgateway_http_requests_total{container="ingressgateway-sig",namespace="ocudr"}[2m])) * 100 >= 1
Recommended Actions	This alert is cleared when the when rejection is reduced less than 1% of the total traffic. Steps: Check the service specific metrics to for the specific service request errors. For example, `oc_ingressgateway_congestion_system_state`. Contact My Oracle Support, if guidance is required.

Parent topic: Application Level Alerts

4.1.2.53 DRServiceRequestLatencyMajor

Table 4-72 DRServiceRequestLatencyMajor

Field	Details
Description	DR service request latency is more than 100ms
Summary	DR service request latency is above 100ms
Severity	Major
Condition	Alert will be raised when DR service request latency exceeds 100ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7046 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7046 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7046
Metric Used	histogram_quantile(95 / 100, sum by(le) (rate(udr_request_processing_time_seconds_bucket{namespace="ocudr",container="nudr-drservice"}[5m])))*1000 >= 100 < 250
Recommended Actions	The alert is cleared when DR service latency falls below 100ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.54 DRServiceRequestLatencyCritical

Table 4-73 DRServiceRequestLatencyCritical

Field	Details
Description	DR service request latency is more than 250ms
Summary	DR service request latency is above 250ms
Severity	Critical
Condition	Alert will be raised when DR service request latency exceeds 250ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7046 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7046 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7046
Metric Used	histogram_quantile(95 / 100, sum by(le) (rate(udr_request_processing_time_seconds_bucket{namespace="ocudr",container="nudr-drservice"}[5m])))*1000 >= 250
Recommended Actions	The alert is cleared when DR service latency falls below 250ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.55 DRServiceDBLatencyMajor

Table 4-74 DRServiceDBLatencyMajor

Field	Details
Description	DR service DB latency is more than 25ms
Summary	DR service DB latency is above 25ms
Severity	Major
Condition	Alert will be raised when DR service DB latency exceeds 25ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7047 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7047 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7047
Metric Used	histogram_quantile(95 / 100, sum by(le) (rate(udr_db_processing_time_seconds_bucket{namespace="ocudr",container="nudr-drservice"}[5m])))*1000 >= 25 < 50
Recommended Actions	The alert is cleared when DR service DB latency falls below 25ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.56 DRServiceDBLatencyCritical

Table 4-75 DRServiceDBLatencyCritical

Field	Details
Description	DR service DB latency is more than 50ms
Summary	DR service DB latency is above 50ms
Severity	Critical
Condition	Alert will be raised when DR service DB latency exceeds 50ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7047 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7047 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7047
Metric Used	histogram_quantile(95 / 100, sum by(le) (rate(udr_db_processing_time_seconds_bucket{namespace="ocudr",container="nudr-drservice"}[5m])))*1000 >= 50
Recommended Actions	The alert is cleared when DR service DB latency falls below 50ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.57 IGWSignallingTotalAvgLatencyMajor

Table 4-76 IGWSignallingTotalAvgLatencyMajor

Field	Details
Description	IGW signalling average latency is more than 250ms
Summary	IGW signalling average latency is above 250ms
Severity	Major
Condition	Alert will be fired when IGW signalling average latency exceeds 250ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7048 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7048 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7048
Metric Used	((sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ) + (sum(irate(oc_ingressgateway_request_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_request_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ) + (sum(irate(oc_ingressgateway_response_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_response_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ))*1000 >= 250 < 500
Recommended Actions	The alert is cleared when IGW signalling average latency falls below 250ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.58 IGWSignallingTotalAvgLatencyCritical

Table 4-77 IGWSignallingTotalAvgLatencyCritical

Field	Details
Description	IGW signalling average latency is more than 500ms
Summary	IGW signalling average latency is above 500ms
Severity	Critical
Condition	Alert will be fired when IGW signalling average latency exceeds 500ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7048 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7048 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7048
Metric Used	((sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ) + (sum(irate(oc_ingressgateway_request_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_request_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ) + (sum(irate(oc_ingressgateway_response_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_response_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ))*1000 >= 500
Recommended Actions	The alert is cleared when IGW signalling average latency falls below 500ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.59 DRProvServiceRequestLatencyMajor

Table 4-78 DRProvServiceRequestLatencyMajor

Field	Details
Description	DR provisioning service request latency is more than 100ms
Summary	DR provisioning service request latency is above 100ms
Severity	Major
Condition	Alert will be raised when DR provisioning service request latency exceeds 100ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7049 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7049 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7049
Metric Used	histogram_quantile(95 / 100, sum by(le) (rate(udr_request_processing_time_seconds_bucket{namespace="ocudr",container="nudr-dr-provservice"}[5m])))*1000 >= 100 < 250
Recommended Actions	The alert is cleared when DR provisioning service latency falls below 100ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.60 DRProvServiceRequestLatencyCritical

Table 4-79 DRProvServiceRequestLatencyCritical

Field	Details
Description	DR provisioning service request latency is more than 250ms
Summary	DR provisioning service request latency is above 250ms
Severity	Critical
Condition	Alert will be raised when DR provisioning service request latency exceeds 250ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7049 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7049 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7049
Metric Used	histogram_quantile(95 / 100, sum by(le) (rate(udr_request_processing_time_seconds_bucket{namespace="ocudr",container="nudr-dr-provservice"}[5m])))*1000 >= 250
Recommended Actions	The alert is cleared when DR provisioning service latency falls below 250ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.61 DRProvServiceDBLatencyMajor

Table 4-80 DRProvServiceDBLatencyMajor

Field	Details
Description	DR provisioning service DB latency is more than 25ms
Summary	DR provisioning service DB latency is above 25ms
Severity	Major
Condition	Alert will be raised when DR provisioning service DB latency exceeds 25ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7050 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7050 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7050
Metric Used	histogram_quantile(95 / 100, sum by(le) (rate(udr_db_processing_time_seconds_bucket{namespace="ocudr",container="nudr-dr-provservice"}[5m])))*1000 >= 25 < 50
Recommended Actions	The alert is cleared when DR provisioning service DB latency falls below 25ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.62 DRProvServiceDBLatencyCritical

Table 4-81 DRProvServiceDBLatencyCritical

Field	Details
Description	DR provisioning service DB latency is more than 50ms
Summary	DR provisioning service DB latency is above 50ms
Severity	Critical
Condition	Alert will be raised when DR provisioning service DB latency exceeds 50ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7050 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7050 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7050
Metric Used	histogram_quantile(95 / 100, sum by(le) (rate(udr_db_processing_time_seconds_bucket{namespace="ocudr",container="nudr-dr-provservice"}[5m])))*1000 >= 50
Recommended Actions	The alert is cleared when DR provisioning service DB latency falls below 50ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.63 IGWProvisioningTotalAvgLatencyMajor

Table 4-82 IGWProvisioningTotalAvgLatencyMajor

Field	Details
Description	IGW provisioning average latency is more than 250ms
Summary	IGW provisioning average latency is above 250ms
Severity	Major
Condition	Alert will be fired when IGW provisioning average latency exceeds 250ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7051 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7051 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7051
Metric Used	((sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ) + (sum(irate(oc_ingressgateway_request_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_request_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ) + (sum(irate(oc_ingressgateway_response_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_response_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ))*1000 >= 250 < 500
Recommended Actions	The alert is cleared when IGW provisioning average latency falls below 250ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts

4.1.2.64 IGWProvisioningTotalAvgLatencyCritical

Table 4-83 IGWProvisioningTotalAvgLatencyCritical

Field	Details
Description	IGW provisioning average latency is more than 500ms
Summary	IGW provisioning average latency is above 500ms
Severity	Critical
Condition	Alert will be fired when IGW provisioning average latency exceeds 500ms
OID	UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7051 SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7051 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7051
Metric Used	((sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ) + (sum(irate(oc_ingressgateway_request_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_request_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ) + (sum(irate(oc_ingressgateway_response_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_response_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ))*1000 >= 500
Recommended Actions	The alert is cleared when IGW provisioning average latency falls below 500ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support.

Parent topic: Application Level Alerts