4 Alert Configuration
This section describes how to configure alert rules for the UDR. It provides guidance on setting up measurement-based alert rules, where the alerting system evaluates metrics reported by UDR microservices against specified rule conditions to generate alerts as needed. UDR alert rules are configured based on metrics reported by UDR components. The alerting workflow monitors these metrics and issues notifications when the defined conditions are met. For more information about configuring UDR alerts in Prometheus, see the “Alert Configuration” section in Oracle Communications Cloud Native Core, Unified Data Repository Installation, Upgrade, and Fault Recovery Guide.
4.1 Alert Details
This section describes alerts in detail.
Note:
Max Ingress requests/sec in consideration is 1000/second.Table 4-1 Alerts Levels or Severity Types
| Alerts Levels / Severity Types | Definition |
|---|---|
| Critical | Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of UDR. |
| Major | Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of UDR. |
| Minor | Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of UDR. |
| Info or Warn (Informational) | Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of UDR. |
The below table provides alert names for UDR and EIR.
Table 4-2 Alert names for UDR/SLF and EIR
| UDR/SLF | EIR |
|---|---|
| OcudrTrafficRateAboveMajorThreshold | OceirTrafficRateAboveMajorThreshold |
| OcudrTrafficRateAboveMinorThreshold | OceirTrafficRateAboveMinorThreshold |
| OcudrTrafficRateAboveCriticalThreshold | OceirTrafficRateAboveCriticalThreshold |
| OcudrTransactionErrorRateAbove0.1Percent | OceirTransactionErrorRateAbove0.1Percent |
| OcudrTransactionErrorRateAbove1Percent | OceirTransactionErrorRateAbove1Percent |
| OcudrTransactionErrorRateAbove10Percent | OceirTransactionErrorRateAbove10Percent |
| OcudrTrafficRateAboveCriticalThreshold | OceirTrafficRateAboveCriticalThreshold |
| OcudrTrafficRateAboveMajorThreshold | OceirTrafficRateAboveMajorThreshold |
| OcudrTrafficRateAboveMinorThreshold | OceirTrafficRateAboveMinorThreshold |
| OcudrTransactionErrorRateAbove0.1Percent | OceirTransactionErrorRateAbove0.1Percent |
| OcudrTransactionErrorRateAbove1Percent | OceirTransactionErrorRateAbove1Percent |
| OcudrTransactionErrorRateAbove10Percent | OceirTransactionErrorRateAbove10Percent |
| OcudrTransactionErrorRateAbove25Percent | OceirTransactionErrorRateAbove25Percent |
| OcudrTransactionErrorRateAbove50Percent | OceirTransactionErrorRateAbove50Percent |
| OcudrSubscriberNotFoundAbove1Percent | OceirSubscriberNotFoundAbove1Percent |
| OcudrSubscriberNotFoundAbove10Percent | OceirSubscriberNotFoundAbove10Percent |
| OcudrSubscriberNotFoundAbove25Percent | OceirSubscriberNotFoundAbove25Percent |
| OcudrSubscriberNotFoundAbove50Percent | OceirSubscriberNotFoundAbove50Percent |
| OcudrPodsRestart | OceirPodsRestart |
| NudrServiceDown | NudrServiceDown |
| NudrProvServiceDown | NudrProvServiceDown |
| NudrNotifyServiceServiceDown | NA |
| NudrNRFClientServiceDown | NudrNRFClientServiceDown |
| NudrConfigServiceDown | NudrConfigServiceDown |
| NudrDiameterProxyServiceDown | NudrDiameterProxyServiceDown |
| NudrOnDemandMigrationServiceDown | NA |
| OcudrIngressGatewayServiceDown | OceirIngressGatewayServiceDown |
| OcudrEgressGatewayServiceDown | OceirEgressGatewayServiceDown |
| OcudrDbServiceDown | OceirDbServiceDown |
| OcudrXFCCValidationFailureAbove10Percent | OceirXFCCValidationFailureAbove10Percent |
| OcudrXFCCValidationFailureAbove20Percent | OceirXFCCValidationFailureAbove20Percent |
| OcudrXFCCValidationFailureAbove50Percent | OceirXFCCValidationFailureAbove50Percent |
| DRServiceOverload60Percent | DRServiceOverload60Percent |
| DRServiceOverload75Percent | DRServiceOverload75Percent |
| DRServiceOverload80Percent | DRServiceOverload80Percent |
| DRServiceOverload90Percent | DRServiceOverload90Percent |
| SLFSucessTxnDefaultGroupIdRateAbove1Percent | NA |
| SLFSucessTxnDefaultGroupIdRateAbove10Percent | NA |
| SLFSucessTxnDefaultGroupIdRateAbove25Percent | NA |
| SLFSucessTxnDefaultGroupIdRateAbove50Percent | NA |
| OcudrDiameterCongestionCongestedState | OceirDiameterCongestionCongestedState |
| OcudrDiameterCongestionDocState | OceirDiameterCongestionDocState |
| DRProvServiceOverload60Percent | DRProvServiceOverload60Percent |
| DRProvServiceOverload75Percent | DRProvServiceOverload75Percent |
| DRProvServiceOverload80Percent | DRProvServiceOverload80Percent |
| DRProvServiceOverload90Percent | DRProvServiceOverload90Percent |
| OcudrIngressGatewayProvServiceDown | OceirIngressGatewayProvServiceDown |
| OcudrProvisioningTrafficRateAboveMajorThreshold | OceirProvisioningTrafficRateAboveMajorThreshold |
| OcudrProvisioningTrafficRateAboveCriticalThreshold | OceirProvisioningTrafficRateAboveCriticalThreshold |
| OcudrProvisioningTransactionErrorRateAbove25Percent | OceirProvisioningTransactionErrorRateAbove25Percent |
| OcudrProvisioningTransactionErrorRateAbove50Percent | OceirProvisioningTransactionErrorRateAbove50Percent |
| PVCFullForSLFExport | NA |
| FailedExtractForSLFExport | NA |
| BulkImportTransferInFailed | BulkImportTransferInFailed |
| BulkImportTransferOutFailed | BulkImportTransferOutFailed |
| ExportToolTransferOutFailed | ExportToolTransferOutFailed |
| PVCFullForXMLBulkImport | PVCFullForXMLBulkImport |
| PVCFullForBulkImport | PVCFullForBulkImport |
| OperationalStatusCompleteShutdown | OperationalStatusCompleteShutdown |
| NFScoreCalculationFailed | NFScoreCalculationFailed |
| PVCFullForUDRExport | NA |
| UDRExportFailed | NA |
| IngressgatewayPodProtectionDocState | IngressgatewayPodProtectionDocState |
| IngressgatewayPodProtectionCongestedState | IngressgatewayPodProtectionCongestedState |
| RetryNotificationRecordsMaxLimitExceeded | RetryNotificationRecordsMaxLimitExceeded |
| UserAgentHeaderNotFoundMorethan10PercentRequest | NA |
| EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold | EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold |
| EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold | EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold |
| EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold | EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold |
| NudrDiameterGatewayDown | NudrDiameterGatewayDown |
| DiameterPeerConnectionsDropped | DiameterPeerConnectionsDropped |
| IGWSignallingPodProtectionDOCState | NA |
| IGWSignallingPodProtectionCongestedState | NA |
| IGWSignallingPodProtectionByRateLimitRejectedRequest | NA |
Note:
For the following alert details, only UDR alerts names are provided. The corresponding EIR alert names can be found in Table 4-2.Parent topic: Alert Configuration
4.1.1 System Level Alerts
This section lists the system level alerts.
- OcudrSubscriberNotFoundAbove1Percent
- OcudrSubscriberNotFoundAbove10Percent
- OcudrSubscriberNotFoundAbove25Percent
- OcudrSubscriberNotFoundAbove50Percent
- OcudrNfStatusUnavailable
- OcudrPodsRestart
- NudrServiceDown
- NudrProvServiceDown
- NudrNotifyServiceServiceDown
- NudrNRFClientServiceDown
- NudrConfigServiceDown
- NudrDiameterProxyServiceDown
- NudrOnDemandMigrationServiceDown
- OcudrIngressGatewayServiceDown
- OcudrEgressGatewayServiceDown
- OcudrDbServiceDown
- OcudrIngressGatewayProvServiceDown
Parent topic: Alert Details
4.1.1.1 OcudrSubscriberNotFoundAbove1Percent
Table 4-3 OcudrSubscriberNotFoundAbove1Percent
| Field | Details |
|---|---|
| Description | Total number of response if subscriber not found is about 1% of ingress traffic |
| Summary | Total number of response if subscriber not found is about 1% of ingress traffic |
| Severity | Warning |
| Condition | Alert if number of subscribers not found is 1% of all ingress traffic |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7003
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7003 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7003 (For EIR alert name, see Alert Details) |
| Metric Used | udr_subscriber_not_found_total |
| Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 1% of the total. Steps:
|
Parent topic: System Level Alerts
4.1.1.2 OcudrSubscriberNotFoundAbove10Percent
Table 4-4 OcudrSubscriberNotFoundAbove10Percent
| Field | Details |
|---|---|
| Description | Total number of response if subscriber not found is about 10% of ingress traffic |
| Summary | Total number of response if subscriber not found is about 10% of ingress traffic |
| Severity | Minor |
| Condition | Alert if number of subscribers not found is 10% of all ingress traffic |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7003
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7003 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7003 (For EIR alert name, see Alert Details) |
| Metric Used | udr_subscriber_not_found_total |
| Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 10% of the total. Steps:
|
Parent topic: System Level Alerts
4.1.1.3 OcudrSubscriberNotFoundAbove25Percent
Table 4-5 OcudrSubscriberNotFoundAbove25Percent
| Field | Details |
|---|---|
| Description | Total number of response if subscriber not found is about 25% of ingress traffic |
| Summary | Total number of response if subscriber not found is about 25% of ingress traffic |
| Severity | Major |
| Condition | Alert if number of subscribers not found is 25% of all ingress traffic |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7003
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7003 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7003 (For EIR alert name, see Alert Details) |
| Metric Used | udr_subscriber_not_found_total |
| Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 25% of the total. Steps:
|
Parent topic: System Level Alerts
4.1.1.4 OcudrSubscriberNotFoundAbove50Percent
Table 4-6 OcudrSubscriberNotFoundAbove50Percent
| Field | Details |
|---|---|
| Description | Total number of response if subscriber not found is about 50% of ingress traffic |
| Summary | Total number of response if subscriber not found is about 50% of ingress traffic |
| Severity | Critical |
| Condition | Alert if number of subscribers not found is 50% of all ingress traffic |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7003
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7003 EIR: NA |
| Metric Used | udr_subscriber_not_found_total |
| Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 50% of the total. Steps:
|
Parent topic: System Level Alerts
4.1.1.5 OcudrNfStatusUnavailable
Table 4-7 OcudrNfStatusUnavailable
| Field | Details |
|---|---|
| Description | OCUDR services unavailable |
| Summary | OCUDR services unavailable |
| Severity | Critical |
| Condition | This alert is triggered if OCUDR services are unavailable. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7004
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7004 |
| Metric Used | absent(up{app_kubernetes_io_part_of="ocudr",kubernetes_namespace="ocudr"}) or sum(up{app_kubernetes_io_part_of="ocudr",kubernetes_namespace="ocudr"}) == 0 |
| Recommended Actions | The alert is cleared when all the OCUDR Services will be available.
Steps:
|
Parent topic: System Level Alerts
4.1.1.6 OcudrPodsRestart
Table 4-8 OcudrPodsRestart
| Field | Details |
|---|---|
| Description | Pod {{$labels.pod}} has restarted. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A Pod has restarted |
| Severity | Major |
| Condition | Alert if any of the pod got restarted |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7005
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7005 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7005 (For EIR alert name, see Alert Details) |
| Metric Used | kube_pod_container_status_restarts_total |
| Recommended Actions |
The alert is cleared automatically if the specific pod is up. Steps:
|
Parent topic: System Level Alerts
4.1.1.7 NudrServiceDown
Table 4-9 NudrServiceDown
| Field | Details |
|---|---|
| Description | OCUDR Nudr_DRService {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : DR Service is down |
| Severity | Critical |
| Condition | Alert if Nudr-dr service is down |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7006
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7006 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7006 |
| Metric Used | app_kubernetes_io_name="nudr-drservice |
| Recommended Actions |
The alert is cleared when the NudrService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.8 NudrProvServiceDown
Table 4-10 NudrProvServiceDown
| Field | Details |
|---|---|
| Description | OCUDR Nudr_DR_PROVService {{$labels.app_kubernetes_io_name}} is down |
| Summary | 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : DR Prov Service is down' |
| Severity | Critical |
| Condition | Alert if Nudr-dr service is down |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7016
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7015 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7014 |
| Metric Used | app_kubernetes_io_name="nudr-dr-provservice |
| Recommended Actions |
The alert is cleared when the NudrProvService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.9 NudrNotifyServiceServiceDown
Table 4-11 NudrNotifyServiceServiceDown
| Field | Details |
|---|---|
| Description | OCUDR NudrNotifyServiceService {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Nudr Notify Service down. |
| Severity | Critical |
| Condition | Alert if Nudr Notify service is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7016 |
| Metric Used | app_kubernetes_io_name="nudr-notify-service" |
| Recommended Actions |
The alert is cleared when the NotifyService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.10 NudrNRFClientServiceDown
Table 4-12 NudrNRFClientServiceDown
| Field | Details |
|---|---|
| Description | OCUDR NRFClient service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NRF Client service down |
| Severity | Critical |
| Condition | Alert if Nudr Nrf Client service is down |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7007
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7007 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7007 |
| Metric Used | app_kubernetes_io_name="nrf-client-nfmanagement |
| Recommended Actions |
The alert is cleared when the NRFClientService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.11 NudrConfigServiceDown
Table 4-13 NudrConfigServiceDown
| Field | Details |
|---|---|
| Description | OCUDR config service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : nudr-config service down |
| Severity | Critical |
| Condition | Alert if Nudr Config service is down |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7010
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7008 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7008 |
| Metric Used | app_kubernetes_io_name="nudr-config" |
| Recommended Actions |
The alert is cleared when the ConfigService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.12 NudrDiameterProxyServiceDown
Table 4-14 NudrDiameterProxyServiceDown
| Field | Details |
|---|---|
| Description | OCUDR diameterproxy service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : nudr-diameterproxy service is down |
| Severity | Critical |
| Condition | Alert if Nudr Diameter Proxy is down |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7008
SLF: NA EIR: NA |
| Metric Used | app_kubernetes_io_name="nudr-diameterproxy" |
| Recommended Actions |
The alert is cleared when the DiameterProxyService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.13 NudrOnDemandMigrationServiceDown
Table 4-15 NudrOnDemandMigrationServiceDown
| Field | Details |
|---|---|
| Description | OCUDR ondemand-migration service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFSubscription service is down |
| Severity | Critical |
| Condition | Alert if Nudr On Demand Migration is down |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7009
SLF: NA EIR: NA |
| Metric Used | app_kubernetes_io_name="nudr-ondemand-migration" |
| Recommended Actions |
The alert is cleared when the OnDemandMigrationService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.14 OcudrIngressGatewayServiceDown
Table 4-16 OcudrIngressGatewayServiceDown
| Field | Details |
|---|---|
| Description | OCUDR Ingress-Gateway service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down |
| Severity | Critical |
| Condition | Alert if Ingress Service is down |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7011
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7009 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7009 (For EIR alert name, see Alert Details) |
| Metric Used | app_kubernetes_io_name="ingressgateway" |
| Recommended Actions |
The alert is cleared when the ingressgateway service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.15 OcudrEgressGatewayServiceDown
Table 4-17 OcudrEgressGatewayServiceDown
| Field | Details |
|---|---|
| Description | OCUDR Egress-Gateway service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Egress-Gateway service down |
| Severity | Critical |
| Condition | Alert if Egress Service is down |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7012
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7010 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7010 (For EIR alert name, see Alert Details) |
| Metric Used | app_kubernetes_io_name="egressgateway" |
| Recommended Actions |
The alert is cleared when the egressgateway service is available. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps:
|
Parent topic: System Level Alerts
4.1.1.16 OcudrDbServiceDown
Table 4-18 OcudrDbServiceDown
| Field | Details |
|---|---|
| Description | Mysql connectivity service is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : MySQL connectivity service down |
| Severity | Critical |
| Condition | Alert if Mysql connectivity is down |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7013
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7011 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7011 (For EIR alert name, see Alert Details) |
| Metric Used | appinfo_service_running |
| Recommended Actions | This alert clears when the microservice nudr-drservice is up and running. |
Parent topic: System Level Alerts
4.1.1.17 OcudrIngressGatewayProvServiceDown
Table 4-19 OcudrIngressGatewayProvServiceDown
| Field | Details |
|---|---|
| Description | OCUDR Ingress-Gateway service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down |
| Severity | Critical |
| Condition | Alert if Ingressgateway-prov service is down |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7019
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7017 EIR: NA |
| Metric Used | app_kubernetes_io_name="ingressgateway-prov" |
| Recommended Actions | The alert is cleared when the ingress-gateway service is
available.
Steps:
|
Parent topic: System Level Alerts
4.1.2 Application Level Alerts
This section lists the application level alerts.
- OcudrSignallingTrafficRateAboveMajorThreshold
- OcudrSignallingTrafficRateAboveMinorThreshold
- OcudrSignallingTrafficRateAboveCriticalThreshold
- OcudrSignallingTransactionErrorRateAbove0.1Percent
- OcudrSignallingTransactionErrorRateAbove1Percent
- OcudrSignallingTransactionErrorRateAbove10Percent
- OcudrSignallingTransactionErrorRateAbove25Percent
- OcudrSignallingTransactionErrorRateAbove50Percent
- OcudrXFCCValidationFailureAbove10Percent
- OcudrXFCCValidationFailureAbove20Percent
- OcudrXFCCValidationFailureAbove50Percent
- DRServiceOverload60Percent
- DRServiceOverload75Percent
- DRServiceOverload80Percent
- DRServiceOverload90Percent
- SLFSucessTxnDefaultGroupIdRateAbove1Percent
- SLFSucessTxnDefaultGroupIdRateAbove10Percent
- SLFSucessTxnDefaultGroupIdRateAbove25Percent
- SLFSucessTxnDefaultGroupIdRateAbove50Percent
- OcudrDiameterCongestionCongestedState
- OcudrDiameterCongestionDocState
- DRProvServiceOverload60Percent
- DRProvServiceOverload75Percent
- DRProvServiceOverload80Percent
- DRProvServiceOverload90Percent
- OcudrProvisioningTrafficRateAboveMajorThreshold
- OcudrProvisioningTrafficRateAboveCriticalThreshold
- OcudrProvisioningTransactionErrorRateAbove25Percent
- OcudrProvisioningTransactionErrorRateAbove50Percent
- PVCFullForSLFExport
- FailedExtractForSLFExport
- BulkImportTransferInFailed
- ExportToolTransferOutFailed
- BulkImportTransferOutFailed
- PVCFullForXMLBulkImport
- PVCFullForBulkImport
- OperationalStatusCompleteShutdown
- NFScoreCalculationFailed
- PVCFullForUDRExport
- UDRExportFailed
- IngressgatewayPodProtectionDocState
- IngressgatewayPodProtectionCongestedState
- RetryNotificationRecordsMaxLimitExceeded
- UserAgentHeaderNotFoundMorethan10PercentRequest
- EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold
- EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold
- EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold
- NudrDiameterGatewayDown
- DiameterPeerConnectionsDropped
- IGWSignallingPodProtectionDOCState
- IGWSignallingPodProtectionCongestedState
- IGWSignallingPodProtectionByRateLimitRejectedRequest
- DRServiceRequestLatencyMajor
- DRServiceRequestLatencyCritical
- DRServiceDBLatencyMajor
- DRServiceDBLatencyCritical
- IGWSignallingTotalAvgLatencyMajor
- IGWSignallingTotalAvgLatencyCritical
- DRProvServiceRequestLatencyMajor
- DRProvServiceRequestLatencyCritical
- DRProvServiceDBLatencyMajor
- DRProvServiceDBLatencyCritical
- IGWProvisioningTotalAvgLatencyMajor
- IGWProvisioningTotalAvgLatencyCritical
Parent topic: Alert Details
4.1.2.1 OcudrSignallingTrafficRateAboveMajorThreshold
Table 4-20 OcudrSignallingTrafficRateAboveMajorThreshold
| Field | Details |
|---|---|
| Description | 'Ingress traffic Rate is above major threshold i.e. 900 requests per second |
| Summary | 'Traffic Rate is above 90 Percent of Max requests per second(1000)' |
| Severity | Major |
| Condition | Alert if Ingress traffic reaches 90% of max TPS |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7001
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7001 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7001 (For EIR alert name, see Alert Details |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (eg : Mated site OCUDR is unavailable in georedundancy scenario). If this is unexpected, contact My Oracle Support and:
|
Parent topic: Application Level Alerts
4.1.2.2 OcudrSignallingTrafficRateAboveMinorThreshold
Table 4-21 OcudrSignallingTrafficRateAboveMinorThreshold
| Field | Details |
|---|---|
| Description | Ingress traffic rate is above minor threshold i.e. 800 requests per second |
| Summary | Traffic rate is above 80 Percent of Max requests per second(1000) |
| Severity | Minor |
| Condition | Alert if Ingress traffic reaches 80% of max TPS |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7001
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7001 EIR: NA |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcudrTrafficRateAboveMinorThreshold alert shall be raised. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic(eg : Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
Parent topic: Application Level Alerts
4.1.2.3 OcudrSignallingTrafficRateAboveCriticalThreshold
Table 4-22 OcudrSignallingTrafficRateAboveCriticalThreshold
| Field | Details |
|---|---|
| Description | 'Ingress traffic Rate is above critical threshold i.e. 950 requests per second |
| Summary | 'Traffic Rate is above 95 Percent of Max requests per second(1000)' |
| Severity | Critical |
| Condition | Alert if Ingress traffic reaches 95% of max TPS |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7001
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7001 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7001 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (Example: Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
Parent topic: Application Level Alerts
4.1.2.4 OcudrSignallingTransactionErrorRateAbove0.1Percent
Table 4-23 OcudrSignallingTransactionErrorRateAbove0.1Percent
| Field | Details |
|---|---|
| Description | Transaction error rate is above 0.1 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 0.1 Percent of Total Transactions |
| Severity | Warning |
| Condition | Alert if all error rate exceeds 0.1% of the total transactions |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7002
SLF: NA EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failed transactions is below 0.1 percent of the total transactions or when the number of failed transactions crosses the 1% threshold in which case the OcudrTransactionErrorRateAbove0.1Percent is raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.5 OcudrSignallingTransactionErrorRateAbove1Percent
Table 4-24 OcudrSignallingTransactionErrorRateAbove1Percent
| Field | Details |
|---|---|
| Description | 'Transaction Error rate is above 1 Percent of Total Transactions |
| Summary | 'Transaction Error Rate detected above 1 Percent of Total Transactions' |
| Severity | Warning |
| Condition | Alert if all error rate exceeds 1% of the total transactions |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7002
SLF: NA EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.6 OcudrSignallingTransactionErrorRateAbove10Percent
Table 4-25 OcudrSignallingTransactionErrorRateAbove10Percent
| Field | Details |
|---|---|
| Description | Transaction error rate is above 10 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
| Severity | Minor |
| Condition | Alert if all error rate exceeds 10% of the total transactions |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7002
SLF: NA EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.7 OcudrSignallingTransactionErrorRateAbove25Percent
Table 4-26 OcudrSignallingTransactionErrorRateAbove25Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 25 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 25 Percent of Total Transactions |
| Severity | Major |
| Condition | Alert if all error rate exceeds 25% of the total transactions |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7002
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7002 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the OcnrfTransactionErrorRateAbove50Percent shall be raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.8 OcudrSignallingTransactionErrorRateAbove50Percent
Table 4-27 OcudrSignallingTransactionErrorRateAbove50Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 50 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 50 Percent of Total Transactions |
| Severity | Critical |
| Condition | Alert if all error rate exceeds 50% of the total transactions |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7002
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7002 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7002 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 50 percent of the total transactions. Steps:
|
Parent topic: Application Level Alerts
4.1.2.9 OcudrXFCCValidationFailureAbove10Percent
Table 4-28 OcudrXFCCValidationFailureAbove10Percent
| Field | Details |
|---|---|
| Description | Total number of response with xfcc validation failure is about 10% of ingress traffic |
| Summary | Total number of response with xfcc validation failure is about 10% of ingress traffic |
| Severity | Minor |
| Condition | Alert if XFCC validation failure is 10% of the total XFCC validations |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7014
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7012 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7012 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_xfcc_header_validate_total |
| Recommended Actions |
The alert is cleared when the number of failure of XFCCValidationFailure are below 10% of the total. Steps:
|
Parent topic: Application Level Alerts
4.1.2.10 OcudrXFCCValidationFailureAbove20Percent
Table 4-29 OcudrXFCCValidationFailureAbove20Percent
| Field | Details |
|---|---|
| Description | Total number of response with xfcc validation failure is about 20% of ingress traffic |
| Summary | Total number of response with xfcc validation failure is about 20% of ingress traffic |
| Severity | Major |
| Condition | Alert if XFCC validation failure is 20% of the total XFCC validations |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7014
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7012 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7012 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_xfcc_header_validate_total |
| Recommended Actions |
The alert is cleared when the number of failure of XFCCValidationFailure are below 20% of the total. Steps:
|
Parent topic: Application Level Alerts
4.1.2.11 OcudrXFCCValidationFailureAbove50Percent
Table 4-30 OcudrXFCCValidationFailureAbove50Percent
| Field | Details |
|---|---|
| Description | Total number of response with XFCC validation failure is about 50% of ingress traffic |
| Summary | Total number of response with XFCC validation failure is about 50% of ingress traffic. |
| Severity | Critical |
| Condition | Alert if XFCC validation failure is 50% of the total XFCC validations |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7014
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7012 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7012 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_xfcc_header_validate_total |
| Recommended Actions |
The alert is cleared when the number of failure of XFCCValidationFailure are below 50% of the total. Steps:
|
Parent topic: Application Level Alerts
4.1.2.12 DRServiceOverload60Percent
Table 4-31 DRServiceOverload60Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Warn level |
| Summary | This alert is fired when the application go to the overload level of Warn level |
| Severity | Warning |
| Condition | Alert If the application overloads at 60% |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7015
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7013 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7013 |
| Metric Used | load_level |
| Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Warn level.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.13 DRServiceOverload75Percent
Table 4-32 DRServiceOverload75Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Minor level |
| Summary | This alert is fired when the application go to the overload level of Minor level. |
| Severity | Minor |
| Condition | Alert If the application overloads at 75% |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7015
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7013 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7013 |
| Metric Used | load_level |
| Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Minor level.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.14 DRServiceOverload80Percent
Table 4-33 DRServiceOverload80Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Minor level |
| Summary | This alert is fired when the application go to the overload level of Minor level |
| Severity | Major |
| Condition | Alert If the application overloads at 80% |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7015
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7013 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7013 |
| Metric Used | load_level |
| Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Major level.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.15 DRServiceOverload90Percent
Table 4-34 DRServiceOverload90Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Minor level |
| Summary | This alert is fired when the application go to the overload level of Minor level |
| Severity | Critical |
| Condition | Alert if the application overloads at 90% |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7015
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7013 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7013 |
| Metric Used | load_level |
| Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Critical level.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.16 SLFSucessTxnDefaultGroupIdRateAbove1Percent
Table 4-35 SLFSucessTxnDefaultGroupIdRateAbove1Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 1 Percent of Total Transactions |
| Summary | Transaction Error rate is above 1 Percent of Total Transactions |
| Severity | Warning |
| Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 1% of the total responses. |
| OID | UDR: NA
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7014 EIR: NA |
| Metric Used | slf_sucess_txn_default_grp_id_total |
| Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
Parent topic: Application Level Alerts
4.1.2.17 SLFSucessTxnDefaultGroupIdRateAbove10Percent
Table 4-36 SLFSucessTxnDefaultGroupIdRateAbove10Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 10 Percent of Total Transactions |
| Summary | Transaction Error rate is above 10 Percent of Total Transactions |
| Severity | Minor |
| Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 10% of the total responses. |
| OID | UDR: NA
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7014 EIR: NA |
| Metric Used | slf_sucess_txn_default_grp_id_total |
| Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
Parent topic: Application Level Alerts
4.1.2.18 SLFSucessTxnDefaultGroupIdRateAbove25Percent
Table 4-37 SLFSucessTxnDefaultGroupIdRateAbove25Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 25 Percent of Total Transactions |
| Summary | Transaction Error rate is above 25 Percent of Total Transactions |
| Severity | Major |
| Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 25% of the total responses. |
| OID | UDR: NA
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7014 EIR: NA |
| Metric Used | slf_sucess_txn_default_grp_id_total |
| Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
Parent topic: Application Level Alerts
4.1.2.19 SLFSucessTxnDefaultGroupIdRateAbove50Percent
Table 4-38 SLFSucessTxnDefaultGroupIdRateAbove50Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 50 Percent of Total Transactions |
| Summary | Transaction Error rate is above 50 Percent of Total Transactions |
| Severity | Critical |
| Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 50% of the total responses. |
| OID | UDR: NA
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7014 EIR: NA |
| Metric Used | slf_sucess_txn_default_grp_id_total |
| Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
Parent topic: Application Level Alerts
4.1.2.20 OcudrDiameterCongestionCongestedState
Table 4-39 OcudrDiameterCongestionCongestedState
| Field | Details |
|---|---|
| Description | Alert will be raised if the diameter gateway pod is in CONGESTED state. |
| Summary | Alert will be raised if the diameter gateway pod is in CONGESTED state. |
| Severity | Critical |
| Condition | Alert will be raised if the diameter gateway pod is in CONGESTED state. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7018
SLF: NA EIR: NA |
| Metric Used | ocudr_pod_congestion_state = = 2 |
| Recommended Actions |
This alert is raised when the Diameter Gateway pod congestion level is set to the CONGESTED state. Steps:
|
Parent topic: Application Level Alerts
4.1.2.21 OcudrDiameterCongestionDocState
Table 4-40 OcudrDiameterCongestionDocState
| Field | Details |
|---|---|
| Description | Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state. |
| Summary | Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state. |
| Severity | Major |
| Condition | Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7018
SLF: NA EIR: NA |
| Metric Used | ocudr_pod_congestion_state = = 1 |
| Recommended Actions |
This alert is raised when the Diameter Gateway pod congestion level is set to the Danger of Congestion (DOC) state. Steps:
|
Parent topic: Application Level Alerts
4.1.2.22 DRProvServiceOverload60Percent
Table 4-41 DRProvServiceOverload60Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Warn level |
| Summary | This alert is fired when the application go to the overload level of Warn level |
| Severity | Warning |
| Condition | Alert If the application overloads at 60% |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7017
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7016 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7015 |
| Metric Used | load_level |
| Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below Warn level. Steps:
|
Parent topic: Application Level Alerts
4.1.2.23 DRProvServiceOverload75Percent
Table 4-42 DRProvServiceOverload75Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Minor level |
| Summary | This alert is fired when the application go to the overload level of Minor level |
| Severity | Minor |
| Condition | Alert If the application overloads at 75% |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7017
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7016 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7015 |
| Metric Used | load_level |
| Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below Minor level. Steps:
|
Parent topic: Application Level Alerts
4.1.2.24 DRProvServiceOverload80Percent
Table 4-43 DRProvServiceOverload80Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Major level |
| Summary | This alert is fired when the application go to the overload level of Major level |
| Severity | Major |
| Condition | Alert If the application overloads at 80% |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7017
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7016 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7015 |
| Metric Used | load_level |
| Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below Major level. Steps:
|
Parent topic: Application Level Alerts
4.1.2.25 DRProvServiceOverload90Percent
Table 4-44 DRProvServiceOverload90Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of critical level |
| Summary | This alert is fired when the application go to the overload level of critical level |
| Severity | Critical |
| Condition | Alert If the application overloads at 90% |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7017
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7016 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7015 |
| Metric Used | load_level |
| Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below critical level. Steps:
|
Parent topic: Application Level Alerts
4.1.2.26 OcudrProvisioningTrafficRateAboveMajorThreshold
Table 4-45 OcudrProvisioningTrafficRateAboveMajorThreshold
| Field | Details |
|---|---|
| Description | Ingress traffic Rate is above critical threshold, that is, 950 requests per second |
| Summary | Traffic Rate is above 95 Percent of Max requests per second (1000) |
| Severity | Critical |
| Condition | Alert if Ingress traffic reaches 95% of max TPS |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7020
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7018 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7017 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions | The alert is cleared when the Ingress Traffic rate falls
below the Critical threshold.
Note: The threshold is configurable inUDR_Alertrules.yaml.
Steps: Reassess why OCUDR is receiving an additional traffic (for example, Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support.
|
Parent topic: Application Level Alerts
4.1.2.27 OcudrProvisioningTrafficRateAboveCriticalThreshold
Table 4-46 OcudrProvisioningTrafficRateAboveCriticalThreshold
| Field | Details |
|---|---|
| Description | Ingress traffic Rate is above major threshold, that is, 900 requests per second |
| Summary | Traffic Rate is above 90 Percent of Max requests per second (1000) |
| Severity | Major |
| Condition | Alert if Ingress traffic reaches 90% of max TPS |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7020
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7018 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7017 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate exceeds the Critical threshold in which the OcudrTrafficRateAboveMajorThreshold alert is raised. Note: The threshold is configurable inUDR_Alertrules.yaml.
Steps: Reassess why OCUDR is receiving an additional traffic (for example, Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support.
|
Parent topic: Application Level Alerts
4.1.2.28 OcudrProvisioningTransactionErrorRateAbove25Percent
Table 4-47 OcudrProvisioningTransactionErrorRateAbove25Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 25 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 25 Percent of Total Transactions |
| Severity | Major |
| Condition | Alert if all error rate exceeds 25% of the total transactions |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7021
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7019 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7018 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions is below 25% of the total transactions or when the number of failure transactions exceeds the 50% threshold in which the OcnrfTransactionErrorRateAbove50Percent is raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.29 OcudrProvisioningTransactionErrorRateAbove50Percent
Table 4-48 OcudrProvisioningTransactionErrorRateAbove50Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 50 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 50 Percent of Total Transactions |
| Severity | Critical |
| Condition | Alert if all error rate exceeds 50% of the total transactions |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7021
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7019 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7018 (For EIR alert name, see Alert Details) |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions | The alert is cleared when the number of failure
transactions is below 50 percent of the total transactions.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.30 PVCFullForSLFExport
Table 4-49 PVCFullForSLFExport
| Field | Details |
|---|---|
| Description | Storage for Export tool is full |
| Summary | Storage for Export tool is full |
| Severity | Critical |
| Condition | Alert if PVC allocated for export tool dump path is full |
| OID | UDR: NA
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7020 EIR: NA |
| Metric Used | export_tool_full_usage |
| Recommended Actions | Alert will be cleared when the PVC usage is optimized. Configure maxDumps to lower value to clear old dumps. Remove old dumps, if any from the export tool container. |
Parent topic: Application Level Alerts
4.1.2.31 FailedExtractForSLFExport
Table 4-50 FailedExtractForSLFExport
| Field | Details |
|---|---|
| Description | Export tool job is failed |
| Summary | Export tool job is failed |
| Severity | Critical |
| Condition | Alert of the export operation fails |
| OID | UDR: NA
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7021 EIR: NA |
| Metric Used | export_failure |
| Recommended Actions | Check logs for failure. The alert will be cleared when the export job succeeds next time. |
Parent topic: Application Level Alerts
4.1.2.32 BulkImportTransferInFailed
Table 4-51 BulkImportTransferInFailed
| Field | Details |
|---|---|
| Description | Transfer-in failed for bulk import |
| Summary | Transfer-in failed for bulk import |
| Severity | Major |
| Condition | Alert will be raised, if Transfer-In failed from Remote to PVC |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7022
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7022 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7019 |
| Metric Used | bulkimport_transfer_in_status |
| Recommended Actions | This alert is cleared when the transfer-in is success
from bulk import. Steps
|
Parent topic: Application Level Alerts
4.1.2.33 ExportToolTransferOutFailed
Table 4-52 ExportToolTransferOutFailed
| Field | Details |
|---|---|
| Description | Transfer-out failed for export-tool |
| Summary | Transfer-out failed for export-tool" |
| Severity | Major |
| Condition | Alert will be raised if Transfer-Out failed from PVC to Remote |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7024
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7024 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7021 |
| Metric Used | sftp_transfer_status |
| Recommended Actions | This alert is cleared when the transfer-out is success
from export tool. Steps
|
Parent topic: Application Level Alerts
4.1.2.34 BulkImportTransferOutFailed
Table 4-53 BulkImportTransferOutFailed
| Field | Details |
|---|---|
| Description | Transfer-out failed for bulk import |
| Summary | Transfer-out failed for bulk import |
| Severity | Major |
| Condition | Alert will be raised if Transfer-Out failed from PVC to Remote |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7023
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7023 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7020 |
| Metric Used | bulkimport_transfer_out_status |
| Recommended Actions | This alert is cleared when the transfer-out is success
from bulk import. Steps
|
Parent topic: Application Level Alerts
4.1.2.35 PVCFullForXMLBulkImport
Table 4-54 PVCFullForXMLBulkImport
| Field | Details |
|---|---|
| Description | Storage for XML Bulk Import tool is full |
| Summary | Storage for XML Bulk Import tool is full |
| Severity | Critical |
| Condition | Alert will be raised if the PVC is full for xml-csv container |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7025
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7025 EIR: NA |
| Metric Used | nudr_bulk_import_tool_pvc_full_usage{app_kubernetes_io_name="nudr-xmltocsv",kubernetes_namespace="ocudr"}==1 |
| Recommended Actions | This alert will be cleared when the PVC is back to
normal. Steps:
|
Parent topic: Application Level Alerts
4.1.2.36 PVCFullForBulkImport
Table 4-55 PVCFullForBulkImport
| Field | Details |
|---|---|
| Description | Storage for Bulk Import tool is full |
| Summary | Storage for Bulk Import tool is full |
| Severity | Critical |
| Condition | Alert will be raised if the PVC is full for bulk import container |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7026
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7026 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7025 |
| Metric Used | nudr_bulk_import_tool_pvc_full_usage{app_kubernetes_io_name="nudr-bulk-import",kubernetes_namespace="ocudr"}==1 |
| Recommended Actions | This alert will be cleared when the PVC is back to
normal. Steps:
|
Parent topic: Application Level Alerts
4.1.2.37 OperationalStatusCompleteShutdown
Table 4-56 OperationalStatusCompleteShutdown
| Field | Details |
|---|---|
| Description | Operational state is control shutdown |
| Summary | Operational state is control shutdown |
| Severity | Critical |
| Condition | Alert will be raised if the opertational state of the UDR, SLF, or EIR is COMPLETE_SHUTDOWN |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7027
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7027 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7026 |
| Metric Used | nudr_config_operational_status{kubernetes_namespace="ocudr"}==1 |
| Recommended Actions | This alert will be cleared when the operational status is
back to normal. Steps:
|
Parent topic: Application Level Alerts
4.1.2.38 NFScoreCalculationFailed
Table 4-57 NFScoreCalculationFailed
| Field | Details |
|---|---|
| Description | NFScoreCalculationFailed |
| Summary | NFScoreCalculationFailed |
| Severity | Major |
| Condition | Alert is raised if the NF Score calculation are failed for any of the scoring factors |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7028
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7028 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7027 |
| Metric Used | nfscore{kubernetes_namespace="ocudr" ,factor=~"successTPS|signallingConnections|serviceHealth|replicationHealth|localityPreference|bulkImport|bulkExport",calculatedStatus="failed"} |
| Recommended Actions |
This alert is cleared when the NF score calculation is successful. Steps:
|
Parent topic: Application Level Alerts
4.1.2.39 PVCFullForUDRExport
Table 4-58 PVCFullForUDRExport
| Field | Details |
|---|---|
| Description | Storage for Export tool is full |
| Summary | Storage for Export tool is full |
| Severity | Critical |
| Condition | Alert is raised if PVC allocated for export tool dump path is full. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7030
SLF: NA EIR: NA |
| Metric Used | export_tool_full_usage{namespace="ocudr"}==1 |
| Recommended Actions |
Alert is cleared when the PVC usage is optimized. You must configure maxDumps to a lower value to clear old dumps. Steps:
|
Parent topic: Application Level Alerts
4.1.2.40 UDRExportFailed
Table 4-59 UDRExportFailed
| Field | Details |
|---|---|
| Description | Export tool job is failed |
| Summary | Export tool job is failed |
| Severity | Critical |
| Condition | Alert is raised if the export operation fails for UDR Mode |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7031
SLF: NA EIR: NA |
| Metric Used | export_failure{namespace="ocudr"}== 1 |
| Recommended Actions |
You must check the logs for failure. When the next export job is successful the alert is cleared. |
Parent topic: Application Level Alerts
4.1.2.41 IngressgatewayPodProtectionDocState
Table 4-60 IngressgatewayPodProtectionDocState
| Field | Details |
|---|---|
| Description | Ingress congestion in Doc state |
| Summary | Ingress congestion Doc state |
| Severity | Critical |
| Condition | Alert is raised if Ingress congestion is in doc state. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7032
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7029 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7028 |
| Metric Used | oc_ingressgateway_pod_congestion_state{namespace="ocudr"}==1 |
| Recommended Actions | This alert will be cleared when the ingress gateway
comes to normal state.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.42 IngressgatewayPodProtectionCongestedState
Table 4-61 IngressgatewayPodProtectionCongestedState
| Field | Details |
|---|---|
| Description | Ingress congestion in Congested state |
| Summary | Ingress congestion in Congested state |
| Severity | Critical |
| Condition | Alert is raised if ingress congestion is in congested state. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7033
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7030 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7029 |
| Metric Used | oc_ingressgateway_pod_congestion_state{namespace="ocudr"}==2 |
| Recommended Actions | This alert will be cleared when the ingress gateway comes
to normal state.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.43 RetryNotificationRecordsMaxLimitExceeded
Table 4-62 RetryNotificationRecordsMaxLimitExceeded
| Field | Details |
|---|---|
| Description | Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit. |
| Summary | Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit. |
| Severity | Critical |
| Condition | Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit. |
| OID: | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7036
SLF: NA EIR: NA |
| Metric Used | nudr_notif_records_limit_exceeded{namespace="ocudr"}==1 |
| Recommended Actions |
This alert is raised when there are more notification failures and the retry notifications stored in database is more than 50k. Steps:
|
Parent topic: Application Level Alerts
4.1.2.44 UserAgentHeaderNotFoundMorethan10PercentRequest
Table 4-63 UserAgentHeaderNotFoundMorethan10PercentRequest
| Field | Details |
|---|---|
| Description | Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic when suppress notification feature is enabled. |
| Summary | Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic when suppress notification feature is enabled. |
| Severity | Critical |
| Condition | Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7035
SLF: NA EIR: NA |
| Metric Used | (sum by(namespace)(rate(suppress_user_agent_not_found_total{namespace="ocudr"}[5m]))/sum by(namespace)(rate(oc_ingressgateway_http_requests_total{namespace="ocudr"}[5m])))*100 >= 10 |
| Recommended Actions |
This alert is cleared if the total number of requests not having User-Agent header is less than 10% of ingress traffic. Steps:
|
Parent topic: Application Level Alerts
4.1.2.45 EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold
Table 4-64 EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold
| Field | Details |
|---|---|
| Description | Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit. |
| Summary | Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit. |
| Severity | Minor |
| Condition | Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7034
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7034 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7034 |
| Metric Used | sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".*egress.*"}) >= 1300000000 |
| Recommended Actions |
This alert is cleared if the egress gateway JVM buffer memory is below the minor threshold limit. Steps:
|
Parent topic: Application Level Alerts
4.1.2.46 EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold
Table 4-65 EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold
| Field | Details |
|---|---|
| Description | Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit. |
| Summary | Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit. |
| Severity | Major |
| Condition | Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7034
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7034 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7034 |
| Metric Used | sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".*egress.*"}) >= 1500000000 |
| Recommended Actions |
This alert is cleared if the egress gateway JVM buffer memory is below the major threshold limit. Steps:
|
Parent topic: Application Level Alerts
4.1.2.47 EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold
Table 4-66 EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold
| Field | Details |
|---|---|
| Description | Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit. |
| Summary | Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit. |
| Severity | Critical |
| Condition | Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7034
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7034 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7034 |
| Metric Used | sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".*egress.*"}) >= 1800000000 |
| Recommended Actions |
This alert is cleared if the egress gateway JVM buffer memory is below the critical threshold limit. Steps:
|
Parent topic: Application Level Alerts
4.1.2.48 NudrDiameterGatewayDown
Table 4-67 NudrDiameterGatewayDown
| Field | Details |
|---|---|
| Description | Alert will be raised if Nudr-diam-gateway service is down. |
| Summary | Alert will be raised if Nudr-diam-gateway service is down. |
| Severity | Critical |
| Condition | Alert will be raised if Nudr-diam-gateway service is down. |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7037
SLF: NA EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7037 |
| Metric Used | absent(up{container="nudr-diam-gateway",namespace="ocudr"}) or up{container="nudr-diam-gateway",namespace="ocudr"} == 0 |
| Recommended Actions |
This alert is cleared when the NudrDiamGateway service is available. Steps:
|
Parent topic: Application Level Alerts
4.1.2.49 DiameterPeerConnectionsDropped
Table 4-68 DiameterPeerConnectionsDropped
| Field | Details |
|---|---|
| Description | Alert will be raised if there are no connections between diameter peer and diameter gateway. |
| Summary | Alert will be raised if there are no connections between diameter peer and diameter gateway. |
| Severity | Major |
| Condition | Alert will be raised if there are no connections between diameter peer and diameter gateway. |
| OID: | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7029
SLF: NA EIR: NA |
| Metric Used | sum(ocudr_diam_conn_network{origHost=~".*CHI.*",container="nudr-diam-gateway",namespace="ocudr"} or vector(0))< 2 or sum(ocudr_diam_conn_network{origHost=~".*IND.*",container="nudr-diam-gateway",namespace="ocudr"} or vector(0)) < 2 or (sum(ocudr_diam_conn_network{origHost=~".*CHI.*",container="nudr-diam-gateway",kubernetes_namespace="ocudr"} or vector(0)) + sum(ocudr_diam_conn_network{origHost=~".*IND.*",container="nudr-diam-gateway",namespace="ocudr"}) or vector(0)) < 5 |
| Recommended Actions |
This alert is cleared when the NudrDiamGateway service is available. Steps:
|
Parent topic: Application Level Alerts
4.1.2.50 IGWSignallingPodProtectionDOCState
Table 4-69 IGWSignallingPodProtectionDOCState
| Field | Details |
|---|---|
| Description | Alert will be raised when the ingress gateway signaling traffic at DOC State. |
| Summary | Alert will be raised when the ingress gateway signaling traffic at DOC State. |
| Severity | Major |
| Condition | Alert will be raised when the ingress gateway signaling traffic at DOC State. |
| OID: | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7038
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7038 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7038 |
| Metric Used | sum({namespace="ocudr",container="ingressgateway-sig"}) by (pod) == 2 |
| Recommended Actions |
This alert is cleared when the signaling traffic reaches NORMAL state. Steps:
|
Parent topic: Application Level Alerts
4.1.2.51 IGWSignallingPodProtectionCongestedState
Table 4-70 IGWSignallingPodProtectionCongestedState
| Field | Details |
|---|---|
| Description | Alert will be raised when the ingress gateway signaling traffic at Congested State. |
| Summary | Alert will be raised when the ingress gateway signaling traffic at Congested State. |
| Severity | Critical |
| Condition | Alert will be raised when the ingress gateway signaling traffic at Congested State. |
| OID: | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7038
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7038 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7038 |
| Metric Used | sum(oc_ingressgateway_congestion_system_state{namespace="ocudr",container="ingressgateway-sig"}) by (pod) == 3 |
| Recommended Actions |
This alert is cleared when the signaling traffic reaches NORMAL or DOC state. Steps:
|
Parent topic: Application Level Alerts
4.1.2.52 IGWSignallingPodProtectionByRateLimitRejectedRequest
Table 4-71 IGWSignallingPodProtectionByRateLimitRejectedRequest
| Field | Details |
|---|---|
| Description | Alert will be raised when total rejections crossed more than 1% traffic of the total incoming traffic. |
| Summary | Alert will be raised when total rejections crossed more than 1% traffic of the total incoming traffic. |
| Severity | Critical |
| Condition | Alert will be raised when total rejections crossed more than 1% traffic of the total incoming traffic. |
| OID: | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7039
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7039 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7039 |
| Metric Used | (sum (rate(oc_ingressgateway_http_request_ratelimit_denied_count_total{Action="REJECT",namespace="ocudr"}[2m]) or (up * 0 ) ) )/ sum(rate(oc_ingressgateway_http_requests_total{container="ingressgateway-sig",namespace="ocudr"}[2m])) * 100 >= 1 |
| Recommended Actions |
This alert is cleared when the when rejection is reduced less than 1% of the total traffic. Steps:
|
Parent topic: Application Level Alerts
4.1.2.53 DRServiceRequestLatencyMajor
Table 4-72 DRServiceRequestLatencyMajor
| Field | Details |
|---|---|
| Description | DR service request latency is more than 100ms |
| Summary | DR service request latency is above 100ms |
| Severity | Major |
| Condition | Alert will be raised when DR service request latency exceeds 100ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7046
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7046 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7046 |
| Metric Used | histogram_quantile(95 / 100, sum by(le) (rate(udr_request_processing_time_seconds_bucket{namespace="ocudr",container="nudr-drservice"}[5m])))*1000 >= 100 < 250 |
| Recommended Actions | The alert is cleared when DR service latency falls below 100ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.54 DRServiceRequestLatencyCritical
Table 4-73 DRServiceRequestLatencyCritical
| Field | Details |
|---|---|
| Description | DR service request latency is more than 250ms |
| Summary | DR service request latency is above 250ms |
| Severity | Critical |
| Condition | Alert will be raised when DR service request latency exceeds 250ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7046
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7046 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7046 |
| Metric Used | histogram_quantile(95 / 100, sum by(le) (rate(udr_request_processing_time_seconds_bucket{namespace="ocudr",container="nudr-drservice"}[5m])))*1000 >= 250 |
| Recommended Actions | The alert is cleared when DR service latency falls below 250ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.55 DRServiceDBLatencyMajor
Table 4-74 DRServiceDBLatencyMajor
| Field | Details |
|---|---|
| Description | DR service DB latency is more than 25ms |
| Summary | DR service DB latency is above 25ms |
| Severity | Major |
| Condition | Alert will be raised when DR service DB latency exceeds 25ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7047
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7047 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7047 |
| Metric Used | histogram_quantile(95 / 100, sum by(le) (rate(udr_db_processing_time_seconds_bucket{namespace="ocudr",container="nudr-drservice"}[5m])))*1000 >= 25 < 50 |
| Recommended Actions | The alert is cleared when DR service DB latency falls below 25ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.56 DRServiceDBLatencyCritical
Table 4-75 DRServiceDBLatencyCritical
| Field | Details |
|---|---|
| Description | DR service DB latency is more than 50ms |
| Summary | DR service DB latency is above 50ms |
| Severity | Critical |
| Condition | Alert will be raised when DR service DB latency exceeds 50ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7047
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7047 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7047 |
| Metric Used | histogram_quantile(95 / 100, sum by(le) (rate(udr_db_processing_time_seconds_bucket{namespace="ocudr",container="nudr-drservice"}[5m])))*1000 >= 50 |
| Recommended Actions | The alert is cleared when DR service DB latency falls below 50ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.57 IGWSignallingTotalAvgLatencyMajor
Table 4-76 IGWSignallingTotalAvgLatencyMajor
| Field | Details |
|---|---|
| Description | IGW signalling average latency is more than 250ms |
| Summary | IGW signalling average latency is above 250ms |
| Severity | Major |
| Condition | Alert will be fired when IGW signalling average latency exceeds 250ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7048
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7048 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7048 |
| Metric Used | ((sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ) + (sum(irate(oc_ingressgateway_request_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_request_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ) + (sum(irate(oc_ingressgateway_response_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_response_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ))*1000 >= 250 < 500 |
| Recommended Actions | The alert is cleared when IGW signalling average latency falls below 250ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.58 IGWSignallingTotalAvgLatencyCritical
Table 4-77 IGWSignallingTotalAvgLatencyCritical
| Field | Details |
|---|---|
| Description | IGW signalling average latency is more than 500ms |
| Summary | IGW signalling average latency is above 500ms |
| Severity | Critical |
| Condition | Alert will be fired when IGW signalling average latency exceeds 500ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7048
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7048 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7048 |
| Metric Used | ((sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ) + (sum(irate(oc_ingressgateway_request_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_request_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ) + (sum(irate(oc_ingressgateway_response_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-sig"}[2m])) / sum(irate(oc_ingressgateway_response_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-sig"}[2m])) ))*1000 >= 500 |
| Recommended Actions | The alert is cleared when IGW signalling average latency falls below 500ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.59 DRProvServiceRequestLatencyMajor
Table 4-78 DRProvServiceRequestLatencyMajor
| Field | Details |
|---|---|
| Description | DR provisioning service request latency is more than 100ms |
| Summary | DR provisioning service request latency is above 100ms |
| Severity | Major |
| Condition | Alert will be raised when DR provisioning service request latency exceeds 100ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7049
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7049 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7049 |
| Metric Used | histogram_quantile(95 / 100, sum by(le) (rate(udr_request_processing_time_seconds_bucket{namespace="ocudr",container="nudr-dr-provservice"}[5m])))*1000 >= 100 < 250 |
| Recommended Actions | The alert is cleared when DR provisioning service latency falls below 100ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.60 DRProvServiceRequestLatencyCritical
Table 4-79 DRProvServiceRequestLatencyCritical
| Field | Details |
|---|---|
| Description | DR provisioning service request latency is more than 250ms |
| Summary | DR provisioning service request latency is above 250ms |
| Severity | Critical |
| Condition | Alert will be raised when DR provisioning service request latency exceeds 250ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7049
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7049 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7049 |
| Metric Used | histogram_quantile(95 / 100, sum by(le) (rate(udr_request_processing_time_seconds_bucket{namespace="ocudr",container="nudr-dr-provservice"}[5m])))*1000 >= 250 |
| Recommended Actions | The alert is cleared when DR provisioning service latency falls below 250ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.61 DRProvServiceDBLatencyMajor
Table 4-80 DRProvServiceDBLatencyMajor
| Field | Details |
|---|---|
| Description | DR provisioning service DB latency is more than 25ms |
| Summary | DR provisioning service DB latency is above 25ms |
| Severity | Major |
| Condition | Alert will be raised when DR provisioning service DB latency exceeds 25ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7050
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7050 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7050 |
| Metric Used | histogram_quantile(95 / 100, sum by(le) (rate(udr_db_processing_time_seconds_bucket{namespace="ocudr",container="nudr-dr-provservice"}[5m])))*1000 >= 25 < 50 |
| Recommended Actions | The alert is cleared when DR provisioning service DB latency falls below 25ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.62 DRProvServiceDBLatencyCritical
Table 4-81 DRProvServiceDBLatencyCritical
| Field | Details |
|---|---|
| Description | DR provisioning service DB latency is more than 50ms |
| Summary | DR provisioning service DB latency is above 50ms |
| Severity | Critical |
| Condition | Alert will be raised when DR provisioning service DB latency exceeds 50ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7050
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7050 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7050 |
| Metric Used | histogram_quantile(95 / 100, sum by(le) (rate(udr_db_processing_time_seconds_bucket{namespace="ocudr",container="nudr-dr-provservice"}[5m])))*1000 >= 50 |
| Recommended Actions | The alert is cleared when DR provisioning service DB latency falls below 50ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.63 IGWProvisioningTotalAvgLatencyMajor
Table 4-82 IGWProvisioningTotalAvgLatencyMajor
| Field | Details |
|---|---|
| Description | IGW provisioning average latency is more than 250ms |
| Summary | IGW provisioning average latency is above 250ms |
| Severity | Major |
| Condition | Alert will be fired when IGW provisioning average latency exceeds 250ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7051
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7051 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7051 |
| Metric Used | ((sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ) + (sum(irate(oc_ingressgateway_request_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_request_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ) + (sum(irate(oc_ingressgateway_response_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_response_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ))*1000 >= 250 < 500 |
| Recommended Actions | The alert is cleared when IGW provisioning average latency falls below 250ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts
4.1.2.64 IGWProvisioningTotalAvgLatencyCritical
Table 4-83 IGWProvisioningTotalAvgLatencyCritical
| Field | Details |
|---|---|
| Description | IGW provisioning average latency is more than 500ms |
| Summary | IGW provisioning average latency is above 500ms |
| Severity | Critical |
| Condition | Alert will be fired when IGW provisioning average latency exceeds 500ms |
| OID | UDR: 1.3.6.1.4.1.323.5.3.43.1.2.7051
SLF: 1.3.6.1.4.1.323.5.3.43.1.2.7051 EIR: 1.3.6.1.4.1.323.5.3.43.1.2.7051 |
| Metric Used | ((sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_backend_invocation_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ) + (sum(irate(oc_ingressgateway_request_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_request_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ) + (sum(irate(oc_ingressgateway_response_processing_latency_seconds_sum{namespace="ocudr",container="ingressgateway-prov"}[2m])) / sum(irate(oc_ingressgateway_response_processing_latency_seconds_count{namespace="ocudr",container="ingressgateway-prov"}[2m])) ))*1000 >= 500 |
| Recommended Actions | The alert is cleared when IGW provisioning average latency falls below 500ms. Steps: Check the service-specific metrics to understand the specific service request errors. If guidance is required, contact My Oracle Support. |
Parent topic: Application Level Alerts