4 Alert Configuration
This section describes how to configure alert rules for the UDR. It provides guidance on setting up measurement-based alert rules, where the alerting system evaluates metrics reported by UDR microservices against specified rule conditions to generate alerts as needed. UDR alert rules are configured based on metrics reported by UDR components. The alerting workflow monitors these metrics and issues notifications when the defined conditions are met. For more information about configuring UDR alerts in Prometheus, see the “Alert Configuration” section in Oracle Communications Cloud Native Core, Unified Data Repository Installation, Upgrade, and Fault Recovery Guide.
4.1 Alert Details
This section describes alerts in detail.
Note:
Max Ingress requests/sec in consideration is 1000/second.Table 4-1 Alerts Levels or Severity Types
| Alerts Levels / Severity Types | Definition |
|---|---|
| Critical | Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of UDR. |
| Major | Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of UDR. |
| Minor | Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of UDR. |
| Info or Warn (Informational) | Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of UDR. |
The below table provides alert names for UDR and EIR.
Table 4-2 Alert names for UDR/SLF and EIR
| UDR/SLF | EIR |
|---|---|
| OcudrTrafficRateAboveMajorThreshold | OceirTrafficRateAboveMajorThreshold |
| OcudrTrafficRateAboveMinorThreshold | OceirTrafficRateAboveMinorThreshold |
| OcudrTrafficRateAboveCriticalThreshold | OceirTrafficRateAboveCriticalThreshold |
| OcudrTransactionErrorRateAbove0.1Percent | OceirTransactionErrorRateAbove0.1Percent |
| OcudrTransactionErrorRateAbove1Percent | OceirTransactionErrorRateAbove1Percent |
| OcudrTransactionErrorRateAbove10Percent | OceirTransactionErrorRateAbove10Percent |
| OcudrTrafficRateAboveCriticalThreshold | OceirTrafficRateAboveCriticalThreshold |
| OcudrTrafficRateAboveMajorThreshold | OceirTrafficRateAboveMajorThreshold |
| OcudrTrafficRateAboveMinorThreshold | OceirTrafficRateAboveMinorThreshold |
| OcudrTransactionErrorRateAbove0.1Percent | OceirTransactionErrorRateAbove0.1Percent |
| OcudrTransactionErrorRateAbove1Percent | OceirTransactionErrorRateAbove1Percent |
| OcudrTransactionErrorRateAbove10Percent | OceirTransactionErrorRateAbove10Percent |
| OcudrTransactionErrorRateAbove25Percent | OceirTransactionErrorRateAbove25Percent |
| OcudrTransactionErrorRateAbove50Percent | OceirTransactionErrorRateAbove50Percent |
| OcudrSubscriberNotFoundAbove1Percent | OceirSubscriberNotFoundAbove1Percent |
| OcudrSubscriberNotFoundAbove10Percent | OceirSubscriberNotFoundAbove10Percent |
| OcudrSubscriberNotFoundAbove25Percent | OceirSubscriberNotFoundAbove25Percent |
| OcudrSubscriberNotFoundAbove50Percent | OceirSubscriberNotFoundAbove50Percent |
| OcudrPodsRestart | OceirPodsRestart |
| NudrServiceDown | NudrServiceDown |
| NudrProvServiceDown | NudrProvServiceDown |
| NudrNotifyServiceServiceDown | NA |
| NudrNRFClientServiceDown | NudrNRFClientServiceDown |
| NudrConfigServiceDown | NudrConfigServiceDown |
| NudrDiameterProxyServiceDown | NudrDiameterProxyServiceDown |
| NudrOnDemandMigrationServiceDown | NA |
| OcudrIngressGatewayServiceDown | OceirIngressGatewayServiceDown |
| OcudrEgressGatewayServiceDown | OceirEgressGatewayServiceDown |
| OcudrDbServiceDown | OceirDbServiceDown |
| OcudrXFCCValidationFailureAbove10Percent | OceirXFCCValidationFailureAbove10Percent |
| OcudrXFCCValidationFailureAbove20Percent | OceirXFCCValidationFailureAbove20Percent |
| OcudrXFCCValidationFailureAbove50Percent | OceirXFCCValidationFailureAbove50Percent |
| DRServiceOverload60Percent | DRServiceOverload60Percent |
| DRServiceOverload75Percent | DRServiceOverload75Percent |
| DRServiceOverload80Percent | DRServiceOverload80Percent |
| DRServiceOverload90Percent | DRServiceOverload90Percent |
| SLFSucessTxnDefaultGroupIdRateAbove1Percent | NA |
| SLFSucessTxnDefaultGroupIdRateAbove10Percent | NA |
| SLFSucessTxnDefaultGroupIdRateAbove25Percent | NA |
| SLFSucessTxnDefaultGroupIdRateAbove50Percent | NA |
| OcudrDiameterCongestionCongestedState | OceirDiameterCongestionCongestedState |
| OcudrDiameterCongestionDocState | OceirDiameterCongestionDocState |
| DRProvServiceOverload60Percent | DRProvServiceOverload60Percent |
| DRProvServiceOverload75Percent | DRProvServiceOverload75Percent |
| DRProvServiceOverload80Percent | DRProvServiceOverload80Percent |
| DRProvServiceOverload90Percent | DRProvServiceOverload90Percent |
| OcudrIngressGatewayProvServiceDown | OceirIngressGatewayProvServiceDown |
| OcudrProvisioningTrafficRateAboveMajorThreshold | OceirProvisioningTrafficRateAboveMajorThreshold |
| OcudrProvisioningTrafficRateAboveCriticalThreshold | OceirProvisioningTrafficRateAboveCriticalThreshold |
| OcudrProvisioningTransactionErrorRateAbove25Percent | OceirProvisioningTransactionErrorRateAbove25Percent |
| OcudrProvisioningTransactionErrorRateAbove50Percent | OceirProvisioningTransactionErrorRateAbove50Percent |
| PVCFullForSLFExport | NA |
| FailedExtractForSLFExport | NA |
| BulkImportTransferInFailed | BulkImportTransferInFailed |
| BulkImportTransferOutFailed | BulkImportTransferOutFailed |
| ExportToolTransferOutFailed | ExportToolTransferOutFailed |
| PVCFullForXMLBulkImport | PVCFullForXMLBulkImport |
| PVCFullForBulkImport | PVCFullForBulkImport |
| OperationalStatusCompleteShutdown | OperationalStatusCompleteShutdown |
| NFScoreCalculationFailed | NFScoreCalculationFailed |
| PVCFullForUDRExport | NA |
| UDRExportFailed | NA |
| IngressgatewayPodProtectionDocState | IngressgatewayPodProtectionDocState |
| IngressgatewayPodProtectionCongestedState | IngressgatewayPodProtectionCongestedState |
| RetryNotificationRecordsMaxLimitExceeded | RetryNotificationRecordsMaxLimitExceeded |
| UserAgentHeaderNotFoundMorethan10PercentRequest | NA |
| EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold | EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold |
| EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold | EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold |
| EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold | EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold |
| NudrDiameterGatewayDown | NudrDiameterGatewayDown |
| DiameterPeerConnectionsDropped | DiameterPeerConnectionsDropped |
| IGWSignallingPodProtectionDOCState | NA |
| IGWSignallingPodProtectionCongestedState | NA |
| IGWSignallingPodProtectionByRateLimitRejectedRequest | NA |
Note:
For the following alert details, only UDR alerts names are provided. The corresponding EIR alert names can be found in Table 4-2.Parent topic: Alert Configuration
4.1.1 System Level Alerts
This section lists the system level alerts.
- OcudrSubscriberNotFoundAbove1Percent
- OcudrSubscriberNotFoundAbove10Percent
- OcudrSubscriberNotFoundAbove25Percent
- OcudrSubscriberNotFoundAbove50Percent
- OcudrPodsRestart
- NudrServiceDown
- NudrProvServiceDown
- NudrNotifyServiceServiceDown
- NudrNRFClientServiceDown
- NudrConfigServiceDown
- NudrDiameterProxyServiceDown
- NudrOnDemandMigrationServiceDown
- OcudrIngressGatewayServiceDown
- OcudrEgressGatewayServiceDown
- OcudrDbServiceDown
- OcudrIngressGatewayProvServiceDown
Parent topic: Alert Details
4.1.1.1 OcudrSubscriberNotFoundAbove1Percent
Table 4-3 OcudrSubscriberNotFoundAbove1Percent
| Field | Details |
|---|---|
| Description | Total number of response if subscriber not found is about 1% of ingress traffic |
| Summary | Total number of response if subscriber not found is about 1% of ingress traffic |
| Severity | Warning |
| Condition | Alert if number of subscribers not found is 1% of all ingress traffic |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7009 |
| Metric Used | udr_subscriber_not_found_total |
| Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 1% of the total. Steps:
|
Parent topic: System Level Alerts
4.1.1.2 OcudrSubscriberNotFoundAbove10Percent
Table 4-4 OcudrSubscriberNotFoundAbove10Percent
| Field | Details |
|---|---|
| Description | Total number of response if subscriber not found is about 10% of ingress traffic |
| Summary | Total number of response if subscriber not found is about 10% of ingress traffic |
| Severity | Minor |
| Condition | Alert if number of subscribers not found is 10% of all ingress traffic |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7010 |
| Metric Used | udr_subscriber_not_found_total |
| Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 10% of the total. Steps:
|
Parent topic: System Level Alerts
4.1.1.3 OcudrSubscriberNotFoundAbove25Percent
Table 4-5 OcudrSubscriberNotFoundAbove25Percent
| Field | Details |
|---|---|
| Description | Total number of response if subscriber not found is about 25% of ingress traffic |
| Summary | Total number of response if subscriber not found is about 25% of ingress traffic |
| Severity | Major |
| Condition | Alert if number of subscribers not found is 25% of all ingress traffic |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7011 |
| Metric Used | udr_subscriber_not_found_total |
| Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 25% of the total. Steps:
|
Parent topic: System Level Alerts
4.1.1.4 OcudrSubscriberNotFoundAbove50Percent
Table 4-6 OcudrSubscriberNotFoundAbove50Percent
| Field | Details |
|---|---|
| Description | Total number of response if subscriber not found is about 50% of ingress traffic |
| Summary | Total number of response if subscriber not found is about 50% of ingress traffic |
| Severity | Critical |
| Condition | Alert if number of subscribers not found is 50% of all ingress traffic |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7012 |
| Metric Used | udr_subscriber_not_found_total |
| Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 50% of the total. Steps:
|
Parent topic: System Level Alerts
4.1.1.5 OcudrPodsRestart
Table 4-7 OcudrPodsRestart
| Field | Details |
|---|---|
| Description | Pod {{$labels.pod}} has restarted. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A Pod has restarted |
| Severity | Major |
| Condition | Alert if any of the pod got restarted |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7014 |
| Metric Used | kube_pod_container_status_restarts_total |
| Recommended Actions |
The alert is cleared automatically if the specific pod is up. Steps:
|
Parent topic: System Level Alerts
4.1.1.6 NudrServiceDown
Table 4-8 NudrServiceDown
| Field | Details |
|---|---|
| Description | OCUDR Nudr_DRService {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : DR Service is down |
| Severity | Critical |
| Condition | Alert if Nudr-dr service is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7015 |
| Metric Used | app_kubernetes_io_name="nudr-drservice |
| Recommended Actions |
The alert is cleared when the NudrService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.7 NudrProvServiceDown
Table 4-9 NudrProvServiceDown
| Field | Details |
|---|---|
| Description | OCUDR Nudr_DR_PROVService {{$labels.app_kubernetes_io_name}} is down |
| Summary | 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : DR Prov Service is down' |
| Severity | Critical |
| Condition | Alert if Nudr-dr service is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7015 |
| Metric Used | app_kubernetes_io_name="nudr-dr-provservice |
| Recommended Actions |
The alert is cleared when the NudrProvService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.8 NudrNotifyServiceServiceDown
Table 4-10 NudrNotifyServiceServiceDown
| Field | Details |
|---|---|
| Description | OCUDR NudrNotifyServiceService {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Nudr Notify Service down. |
| Severity | Critical |
| Condition | Alert if Nudr Notify service is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7016 |
| Metric Used | app_kubernetes_io_name="nudr-notify-service" |
| Recommended Actions |
The alert is cleared when the NotifyService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.9 NudrNRFClientServiceDown
Table 4-11 NudrNRFClientServiceDown
| Field | Details |
|---|---|
| Description | OCUDR NRFClient service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NRF Client service down |
| Severity | Critical |
| Condition | Alert if Nudr Nrf Client service is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7017 |
| Metric Used | app_kubernetes_io_name="nrf-client-nfmanagement |
| Recommended Actions |
The alert is cleared when the NRFClientService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.10 NudrConfigServiceDown
Table 4-12 NudrConfigServiceDown
| Field | Details |
|---|---|
| Description | OCUDR config service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : nudr-config service down |
| Severity | Critical |
| Condition | Alert if Nudr Config service is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7020 |
| Metric Used | app_kubernetes_io_name="nudr-config" |
| Recommended Actions |
The alert is cleared when the ConfigService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.11 NudrDiameterProxyServiceDown
Table 4-13 NudrDiameterProxyServiceDown
| Field | Details |
|---|---|
| Description | OCUDR diameterproxy service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : nudr-diameterproxy service is down |
| Severity | Critical |
| Condition | Alert if Nudr Diameter Proxy is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7018 |
| Metric Used | app_kubernetes_io_name="nudr-diameterproxy" |
| Recommended Actions |
The alert is cleared when the DiameterProxyService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.12 NudrOnDemandMigrationServiceDown
Table 4-14 NudrOnDemandMigrationServiceDown
| Field | Details |
|---|---|
| Description | OCUDR ondemand-migration service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFSubscription service is down |
| Severity | Critical |
| Condition | Alert if Nudr On Demand Migration is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7019 |
| Metric Used | app_kubernetes_io_name="nudr-ondemand-migration" |
| Recommended Actions |
The alert is cleared when the OnDemandMigrationService service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.13 OcudrIngressGatewayServiceDown
Table 4-15 OcudrIngressGatewayServiceDown
| Field | Details |
|---|---|
| Description | OCUDR Ingress-Gateway service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down |
| Severity | Critical |
| Condition | Alert if Ingress Service is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7021 |
| Metric Used | app_kubernetes_io_name="ingressgateway" |
| Recommended Actions |
The alert is cleared when the ingressgateway service is available. Steps:
|
Parent topic: System Level Alerts
4.1.1.14 OcudrEgressGatewayServiceDown
Table 4-16 OcudrEgressGatewayServiceDown
| Field | Details |
|---|---|
| Description | OCUDR Egress-Gateway service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Egress-Gateway service down |
| Severity | Critical |
| Condition | Alert if Egress Service is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7022 |
| Metric Used | app_kubernetes_io_name="egressgateway" |
| Recommended Actions |
The alert is cleared when the egressgateway service is available. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps:
|
Parent topic: System Level Alerts
4.1.1.15 OcudrDbServiceDown
Table 4-17 OcudrDbServiceDown
| Field | Details |
|---|---|
| Description | Mysql connectivity service is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : MySQL connectivity service down |
| Severity | Critical |
| Condition | Alert if Mysql connectivity is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7023 |
| Metric Used | appinfo_service_running |
| Recommended Actions | This alert clears when the microservice nudr-drservice is up and running. |
Parent topic: System Level Alerts
4.1.1.16 OcudrIngressGatewayProvServiceDown
Table 4-18 OcudrIngressGatewayProvServiceDown
| Field | Details |
|---|---|
| Description | OCUDR Ingress-Gateway service {{$labels.app_kubernetes_io_name}} is down |
| Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down |
| Severity | Critical |
| Condition | Alert if Ingressgateway-prov service is down |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7043 |
| Metric Used | app_kubernetes_io_name="ingressgateway-prov" |
| Recommended Actions | The alert is cleared when the ingress-gateway service is
available.
Steps:
|
Parent topic: System Level Alerts
4.1.2 Application Level Alerts
This section lists the application level alerts.
- OcudrTrafficRateAboveMajorThreshold
- OcudrTrafficRateAboveMinorThreshold
- OcudrTrafficRateAboveCriticalThreshold
- OcudrTransactionErrorRateAbove0.1Percent
- OcudrTransactionErrorRateAbove1Percent
- OcudrTransactionErrorRateAbove10Percent
- OcudrTrafficRateAboveCriticalThreshold
- OcudrTrafficRateAboveMajorThreshold
- OcudrTrafficRateAboveMinorThreshold
- OcudrTransactionErrorRateAbove0.1Percent
- OcudrTransactionErrorRateAbove1Percent
- OcudrTransactionErrorRateAbove10Percent
- OcudrTransactionErrorRateAbove25Percent
- OcudrTransactionErrorRateAbove50Percent
- OcudrXFCCValidationFailureAbove10Percent
- OcudrXFCCValidationFailureAbove20Percent
- OcudrXFCCValidationFailureAbove50Percent
- DRServiceOverload60Percent
- DRServiceOverload75Percent
- DRServiceOverload80Percent
- DRServiceOverload90Percent
- SLFSucessTxnDefaultGroupIdRateAbove1Percent
- SLFSucessTxnDefaultGroupIdRateAbove10Percent
- SLFSucessTxnDefaultGroupIdRateAbove25Percent
- SLFSucessTxnDefaultGroupIdRateAbove50Percent
- OcudrDiameterCongestionCongestedState
- OcudrDiameterCongestionDocState
- DRProvServiceOverload60Percent
- DRProvServiceOverload75Percent
- DRProvServiceOverload80Percent
- DRProvServiceOverload90Percent
- Diameter-Gateway pod congestion Danger of congestion state
- Diameter-Gateway pod CONGESTED state
- OcudrProvisioningTrafficRateAboveMajorThreshold
- OcudrProvisioningTrafficRateAboveCriticalThreshold
- OcudrProvisioningTransactionErrorRateAbove25Percent
- OcudrProvisioningTransactionErrorRateAbove50Percent
- PVCFullForSLFExport
- FailedExtractForSLFExport
- BulkImportTransferInFailed
- ExportToolTransferOutFailed
- BulkImportTransferOutFailed
- PVCFullForXMLBulkImport
- PVCFullForBulkImport
- OperationalStatusCompleteShutdown
- NFScoreCalculationFailed
- PVCFullForUDRExport
- UDRExportFailed
- IngressgatewayPodProtectionDocState
- IngressgatewayPodProtectionCongestedState
- RetryNotificationRecordsMaxLimitExceeded
- UserAgentHeaderNotFoundMorethan10PercentRequest
- EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold
- EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold
- EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold
- NudrDiameterGatewayDown
- DiameterPeerConnectionsDropped
- IGWSignallingPodProtectionDOCState
- IGWSignallingPodProtectionCongestedState
- IGWSignallingPodProtectionByRateLimitRejectedRequest
Parent topic: Alert Details
4.1.2.1 OcudrTrafficRateAboveMajorThreshold
Table 4-19 OcudrTrafficRateAboveMajorThreshold
| Field | Details |
|---|---|
| Description | 'Ingress traffic Rate is above major threshold i.e. 900 requests per second |
| Summary | 'Traffic Rate is above 90 Percent of Max requests per second(1000)' |
| Severity | Major |
| Condition | Alert if Ingress traffic reaches 90% of max TPS |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7002 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (eg : Mated site OCUDR is unavailable in georedundancy scenario). If this is unexpected, contact My Oracle Support and:
|
Parent topic: Application Level Alerts
4.1.2.2 OcudrTrafficRateAboveMinorThreshold
Table 4-20 OcudrTrafficRateAboveMinorThreshold
| Field | Details |
|---|---|
| Description | Ingress traffic rate is above minor threshold i.e. 800 requests per second |
| Summary | Traffic rate is above 80 Percent of Max requests per second(1000) |
| Severity | Minor |
| Condition | Alert if Ingress traffic reaches 80% of max TPS |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7001 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcudrTrafficRateAboveMinorThreshold alert shall be raised. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic(eg : Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
Parent topic: Application Level Alerts
4.1.2.3 OcudrTrafficRateAboveCriticalThreshold
Table 4-21 OcudrTrafficRateAboveCriticalThreshold
| Field | Details |
|---|---|
| Description | 'Ingress traffic Rate is above critical threshold i.e. 950 requests per second |
| Summary | 'Traffic Rate is above 95 Percent of Max requests per second(1000)' |
| Severity | Critical |
| Condition | Alert if Ingress traffic reaches 95% of max TPS |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7003 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (Example: Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
Parent topic: Application Level Alerts
4.1.2.4 OcudrTransactionErrorRateAbove0.1Percent
Table 4-22 OcudrTransactionErrorRateAbove0.1Percent
| Field | Details |
|---|---|
| Description | Transaction error rate is above 0.1 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 0.1 Percent of Total Transactions |
| Severity | Warning |
| Condition | Alert if all error rate exceeds 0.1% of the total transactions |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7004 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failed transactions is below 0.1 percent of the total transactions or when the number of failed transactions crosses the 1% threshold in which case the OcudrTransactionErrorRateAbove0.1Percent is raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.5 OcudrTransactionErrorRateAbove1Percent
Table 4-23 OcudrTransactionErrorRateAbove1Percent
| Field | Details |
|---|---|
| Description | 'Transaction Error rate is above 1 Percent of Total Transactions |
| Summary | 'Transaction Error Rate detected above 1 Percent of Total Transactions' |
| Severity | Warning |
| Condition | Alert if all error rate exceeds 1% of the total transactions |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7005 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.6 OcudrTransactionErrorRateAbove10Percent
Table 4-24 OcudrTransactionErrorRateAbove10Percent
| Field | Details |
|---|---|
| Description | Transaction error rate is above 10 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
| Severity | Minor |
| Condition | Alert if all error rate exceeds 10% of the total transactions |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7006 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.7 OcudrTrafficRateAboveCriticalThreshold
Table 4-25 OcudrTrafficRateAboveCriticalThreshold
| Field | Details |
|---|---|
| Description | Ingress traffic rate is above critical threshold i.e. 950 requests per second |
| Summary | Traffic rate is above 95 Percent of Max requests per second(1000) |
| Severity | Critical |
| Condition | Alert if Ingress traffic reaches 95% of max TPS |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7003 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (Example: Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
Parent topic: Application Level Alerts
4.1.2.8 OcudrTrafficRateAboveMajorThreshold
Table 4-26 OcudrTrafficRateAboveMajorThreshold
| Field | Details |
|---|---|
| Description | Ingress traffic rate is above major threshold i.e. 900 requests per second |
| Summary | Traffic rate is above 90 Percent of Max requests per second(1000) |
| Severity | Major |
| Condition | Alert if Ingress traffic reaches 90% of max TPS |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7002 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (eg: Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
Parent topic: Application Level Alerts
4.1.2.9 OcudrTrafficRateAboveMinorThreshold
Table 4-27 OcudrTrafficRateAboveMinorThreshold
| Field | Details |
|---|---|
| Description | Ingress traffic Rate is above minor threshold i.e. 800 requests per second |
| Summary | Traffic Rate is above 80 Percent of Max requests per second (1000) |
| Severity | Minor |
| Condition | Alert if Ingress traffic reaches 80% of max TPS |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7001 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcudrTrafficRateAboveMinorThreshold alert shall be raised. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (eg : Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
Parent topic: Application Level Alerts
4.1.2.10 OcudrTransactionErrorRateAbove0.1Percent
Table 4-28 OcudrTransactionErrorRateAbove0.1Percent
| Field | Details |
|---|---|
| Description | Transaction Error rate is above 0.1 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 0.1 Percent of Total Transactions |
| Severity | Warning |
| Condition | Alert if all error rate exceeds 0.1% of the total transactions |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7004 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 0.1 percent of the total transactions or when the number of failure transactions cross the 1% threshold in which case the OcudrTransactionErrorRateAbove0.1Percent shall be raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.11 OcudrTransactionErrorRateAbove1Percent
Table 4-29 OcudrTransactionErrorRateAbove1Percent
| Field | Details |
|---|---|
| Description | Transaction Error rate is above 1 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 1 Percent of Total Transactions |
| Severity | Warning |
| Condition | Alert if all error rate exceeds 1% of the total transactions |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7005 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.12 OcudrTransactionErrorRateAbove10Percent
Table 4-30 OcudrTransactionErrorRateAbove10Percent
| Field | Details |
|---|---|
| Description | Transaction Error rate is above 10 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
| Severity | Minor |
| Condition | Alert if all error rate exceeds 10% of the total transactions |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7006 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.13 OcudrTransactionErrorRateAbove25Percent
Table 4-31 OcudrTransactionErrorRateAbove25Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 25 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 25 Percent of Total Transactions |
| Severity | Major |
| Condition | Alert if all error rate exceeds 25% of the total transactions |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7007 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the OcnrfTransactionErrorRateAbove50Percent shall be raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.14 OcudrTransactionErrorRateAbove50Percent
Table 4-32 OcudrTransactionErrorRateAbove50Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 50 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 50 Percent of Total Transactions |
| Severity | Critical |
| Condition | Alert if all error rate exceeds 50% of the total transactions |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7008 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 50 percent of the total transactions. Steps:
|
Parent topic: Application Level Alerts
4.1.2.15 OcudrXFCCValidationFailureAbove10Percent
Table 4-33 OcudrXFCCValidationFailureAbove10Percent
| Field | Details |
|---|---|
| Description | Total number of response with xfcc validation failure is about 10% of ingress traffic |
| Summary | Total number of response with xfcc validation failure is about 10% of ingress traffic |
| Severity | Minor |
| Condition | Alert if XFCC validation failure is 10% of the total XFCC validations |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7024 |
| Metric Used | oc_ingressgateway_xfcc_header_validate_total |
| Recommended Actions |
The alert is cleared when the number of failure of XFCCValidationFailure are below 10% of the total. Steps:
|
Parent topic: Application Level Alerts
4.1.2.16 OcudrXFCCValidationFailureAbove20Percent
Table 4-34 OcudrXFCCValidationFailureAbove20Percent
| Field | Details |
|---|---|
| Description | Total number of response with xfcc validation failure is about 20% of ingress traffic |
| Summary | Total number of response with xfcc validation failure is about 20% of ingress traffic |
| Severity | Major |
| Condition | Alert if XFCC validation failure is 20% of the total XFCC validations |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7025 |
| Metric Used | oc_ingressgateway_xfcc_header_validate_total |
| Recommended Actions |
The alert is cleared when the number of failure of XFCCValidationFailure are below 20% of the total. Steps:
|
Parent topic: Application Level Alerts
4.1.2.17 OcudrXFCCValidationFailureAbove50Percent
Table 4-35 OcudrXFCCValidationFailureAbove50Percent
| Field | Details |
|---|---|
| Description | Total number of response with XFCC validation failure is about 50% of ingress traffic |
| Summary | Total number of response with XFCC validation failure is about 50% of ingress traffic. |
| Severity | Critical |
| Condition | Alert if XFCC validation failure is 50% of the total XFCC validations |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7026 |
| Metric Used | oc_ingressgateway_xfcc_header_validate_total |
| Recommended Actions |
The alert is cleared when the number of failure of XFCCValidationFailure are below 50% of the total. Steps:
|
Parent topic: Application Level Alerts
4.1.2.18 DRServiceOverload60Percent
Table 4-36 DRServiceOverload60Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Warn level |
| Summary | This alert is fired when the application go to the overload level of Warn level |
| Severity | Warning |
| Condition | Alert If the application overloads at 60% |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7027 |
| Metric Used | load_level |
| Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Warn level.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.19 DRServiceOverload75Percent
Table 4-37 DRServiceOverload75Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Minor level |
| Summary | This alert is fired when the application go to the overload level of Minor level. |
| Severity | Minor |
| Condition | Alert If the application overloads at 75% |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7028 |
| Metric Used | load_level |
| Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Minor level.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.20 DRServiceOverload80Percent
Table 4-38 DRServiceOverload80Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Minor level |
| Summary | This alert is fired when the application go to the overload level of Minor level |
| Severity | Major |
| Condition | Alert If the application overloads at 80% |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7029 |
| Metric Used | load_level |
| Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Major level.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.21 DRServiceOverload90Percent
Table 4-39 DRServiceOverload90Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Minor level |
| Summary | This alert is fired when the application go to the overload level of Minor level |
| Severity | Critical |
| Condition | Alert if the application overloads at 90% |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7030 |
| Metric Used | load_level |
| Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Critical level.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.22 SLFSucessTxnDefaultGroupIdRateAbove1Percent
Table 4-40 SLFSucessTxnDefaultGroupIdRateAbove1Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 1 Percent of Total Transactions |
| Summary | Transaction Error rate is above 1 Percent of Total Transactions |
| Severity | Warning |
| Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 1% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7031 |
| Metric Used | slf_sucess_txn_default_grp_id_total |
| Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
Parent topic: Application Level Alerts
4.1.2.23 SLFSucessTxnDefaultGroupIdRateAbove10Percent
Table 4-41 SLFSucessTxnDefaultGroupIdRateAbove10Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 10 Percent of Total Transactions |
| Summary | Transaction Error rate is above 10 Percent of Total Transactions |
| Severity | Minor |
| Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 10% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7032 |
| Metric Used | slf_sucess_txn_default_grp_id_total |
| Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
Parent topic: Application Level Alerts
4.1.2.24 SLFSucessTxnDefaultGroupIdRateAbove25Percent
Table 4-42 SLFSucessTxnDefaultGroupIdRateAbove25Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 25 Percent of Total Transactions |
| Summary | Transaction Error rate is above 25 Percent of Total Transactions |
| Severity | Major |
| Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 25% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7033 |
| Metric Used | slf_sucess_txn_default_grp_id_total |
| Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
Parent topic: Application Level Alerts
4.1.2.25 SLFSucessTxnDefaultGroupIdRateAbove50Percent
Table 4-43 SLFSucessTxnDefaultGroupIdRateAbove50Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 50 Percent of Total Transactions |
| Summary | Transaction Error rate is above 50 Percent of Total Transactions |
| Severity | Critical |
| Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 50% of the total responses. |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7034 |
| Metric Used | slf_sucess_txn_default_grp_id_total |
| Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
Parent topic: Application Level Alerts
4.1.2.26 OcudrDiameterCongestionCongestedState
Table 4-44 OcudrDiameterCongestionCongestedState
| Field | Details |
|---|---|
| Description | Alert will be raised if the diameter gateway pod is in CONGESTED state. |
| Summary | Alert will be raised if the diameter gateway pod is in CONGESTED state. |
| Severity | Critical |
| Condition | Alert will be raised if the diameter gateway pod is in CONGESTED state. |
| Metric Used | ocudr_pod_congestion_state = = 2 |
| Recommended Actions |
This alert is raised when the Diameter Gateway pod congestion level is set to the CONGESTED state. Steps:
|
Parent topic: Application Level Alerts
4.1.2.27 OcudrDiameterCongestionDocState
Table 4-45 OcudrDiameterCongestionDocState
| Field | Details |
|---|---|
| Description | Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state. |
| Summary | Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state. |
| Severity | Major |
| Condition | Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state. |
| Metric Used | ocudr_pod_congestion_state = = 1 |
| Recommended Actions |
This alert is raised when the Diameter Gateway pod congestion level is set to the Danger of Congestion (DOC) state. Steps:
|
Parent topic: Application Level Alerts
4.1.2.28 DRProvServiceOverload60Percent
Table 4-46 DRProvServiceOverload60Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Warn level |
| Summary | This alert is fired when the application go to the overload level of Warn level |
| Severity | Warning |
| Condition | Alert If the application overloads at 60% |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7036 |
| Metric Used | load_level |
| Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below Warn level. Steps:
|
Parent topic: Application Level Alerts
4.1.2.29 DRProvServiceOverload75Percent
Table 4-47 DRProvServiceOverload75Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Minor level |
| Summary | This alert is fired when the application go to the overload level of Minor level |
| Severity | Minor |
| Condition | Alert If the application overloads at 75% |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7037 |
| Metric Used | load_level |
| Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below Minor level. Steps:
|
Parent topic: Application Level Alerts
4.1.2.30 DRProvServiceOverload80Percent
Table 4-48 DRProvServiceOverload80Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of Major level |
| Summary | This alert is fired when the application go to the overload level of Major level |
| Severity | Major |
| Condition | Alert If the application overloads at 80% |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7038 |
| Metric Used | load_level |
| Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below Major level. Steps:
|
Parent topic: Application Level Alerts
4.1.2.31 DRProvServiceOverload90Percent
Table 4-49 DRProvServiceOverload90Percent
| Field | Details |
|---|---|
| Description | This alert is fired when the application go to the overload level of critical level |
| Summary | This alert is fired when the application go to the overload level of critical level |
| Severity | Critical |
| Condition | Alert If the application overloads at 90% |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7039 |
| Metric Used | load_level |
| Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below critical level. Steps:
|
Parent topic: Application Level Alerts
4.1.2.32 Diameter-Gateway pod congestion Danger of congestion state
Table 4-50 Diameter-Gateway pod congestion Danger of congestion state
| Field | Details |
|---|---|
| Description | DiameterGateway pod at Danger of Congestion state |
| Summary | DiameterGateway pod at Danger of Congestion state |
| Severity | Major |
| Condition | Alert if the diameter gateway pod is in Danger of Congestion (DOC) state |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7041 |
| Metric Used | occnp_pod_congestion_state==1 |
| Recommended Actions |
This alert is raised when the diameter gateway pod congestion level is set to the danger of congestion(DOC) Steps:
|
Parent topic: Application Level Alerts
4.1.2.33 Diameter-Gateway pod CONGESTED state
Table 4-51 Diameter-Gateway pod CONGESTED state
| Field | Details |
|---|---|
| Description | DiameterGateway pod at Congested state |
| Summary | DiameterGateway pod at Congested state |
| Severity | Critical |
| Condition | Alert if the diameter gateway pod is in CONGESTED state |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7042 |
| Metric Used | occnp_pod_congestion_state==2 |
| Recommended Actions |
This alert is raised when the diameter gateway pod congestion level is set to the CONGESTED state Steps:
|
Parent topic: Application Level Alerts
4.1.2.34 OcudrProvisioningTrafficRateAboveMajorThreshold
Table 4-52 OcudrProvisioningTrafficRateAboveMajorThreshold
| Field | Details |
|---|---|
| Description | Ingress traffic Rate is above critical threshold, that is, 950 requests per second |
| Summary | Traffic Rate is above 95 Percent of Max requests per second (1000) |
| Severity | Critical |
| Condition | Alert if Ingress traffic reaches 95% of max TPS |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7044 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions | The alert is cleared when the Ingress Traffic rate falls
below the Critical threshold.
Note: The threshold is configurable inUDR_Alertrules.yaml.
Steps: Reassess why OCUDR is receiving an additional traffic (for example, Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support.
|
Parent topic: Application Level Alerts
4.1.2.35 OcudrProvisioningTrafficRateAboveCriticalThreshold
Table 4-53 OcudrProvisioningTrafficRateAboveCriticalThreshold
| Field | Details |
|---|---|
| Description | Ingress traffic Rate is above major threshold, that is, 900 requests per second |
| Summary | Traffic Rate is above 90 Percent of Max requests per second (1000) |
| Severity | Major |
| Condition | Alert if Ingress traffic reaches 90% of max TPS |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7045 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate exceeds the Critical threshold in which the OcudrTrafficRateAboveMajorThreshold alert is raised. Note: The threshold is configurable inUDR_Alertrules.yaml.
Steps: Reassess why OCUDR is receiving an additional traffic (for example, Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support.
|
Parent topic: Application Level Alerts
4.1.2.36 OcudrProvisioningTransactionErrorRateAbove25Percent
Table 4-54 OcudrProvisioningTransactionErrorRateAbove25Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 25 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 25 Percent of Total Transactions |
| Severity | Major |
| Condition | Alert if all error rate exceeds 25% of the total transactions |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7046 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failure transactions is below 25% of the total transactions or when the number of failure transactions exceeds the 50% threshold in which the OcnrfTransactionErrorRateAbove50Percent is raised. Steps:
|
Parent topic: Application Level Alerts
4.1.2.37 OcudrProvisioningTransactionErrorRateAbove50Percent
Table 4-55 OcudrProvisioningTransactionErrorRateAbove50Percent
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 50 Percent of Total Transactions |
| Summary | Transaction Error Rate detected above 50 Percent of Total Transactions |
| Severity | Critical |
| Condition | Alert if all error rate exceeds 50% of the total transactions |
| OID | 1.3.6.1.4.1.323.5.3.43.1.2.7047 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Actions | The alert is cleared when the number of failure
transactions is below 50 percent of the total transactions.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.38 PVCFullForSLFExport
Table 4-56 PVCFullForSLFExport
| Field | Details |
|---|---|
| Description | Storage for Export tool is full |
| Summary | Storage for Export tool is full |
| Severity | Critical |
| Condition | Alert if PVC allocated for export tool dump path is full |
| Metric Used | export_tool_full_usage |
| Recommended Actions | Alert will be cleared when the PVC usage is optimized. Configure maxDumps to lower value to clear old dumps. Remove old dumps, if any from the export tool container. |
Parent topic: Application Level Alerts
4.1.2.39 FailedExtractForSLFExport
Table 4-57 FailedExtractForSLFExport
| Field | Details |
|---|---|
| Description | Export tool job is failed |
| Summary | Export tool job is failed |
| Severity | Critical |
| Condition | Alert of the export operation fails |
| Metric Used | export_failure |
| Recommended Actions | Check logs for failure. The alert will be cleared when the export job succeeds next time. |
Parent topic: Application Level Alerts
4.1.2.40 BulkImportTransferInFailed
Table 4-58 BulkImportTransferInFailed
| Field | Details |
|---|---|
| Description | Transfer-in failed for bulk import |
| Summary | Transfer-in failed for bulk import |
| Severity | Major |
| Condition | Alert will be raised, if Transfer-In failed from Remote to PVC |
| Metric Used | bulkimport_transfer_in_status |
| Recommended Actions | This alert is cleared when the transfer-in is success
from bulk import. Steps
|
Parent topic: Application Level Alerts
4.1.2.41 ExportToolTransferOutFailed
Table 4-59 ExportToolTransferOutFailed
| Field | Details |
|---|---|
| Description | Transfer-out failed for export-tool |
| Summary | Transfer-out failed for export-tool" |
| Severity | Major |
| Condition | Alert will be raised if Transfer-Out failed from PVC to Remote |
| Metric Used | sftp_transfer_status |
| Recommended Actions | This alert is cleared when the transfer-out is success
from export tool. Steps
|
Parent topic: Application Level Alerts
4.1.2.42 BulkImportTransferOutFailed
Table 4-60 BulkImportTransferOutFailed
| Field | Details |
|---|---|
| Description | Transfer-out failed for bulk import |
| Summary | Transfer-out failed for bulk import |
| Severity | Major |
| Condition | Alert will be raised if Transfer-Out failed from PVC to Remote |
| Metric Used | bulkimport_transfer_out_status |
| Recommended Actions | This alert is cleared when the transfer-out is success
from bulk import. Steps
|
Parent topic: Application Level Alerts
4.1.2.43 PVCFullForXMLBulkImport
Table 4-61 PVCFullForXMLBulkImport
| Field | Details |
|---|---|
| Description | Storage for XML Bulk Import tool is full |
| Summary | Storage for XML Bulk Import tool is full |
| Severity | Critical |
| Condition | Alert will be raised if the PVC is full for xml-csv container |
| Metric Used | nudr_bulk_import_tool_pvc_full_usage{app_kubernetes_io_name="nudr-xmltocsv",kubernetes_namespace="ocudr"}==1 |
| Recommended Actions | This alert will be cleared when the PVC is back to
normal. Steps:
|
Parent topic: Application Level Alerts
4.1.2.44 PVCFullForBulkImport
Table 4-62 PVCFullForBulkImport
| Field | Details |
|---|---|
| Description | Storage for Bulk Import tool is full |
| Summary | Storage for Bulk Import tool is full |
| Severity | Critical |
| Condition | Alert will be raised if the PVC is full for bulk import container |
| Metric Used | nudr_bulk_import_tool_pvc_full_usage{app_kubernetes_io_name="nudr-bulk-import",kubernetes_namespace="ocudr"}==1 |
| Recommended Actions | This alert will be cleared when the PVC is back to
normal. Steps:
|
Parent topic: Application Level Alerts
4.1.2.45 OperationalStatusCompleteShutdown
Table 4-63 OperationalStatusCompleteShutdown
| Field | Details |
|---|---|
| Description | Operational state is control shutdown |
| Summary | Operational state is control shutdown |
| Severity | Critical |
| Condition | Alert will be raised if the opertational state of the UDR, SLF, or EIR is COMPLETE_SHUTDOWN |
| Metric Used | nudr_config_operational_status{kubernetes_namespace="ocudr"}==1 |
| Recommended Actions | This alert will be cleared when the operational status is
back to normal. Steps:
|
Parent topic: Application Level Alerts
4.1.2.46 NFScoreCalculationFailed
Table 4-64 NFScoreCalculationFailed
| Field | Details |
|---|---|
| Description | NFScoreCalculationFailed |
| Summary | NFScoreCalculationFailed |
| Severity | Major |
| Condition | Alert is raised if the NF Score calculation are failed for any of the scoring factors |
| Metric Used | nfscore{kubernetes_namespace="ocudr" ,factor=~"successTPS|signallingConnections|serviceHealth|replicationHealth|localityPreference|bulkImport|bulkExport",calculatedStatus="failed"} |
| Recommended Actions |
This alert is cleared when the NF score calculation is successful. Steps:
|
Parent topic: Application Level Alerts
4.1.2.47 PVCFullForUDRExport
Table 4-65 PVCFullForUDRExport
| Field | Details |
|---|---|
| Description | Storage for Export tool is full |
| Summary | Storage for Export tool is full |
| Severity | Critical |
| Condition | Alert is raised if PVC allocated for export tool dump path is full. |
| Metric Used | export_tool_full_usage{namespace="ocudr"}==1 |
| Recommended Actions |
Alert is cleared when the PVC usage is optimized. You must configure maxDumps to a lower value to clear old dumps. Steps:
|
Parent topic: Application Level Alerts
4.1.2.48 UDRExportFailed
Table 4-66 UDRExportFailed
| Field | Details |
|---|---|
| Description | Export tool job is failed |
| Summary | Export tool job is failed |
| Severity | Critical |
| Condition | Alert is raised if the export operation fails for UDR Mode |
| Metric Used | export_failure{namespace="ocudr"}== 1 |
| Recommended Actions |
You must check the logs for failure. When the next export job is successful the alert is cleared. |
Parent topic: Application Level Alerts
4.1.2.49 IngressgatewayPodProtectionDocState
Table 4-67 IngressgatewayPodProtectionDocState
| Field | Details |
|---|---|
| Description | Ingress congestion in Doc state |
| Summary | Ingress congestion Doc state |
| Severity | Critical |
| Condition | Alert is raised if Ingress congestion is in doc state. |
| Metric Used | oc_ingressgateway_pod_congestion_state{namespace="ocudr"}==1 |
| Recommended Actions | This alert will be cleared when the ingress gateway
comes to normal state.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.50 IngressgatewayPodProtectionCongestedState
Table 4-68 IngressgatewayPodProtectionCongestedState
| Field | Details |
|---|---|
| Description | Ingress congestion in Congested state |
| Summary | Ingress congestion in Congested state |
| Severity | Critical |
| Condition | Alert is raised if ingress congestion is in congested state. |
| Metric Used | oc_ingressgateway_pod_congestion_state{namespace="ocudr"}==2 |
| Recommended Actions | This alert will be cleared when the ingress gateway comes
to normal state.
Steps:
|
Parent topic: Application Level Alerts
4.1.2.51 RetryNotificationRecordsMaxLimitExceeded
Table 4-69 RetryNotificationRecordsMaxLimitExceeded
| Field | Details |
|---|---|
| Description | Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit. |
| Summary | Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit. |
| Severity | Critical |
| Condition | Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit. |
| Metric Used | nudr_notif_records_limit_exceeded{namespace="ocudr"}==1 |
| Recommended Actions |
This alert is raised when there are more notification failures and the retry notifications stored in database is more than 50k. Steps:
|
Parent topic: Application Level Alerts
4.1.2.52 UserAgentHeaderNotFoundMorethan10PercentRequest
Table 4-70 UserAgentHeaderNotFoundMorethan10PercentRequest
| Field | Details |
|---|---|
| Description | Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic when suppress notification feature is enabled. |
| Summary | Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic when suppress notification feature is enabled. |
| Severity | Critical |
| Condition | Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic. |
| Metric Used | (sum by(namespace)(rate(suppress_user_agent_not_found_total{namespace="ocudr"}[5m]))/sum by(namespace)(rate(oc_ingressgateway_http_requests_total{namespace="ocudr"}[5m])))*100 >= 10 |
| Recommended Actions |
This alert is cleared if the total number of requests not having User-Agent header is less than 10% of ingress traffic. Steps:
|
Parent topic: Application Level Alerts
4.1.2.53 EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold
Table 4-71 EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold
| Field | Details |
|---|---|
| Description | Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit. |
| Summary | Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit. |
| Severity | Minor |
| Condition | Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit. |
| Metric Used | sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".*egress.*"}) >= 1300000000 |
| Recommended Actions |
This alert is cleared if the egress gateway JVM buffer memory is below the minor threshold limit. Steps:
|
Parent topic: Application Level Alerts
4.1.2.54 EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold
Table 4-72 EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold
| Field | Details |
|---|---|
| Description | Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit. |
| Summary | Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit. |
| Severity | Major |
| Condition | Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit. |
| Metric Used | sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".*egress.*"}) >= 1500000000 |
| Recommended Actions |
This alert is cleared if the egress gateway JVM buffer memory is below the major threshold limit. Steps:
|
Parent topic: Application Level Alerts
4.1.2.55 EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold
Table 4-73 EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold
| Field | Details |
|---|---|
| Description | Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit. |
| Summary | Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit. |
| Severity | Critical |
| Condition | Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit. |
| Metric Used | sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".*egress.*"}) >= 1800000000 |
| Recommended Actions |
This alert is cleared if the egress gateway JVM buffer memory is below the critical threshold limit. Steps:
|
Parent topic: Application Level Alerts
4.1.2.56 NudrDiameterGatewayDown
Table 4-74 NudrDiameterGatewayDown
| Field | Details |
|---|---|
| Description | Alert will be raised if Nudr-diam-gateway service is down. |
| Summary | Alert will be raised if Nudr-diam-gateway service is down. |
| Severity | Critical |
| Condition | Alert will be raised if Nudr-diam-gateway service is down. |
| Metric Used | absent(up{container="nudr-diam-gateway",namespace="ocudr"}) or up{container="nudr-diam-gateway",namespace="ocudr"} == 0 |
| Recommended Actions |
This alert is cleared when the NudrDiamGateway service is available. Steps:
|
Parent topic: Application Level Alerts
4.1.2.57 DiameterPeerConnectionsDropped
Table 4-75 DiameterPeerConnectionsDropped
| Field | Details |
|---|---|
| Description | Alert will be raised if there are no connections between diameter peer and diameter gateway. |
| Summary | Alert will be raised if there are no connections between diameter peer and diameter gateway. |
| Severity | Major |
| Condition | Alert will be raised if there are no connections between diameter peer and diameter gateway. |
| Metric Used | sum(ocudr_diam_conn_network{origHost=~".*CHI.*",container="nudr-diam-gateway",namespace="ocudr"} or vector(0))< 2 or sum(ocudr_diam_conn_network{origHost=~".*IND.*",container="nudr-diam-gateway",namespace="ocudr"} or vector(0)) < 2 or (sum(ocudr_diam_conn_network{origHost=~".*CHI.*",container="nudr-diam-gateway",kubernetes_namespace="ocudr"} or vector(0)) + sum(ocudr_diam_conn_network{origHost=~".*IND.*",container="nudr-diam-gateway",namespace="ocudr"}) or vector(0)) < 5 |
| Recommended Actions |
This alert is cleared when the NudrDiamGateway service is available. Steps:
|
Parent topic: Application Level Alerts
4.1.2.58 IGWSignallingPodProtectionDOCState
Table 4-76 IGWSignallingPodProtectionDOCState
| Field | Details |
|---|---|
| Description | Alert will be raised when the ingress gateway signaling traffic at DOC State. |
| Summary | Alert will be raised when the ingress gateway signaling traffic at DOC State. |
| Severity | Major |
| Condition | Alert will be raised when the ingress gateway signaling traffic at DOC State. |
| Metric Used | sum({namespace="ocudr",container="ingressgateway-sig"}) by (pod) == 2 |
| Recommended Actions |
This alert is cleared when the signaling traffic reaches NORMAL state. Steps:
|
Parent topic: Application Level Alerts
4.1.2.59 IGWSignallingPodProtectionCongestedState
Table 4-77 IGWSignallingPodProtectionCongestedState
| Field | Details |
|---|---|
| Description | Alert will be raised when the ingress gateway signaling traffic at Congested State. |
| Summary | Alert will be raised when the ingress gateway signaling traffic at Congested State. |
| Severity | Critical |
| Condition | Alert will be raised when the ingress gateway signaling traffic at Congested State. |
| Metric Used | sum(oc_ingressgateway_congestion_system_state{namespace="ocudr",container="ingressgateway-sig"}) by (pod) == 3 |
| Recommended Actions |
This alert is cleared when the signaling traffic reaches NORMAL or DOC state. Steps:
|
Parent topic: Application Level Alerts
4.1.2.60 IGWSignallingPodProtectionByRateLimitRejectedRequest
Table 4-78 IGWSignallingPodProtectionByRateLimitRejectedRequest
| Field | Details |
|---|---|
| Description | Alert will be raised when total rejections crossed more than 1% traffic of the total incoming traffic. |
| Summary | Alert will be raised when total rejections crossed more than 1% traffic of the total incoming traffic. |
| Severity | Critical |
| Condition | Alert will be raised when total rejections crossed more than 1% traffic of the total incoming traffic. |
| Metric Used | (sum (rate(oc_ingressgateway_http_request_ratelimit_denied_count_total{Action="REJECT",namespace="ocudr"}[2m]) or (up * 0 ) ) )/ sum(rate(oc_ingressgateway_http_requests_total{container="ingressgateway-sig",namespace="ocudr"}[2m])) * 100 >= 1 |
| Recommended Actions |
This alert is cleared when the when rejection is reduced less than 1% of the total traffic. Steps:
|
Parent topic: Application Level Alerts