4 Alerts
- Pod not running or down
- Pod restarts
- Transaction reaches maximum threshold traffic
- Subscriber not found
- XFCC validation failure rate
- Invalid user agent
Note:
The performance and capacity of the UDR system may vary based on the call model, feature or interface configuration, and underlying CNE and hardware environment, including but not limited to, the size of the json payload, operatsion type, and traffic model.If any of the above scenarios occur, an alert triggers in Prometheus. Alerts help to handle a scenario before its failure.
To Configure
- ocudr_alerts_haprom.yaml to configure alerts in UDR when using CNE 25.1.1xx version.
- ocudr_alerts_non_haprom.yaml to configure alerts in UDR when using environment other than CNE.
- ocslf_alerts_non_haprom.yaml to configure alerts in SLF when using OSO 1.6.x and OSO 25.1.1xx versions.
- ocslf_alerts_haprom.yaml to configure alerts in SLF when using CNE 25.1.1xx version.
- oceir_alerts_haprom.yaml to configure alerts in EIR when using CNE 25.1.1xx version.
- oceir_alerts_non_haprom.yaml to configure alerts in EIR when using environment other than CNE.
The above file are shared as part of custom templates of UDR.
In the SLF_Alerts.yaml, update the namespace and then, follow the To Configure Alerts to deploy SLF in OSO 1.6.x and OSO 22.3.x versions section to configure alerts.
In the UDR, SLF, and EIR Alerts.yaml, update the namespace and then, follow the To Configure Alerts to deploy UDR, SLF, and EIR in CNE 22.3.x version. section to configure alerts.
To Configure Alerts in SLF when using OSO 1.6.x and OSO 25.1.1xx versions:
- Run the following command to take backup of
current configuration map of Prometheus.
kubectl get configmaps occne-prometheus-server -o yaml -n occne-infra > /tmp/tempConfig.yaml
- Run the following commands to add UDR
alerts file to the Prometheus configmap yaml file.
sed -i '/etc\/config\/alertsudr/d' /tmp/tempConfig.yaml
sed -i '/rule_files:/a\ \ - /etc/config/alertsudr' /tmp/tempConfig.yaml
- Run the following command to update the
configuration map with updated file name of SLF
alert file.
kubectl replace configmap occne-prometheus-server -f /tmp/tempConfig.yaml
- Run the following command to add
UDR_Alertrules in the configuration map under SLF
alert file name.
kubectl patch configmap occne-prometheus-server -n occne-infra --type merge --patch "$(cat ~/ocslf_alerts_non_haprom_25.1.100.yaml)"
Note:
Prometheus server takes updated configmap reloaded after sometime automatically (approximately 20 sec).
To Configure Alerts in UDR, SLF, and EIR when using CNE 24.1.x version:
Run the following command for:
kubectl create -f ocudr_alerts_haprom_25.1.100.yaml -n <namespace>
SLF:
kubectl create -f ocslf_alerts_haprom_25.1.100.yaml -n <namespace>
kubectl create -f oceir_alerts_haprom_25.1.100.yaml -n <namespace>
To disable
To disable alerts in Prometheus:
- Edit the ocslf_alerts_non_haprom.yaml file to
remove specific alert. For example, to disable the
OcudrTrafficRateAboveMinorThreshold
alert:
## ALERT SAMPLE START## - alert: OcudrTrafficRateAboveMinorThreshold annotations: description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})' summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)' expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocudr"}[2m])) >= 800 < 900 labels: severity: Minor ## ALERT SAMPLE END##
- Remove the specific content from alert that needs to be disabled.
- Configure alert again as mentioned in the To Configure section above.
To Observe
For more information on metrics and KPIs, see UDR Metrics and UDR KPIs sections respectively.
4.1 Alert Details
This section describes alerts in detail.
Note:
Max Ingress requests/sec in consideration is 1000/second.Table 4-1 Alerts Levels or Severity Types
Alerts Levels / Severity Types | Definition |
---|---|
Critical | Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of UDR. |
Major | Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of UDR. |
Minor | Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of UDR. |
Info or Warn (Informational) | Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of UDR. |
The below table provides alert names for UDR and EIR.
Table 4-2 Alert names for UDR and EIR
UDR | EIR |
---|---|
OcudrTrafficRateAboveMajorThreshold | OceirTrafficRateAboveMajorThreshold |
OcudrTrafficRateAboveMinorThreshold | OceirTrafficRateAboveMinorThreshold |
OcudrTrafficRateAboveCriticalThreshold | OceirTrafficRateAboveCriticalThreshold |
OcudrTransactionErrorRateAbove0.1Percent | OceirTransactionErrorRateAbove0.1Percent |
OcudrTransactionErrorRateAbove1Percent | OceirTransactionErrorRateAbove1Percent |
OcudrTransactionErrorRateAbove10Percent | OceirTransactionErrorRateAbove10Percent |
OcudrTrafficRateAboveCriticalThreshold | OceirTrafficRateAboveCriticalThreshold |
OcudrTrafficRateAboveMajorThreshold | OceirTrafficRateAboveMajorThreshold |
OcudrTrafficRateAboveMinorThreshold | OceirTrafficRateAboveMinorThreshold |
OcudrTransactionErrorRateAbove0.1Percent | OceirTransactionErrorRateAbove0.1Percent |
OcudrTransactionErrorRateAbove1Percent | OceirTransactionErrorRateAbove1Percent |
OcudrTransactionErrorRateAbove10Percent | OceirTransactionErrorRateAbove10Percent |
OcudrTransactionErrorRateAbove25Percent | OceirTransactionErrorRateAbove25Percent |
OcudrTransactionErrorRateAbove50Percent | OceirTransactionErrorRateAbove50Percent |
OcudrSubscriberNotFoundAbove1Percent | OceirSubscriberNotFoundAbove1Percent |
OcudrSubscriberNotFoundAbove10Percent | OceirSubscriberNotFoundAbove10Percent |
OcudrSubscriberNotFoundAbove25Percent | OceirSubscriberNotFoundAbove25Percent |
OcudrSubscriberNotFoundAbove50Percent | OceirSubscriberNotFoundAbove50Percent |
OcudrPodsRestart | OceirPodsRestart |
NudrServiceDown | NudrServiceDown |
NudrProvServiceDown | NudrProvServiceDown |
NudrNotifyServiceServiceDown | NA |
NudrNRFClientServiceDown | NudrNRFClientServiceDown |
NudrConfigServiceDown | NudrConfigServiceDown |
NudrDiameterProxyServiceDown | NudrDiameterProxyServiceDown |
NudrOnDemandMigrationServiceDown | NA |
OcudrIngressGatewayServiceDown | OceirIngressGatewayServiceDown |
OcudrEgressGatewayServiceDown | OceirEgressGatewayServiceDown |
OcudrDbServiceDown | OceirDbServiceDown |
OcudrXFCCValidationFailureAbove10Percent | OceirXFCCValidationFailureAbove10Percent |
OcudrXFCCValidationFailureAbove20Percent | OceirXFCCValidationFailureAbove20Percent |
OcudrXFCCValidationFailureAbove50Percent | OceirXFCCValidationFailureAbove50Percent |
DRServiceOverload60Percent | DRServiceOverload60Percent |
DRServiceOverload75Percent | DRServiceOverload75Percent |
DRServiceOverload80Percent | DRServiceOverload80Percent |
DRServiceOverload90Percent | DRServiceOverload90Percent |
SLFSucessTxnDefaultGroupIdRateAbove1Percent | NA |
SLFSucessTxnDefaultGroupIdRateAbove10Percent | NA |
SLFSucessTxnDefaultGroupIdRateAbove25Percent | NA |
SLFSucessTxnDefaultGroupIdRateAbove50Percent | NA |
OcudrDiameterCongestionCongestedState | OceirDiameterCongestionCongestedState |
OcudrDiameterCongestionDocState | OceirDiameterCongestionDocState |
DRProvServiceOverload60Percent | DRProvServiceOverload60Percent |
DRProvServiceOverload75Percent | DRProvServiceOverload75Percent |
DRProvServiceOverload80Percent | DRProvServiceOverload80Percent |
DRProvServiceOverload90Percent | DRProvServiceOverload90Percent |
OcudrIngressGatewayProvServiceDown | OceirIngressGatewayProvServiceDown |
OcudrProvisioningTrafficRateAboveMajorThreshold | OceirProvisioningTrafficRateAboveMajorThreshold |
OcudrProvisioningTrafficRateAboveCriticalThreshold | OceirProvisioningTrafficRateAboveCriticalThreshold |
OcudrProvisioningTransactionErrorRateAbove25Percent | OceirProvisioningTransactionErrorRateAbove25Percent |
OcudrProvisioningTransactionErrorRateAbove50Percent | OceirProvisioningTransactionErrorRateAbove50Percent |
PVCFullForSLFExport | NA |
FailedExtractForSLFExport | NA |
BulkImportTransferInFailed | BulkImportTransferInFailed |
BulkImportTransferOutFailed | BulkImportTransferOutFailed |
ExportToolTransferOutFailed | ExportToolTransferOutFailed |
PVCFullForXMLBulkImport | PVCFullForXMLBulkImport |
PVCFullForBulkImport | PVCFullForBulkImport |
OperationalStatusCompleteShutdown | OperationalStatusCompleteShutdown |
NFScoreCalculationFailed | NFScoreCalculationFailed |
PVCFullForEXMLExport | NA |
EXMLExportFailed | NA |
IngressgatewayPodProtectionDocState | IngressgatewayPodProtectionDocState |
IngressgatewayPodProtectionCongestedState | IngressgatewayPodProtectionCongestedState |
RetryNotificationRecordsMaxLimitExceeded | RetryNotificationRecordsMaxLimitExceeded |
UserAgentHeaderNotFoundMorethan10PercentRequest | NA |
EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold | EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold |
EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold | EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold |
EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold | EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold |
NudrDiameterGatewayDown | NudrDiameterGatewayDown |
DiameterPeerConnectionsDropped | DiameterPeerConnectionsDropped |
Note:
For the following alert details, only UDR alerts names are provided. The corresponding EIR alert names can be found in Table 4-2.4.1.1 System Level Alerts
This section lists the system level alerts.
4.1.1.1 OcudrSubscriberNotFoundAbove1Percent
Table 4-3 OcudrSubscriberNotFoundAbove1Percent
Field | Details |
---|---|
Description | Total number of response if subscriber not found is about 1% of ingress traffic |
Summary | Total number of response if subscriber not found is about 1% of ingress traffic |
Severity | Warning |
Condition | Alert if number of subscribers not found is 1% of all ingress traffic |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7009 |
Metric Used | udr_subscriber_not_found_total |
Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 1% of the total. Steps:
|
4.1.1.2 OcudrSubscriberNotFoundAbove10Percent
Table 4-4 OcudrSubscriberNotFoundAbove10Percent
Field | Details |
---|---|
Description | Total number of response if subscriber not found is about 10% of ingress traffic |
Summary | Total number of response if subscriber not found is about 10% of ingress traffic |
Severity | Minor |
Condition | Alert if number of subscribers not found is 10% of all ingress traffic |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7010 |
Metric Used | udr_subscriber_not_found_total |
Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 10% of the total. Steps:
|
4.1.1.3 OcudrSubscriberNotFoundAbove25Percent
Table 4-5 OcudrSubscriberNotFoundAbove25Percent
Field | Details |
---|---|
Description | Total number of response if subscriber not found is about 25% of ingress traffic |
Summary | Total number of response if subscriber not found is about 25% of ingress traffic |
Severity | Major |
Condition | Alert if number of subscribers not found is 25% of all ingress traffic |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7011 |
Metric Used | udr_subscriber_not_found_total |
Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 25% of the total. Steps:
|
4.1.1.4 OcudrSubscriberNotFoundAbove50Percent
Table 4-6 OcudrSubscriberNotFoundAbove50Percent
Field | Details |
---|---|
Description | Total number of response if subscriber not found is about 50% of ingress traffic |
Summary | Total number of response if subscriber not found is about 50% of ingress traffic |
Severity | Critical |
Condition | Alert if number of subscribers not found is 50% of all ingress traffic |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7012 |
Metric Used | udr_subscriber_not_found_total |
Recommended Actions |
The alert is cleared when the number of failure of Subscriber Not Found are below 50% of the total. Steps:
|
4.1.1.5 OcudrPodsRestart
Table 4-7 OcudrPodsRestart
Field | Details |
---|---|
Description | Pod {{$labels.pod}} has restarted. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A Pod has restarted |
Severity | Major |
Condition | Alert if any of the pod got restarted |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7014 |
Metric Used | kube_pod_container_status_restarts_total |
Recommended Actions |
The alert is cleared automatically if the specific pod is up. Steps:
|
4.1.1.6 NudrServiceDown
Table 4-8 NudrServiceDown
Field | Details |
---|---|
Description | OCUDR Nudr_DRService {{$labels.app_kubernetes_io_name}} is down |
Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : DR Service is down |
Severity | Critical |
Condition | Alert if Nudr-dr service is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7015 |
Metric Used | app_kubernetes_io_name="nudr-drservice |
Recommended Actions |
The alert is cleared when the NudrService service is available. Steps:
|
4.1.1.7 NudrProvServiceDown
Table 4-9 NudrProvServiceDown
Field | Details |
---|---|
Description | OCUDR Nudr_DR_PROVService {{$labels.app_kubernetes_io_name}} is down |
Summary | 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : DR Prov Service is down' |
Severity | Critical |
Condition | Alert if Nudr-dr service is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7015 |
Metric Used | app_kubernetes_io_name="nudr-dr-provservice |
Recommended Actions |
The alert is cleared when the NudrProvService service is available. Steps:
|
4.1.1.8 NudrNotifyServiceServiceDown
Table 4-10 NudrNotifyServiceServiceDown
Field | Details |
---|---|
Description | OCUDR NudrNotifyServiceService {{$labels.app_kubernetes_io_name}} is down |
Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Nudr Notify Service down. |
Severity | Critical |
Condition | Alert if Nudr Notify service is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7016 |
Metric Used | app_kubernetes_io_name="nudr-notify-service" |
Recommended Actions |
The alert is cleared when the NotifyService service is available. Steps:
|
4.1.1.9 NudrNRFClientServiceDown
Table 4-11 NudrNRFClientServiceDown
Field | Details |
---|---|
Description | OCUDR NRFClient service {{$labels.app_kubernetes_io_name}} is down |
Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NRF Client service down |
Severity | Critical |
Condition | Alert if Nudr Nrf Client service is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7017 |
Metric Used | app_kubernetes_io_name="nrf-client-nfmanagement |
Recommended Actions |
The alert is cleared when the NRFClientService service is available. Steps:
|
4.1.1.10 NudrConfigServiceDown
Table 4-12 NudrConfigServiceDown
Field | Details |
---|---|
Description | OCUDR config service {{$labels.app_kubernetes_io_name}} is down |
Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : nudr-config service down |
Severity | Critical |
Condition | Alert if Nudr Config service is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7020 |
Metric Used | app_kubernetes_io_name="nudr-config" |
Recommended Actions |
The alert is cleared when the ConfigService service is available. Steps:
|
4.1.1.11 NudrDiameterProxyServiceDown
Table 4-13 NudrDiameterProxyServiceDown
Field | Details |
---|---|
Description | OCUDR diameterproxy service {{$labels.app_kubernetes_io_name}} is down |
Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : nudr-diameterproxy service is down |
Severity | Critical |
Condition | Alert if Nudr Diameter Proxy is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7018 |
Metric Used | app_kubernetes_io_name="nudr-diameterproxy" |
Recommended Actions |
The alert is cleared when the DiameterProxyService service is available. Steps:
|
4.1.1.12 NudrOnDemandMigrationServiceDown
Table 4-14 NudrOnDemandMigrationServiceDown
Field | Details |
---|---|
Description | OCUDR ondemand-migration service {{$labels.app_kubernetes_io_name}} is down |
Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFSubscription service is down |
Severity | Critical |
Condition | Alert if Nudr On Demand Migration is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7019 |
Metric Used | app_kubernetes_io_name="nudr-ondemand-migration" |
Recommended Actions |
The alert is cleared when the OnDemandMigrationService service is available. Steps:
|
4.1.1.13 OcudrIngressGatewayServiceDown
Table 4-15 OcudrIngressGatewayServiceDown
Field | Details |
---|---|
Description | OCUDR Ingress-Gateway service {{$labels.app_kubernetes_io_name}} is down |
Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down |
Severity | Critical |
Condition | Alert if Ingress Service is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7021 |
Metric Used | app_kubernetes_io_name="ingressgateway" |
Recommended Actions |
The alert is cleared when the ingressgateway service is available. Steps:
|
4.1.1.14 OcudrEgressGatewayServiceDown
Table 4-16 OcudrEgressGatewayServiceDown
Field | Details |
---|---|
Description | OCUDR Egress-Gateway service {{$labels.app_kubernetes_io_name}} is down |
Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Egress-Gateway service down |
Severity | Critical |
Condition | Alert if Egress Service is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7022 |
Metric Used | app_kubernetes_io_name="egressgateway" |
Recommended Actions |
The alert is cleared when the egressgateway service is available. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps:
|
4.1.1.15 OcudrDbServiceDown
Table 4-17 OcudrDbServiceDown
Field | Details |
---|---|
Description | Mysql connectivity service is down |
Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : MySQL connectivity service down |
Severity | Critical |
Condition | Alert if Mysql connectivity is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7023 |
Metric Used | appinfo_service_running |
Recommended Actions | This alert clears when the microservice nudr-drservice is up and running. |
4.1.1.16 OcudrIngressGatewayProvServiceDown
Table 4-18 OcudrIngressGatewayProvServiceDown
Field | Details |
---|---|
Description | OCUDR Ingress-Gateway service {{$labels.app_kubernetes_io_name}} is down |
Summary | namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down |
Severity | Critical |
Condition | Alert if Ingressgateway-prov service is down |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7043 |
Metric Used | app_kubernetes_io_name="ingressgateway-prov" |
Recommended Actions | The alert is cleared when the ingress-gateway service is
available.
Steps:
|
4.1.2 Application Level Alerts
This section lists the application level alerts.
4.1.2.1 OcudrTrafficRateAboveMajorThreshold
Table 4-19 OcudrTrafficRateAboveMajorThreshold
Field | Details |
---|---|
Description | 'Ingress traffic Rate is above major threshold i.e. 900 requests per second |
Summary | 'Traffic Rate is above 90 Percent of Max requests per second(1000)' |
Severity | Major |
Condition | Alert if Ingress traffic reaches 90% of max TPS |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7002 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (eg : Mated site OCUDR is unavailable in georedundancy scenario). If this is unexpected, contact My Oracle Support and:
|
4.1.2.2 OcudrTrafficRateAboveMinorThreshold
Table 4-20 OcudrTrafficRateAboveMinorThreshold
Field | Details |
---|---|
Description | Ingress traffic rate is above minor threshold i.e. 800 requests per second |
Summary | Traffic rate is above 80 Percent of Max requests per second(1000) |
Severity | Minor |
Condition | Alert if Ingress traffic reaches 80% of max TPS |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7001 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions |
The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcudrTrafficRateAboveMinorThreshold alert shall be raised. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic(eg : Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
4.1.2.3 OcudrTrafficRateAboveCriticalThreshold
Table 4-21 OcudrTrafficRateAboveCriticalThreshold
Field | Details |
---|---|
Description | 'Ingress traffic Rate is above critical threshold i.e. 950 requests per second |
Summary | 'Traffic Rate is above 95 Percent of Max requests per second(1000)' |
Severity | Critical |
Condition | Alert if Ingress traffic reaches 95% of max TPS |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7003 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (Example: Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
4.1.2.4 OcudrTransactionErrorRateAbove0.1Percent
Table 4-22 OcudrTransactionErrorRateAbove0.1Percent
Field | Details |
---|---|
Description | Transaction error rate is above 0.1 Percent of Total Transactions |
Summary | Transaction Error Rate detected above 0.1 Percent of Total Transactions |
Severity | Warning |
Condition | Alert if all error rate exceeds 0.1% of the total transactions |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7004 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions |
The alert is cleared when the number of failed transactions is below 0.1 percent of the total transactions or when the number of failed transactions crosses the 1% threshold in which case the OcudrTransactionErrorRateAbove0.1Percent is raised. Steps:
|
4.1.2.5 OcudrTransactionErrorRateAbove1Percent
Table 4-23 OcudrTransactionErrorRateAbove1Percent
Field | Details |
---|---|
Description | 'Transaction Error rate is above 1 Percent of Total Transactions |
Summary | 'Transaction Error Rate detected above 1 Percent of Total Transactions' |
Severity | Warning |
Condition | Alert if all error rate exceeds 1% of the total transactions |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7005 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions |
The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised. Steps:
|
4.1.2.6 OcudrTransactionErrorRateAbove10Percent
Table 4-24 OcudrTransactionErrorRateAbove10Percent
Field | Details |
---|---|
Description | Transaction error rate is above 10 Percent of Total Transactions |
Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
Severity | Minor |
Condition | Alert if all error rate exceeds 10% of the total transactions |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7006 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions |
The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised. Steps:
|
4.1.2.7 OcudrTrafficRateAboveCriticalThreshold
Table 4-25 OcudrTrafficRateAboveCriticalThreshold
Field | Details |
---|---|
Description | Ingress traffic rate is above critical threshold i.e. 950 requests per second |
Summary | Traffic rate is above 95 Percent of Max requests per second(1000) |
Severity | Critical |
Condition | Alert if Ingress traffic reaches 95% of max TPS |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7003 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (Example: Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
4.1.2.8 OcudrTrafficRateAboveMajorThreshold
Table 4-26 OcudrTrafficRateAboveMajorThreshold
Field | Details |
---|---|
Description | Ingress traffic rate is above major threshold i.e. 900 requests per second |
Summary | Traffic rate is above 90 Percent of Max requests per second(1000) |
Severity | Major |
Condition | Alert if Ingress traffic reaches 90% of max TPS |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7002 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (eg: Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
4.1.2.9 OcudrTrafficRateAboveMinorThreshold
Table 4-27 OcudrTrafficRateAboveMinorThreshold
Field | Details |
---|---|
Description | Ingress traffic Rate is above minor threshold i.e. 800 requests per second |
Summary | Traffic Rate is above 80 Percent of Max requests per second (1000) |
Severity | Minor |
Condition | Alert if Ingress traffic reaches 80% of max TPS |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7001 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions |
The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, in which case the OcudrTrafficRateAboveMinorThreshold alert shall be raised. Note: The threshold is configurable in the UDR_Alertrules.yaml Steps: Reassess why the OCUDR is receiving additional traffic (eg : Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support and:
|
4.1.2.10 OcudrTransactionErrorRateAbove0.1Percent
Table 4-28 OcudrTransactionErrorRateAbove0.1Percent
Field | Details |
---|---|
Description | Transaction Error rate is above 0.1 Percent of Total Transactions |
Summary | Transaction Error Rate detected above 0.1 Percent of Total Transactions |
Severity | Warning |
Condition | Alert if all error rate exceeds 0.1% of the total transactions |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7004 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions |
The alert is cleared when the number of failure transactions are below 0.1 percent of the total transactions or when the number of failure transactions cross the 1% threshold in which case the OcudrTransactionErrorRateAbove0.1Percent shall be raised. Steps:
|
4.1.2.11 OcudrTransactionErrorRateAbove1Percent
Table 4-29 OcudrTransactionErrorRateAbove1Percent
Field | Details |
---|---|
Description | Transaction Error rate is above 1 Percent of Total Transactions |
Summary | Transaction Error Rate detected above 1 Percent of Total Transactions |
Severity | Warning |
Condition | Alert if all error rate exceeds 1% of the total transactions |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7005 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions |
The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the OcnrfTransactionErrorRateAbove10Percent shall be raised. Steps:
|
4.1.2.12 OcudrTransactionErrorRateAbove10Percent
Table 4-30 OcudrTransactionErrorRateAbove10Percent
Field | Details |
---|---|
Description | Transaction Error rate is above 10 Percent of Total Transactions |
Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
Severity | Minor |
Condition | Alert if all error rate exceeds 10% of the total transactions |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7006 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions |
The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the OcnrfTransactionErrorRateAbove25Percent shall be raised. Steps:
|
4.1.2.13 OcudrTransactionErrorRateAbove25Percent
Table 4-31 OcudrTransactionErrorRateAbove25Percent
Field | Details |
---|---|
Description | Transaction Error Rate detected above 25 Percent of Total Transactions |
Summary | Transaction Error Rate detected above 25 Percent of Total Transactions |
Severity | Major |
Condition | Alert if all error rate exceeds 25% of the total transactions |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7007 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions |
The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the OcnrfTransactionErrorRateAbove50Percent shall be raised. Steps:
|
4.1.2.14 OcudrTransactionErrorRateAbove50Percent
Table 4-32 OcudrTransactionErrorRateAbove50Percent
Field | Details |
---|---|
Description | Transaction Error Rate detected above 50 Percent of Total Transactions |
Summary | Transaction Error Rate detected above 50 Percent of Total Transactions |
Severity | Critical |
Condition | Alert if all error rate exceeds 50% of the total transactions |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7008 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions |
The alert is cleared when the number of failure transactions are below 50 percent of the total transactions. Steps:
|
4.1.2.15 OcudrXFCCValidationFailureAbove10Percent
Table 4-33 OcudrXFCCValidationFailureAbove10Percent
Field | Details |
---|---|
Description | Total number of response with xfcc validation failure is about 10% of ingress traffic |
Summary | Total number of response with xfcc validation failure is about 10% of ingress traffic |
Severity | Minor |
Condition | Alert if XFCC validation failure is 10% of the total XFCC validations |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7024 |
Metric Used | oc_ingressgateway_xfcc_header_validate_total |
Recommended Actions |
The alert is cleared when the number of failure of XFCCValidationFailure are below 10% of the total. Steps:
|
4.1.2.16 OcudrXFCCValidationFailureAbove20Percent
Table 4-34 OcudrXFCCValidationFailureAbove20Percent
Field | Details |
---|---|
Description | Total number of response with xfcc validation failure is about 20% of ingress traffic |
Summary | Total number of response with xfcc validation failure is about 20% of ingress traffic |
Severity | Major |
Condition | Alert if XFCC validation failure is 20% of the total XFCC validations |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7025 |
Metric Used | oc_ingressgateway_xfcc_header_validate_total |
Recommended Actions |
The alert is cleared when the number of failure of XFCCValidationFailure are below 20% of the total. Steps:
|
4.1.2.17 OcudrXFCCValidationFailureAbove50Percent
Table 4-35 OcudrXFCCValidationFailureAbove50Percent
Field | Details |
---|---|
Description | Total number of response with XFCC validation failure is about 50% of ingress traffic |
Summary | Total number of response with XFCC validation failure is about 50% of ingress traffic. |
Severity | Critical |
Condition | Alert if XFCC validation failure is 50% of the total XFCC validations |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7026 |
Metric Used | oc_ingressgateway_xfcc_header_validate_total |
Recommended Actions |
The alert is cleared when the number of failure of XFCCValidationFailure are below 50% of the total. Steps:
|
4.1.2.18 DRServiceOverload60Percent
Table 4-36 DRServiceOverload60Percent
Field | Details |
---|---|
Description | This alert is fired when the application go to the overload level of Warn level |
Summary | This alert is fired when the application go to the overload level of Warn level |
Severity | Warning |
Condition | Alert If the application overloads at 60% |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7027 |
Metric Used | load_level |
Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Warn level.
Steps:
|
4.1.2.19 DRServiceOverload75Percent
Table 4-37 DRServiceOverload75Percent
Field | Details |
---|---|
Description | This alert is fired when the application go to the overload level of Minor level |
Summary | This alert is fired when the application go to the overload level of Minor level. |
Severity | Minor |
Condition | Alert If the application overloads at 75% |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7028 |
Metric Used | load_level |
Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Minor level.
Steps:
|
4.1.2.20 DRServiceOverload80Percent
Table 4-38 DRServiceOverload80Percent
Field | Details |
---|---|
Description | This alert is fired when the application go to the overload level of Minor level |
Summary | This alert is fired when the application go to the overload level of Minor level |
Severity | Major |
Condition | Alert If the application overloads at 80% |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7029 |
Metric Used | load_level |
Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Major level.
Steps:
|
4.1.2.21 DRServiceOverload90Percent
Table 4-39 DRServiceOverload90Percent
Field | Details |
---|---|
Description | This alert is fired when the application go to the overload level of Minor level |
Summary | This alert is fired when the application go to the overload level of Minor level |
Severity | Critical |
Condition | Alert if the application overloads at 90% |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7030 |
Metric Used | load_level |
Recommended Actions | This alert is cleared when the incoming traffic is
reduced to below Critical level.
Steps:
|
4.1.2.22 SLFSucessTxnDefaultGroupIdRateAbove1Percent
Table 4-40 SLFSucessTxnDefaultGroupIdRateAbove1Percent
Field | Details |
---|---|
Description | Transaction Error Rate detected above 1 Percent of Total Transactions |
Summary | Transaction Error rate is above 1 Percent of Total Transactions |
Severity | Warning |
Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 1% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7031 |
Metric Used | slf_sucess_txn_default_grp_id_total |
Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
4.1.2.23 SLFSucessTxnDefaultGroupIdRateAbove10Percent
Table 4-41 SLFSucessTxnDefaultGroupIdRateAbove10Percent
Field | Details |
---|---|
Description | Transaction Error Rate detected above 10 Percent of Total Transactions |
Summary | Transaction Error rate is above 10 Percent of Total Transactions |
Severity | Minor |
Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 10% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7032 |
Metric Used | slf_sucess_txn_default_grp_id_total |
Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
4.1.2.24 SLFSucessTxnDefaultGroupIdRateAbove25Percent
Table 4-42 SLFSucessTxnDefaultGroupIdRateAbove25Percent
Field | Details |
---|---|
Description | Transaction Error Rate detected above 25 Percent of Total Transactions |
Summary | Transaction Error rate is above 25 Percent of Total Transactions |
Severity | Major |
Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 25% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7033 |
Metric Used | slf_sucess_txn_default_grp_id_total |
Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
4.1.2.25 SLFSucessTxnDefaultGroupIdRateAbove50Percent
Table 4-43 SLFSucessTxnDefaultGroupIdRateAbove50Percent
Field | Details |
---|---|
Description | Transaction Error Rate detected above 50 Percent of Total Transactions |
Summary | Transaction Error rate is above 50 Percent of Total Transactions |
Severity | Critical |
Condition | Alert if number of SLF Lookup requests responded with default Group ID exceeds 50% of the total responses. |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7034 |
Metric Used | slf_sucess_txn_default_grp_id_total |
Recommended Actions |
This alert is cleared when SLF Lookup request coming for subscribers not provisioned reduces. Steps: Check the subscriber range received for Lookup and make sure to avoid if there is any unexpected out of range of subscribers. |
4.1.2.26 OcudrDiameterCongestionCongestedState
Table 4-44 OcudrDiameterCongestionCongestedState
Field | Details |
---|---|
Description | Alert will be raised if the diameter gateway pod is in CONGESTED state. |
Summary | Alert will be raised if the diameter gateway pod is in CONGESTED state. |
Severity | Critical |
Condition | Alert will be raised if the diameter gateway pod is in CONGESTED state. |
Metric Used | ocudr_pod_congestion_state = = 2 |
Recommended Actions |
This alert is raised when the Diameter Gateway pod congestion level is set to the CONGESTED state. Steps:
|
4.1.2.27 OcudrDiameterCongestionDocState
Table 4-45 OcudrDiameterCongestionDocState
Field | Details |
---|---|
Description | Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state. |
Summary | Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state. |
Severity | Major |
Condition | Alert will be raised if the diameter gateway pod is in is in Danger of Congestion (DOC) state. |
Metric Used | ocudr_pod_congestion_state = = 1 |
Recommended Actions |
This alert is raised when the Diameter Gateway pod congestion level is set to the Danger of Congestion (DOC) state. Steps:
|
4.1.2.28 DRProvServiceOverload60Percent
Table 4-46 DRProvServiceOverload60Percent
Field | Details |
---|---|
Description | This alert is fired when the application go to the overload level of Warn level |
Summary | This alert is fired when the application go to the overload level of Warn level |
Severity | Warning |
Condition | Alert If the application overloads at 60% |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7036 |
Metric Used | load_level |
Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below Warn level. Steps:
|
4.1.2.29 DRProvServiceOverload75Percent
Table 4-47 DRProvServiceOverload75Percent
Field | Details |
---|---|
Description | This alert is fired when the application go to the overload level of Minor level |
Summary | This alert is fired when the application go to the overload level of Minor level |
Severity | Minor |
Condition | Alert If the application overloads at 75% |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7037 |
Metric Used | load_level |
Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below Minor level. Steps:
|
4.1.2.30 DRProvServiceOverload80Percent
Table 4-48 DRProvServiceOverload80Percent
Field | Details |
---|---|
Description | This alert is fired when the application go to the overload level of Major level |
Summary | This alert is fired when the application go to the overload level of Major level |
Severity | Major |
Condition | Alert If the application overloads at 80% |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7038 |
Metric Used | load_level |
Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below Major level. Steps:
|
4.1.2.31 DRProvServiceOverload90Percent
Table 4-49 DRProvServiceOverload90Percent
Field | Details |
---|---|
Description | This alert is fired when the application go to the overload level of critical level |
Summary | This alert is fired when the application go to the overload level of critical level |
Severity | Critical |
Condition | Alert If the application overloads at 90% |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7039 |
Metric Used | load_level |
Recommended Actions |
This alert is cleared when the incoming traffic is reduced to below critical level. Steps:
|
4.1.2.32 Diameter-Gateway pod congestion Danger of congestion state
Table 4-50 Diameter-Gateway pod congestion Danger of congestion state
Field | Details |
---|---|
Description | DiameterGateway pod at Danger of Congestion state |
Summary | DiameterGateway pod at Danger of Congestion state |
Severity | Major |
Condition | Alert if the diameter gateway pod is in Danger of Congestion (DOC) state |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7041 |
Metric Used | occnp_pod_congestion_state==1 |
Recommended Actions |
This alert is raised when the diameter gateway pod congestion level is set to the danger of congestion(DOC) Steps:
|
4.1.2.33 Diameter-Gateway pod CONGESTED state
Table 4-51 Diameter-Gateway pod CONGESTED state
Field | Details |
---|---|
Description | DiameterGateway pod at Congested state |
Summary | DiameterGateway pod at Congested state |
Severity | Critical |
Condition | Alert if the diameter gateway pod is in CONGESTED state |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7042 |
Metric Used | occnp_pod_congestion_state==2 |
Recommended Actions |
This alert is raised when the diameter gateway pod congestion level is set to the CONGESTED state Steps:
|
4.1.2.34 OcudrProvisioningTrafficRateAboveMajorThreshold
Table 4-52 OcudrProvisioningTrafficRateAboveMajorThreshold
Field | Details |
---|---|
Description | Ingress traffic Rate is above critical threshold, that is, 950 requests per second |
Summary | Traffic Rate is above 95 Percent of Max requests per second (1000) |
Severity | Critical |
Condition | Alert if Ingress traffic reaches 95% of max TPS |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7044 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions | The alert is cleared when the Ingress Traffic rate falls
below the Critical threshold.
Note: The threshold is configurable inUDR_Alertrules.yaml .
Steps: Reassess why OCUDR is receiving an additional traffic (for example, Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support.
|
4.1.2.35 OcudrProvisioningTrafficRateAboveCriticalThreshold
Table 4-53 OcudrProvisioningTrafficRateAboveCriticalThreshold
Field | Details |
---|---|
Description | Ingress traffic Rate is above major threshold, that is, 900 requests per second |
Summary | Traffic Rate is above 90 Percent of Max requests per second (1000) |
Severity | Major |
Condition | Alert if Ingress traffic reaches 90% of max TPS |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7045 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions |
The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate exceeds the Critical threshold in which the OcudrTrafficRateAboveMajorThreshold alert is raised. Note: The threshold is configurable inUDR_Alertrules.yaml .
Steps: Reassess why OCUDR is receiving an additional traffic (for example, Mated site OCUDR is unavailable in geo redundancy scenario). If this is unexpected, contact My Oracle Support.
|
4.1.2.36 OcudrProvisioningTransactionErrorRateAbove25Percent
Table 4-54 OcudrProvisioningTransactionErrorRateAbove25Percent
Field | Details |
---|---|
Description | Transaction Error Rate detected above 25 Percent of Total Transactions |
Summary | Transaction Error Rate detected above 25 Percent of Total Transactions |
Severity | Major |
Condition | Alert if all error rate exceeds 25% of the total transactions |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7046 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions |
The alert is cleared when the number of failure transactions is below 25% of the total transactions or when the number of failure transactions exceeds the 50% threshold in which the OcnrfTransactionErrorRateAbove50Percent is raised. Steps:
|
4.1.2.37 OcudrProvisioningTransactionErrorRateAbove50Percent
Table 4-55 OcudrProvisioningTransactionErrorRateAbove50Percent
Field | Details |
---|---|
Description | Transaction Error Rate detected above 50 Percent of Total Transactions |
Summary | Transaction Error Rate detected above 50 Percent of Total Transactions |
Severity | Critical |
Condition | Alert if all error rate exceeds 50% of the total transactions |
OID | 1.3.6.1.4.1.323.5.3.43.1.2.7047 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure
transactions is below 50 percent of the total transactions.
Steps:
|
4.1.2.38 PVCFullForSLFExport
Table 4-56 PVCFullForSLFExport
Field | Details |
---|---|
Description | Storage for Export tool is full |
Summary | Storage for Export tool is full |
Severity | Critical |
Condition | Alert if PVC allocated for export tool dump path is full |
Metric Used | export_tool_full_usage |
Recommended Actions | Alert will be cleared when the PVC usage is optimized. Configure maxDumps to lower value to clear old dumps. Remove old dumps, if any from the export tool container. |
4.1.2.39 FailedExtractForSLFExport
Table 4-57 FailedExtractForSLFExport
Field | Details |
---|---|
Description | Export tool job is failed |
Summary | Export tool job is failed |
Severity | Critical |
Condition | Alert of the export operation fails |
Metric Used | export_failure |
Recommended Actions | Check logs for failure. The alert will be cleared when the export job succeeds next time. |
4.1.2.40 BulkImportTransferInFailed
Table 4-58 BulkImportTransferInFailed
Field | Details |
---|---|
Description | Transfer-in failed for bulk import |
Summary | Transfer-in failed for bulk import |
Severity | Major |
Condition | Alert will be raised, if Transfer-In failed from Remote to PVC |
Metric Used | bulkimport_transfer_in_status |
Recommended Actions | This alert is cleared when the transfer-in is success
from bulk import. Steps
|
4.1.2.41 ExportToolTransferOutFailed
Table 4-59 ExportToolTransferOutFailed
Field | Details |
---|---|
Description | Transfer-out failed for export-tool |
Summary | Transfer-out failed for export-tool" |
Severity | Major |
Condition | Alert will be raised if Transfer-Out failed from PVC to Remote |
Metric Used | sftp_transfer_status |
Recommended Actions | This alert is cleared when the transfer-out is success
from export tool. Steps
|
4.1.2.42 BulkImportTransferOutFailed
Table 4-60 BulkImportTransferOutFailed
Field | Details |
---|---|
Description | Transfer-out failed for bulk import |
Summary | Transfer-out failed for bulk import |
Severity | Major |
Condition | Alert will be raised if Transfer-Out failed from PVC to Remote |
Metric Used | bulkimport_transfer_out_status |
Recommended Actions | This alert is cleared when the transfer-out is success
from bulk import. Steps
|
4.1.2.43 PVCFullForXMLBulkImport
Table 4-61 PVCFullForXMLBulkImport
Field | Details |
---|---|
Description | Storage for XML Bulk Import tool is full |
Summary | Storage for XML Bulk Import tool is full |
Severity | Critical |
Condition | Alert will be raised if the PVC is full for xml-csv container |
Metric Used | nudr_bulk_import_tool_pvc_full_usage{app_kubernetes_io_name="nudr-xmltocsv",kubernetes_namespace="ocudr"}==1 |
Recommended Actions | This alert will be cleared when the PVC is back to
normal. Steps:
|
4.1.2.44 PVCFullForBulkImport
Table 4-62 PVCFullForBulkImport
Field | Details |
---|---|
Description | Storage for Bulk Import tool is full |
Summary | Storage for Bulk Import tool is full |
Severity | Critical |
Condition | Alert will be raised if the PVC is full for bulk import container |
Metric Used | nudr_bulk_import_tool_pvc_full_usage{app_kubernetes_io_name="nudr-bulk-import",kubernetes_namespace="ocudr"}==1 |
Recommended Actions | This alert will be cleared when the PVC is back to
normal. Steps:
|
4.1.2.45 OperationalStatusCompleteShutdown
Table 4-63 OperationalStatusCompleteShutdown
Field | Details |
---|---|
Description | Operational state is control shutdown |
Summary | Operational state is control shutdown |
Severity | Critical |
Condition | Alert will be raised if the opertational state of the UDR, SLF, or EIR is COMPLETE_SHUTDOWN |
Metric Used | nudr_config_operational_status{kubernetes_namespace="ocudr"}==1 |
Recommended Actions | This alert will be cleared when the operational status is
back to normal. Steps:
|
4.1.2.46 NFScoreCalculationFailed
Table 4-64 NFScoreCalculationFailed
Field | Details |
---|---|
Description | NFScoreCalculationFailed |
Summary | NFScoreCalculationFailed |
Severity | Major |
Condition | Alert is raised if the NF Score calculation are failed for any of the scoring factors |
Metric Used | nfscore{kubernetes_namespace="ocudr" ,factor=~"successTPS|signallingConnections|serviceHealth|replicationHealth|localityPreference|bulkImport|bulkExport",calculatedStatus="failed"} |
Recommended Actions |
This alert is cleared when the NF score calculation is successful. Steps:
|
4.1.2.47 PVCFullForEXMLExport
Table 4-65 PVCFullForEXMLExport
Field | Details |
---|---|
Description | Storage for Export tool is full |
Summary | Storage for Export tool is full |
Severity | Critical |
Condition | Alert is raised if PVC allocated for export tool dump path is full. |
Metric Used | export_tool_full_usage{namespace="ocudr"}==1 |
Recommended Actions |
Alert is cleared when the PVC usage is optimized. You must configure maxDumps to a lower value to clear old dumps. Steps:
|
4.1.2.48 EXMLExportFailed
Table 4-66 EXMLExportFailed
Field | Details |
---|---|
Description | Export tool job is failed |
Summary | Export tool job is failed |
Severity | Critical |
Condition | Alert is raised if the export operation fails for EXML Mode |
Metric Used | export_failure{namespace="ocudr"}== 1 |
Recommended Actions |
You must check the logs for failure. When the next export job is successful the alert is cleared. |
4.1.2.49 IngressgatewayPodProtectionDocState
Table 4-67 IngressgatewayPodProtectionDocState
Field | Details |
---|---|
Description | Ingress congestion in Doc state |
Summary | Ingress congestion Doc state |
Severity | Critical |
Condition | Alert is raised if Ingress congestion is in doc state. |
Metric Used | oc_ingressgateway_pod_congestion_state{namespace="ocudr"}==1 |
Recommended Actions | This alert will be cleared when the ingress gateway
comes to normal state.
Steps:
|
4.1.2.50 IngressgatewayPodProtectionCongestedState
Table 4-68 IngressgatewayPodProtectionCongestedState
Field | Details |
---|---|
Description | Ingress congestion in Congested state |
Summary | Ingress congestion in Congested state |
Severity | Critical |
Condition | Alert is raised if ingress congestion is in congested state. |
Metric Used | oc_ingressgateway_pod_congestion_state{namespace="ocudr"}==2 |
Recommended Actions | This alert will be cleared when the ingress gateway comes
to normal state.
Steps:
|
4.1.2.51 RetryNotificationRecordsMaxLimitExceeded
Table 4-69 RetryNotificationRecordsMaxLimitExceeded
Field | Details |
---|---|
Description | Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit. |
Summary | Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit. |
Severity | Critical |
Condition | Alert will be raised if the retry notifications stored in UDR database exceeds maximum limit. |
Metric Used | nudr_notif_records_limit_exceeded{namespace="ocudr"}==1 |
Recommended Actions |
This alert is raised when there are more notification failures and the retry notifications stored in database is more than 50k. Steps:
|
4.1.2.52 UserAgentHeaderNotFoundMorethan10PercentRequest
Table 4-70 UserAgentHeaderNotFoundMorethan10PercentRequest
Field | Details |
---|---|
Description | Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic when suppress notification feature is enabled. |
Summary | Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic when suppress notification feature is enabled. |
Severity | Critical |
Condition | Alert will be raised if the total number of requests not having User-Agent header is 10% of ingress traffic. |
Metric Used | (sum by(namespace)(rate(suppress_user_agent_not_found_total{namespace="ocudr"}[5m]))/sum by(namespace)(rate(oc_ingressgateway_http_requests_total{namespace="ocudr"}[5m])))*100 >= 10 |
Recommended Actions |
This alert is cleared if the total number of requests not having User-Agent header is less than 10% of ingress traffic. Steps:
|
4.1.2.53 EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold
Table 4-71 EgressGatewayJVMBufferMemoryUsedAboveMinorThreshold
Field | Details |
---|---|
Description | Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit. |
Summary | Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit. |
Severity | Minor |
Condition | Alert will be raised if egress gateway JVM buffer memory is above the minor threshold limit. |
Metric Used | sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".*egress.*"}) >= 1300000000 |
Recommended Actions |
This alert is cleared if the egress gateway JVM buffer memory is below the minor threshold limit. Steps:
|
4.1.2.54 EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold
Table 4-72 EgressGatewayJVMBufferMemoryUsedAboveMajorThreshold
Field | Details |
---|---|
Description | Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit. |
Summary | Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit. |
Severity | Major |
Condition | Alert will be raised if egress gateway JVM buffer memory is above the major threshold limit. |
Metric Used | sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".*egress.*"}) >= 1500000000 |
Recommended Actions |
This alert is cleared if the egress gateway JVM buffer memory is below the major threshold limit. Steps:
|
4.1.2.55 EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold
Table 4-73 EgressGatewayJVMBufferMemoryUsedAboveCriticalThreshold
Field | Details |
---|---|
Description | Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit. |
Summary | Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit. |
Severity | Critical |
Condition | Alert will be raised if egress gateway JVM buffer memory is above the critical threshold limit. |
Metric Used | sum by (id, pod) (jvm_buffer_memory_used_bytes{namespace="ocudr",pod=~".*egress.*"}) >= 1800000000 |
Recommended Actions |
This alert is cleared if the egress gateway JVM buffer memory is below the critical threshold limit. Steps:
|
4.1.2.56 NudrDiameterGatewayDown
Table 4-74 NudrDiameterGatewayDown
Field | Details |
---|---|
Description | Alert will be raised if Nudr-diam-gateway service is down. |
Summary | Alert will be raised if Nudr-diam-gateway service is down. |
Severity | Critical |
Condition | Alert will be raised if Nudr-diam-gateway service is down. |
Metric Used | absent(up{container="nudr-diam-gateway",namespace="ocudr"}) or up{container="nudr-diam-gateway",namespace="ocudr"} == 0 |
Recommended Actions |
This alert is cleared when the NudrDiamGateway service is available. Steps:
|
4.1.2.57 DiameterPeerConnectionsDropped
Table 4-75 DiameterPeerConnectionsDropped
Field | Details |
---|---|
Description | Alert will be raised if there are no connections between diameter peer and diameter gateway. |
Summary | Alert will be raised if there are no connections between diameter peer and diameter gateway. |
Severity | Major |
Condition | Alert will be raised if there are no connections between diameter peer and diameter gateway. |
Metric Used | sum(ocudr_diam_conn_network{origHost=~".*CHI.*",container="nudr-diam-gateway",namespace="ocudr"} or vector(0))< 2 or sum(ocudr_diam_conn_network{origHost=~".*IND.*",container="nudr-diam-gateway",namespace="ocudr"} or vector(0)) < 2 or (sum(ocudr_diam_conn_network{origHost=~".*CHI.*",container="nudr-diam-gateway",kubernetes_namespace="ocudr"} or vector(0)) + sum(ocudr_diam_conn_network{origHost=~".*IND.*",container="nudr-diam-gateway",namespace="ocudr"}) or vector(0)) < 5 |
Recommended Actions |
This alert is cleared when the NudrDiamGateway service is available. Steps:
|