5 BSF Alerts

This section provides information on Oracle Communications Cloud Native Core, Binding Support Function (BSF) alerts and their configuration.

Note:

The performance and capacity of the BSF system may vary based on the call model, Feature/Interface configuration, and underlying CNE and hardware environment.

You can configure alerts in Prometheus and Alertrules.yaml file.

The following table describes the various severity types of alerts generated by Policy:

Table 5-1 Alerts Levels or Severity Types

Alerts Levels / Severity Types Definition
Critical Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions can affect the service of BSF.
Major Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions can affect the service of BSF.
Minor Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions can affect the service of BSF.
Info or Warn (Informational) Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of BSF.

5.1 Configuring BSF Alerts

This section describes how to configure alerts for Oracle Communications Cloud Native Core, Binding Support Function. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

Note:

  • The Alertmanager and Prometheus tools must run in CNE namespace, for example, occne-infra.
  • Alert file is packaged with BSF Custom Templates. The BSF Custom Templates.zip file can be downloaded from MOS. Unzip the BSF Custom Templates.zip file to get BSF_Alertrules.yaml file. This file must be readily available before the user configures alerts in Prometheus.

Configuring Alerts for CNE versions prior to 1.5

To Configure BSF alerts in Prometheus:
  1. Run the following command to find the configmap and configure alerts in the Prometheus server:
    kubectl get configmap -n <Namespace>
    Where:

    <Namespace> is the prometheus server namespace used in Helm install command.

    For Example, assuming Prometheus server is under occne-infra namespace, run the following command to find the configmap:
    kubectl get configmaps -n occne-infra  | grep Prometheus-server
    0utput: occne-prometheus-server 4 46d
  2. Run the following command to take a backup of the current Prometheus server configmap:
    kubectl get configmaps <Name> -o yaml -n <Namespace> > /tmp/t_mapConfig.yaml
    where, <Name> is the Prometheus configmap name used in Helm install command.
  3. Check if alertsbsf is present in the t_mapConfig.yaml file by running the following command:
    cat /tmp/t_mapConfig.yaml  | grep alertsbsf
    Depending on the outcome of the previous step, perform any of the following:
    • If alertsbsf is present, delete the alertsbsf entry from the t_mapConfig.yaml file, by running the following command:
      sed -i '/etc\/config\/alertsbsf/d' /tmp/t_mapConfig.yaml
      

      Note:

      Run this command only once.
    • If alertsbsf is not present, add the alertsbsf entry in the t_mapConfig.yaml file by running the following command:
      sed -i '/rule_files:/a\    \- /etc/config/alertsbsf'  /tmp/t_mapConfig.yaml

      Note:

      Run this command only once.
  4. Run the following command to reload the configmap with the modified file:
    kubectl replace configmap <Name> -f /tmp/t_mapConfig.yaml

    Note:

    It is not required for AlertRules.
  5. Add BSF_Alertrules.yaml file into Prometheus server configmap by running the following command :
    kubectl patch configmap <Name> -n <Namespace> --type merge --patch
    "$(cat <PATH>/BSF_Alertrules.yaml)"
    where, <PATH> is the location of the BSF_Alertrules.yaml file.
  6. Restart prometheus-server pod.
  7. Verify the alerts in Prometheus GUI.

The following image shows the BSF Alerts:

Screen capture of BSF Alerts

Configuring Alerts for CNE version from 1.5.0 up to 1.8.x

To Configure BSF alerts in Prometheus:
  1. Copy the BSF_Alertrules.yaml file to the Bastion Host. Place this file in the /var/occne/cluster/<cluster-name>/artifacts/alerts directory on the OCCNE Bastion Host.
    $ pwd /var/occne/cluster/stark/artifacts/alerts
    $ ls
    occne_alerts.yaml
    $ vi BSF_Alertrules.yaml
    $ ls BSF_Alertrules.yaml occne_alerts.yaml
  2. To set the correct file permissions, run the following command:
    $ chmod 644 BSF_Alertrules.yaml
  3. To load the updated rules from the Bastion host in the file to the existing occne-prometheus-alerts configmap, run the following command:
    $ kubectl create configmap occne-prometheus-alerts --from-file=/var/occne/cluster/<cluster-name>/artifacts/alerts -o yaml --dry-run -n occne-infra | kubectl replace -f -
    $ kubectl get configmap -n occne-infra
  4. To verify the alerts in the Prometheus GUI, select the Alerts tab and view alert details by selecting any individual rule from the list of configured rules.

Configuring Alerts for CNE 1.9.0 and later versions

To configure BSF alerts in Prometheus for CNE 1.9.0, perform the following steps:
  1. Copy the BSF_Alertrules.yaml file to the Bastion Host.
  2. To create or replace the PrometheusRule CRD, run the following command:
    $ kubectl apply -f ocbsf-alerting-rules.yaml -n <namespace>
    To verify if the CRD is created, run the following command:
    kubectl get prometheusrule -n <namespace>
  3. To verify the alerts in the Prometheus GUI, select the Alerts tab and view alert details by selecting any individual rule from the list of configured alerts.
The following screen capture shows the Prometheus dashboard with BSF alerts configured for CNE 1.9.0:

Figure 5-1 BSF alerts on Prometheus Dashboard

BSF alerts on Prometheus Dashboard

Note:

  1. For upgrading to BSF 1.11.0 from a previous supported version on CNE 1.8.x, use the BSF_Alertrules_cne1.5+.yaml file. On the Prometheus dashboard, configure both old and new alert rules.
  2. For installing BSF 1.11.0 on CNE 1.9.0 and later versions, use the BSF_Alertrules_cne1.9+.yaml file. On the Prometheus dashboard, configure only the new alert rules.

5.2 List of Alerts

This section lists the alerts available for Oracle Communications Cloud Native Core, Binding Support Function (BSF).

5.2.1 AAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 5-2 AAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field Details
Description AAA Rx fail count exceeds the critical threshold limit.
Summary AAA Rx fail count exceeds the critical threshold limit.
Severity CRITICAL
Condition sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2.*"}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) * 100 > 90
OID 1.3.6.1.4.1.323.5.3.37.1.2.40
Metric Used ocbsf_diam_response_network_total
Recommended Actions For any additional guidance, contact My Oracle Support.

5.2.2 AAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 5-3 AAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field Details
Description

AAA Rx fail count exceeds the major threshold limit

Summary AAA Rx fail count exceeds the major threshold limit.
Severity MAJOR
Condition sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2.*"}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) * 100 <=90 and sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2.*"}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) * 100 > 80
OID 1.3.6.1.4.1.323.5.3.37.1.2.40
Metric Used ocbsf_diam_response_network_total
Recommended Actions For any additional guidance, contact My Oracle Support.

5.2.3 AAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 5-4 AAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field Details
Description AAA Rx fail count exceeds the minor threshold limit.
Summary AAA Rx fail count exceeds the minor threshold limit.
Severity MINOR
Condition sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2.*"}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) * 100 <=80 and sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2.*"}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) * 100 > 60
OID 1.3.6.1.4.1.323.5.3.37.1.2.40
Metric Used ocbsf_diam_response_network_total
Recommended Actions For any additional guidance, contact My Oracle Support.

5.2.4 SCP_PEER_UNAVAILABLE

Table 5-5 SCP_PEER_UNAVAILABLE

Field Details
Description Configured SCP peer is unavailable.
Summary Configured SCP peer is unavailable.
Severity Major
Condition ocbsf_oc_egressgateway_peer_health_status != 0. SCP peer [ {{$labels.peer}} ] is unavailable.
OID 1.3.6.1.4.1.323.5.3.37.1.2.38
Metric Used ocbsf_oc_egressgateway_peer_health_status
Recommended Actions

This alert gets cleared when unavailable SCPs become available.

For any additional guidance, contact My Oracle Support.

5.2.5 SCP_PEER_SET_UNAVAILABLE

Table 5-6 SCP_PEER_SET_UNAVAILABLE

Field Details
Description None of the SCP peer available for configured peerset.
Summary (ocbsf_oc_egressgateway_peer_count - ocbsf_oc_egressgateway_peer_available_count) !=0 and (ocbsf_oc_egressgateway_peer_count) > 0. {{ $value }} SCP peers under peer set {{$labels.peerset}} are currently available.
Severity Critical
Condition One of the SCPs has been marked unhealthy.
OID 1.3.6.1.4.1.323.5.3.37.1.2.39
Metric Used oc_egressgateway_peer_count and oc_egressgateway_peer_available_count
Recommended Actions

NF clears the critical alarm when atleast one SCP peer in a peerset becomes available such that all other SCP peers in the given peerset are still unavailable.

For any additional guidance, contact My Oracle Support.

5.2.6 STALE_CONFIGURATION

Table 5-7 STALE_CONFIGURATION

Field Details
Description

In last 10 minutes, the current service config_level does not match the config_level from the config-server.

Summary In last 10 minutes, the current service config_level does not match the config_level from the config-server.
Severity Major
Condition (sum by(namespace) (topic_version{app_kubernetes_io_name="config-server",topicName="config.level"})) / (count by(namespace) (topic_version{app_kubernetes_io_name="config-server",topicName="config.level"})) != (sum by(namespace) (topic_version{app_kubernetes_io_name!="config-server",topicName="config.level"})) / (count by(namespace) (topic_version{app_kubernetes_io_name!="config-server",topicName="config.level"}))
OID 1.3.6.1.4.1.323.5.3.37.1.2.40
Metric Used topic_version
Recommended Actions

For any additional guidance, contact My Oracle Support.

5.2.7 BSF_SERVICES_DOWN

Table 5-8 BSF_SERVICES_DOWN

Field Details
Description {{$labels.microservice}} service is not running!
Summary {{$labels.microservice}} is not running!
Severity Critical
Condition None of the pods of the Binding Support Function (BSF) application is available.
OID 1.3.6.1.4.1.323.5.3.37.1.2.1
Metric Used appinfo_service_running
Recommended Actions Perform the following steps:
  • Check for service specific alerts that may be causing the issues with service exposure.
  • Verify if the POD is in a Running state by using the following command:
    kubectl -n <namespace> get pod
    If the output shows any pod that is not running, copy the pod name and run the following command:
    kubectl describe pod <podname> -n <namespace>
  • Check the application logs on Kibana and look for database related failures such as connectivity, invalid secrets, and so on. The logs can be easily filtered for different services.
  • Check for Helm status to ensure no errors are present by using the following command:
    helm status <release-name> -n <namespace>

    If it is not in STATUS: DEPLOYED, capture the logs and events again.

In case the issue persists, capture the outputs for the preceding steps and contact My Oracle Support.

5.2.8 BSFTrafficRateAboveMinorThreshold

Table 5-9 BSFTrafficRateAboveMinorThreshold

Field Details
Description BSF service Ingress traffic Rate is above threshold of Max MPS(1000) (current value is: {{ $value }})
Summary Traffic Rate is above 70 Percent of Max requests per second(1000)
Severity Minor
Condition The total Binding Management service Ingress traffic rate has crossed the configured threshold of 700 TPS.

The default value of this alert trigger point in the BSF_Alertrules.yaml file is when the Binding management service Ingress Rate crosses 70% of maximum ingress requests per second.

OID 1.3.6.1.4.1.323.5.3.37.1.2.2
Metric Used ocbsf_ingress_request_total
Recommended Actions The alert gets cleared when the Ingress traffic rate falls below the threshold.

Note: Threshold levels can be configured using the BSF_Alertrules.yaml file.

It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic:
  1. Refer Ingress Gateway section in Grafana to determine an increase in 4xx and 5xx error response codes.
  2. Check Ingress Gateway logs on Kibana to determine the reason for the errors.

For any assistance, contact My Oracle Support.

5.2.9 BSFTrafficRateAboveMajorThreshold

Table 5-10 BSFTrafficRateAboveMajorThreshold

Field Details
Description BSF service Ingress traffic Rate is above threshold of Max MPS(1000) (current value is: {{ $value }})
Summary Traffic Rate is above 80 Percent of Max requests per second(1000)
Severity Major
Condition The total Binding Management service Ingress traffic rate has crossed the configured threshold of 800 TPS.

The default value of this alert trigger point in the BSF_Alertrules.yaml file is when the Binding management service Ingress Rate crosses 80% of maximum ingress requests per second.

OID 1.3.6.1.4.1.323.5.3.37.1.2.2
Metric Used ocbsf_ingress_request_total
Recommended Actions The alert gets cleared when the Ingress traffic rate falls below the threshold.

Note: Threshold levels can be configured using the BSF_Alertrules.yaml file.

It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic:
  1. Refer Ingress Gateway section in Grafana to determine an increase in 4xx and 5xx error response codes.
  2. Check Ingress Gateway logs on Kibana to determine the reason for the errors.

For any assistance, contact My Oracle Support.

5.2.10 BSFTrafficRateAboveCriticalThreshold

Table 5-11 BSFTrafficRateAboveCriticalThreshold

Field Details
Description BSF service Ingress traffic Rate is above threshold of Max MPS(1000) (current value is: {{ $value }})
Summary Traffic Rate is above 90 Percent of Max requests per second(1000)
Severity Critical
Condition The total Binding Management service Ingress traffic rate has crossed the configured threshold of 900 TPS.

The default value of this alert trigger point in the BSF_Alertrules.yaml file is when the Binding management service Ingress Rate crosses 90% of maximum ingress requests per second.

OID 1.3.6.1.4.1.323.5.3.37.1.2.2
Metric Used ocbsf_ingress_request_total
Recommended Actions The alert gets cleared when the Ingress traffic rate falls below the threshold.

Note: Threshold levels can be configured using the BSF_Alertrules.yaml file.

It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic:
  1. Refer Ingress Gateway section in Grafana to determine an increase in 4xx and 5xx error response codes.
  2. Check Ingress Gateway logs on Kibana to determine the reason for the errors.

For any assistance, contact My Oracle Support.

5.2.11 BINDING_QUERY_RESPONSE_ERROR_MINOR

Table 5-12 BINDING_QUERY_RESPONSE_ERROR_MINOR

Field Details
Description At least 30% of the Binding Query connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Summary At least 30% of the Binding Query connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Severity Minor
Condition BSF is able to raise threshold based alerts for duplicate Binding request received and handled at BSF. If 30% of the requests fails for 10 mins, BSF is able to raise Minor Alert indicating duplicate Binding request are being detected at BSF.

(sum(rate(ocbsf_bindingQuery_response_total {response_code!~"2.*"} [10m]) or (appinfo_service_running * 0 ) ) / sum(rate(ocbsf_bindingQuery_response_total[10m]))) * 100 >= 30

OID 1.3.6.1.4.1.323.5.3.37.1.2.24
Metric Used ocbsf_bindingQuery_response_total
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.12 BINDING_QUERY_RESPONSE_ERROR_MAJOR

Table 5-13 BINDING_QUERY_RESPONSE_ERROR_MAJOR

Field Details
Description At least 50% of the Binding Query connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Summary At least 50% of the Binding Query connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Severity Major
Condition BSF is able to raise threshold based alerts for duplicate Binding request received and handled at BSF. If 50% of the requests fails for 10 mins, BSF is able to raise Major Alert indicating duplicate Binding request are being detected at BSF.

(sum(rate(ocbsf_bindingQuery_response_total {response_code!~"2.*"} [10m]) or (appinfo_service_running * 0 ) ) / sum(rate(ocbsf_bindingQuery_response_total[10m]))) * 100 >= 50

OID 1.3.6.1.4.1.323.5.3.37.1.2.24
Metric Used ocbsf_bindingQuery_response_total
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.13 BINDING_QUERY_RESPONSE_ERROR_CRITICAL

Table 5-14 BINDING_QUERY_RESPONSE_ERROR_CRITICAL

Field Details
Description At least 70% of the Binding Query connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Summary At least 70% of the Binding Query connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Severity Minor
OID 1.3.6.1.4.1.323.5.3.37.1.2.24
Condition BSF is able to raise threshold based alerts for duplicate Binding request received and handled at BSF. If 70% of the requests fails for 10 mins, BSF is able to raise Critical Alert indicating duplicate Binding request are being detected at BSF.

(sum(rate(ocbsf_bindingQuery_response_total {response_code!~"2.*"} [10m]) or (appinfo_service_running * 0 ) ) / sum(rate(ocbsf_bindingQuery_response_total[10m]))) * 100 >= 70

Metric Used ocbsf_bindingQuery_response_total
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.14 DIAM_RESPONSE_NETWORK_ERROR_MINOR

Table 5-15 DIAM_RESPONSE_NETWORK_ERROR_MINOR

Field Details
Description At least 20% of the Binding Registration requests failed were duplicate failures.
Summary At least 20% of the Binding Registration requests failed were duplicate failures.
Severity Minor
Condition BSF is able to raise threshold based alerts for Message/Service Request Failure. When message failures like Binding Registration or deregistration request, Diameter Requests Failure with error "DIAMETER_UNABLE_TO_DELIVER" are observed, BSF is able to raise alerts.

If 20% of the requests fails for 10 mins, BSF is able to raise Minor Alert indicating the procedure or service which is failing.

OID 1.3.6.1.4.1.323.5.3.37.1.2.24
Metric Used ocbsf_diam_response_network_total
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.15 DIAM_RESPONSE_NETWORK_ERROR_MAJOR

Table 5-16 DIAM_RESPONSE_NETWORK_ERROR_MAJOR

Field Details
Description At least 50% of the Diam Response connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Summary At least 50% of the Diam Response connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Severity Major
Condition BSF is able to raise threshold based alerts for Message/Service Request Failure. When message failures like Binding Registration or deregistration request, Diameter Requests Failure with error "DIAMETER_UNABLE_TO_DELIVER" are observed, BSF is able to raise alerts.

If 50% of the requests fails for 10 mins, BSF is able to raise Major Alert indicating the procedure or service which is failing.

OID 1.3.6.1.4.1.323.5.3.37.1.2.24
Metric Used ocbsf_diam_response_network_total
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.16 DIAM_RESPONSE_NETWORK_ERROR_CRITICAL

Table 5-17 DIAM_RESPONSE_NETWORK_ERROR_CRITICAL

Field Details
Description At least 70% of the Diam Response connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Summary At least 70% of the Diam Response connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Severity Critical
Condition BSF is able to raise threshold based alerts for Message/Service Request Failure. When message failures like Binding Registration or deregistration request, Diameter Requests Failure with error "DIAMETER_UNABLE_TO_DELIVER" are observed, BSF is able to raise alerts.

If 75% of the requests fails for 10 mins, BSF is able to raise Critical Alert indicating the procedure or service which is failing.

OID 1.3.6.1.4.1.323.5.3.37.1.2.24
Metric Used ocbsf_diam_response_network_total
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.17 DUPLICATE_BINDING_REQUEST_ERROR_MINOR

Table 5-18 DUPLICATE_BINDING_REQUEST_ERROR_MINOR

Field Details
Description At least 30% of the Binding Registration requests failed were duplicate failures.
Summary At least 30% of the Binding Registration requests failed were duplicate failures.
Severity Minor
Condition

If 30% of the requests fails for 10 mins, BSF is able to raise Minor Alert indicating duplicate Binding request are being detected at BSF.

(sum(rate({_name_=~"ocbsf_collision_detection.*"}[10m]) or (appinfo_service_running * 0)) / sum(rate(ocbsf_ingress_request_total {operation_type="register"} [10m]))) * 100 >= 30

OID 1.3.6.1.4.1.323.5.3.37.1.2.24
Metric Used ocbsf_ingress_request_total
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.18 DUPLICATE_BINDING_REQUEST_ERROR_MAJOR

Table 5-19 DUPLICATE_BINDING_REQUEST_ERROR_MAJOR

Field Details
Description At least 50% of the Binding Registration requests failed were duplicate failures.
Summary At least 50% of the Binding Registration requests failed were duplicate failures.
Severity Major
Condition

If 50% of the requests fails for 10 mins, BSF is able to raise Major Alert indicating duplicate Binding request are being detected at BSF.

(sum(rate({_name_=~"ocbsf_collision_detection.*"}[10m]) or (appinfo_service_running * 0)) / sum(rate(ocbsf_ingress_request_total {operation_type="register"} [10m]))) * 100 >= 50

OID 1.3.6.1.4.1.323.5.3.37.1.2.24
Metric Used ocbsf_ingress_request_total
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.19 DUPLICATE_BINDING_REQUEST_ERROR_CRITICAL

Table 5-20 DUPLICATE_BINDING_REQUEST_ERROR_CRITICAL

Field Details
Description At least 70% of the Binding Registration requests failed were duplicate failures.
Summary At least 70% of the Binding Registration requests failed were duplicate failures.
Severity Critical
Condition

If 70% of the requests fails for 10 mins, BSF is ablel to raise Critical Alert indicating duplicate Binding request are being detected at BSF.

(sum(rate({_name_=~"ocbsf_collision_detection.*"}[10m]) or (appinfo_service_running * 0)) / sum(rate(ocbsf_ingress_request_total {operation_type="register"} [10m]))) * 100 >= 70

OID 1.3.6.1.4.1.323.5.3.37.1.2.24
Metric Used ocbsf_ingress_request_total
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.20 IngressTotalErrorRateAboveMinorThreshold

Table 5-21 IngressTotalErrorRateAboveMinorThreshold

Field Details
Description Transaction Error Rate detected above 1 Percent of Total on BSF service (current value is: {{ $value }})
Summary Transaction Error Rate detected above 1 Percent of Total Transactions
Severity Minor
Condition The total number of failed transactions for BSF service is above 1 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.3
Metric Used ocbsf_ingress_response_total
Recommended Actions The alert gets cleared when the number of failed transactions is below 1% of the total transactions.

For any assistance, contact My Oracle Support.

5.2.21 IngressTotalErrorRateAboveMajorThreshold

Table 5-22 IngressTotalErrorRateAboveMajorThreshold

Field Details
Description Transaction Error Rate detected above 5 Percent of Total on BSF service (current value is: {{ $value }})
Summary Transaction Error Rate detected above 5 Percent of Total Transactions
Severity Major
Condition The total number of failed transactions for BSF service is above 5 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.3
Metric Used ocbsf_ingress_response_total
Recommended Actions The alert gets cleared when the number of failed transactions is below 5% of the total transactions.

For any assistance, contact My Oracle Support.

5.2.22 IngressTotalErrorRateAboveCriticalThreshold

Table 5-23 IngressTotalErrorRateAboveCriticalThreshold

Field Details
Description Transaction Error Rate detected above 10 Percent of Total on BSF service (current value is: {{ $value }})
Summary Transaction Error Rate detected above 10 Percent of Total Transactions
Severity Critical
Condition The total number of failed transactions for BSF service is above 10 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.3
Metric Used ocbsf_ingress_response_total
Recommended Actions The alert gets cleared when the number of failed transactions is below 10% of the total transactions.

For any assistance, contact My Oracle Support.

5.2.23 PCFBindingErrorRateAboveMinorThreshold

Table 5-24 PCFBindingErrorRateAboveMinorThreshold

Field Details
Description PCF Binding Error Rate above 1 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary PCF Binding Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity Minor
Condition The total number of failed transactions for retrieving PCF Bindings is above 1 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.5
Metric Used http_server_requests_seconds_count
Recommended Actions The alert gets cleared when the number of failed transactions is below 1% of the total transactions.

To assess the reason for failed transactions, check the service specific metrics for the GET method.

For any assistance, contact My Oracle Support.

5.2.24 PCFBindingErrorRateAboveMajorThreshold

Table 5-25 PCFBindingErrorRateAboveMajorThreshold

Field Details
Description PCF Binding Error Rate above 5 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary PCF Binding Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity Major
Condition The total number of failed transactions for retrieving PCF Bindings is above 5 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.5
Metric Used http_server_requests_seconds_count
Recommended Actions The alert gets cleared when the number of failed transactions is below 5% of the total transactions.

To assess the reason for failed transactions, check the service specific metrics for the GET method.

For any assistance, contact My Oracle Support.

5.2.25 PCFBindingErrorRateAboveCriticalThreshold

Table 5-26 PCFBindingErrorRateAboveCriticalThreshold

Field Details
Description PCF Binding Error Rate above 10 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary PCF Binding Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity Critical
Condition The total number of failed transactions for retrieving PCF Bindings is above 10 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.5
Metric Used http_server_requests_seconds_count
Recommended Actions The alert gets cleared when the number of failed transactions is below 10% of the total transactions.

To assess the reason for failed transactions, check the service specific metrics for the GET method.

For any assistance, contact My Oracle Support.

5.2.26 IngressCreateErrorRateAboveMinorThreshold

Table 5-27 IngressCreateErrorRateAboveMinorThreshold

Field Details
Description BSF Ingress Create Error Rate above 1 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary Transaction Create Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity Minor
Condition The total number of failed transactions for creating requests (POST method of operation) for BSF service is above 1 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.4
Metric Used http_server_requests_seconds_count
Recommended Actions The alert gets cleared when the number of failed transactions is below 1% of the total transactions.

To assess the reason for failed transactions, check the service specific metrics for the POST method.

For any assistance, contact My Oracle Support.

5.2.27 IngressCreateErrorRateAboveCriticalThreshold

Table 5-28 IngressCreateErrorRateAboveCriticalThreshold

Field Details
Description BSF Ingress Create Error Rate above 10 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary Transaction Create Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity Critical
Condition The total number of failed transactions for creating requests (POST method of operation) for BSF service is above 10 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.4
Metric Used http_server_requests_seconds_count
Recommended Actions The alert gets cleared when the number of failed transactions is below 10% of the total transactions.

To assess the reason for failed transactions, check the service specific metrics for the POST method.

For any assistance, contact My Oracle Support.

5.2.28 IngressCreateErrorRateAboveMajorThreshold

Table 5-29 IngressCreateErrorRateAboveMajorThreshold

Field Details
Description BSF Ingress Create Error Rate above 5 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary Transaction Create Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity Major
Condition The total number of failed transactions for creating requests (POST method of operation) for BSF service is above 5 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.4
Metric Used http_server_requests_seconds_count
Recommended Actions The alert gets cleared when the number of failed transactions is below 5% of the total transactions.

To assess the reason for failed transactions, check the service specific metrics for the POST method.

For any assistance, contact My Oracle Support.

5.2.29 IngressDeleteErrorRateAboveMinorThreshold

Table 5-30 IngressDeleteErrorRateAboveMinorThreshold

Field Details
Description Ingress Delete Error Rate above 1 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary Ingress Delete Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity Minor
Condition The total number of failed transactions for delete requests (DELETE method of operation) for BSF service is above 1 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.6
Metric Used http_server_requests_seconds_count
Recommended Actions The alert gets cleared when the number of failed transactions is below 1% of the total transactions.

To assess the reason for failed transactions, check the service specific metrics for the DELETE method.

For any assistance, contact My Oracle Support.

5.2.30 IngressDeleteErrorRateAboveMajorThreshold

Table 5-31 IngressDeleteErrorRateAboveMajorThreshold

Field Details
Description Ingress Delete Error Rate above 5 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary Ingress Delete Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity Major
Condition The total number of failed transactions for delete requests (DELETE method of operation) for BSF service is above 5 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.6
Metric Used http_server_requests_seconds_count
Recommended Actions The alert gets cleared when the number of failed transactions is below 5% of the total transactions.

To assess the reason for failed transactions, check the service specific metrics for the DELETE method.

For any assistance, contact My Oracle Support.

5.2.31 IngressDeleteErrorRateAboveCriticalThreshold

Table 5-32 IngressDeleteErrorRateAboveCriticalThreshold

Field Details
Description Ingress Delete Error Rate above 10 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary Ingress Delete Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity Critical
Condition The total number of failed transactions for delete requests (DELETE method of operation) for BSF service is above 10 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.37.1.2.6
Metric Used http_server_requests_seconds_count
Recommended Actions The alert gets cleared when the number of failed transactions is below 10% of the total transactions.

To assess the reason for failed transactions, check the service specific metrics for the DELETE method.

For any assistance, contact My Oracle Support.

5.2.32 DBTierDownAlert

Table 5-33 DBTierDownAlert

Field Details
Description DB cannot be reachable!
Summary DB cannot be reachable!
Severity Critical
Condition The database is not available.
OID 1.3.6.1.4.1.323.5.3.37.1.2.7
Metric Used appinfo_category_running
Recommended Actions

Check whether the database service is up.

Check the status or age of the MySQL pod by using the following command:
kubectl get pods -n <namespace>

where <namespace> is the namespace used to deploy MySQL pod.

This alert is cleared automatically when the DB service is up and running.

5.2.33 CPUUsagePerServiceAboveMinorThreshold

Table 5-34 CPUUsagePerServiceAboveMinorThreshold

Field Details
Description CPU usage for {{$labels.microservice}} service is above 60
Summary CPU usage for {{$labels.microservice}} service is above 60
Severity Minor
Condition A service pod has reached the configured minor threshold (60%) of its CPU usage limits.
OID 1.3.6.1.4.1.323.5.3.37.1.2.8
Metric Used cgroup_cpu_usage
Recommended Actions The alert gets cleared when the CPU utilization falls below the minor threshold or crosses the major threshold, in which case CPUUsagePerServiceAboveMajorThreshold alert shall be raised.

Note: Threshold levels can be configured using the BSF_Alertrules.yaml file.

For any assistance, contact My Oracle Support.

5.2.34 CPUUsagePerServiceAboveMajorThreshold

Table 5-35 CPUUsagePerServiceAboveMajorThreshold

Field Details
Description CPU usage for {{$labels.microservice}} service is above 80
Summary CPU usage for {{$labels.microservice}} service is above 80
Severity Major
Condition A service pod has reached the configured major threshold (80%) of its CPU usage limits.
OID 1.3.6.1.4.1.323.5.3.37.1.2.9
Metric Used cgroup_cpu_usage
Recommended Actions The alert gets cleared when the CPU utilization falls below the major threshold or crosses the critical threshold, in which case CPUUsagePerServiceAboveCriticalThreshold alert shall be raised.

Note: Threshold levels can be configured using the BSF_Alertrules.yaml file.

For any assistance, contact My Oracle Support.

5.2.35 CPUUsagePerServiceAboveCriticalThreshold

Table 5-36 CPUUsagePerServiceAboveCriticalThreshold

Field Details
Description CPU usage for {{$labels.microservice}} service is above 90
Summary CPU usage for {{$labels.microservice}} service is above 90
Severity Critical
Condition A service pod has reached the configured critical threshold (90%) of its CPU usage limits.
OID 1.3.6.1.4.1.323.5.3.37.1.2.10
Metric Used cgroup_cpu_usage
Recommended Actions The alert gets cleared when the CPU utilization falls below the critical threshold.

Note: Threshold levels can be configured using the BSF_Alertrules.yaml file.

For any assistance, contact My Oracle Support.

5.2.36 MemoryUsagePerServiceAboveMinorThreshold

Table 5-37 MemoryUsagePerServiceAboveMinorThreshold

Field Details
Description Memory usage for {{$labels.microservice}} service is above 60
Summary Memory usage for {{$labels.microservice}} service is above 60
Severity Minor
Condition A service pod has reached the configured minor threshold (60%) of its memory usage limits.
OID 1.3.6.1.4.1.323.5.3.37.1.2.11
Metric Used cgroup_memory_usage
Recommended Actions The alert gets cleared when the memory utilization falls below the minor threshold or crosses the major threshold, in which case MemoryUsagePerServiceAboveMajorThreshold alert shall be raised.

Note: Threshold levels can be configured using the BSF_Alertrules.yaml file.

For any assistance, contact My Oracle Support.

5.2.37 MemoryUsagePerServiceAboveMajorThreshold

Table 5-38 MemoryUsagePerServiceAboveMajorThreshold

Field Details
Description Memory usage for {{$labels.microservice}} service is above 80
Summary Memory usage for {{$labels.microservice}} service is above 80
Severity Major
Condition A service pod has reached the configured major threshold (80%) of its memory usage limits.
OID 1.3.6.1.4.1.323.5.3.37.1.2.12
Metric Used cgroup_memory_usage
Recommended Actions The alert gets cleared when the memory utilization falls below the major threshold or crosses the critical threshold, in which case MemoryUsagePerServiceAboveCriticalThreshold alert shall be raised.

Note: Threshold levels can be configured using the BSF_Alertrules.yaml file.

For any additional guidance, contact My Oracle Support.

5.2.38 MemoryUsagePerServiceAboveCriticalThreshold

Table 5-39 MemoryUsagePerServiceAboveCriticalThreshold

Field Details
Description Memory usage for {{$labels.microservice}} service is above 90
Summary Memory usage for {{$labels.microservice}} service is above 90
Severity Critical
Condition A service pod has reached the configured critical threshold (90%) of its memory usage limits.
OID 1.3.6.1.4.1.323.5.3.37.1.2.13
Metric Used cgroup_memory_usage
Recommended Actions The alert gets cleared when the memory utilization falls below the critical threshold.

Note: Threshold levels can be configured using the BSF_Alertrules.yaml

For any assistance, contact My Oracle Support.

5.2.39 NRF_COMMUNICATION_FAILURE

Table 5-40 NRF_COMMUNICATION_FAILURE

Field Details
Description There has been a external failure communication error with NRF.
Summary There has been a external failure communication error with NRF.
Severity Info
Condition BSF is able to raise and clear alarms for the failure of external communication, that is in case of the unavailability of producer NRF.
  • Raise alert if: ocbsf_nrfclient_nrf_operative_status == 0
  • Clear alert if: ocbsf_nrfclient_nrf_operative_status == 1
OID 1.3.6.1.4.1.323.5.3.37.1.2.18
Metric Used ocbsf_nrfclient_nrf_operative_status
Recommended Actions For any assistance, contact My Oracle Support.

5.2.40 NRF_SERVICE_REQUEST_FAILURE

Table 5-41 NRF_SERVICE_REQUEST_FAILURE

Field Details
Description There has been a Service Request Failure with NRF, either due to Registration failure or Profile update failure.
Summary There has been a Service Request Failure with NRF, either a Registration failure, Heartbeat failure, or Profile Update Failure.
Severity Info
Condition BSF is able to raise and clear alarms in case of Service Request Failures with NRF like the Registration failure, Heartbeat failure, Profile Update Failure.
  • raise alert if: ocbsf_nrfclient_nfUpdate_status == 0
  • clear alert if: ocbsf_nrfclient_nfUpdate_status == 1
OID 1.3.6.1.4.1.323.5.3.37.1.2.19
Metric Used ocbsf_nrfclient_nfUpdate_status
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.41 PERF_INFO_ACTIVE_OVERLOAD_THRESHOLD_FETCH_FAILED

Table 5-42 PERF_INFO_ACTIVE_OVERLOAD_THRESHOLD_FETCH_FAILED

Field Details
Description The application fails to get the current active overload level threshold data.
Summary The application raises PERF_INFO_ACTIVE_OVERLOAD_THRESHOLD_FETCH_FAILED alert when it fails to fetch the current active overload level threshold data and active_overload_threshold_fetch_failed == 1.
Severity Major
Condition active_overload_threshold_fetch_failed == 1
OID 1.3.6.1.4.1.323.5.3.37.1.2.20
Metric Used active_overload_threshold_fetch_failed
Recommended Actions

The alert gets cleared when the application fetches the current active overload level threshold data.

For any additional guidance, contact My Oracle Support.

5.2.42 PodDoc

Table 5-43 PodDoc

Field Details
Description Pod Congestion status of {{$labels.microservice}} service is DoC
Summary Pod Congestion status of {{$labels.microservice}} service is DoC
Severity Major
Condition The pod congestion status is set to Danger of Congestion.
OID 1.3.6.1.4.1.323.5.3.37.1.2.25
Metric Used ocbsf_pod_congestion_state
Recommended Actions The alert gets cleared when the system is back to normal state.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.43 PodCongested

Table 5-44 PodCongested

Field Details
Description Pod Congestion status of {{$labels.microservice}} service is congested
Summary Pod Congestion status of {{$labels.microservice}} service is congested
Severity Critical
Condition The pod congestion status is set to congested.
OID 1.3.6.1.4.1.323.5.3.37.1.2.26
Metric Used ocbsf_pod_congestion_state
Recommended Actions The alert gets cleared when the system is back to normal state.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.44 PodPendingRequestDoC

Table 5-45 PodPendingRequestDoC

Field Details
Description Pod Resource Congestion status of {{$labels.microservice}} service is DoC for PendingRequest type
Summary Pod Resource Congestion status of {{$labels.microservice}} service is DoC for PendingRequest type
Severity Major
Condition The pod congestion status is set to DoC for pending requests.
OID 1.3.6.1.4.1.323.5.3.37.1.2.27
Metric Used ocbsf_pod_resource_congestion_state{type="queue"}
Recommended Actions The alert gets cleared when the pending requests in the queue comes below the configured threshold value.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.45 PodPendingRequestCongested

Table 5-46 PodPendingRequestCongested

Field Details
Description Pod Resource Congestion status of {{$labels.microservice}} service is congested for PendingRequest type
Summary Pod Resource Congestion status of {{$labels.microservice}} service is congested for PendingRequest type
Severity Critical
Condition The pod congestion status is set to congested for PendingRequest.
OID 1.3.6.1.4.1.323.5.3.37.1.2.28
Metric Used ocbsf_pod_resource_congestion_state{type="queue"}
Recommended Actions The alert gets cleared when the pending requests in the queue comes below the configured threshold value.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.46 PodCPUDoC

Table 5-47 PodCPUDoC

Field Details
Description Pod Resource Congestion status of {{$labels.microservice}} service is DoC for CPU type
Summary Pod Resource Congestion status of {{$labels.microservice}} service is DoC for CPU type
Severity Major
Condition The pod congestion status is set to DoC for CPU.
OID 1.3.6.1.4.1.323.5.3.37.1.2.29
Metric Used ocbsf_pod_resource_congestion_state{type="cpu"}
Recommended Actions The alert gets cleared when the system CPU usage comes below the configured threshold value.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.47 PodCPUCongested

Table 5-48 PodCPUCongested

Field Details
Description Pod Resource Congestion status of {{$labels.microservice}} service is congested for CPU type
Summary Pod Resource Congestion status of {{$labels.microservice}} service is congested for CPU type
Severity Critical
Condition The pod congestion status is set to congested for CPU.
OID 1.3.6.1.4.1.323.5.3.37.1.2.30
Metric Used ocbsf_pod_resource_congestion_state{type="cpu"}
Recommended Actions The alert gets cleared when the system CPU usage comes below the configured threshold value.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.48 PodMemoryDoC

Table 5-49 PodMemoryDoC

Field Details
Description Pod Resource Congestion status of {{$labels.microservice}} service is DoC for Memory type
Summary Pod Resource Congestion status of {{$labels.microservice}} service is DoC for Memory type
Severity Major
Condition The pod congestion status is set to DoC for memory.
OID 1.3.6.1.4.1.323.5.3.37.1.2.31
Metric Used ocbsf_pod_resource_congestion_state{type="memory"}
Recommended Actions The alert gets cleared when the system memory comes below the configured threshold value.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.49 PodMemoryCongested

Table 5-50 PodMemoryCongested

Field Details
Description Pod Resource Congestion status of {{$labels.microservice}} service is congested for Memory type
Summary Pod Resource Congestion status of {{$labels.microservice}} service is congested for Memory type
Severity Critical
Condition The pod congestion status is set to congested for memory.
OID 1.3.6.1.4.1.323.5.3.37.1.2.32
Metric Used ocbsf_pod_resource_congestion_state{type="memory"}
Recommended Actions The alert gets cleared when the system memory comes below the configured threshold value.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.50 ServiceOverloaded

Table 5-51 ServiceOverloaded-Minor

Field Details
Description Overload Level of {{$labels.microservice}} service is L1
Summary Overload Level of {{$labels.microservice}} service is L1
Severity Minor
Condition The overload level of the service is L1.
OID 1.3.6.1.4.1.323.5.3.37.1.2.14
Metric Used load_level
Recommended Actions The alert gets cleared when the system is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 5-52 ServiceOverloaded-Major

Field Details
Description Overload Level of {{$labels.microservice}} service is L2
Summary Overload Level of {{$labels.microservice}} service is L2
Severity Major
Condition The overload level of the service is L2.
OID 1.3.6.1.4.1.323.5.3.37.1.2.14
Metric Used load_level
Recommended Actions The alert gets cleared when the system is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 5-53 ServiceOverloaded-Critical

Field Details
Description Overload Level of {{$labels.service}} service is L3
Summary Overload Level of {{$labels.service}} service is L3
Severity Critical
Condition The overload level of the service is L3.
OID 1.3.6.1.4.1.323.5.3.37.1.2.14
Metric Used load_level
Recommended Actions The alert gets cleared when the system is back to normal state.

For any additional guidance, contact My Oracle Support.

5.2.51 ServiceResourceOverloaded

Alerts when service is in overload state due to memory usage

Table 5-54 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L1 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L1 for {{$labels.type}} type
Severity Minor
Condition The overload level of the service is L1 due to memory usage.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="memory"}
Recommended Actions The alert gets cleared when the memory usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 5-55 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L2 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L2 for {{$labels.type}} type
Severity Major
Condition The overload level of the service is L2 due to memory usage.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="memory"}
Recommended Actions The alert gets cleared when the memory usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 5-56 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L3 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L3 for {{$labels.type}} type
Severity Critical
Condition The overload level of the service is L3 due to memory usage.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="memory"}
Recommended Actions The alert gets cleared when the memory usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to CPU usage

Table 5-57 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L1 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L1 for {{$labels.type}} type
Severity Minor
Condition The overload level of the service is L1 due to CPU usage.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="cpu"}
Recommended Actions The alert gets cleared when the CPU usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 5-58 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L2 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L2 for {{$labels.type}} type
Severity Major
Condition The overload level of the service is L2 due to CPU usage.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="cpu"}
Recommended Actions The alert gets cleared when the CPU usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 5-59 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L3 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L3 for {{$labels.type}} type
Severity Critical
Condition The overload level of the service is L3 due to CPU usage.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="cpu"}
Recommended Actions The alert gets cleared when the CPU usage of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to number of pending messages

Table 5-60 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L1 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L1 for {{$labels.type}} type
Severity Minor
Condition The overload level of the service is L1 due to number of pending messages.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="svc_pending_count"}
Recommended Actions The alert gets cleared when the number of pending messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 5-61 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L2 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L2 for {{$labels.type}} type
Severity Major
Condition The overload level of the service is L2 due to number of pending messages.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="svc_pending_count"}
Recommended Actions The alert gets cleared when the number of pending messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 5-62 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L3 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L3 for {{$labels.type}} type
Severity Critical
Condition The overload level of the service is L3 due to number of pending messages.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="svc_pending_count"}
Recommended Actions The alert gets cleared when the number of pending messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to number of failed requests

Table 5-63 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L1 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L1 for {{$labels.type}} type
Severity Minor
Condition The overload level of the service is L1 due to number of failed requests.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="svc_failure_count"}
Recommended Actions The alert gets cleared when the number of failed messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 5-64 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L2 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L2 for {{$labels.type}} type
Severity Major
Condition The overload level of the service is L2 due to number of failed requests.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="svc_failure_count"}
Recommended Actions The alert gets cleared when the number of failed messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

Table 5-65 ServiceResourceOverloaded

Field Details
Description {{$labels.microservice}} service is L3 for {{$labels.type}} type
Summary {{$labels.microservice}} service is L3 for {{$labels.type}} type
Severity Critical
Condition The overload level of the service is L3 due to number of failed requests.
OID 1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used service_resource_overload_level{type="svc_failure_count"}
Recommended Actions The alert gets cleared when the number of failed messages of the service is back to normal state.

For any additional guidance, contact My Oracle Support.

5.2.52 SYSTEM_IMPAIRMENT_MAJOR

Table 5-66 SYSTEM_IMPAIRMENT_MAJOR

Field Details
Description Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Summary Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Severity Major
Condition Major Impairment alert
OID 1.3.6.1.4.1.323.5.3.36.1.2.43
Metric Used db_tier_replication_status
Recommended Actions

For any additional guidance, contact My Oracle Support.

5.2.53 SYSTEM_IMPAIRMENT_CRITICAL

Table 5-67 SYSTEM_IMPAIRMENT_CRITICAL

Field Details
Description Critical Impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Summary Critical Impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage
Severity Critical
Condition Critical Impairment alert
OID 1.3.6.1.4.1.323.5.3.36.1.2.43
Metric Used db_tier_replication_status
Recommended Actions

For any additional guidance, contact My Oracle Support.

5.2.54 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN

Table 5-68 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN

Field Details
Description System Operational State is now in partial shutdown state.
Summary System Operational State is now in partial shutdown state.
Severity Major
Condition System Operational State is now in partial shutdown state
OID 1.3.6.1.4.1.323.5.3.36.1.2.44
Metric Used system_operational_state == 2
Recommended Actions

For any additional guidance, contact My Oracle Support.

5.2.55 SYSTEM_OPERATIONAL_STATE_COMPLETE_SHUTDOWN

Table 5-69 SYSTEM_OPERATIONAL_COMPLETE_SHUTDOWN

Field Details
Description System Operational State is now in complete shutdown state
Summary System Operational State is now in complete shutdown state
Severity Critical
Condition System Operational State is now in complete shutdown state
OID 1.3.6.1.4.1.323.5.3.36.1.2.44
Metric Used system_operational_state == 3
Recommended Actions

For any additional guidance, contact My Oracle Support.

5.2.56 DIAM_CONN_PEER_DOWN

Table 5-70 DIAM_CONN_PEER_DOWN

Field Details
Description Diameter connection to peer {{ $labels.peerHost }} is down.
Summary Diameter connection to peer down.
Severity Major
Condition Diameter connection to peer peerHost in given namespace is down.
OID 1.3.6.1.4.1.323.5.3.37.1.2.18
Metric Used ocbsf_diam_conn_network
Recommended Actions For any assistance, contact My Oracle Support.

5.2.57 DIAM_CONN_NETWORK_DOWN

Table 5-71 DIAM_CONN_NETWORK_DOWN

Field Details
Description All diameter network connections are down.
Summary All diameter network connections are down.
Severity Critical
Condition All diameter networks in a kubernetes namespace are down.
OID 1.3.6.1.4.1.323.5.3.37.1.2.19
Metric Used ocbsf_diam_conn_network
Recommended Actions

For any assistance, contact My Oracle Support.

5.2.58 DIAM_RESPONSE_REALM_VALIDATION_ERROR_CRITICAL

Table 5-72 DIAM_RESPONSE_REALM_VALIDATION_ERROR_CRITICAL

Field Details
Description At least 75% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED', either of BSF realm or PCF Realm doesn't match with received destination realm in diameter message.
Summary {{ $value }}% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED'.
Severity CRITICAL
Condition (sum(increase(ocbsf_diam_realm_validation_failed_total{responseCode="3003", appId="16777236"}[10m])) / sum(increase(ocbsf_diam_response_network_total{appId="16777236"}[10m]))) * 100 >= 75
OID 1.3.6.1.4.1.323.5.3.37.1.2.41
Metric Used ocbsf_diam_realm_validation_failed_total
Recommended Actions
  1. Check if the value of the following keys under Advanced settings of diameter settings page are set to true:
    • DIAMETER.Enable.Validate.Realm
    • DIAMETER.BSF.Enable.Validate.Binding.Realm
  2. Check the destination-realm in diameter request.

5.2.59 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MAJOR

Table 5-73 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MAJOR

Field Details
Description At least 50% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED', either of BSF realm or PCF Realm doesn't match with received destination realm in diameter message.
Summary {{ $value }}% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED'.
Severity MAJOR
Condition (sum(increase(ocbsf_diam_realm_validation_failed_total{responseCode="3003", appId="16777236"}[10m])) / sum(increase(ocbsf_diam_response_network_total{appId="16777236"}[10m]))) * 100 >= 50
OID 1.3.6.1.4.1.323.5.3.37.1.2.41
Metric Used ocbsf_diam_realm_validation_failed_total
Recommended Actions
  1. Check if the value of the following keys under Advanced settings of diameter settings page are set to true:
    • DIAMETER.Enable.Validate.Realm
    • DIAMETER.BSF.Enable.Validate.Binding.Realm
  2. Check the destination-realm coming in diameter request.

5.2.60 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MINOR

Table 5-74 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MINOR

Field Details
Description At least 20% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED', either of BSF realm or PCF Realm doesn't match with received destination realm in diameter message.
Summary {{ $value }}% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED'.
Severity MINOR
Condition (sum(increase(ocbsf_diam_realm_validation_failed_total{responseCode="3003", appId="16777236"}[10m])) / sum(increase(ocbsf_diam_response_network_total{appId="16777236"}[10m]))) * 100 >= 20
OID 1.3.6.1.4.1.323.5.3.37.1.2.41
Metric Used ocbsf_diam_realm_validation_failed_total
Recommended Actions
  1. Check if the value of the following keys under Advanced settings of diameter settings page are set to true:
    • DIAMETER.Enable.Validate.Realm
    • DIAMETER.BSF.Enable.Validate.Binding.Realm
  2. Check the destination-realm coming in diameter request.

5.2.61 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MINOR

Table 5-75 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MINOR

Field Details
Description

At least 20% of the BSF Notification Request for Audit have responded with a 5xx or 4xx (not 404) Status in the last 24 hours.

Summary At least 20% of the BSF Notification Request for Audit have responded with a 5xx or 4xx (not 404) Status in the last 24 hours.
Severity MINOR
Condition

When 20%, or more, BSF Notification Requests for Audit to PCF (or its respective NF) fail, the alert is triggered.

The threshold default value is defined at BSF_Alertrules.yaml.

Expression (sum(increase(ocbsf_query_response_count_total{response_code=~"5..|4..",response_code!="404"}[24h])) / sum(increase(ocbsf_query_response_count_total[24h]))) * 100 >= 20
OID 1.3.6.1.4.1.323.5.3.37.1.2.42
Metric Used ocbsf_query_response_count_total
Recommended Actions Determine the reason why these notification requests are failing. This alert indicates that there is a potential issue either with the network communications, or the NF where the audit notifications point to.

5.2.62 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MAJOR

Table 5-76 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MAJOR

Field Details
Description

At least 40% of the BSF Notification Request for Audit have responded with a 5xx or 4xx (not 404) Status in the last 24 hours.

Summary At least 40% of the BSF Notification Request for Audit have responded with a 5xx or 4xx (not 404) Status in the last 24 hours.
Severity MAJOR
Condition

When 40%, or more, BSF Notification Requests for Audit to PCF (or its respective NF) fail, the alert is triggered.

The threshold default value is defined at BSF_Alertrules.yaml.

Expression (sum(increase(ocbsf_query_response_count_total{response_code=~"5..|4..",response_code!="404"}[24h])) / sum(increase(ocbsf_query_response_count_total[24h]))) * 100 >= 40
OID 1.3.6.1.4.1.323.5.3.37.1.2.42
Metric Used ocbsf_query_response_count_total
Recommended Actions Determine the reason why these notification requests are failing. This alert indicates that there is an issue either with the network communications, or the NF where the audit notifications point to, that needs to be addressed as soon as possible.

5.2.63 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_CRITICAL

Table 5-77 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_CRITICAL

Field Details
Description

At least 60% of the BSF Notification Request for Audit have responded with a 5xx or 4xx (not 404) Status in the last 24 hours.

Summary At least 60% of the BSF Notification Request for Audit have responded with a 5xx or 4xx (not 404) Status in the last 24 hours.
Severity CRITICAL
Condition

When 60%, or more, BSF Notification Requests for Audit to PCF (or its respective NF) fail, the alert is triggered.

The threshold default value is defined at BSF_Alertrules.yaml.

Expression (sum(increase(ocbsf_query_response_count_total{response_code=~"5..|4..",response_code!="404"}[24h])) / sum(increase(ocbsf_query_response_count_total[24h]))) * 100 >= 60
OID 1.3.6.1.4.1.323.5.3.37.1.2.42
Metric Used ocbsf_query_response_count_total
Recommended Actions Determine the reason why these notification requests are failing. This alert indicates that there is a critical issue either with the network communications, or the NF where the audit notifications point to, that needs to be addressed immediately.

5.2.64 CERTIFICATE_EXPIRY

Table 5-78 CERTIFICATE_EXPIRY

Field Details
Description TLS certificate to expire in 6 months.
Summary security_cert_x509_expiration_seconds - time() <= 15724800
Severity Minor
Condition This alert is raised when the TLS certificate is about to expire in six months.
OID 1.3.6.1.4.1.323.5.3.37.1.2.44
Metric Used security_cert_x509_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

Table 5-79 CERTIFICATE_EXPIRY

Field Details
Description TLS certificate to expire in 3 months.
Summary security_cert_x509_expiration_seconds - time() <= 7862400
Severity Major
Condition This alert is raised when the TLS certificate is about to expire in three months.
OID 1.3.6.1.4.1.323.5.3.37.1.2.44
Metric Used security_cert_x509_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

Table 5-80 CERTIFICATE_EXPIRY

Field Details
Description TLS certificate to expire in 1 month.
Summary security_cert_x509_expiration_seconds - time() <= 2592000
Severity Critical
Condition This alert is raised when the TLS certificate is about to expire in one month.
OID 1.3.6.1.4.1.323.5.3.37.1.2.44
Metric Used security_cert_x509_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.65 BSF_CONNECTION_FAILURE

Table 5-81 BSF_CONNECTION_FAILURE

Field Details
Description Connection failure on Egress and Ingress Gateways for incoming and outgoing connections.
Summary sum(increase(ocbsf_oc_ingressgateway_connection_failure_total[5m]) >0 or (ocbsf_oc_ingressgateway_connection_failure_total unless ocbsf_oc_ingressgateway_connection_failure_total offset 5m )) by (namespace,app, error_reason) > 0 or sum(increase(ocbsf_oc_egressgateway_connection_failure_total[5m]) >0 or (ocbsf_oc_egressgateway_connection_failure_total unless ocbsf_oc_egressgateway_connection_failure_total offset 5m )) by (namespace,app, error_reason) > 0
Severity Major
Condition This alert is raised when the TLS certificate is about to expire in three months.
OID 1.3.6.1.4.1.323.5.3.37.1.2.43
Metric Used ocbsf_oc_ingressgateway_connection_failure_total
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.66 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Table 5-82 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Field Details
Description This alarm is raised when OCNADD is not reachable.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} BSF Ingress Gateway Data Director unreachable'
Severity Major
Condition This alarm is raised when data director is not reachable from Ingress Gateway.
OID 1.3.6.1.4.1.323.5.3.37.1.2.47
Metric Used oc_ingressgateway_dd_unreachable
Recommended Actions Alert gets cleared automatically when the connection with data director is established.

5.2.67 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Table 5-83 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Field Details
Description This alarm is raised when OCNADD is not reachable.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} BSF Egress Gateway Data Director unreachable'
Severity Major
Condition This alarm is raised when data director is not reachable from Egress Gateway.
OID 1.3.6.1.4.1.323.5.3.37.1.2.48
Metric Used oc_egressgateway_dd_unreachable
Recommended Actions Alert gets cleared automatically when the connection with data director is established.

5.2.68 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Table 5-84 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Field Details
Description Certificate expiry in less than 6 months.
Summary Certificate expiry in less than 6 months.
Severity Minor
Condition dgw_tls_cert_expiration_seconds - time() <= 15724800
OID 1.3.6.1.4.1.323.5.3.37.1.2.75
Metric Used dgw_tls_cert_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.69 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Table 5-85 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Field Details
Description Certificate expiry in less than 3 months.
Summary Certificate expiry in less than 3 months.
Severity Major
Condition dgw_tls_cert_expiration_seconds - time() <= 7862400
OID 1.3.6.1.4.1.323.5.3.37.1.2.75
Metric Used dgw_tls_cert_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.70 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Table 5-86 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Field Details
Description Certificate expiry in less than 1 month.
Summary Certificate expiry in less than 1 month.
Severity Critical
Condition dgw_tls_cert_expiration_seconds - time() <= 2592000
OID 1.3.6.1.4.1.323.5.3.37.1.2.75
Metric Used dgw_tls_cert_expiration_seconds
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.71 DGW_TLS_CONNECTION_FAILURE

Table 5-87 DGW_TLS_CONNECTION_FAILURE

Field Details
Description Alert for TLS connection establishment.
Summary TLS Connection failure when Diam gateway is an initiator.
Severity Major
Condition sum by (namespace,reason)(ocbsf_diam_failed_conn_network) > 0
OID 1.3.6.1.4.1.323.5.3.37.1.2.81
Metric Used ocbsf_diam_failed_conn_network
Recommended Actions

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.72 BINDING_REVALIDATION_PCF_BINDING_MISSING_MINOR

Table 5-88 BINDING_REVALIDATION_PCF_BINDING_MISSING_MINOR

Field Details
Description At least 30% but less than 50% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes.
Summary At least 30% but less than 50% of the PCF BINDING missing among all Binding Revalidation records in the last 5 minutes.
Severity Minor
Condition (sum by (namespace) (rate(ocbsf_binding_revalidation_pcfBinding_missing_total[5m])) / sum by (namespace) (rate(ocbsf_binding_revalidation_response_total[5m]))) * 100 >= 30 < 50
OID 1.3.6.1.4.1.323.5.3.37.1.2.51
Metric Used  
Recommended Actions

Check BSF Management service health history. Increase binding audit frequency.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.73 BINDING_REVALIDATION_PCF_BINDING_MISSING_MAJOR

Table 5-89 BINDING_REVALIDATION_PCF_BINDING_MISSING_MAJOR

Field Details
Description At least 50% but less than 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes.
Summary At least 50% but less than 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes.
Severity Major
Condition (sum by (namespace) (rate(ocbsf_binding_revalidation_pcfBinding_missing_total[5m])) / sum by (namespace) (rate(ocbsf_binding_revalidation_response_total[5m]))) * 100 >= 50 < 70
OID 1.3.6.1.4.1.323.5.3.37.1.2.51
Metric Used  
Recommended Actions

Check BSF Management service health history. Increase binding audit frequency.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.2.74 BINDING_REVALIDATION_PCF_BINDING_MISSING_CRITICAL

Table 5-90 BINDING_REVALIDATION_PCF_BINDING_MISSING_CRITICAL

Field Details
Description At least 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes.
Summary At least 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes.
Severity Critical
Condition (sum by (namespace) (rate(ocbsf_binding_revalidation_pcfBinding_missing_total[5m])) / sum by (namespace) (rate(ocbsf_binding_revalidation_response_total[5m]))) * 100 >= 70
OID 1.3.6.1.4.1.323.5.3.37.1.2.51
Metric Used  
Recommended Actions

Check BSF Management service health history. Increase binding audit frequency.

For any additional guidance, contact My Oracle Support (https://support.oracle.com).