5 BSF Alerts
This section provides information on Oracle Communications Cloud Native Core, Binding Support Function (BSF) alerts and their configuration.
Note:
The performance and capacity of the BSF system may vary based on the call model, Feature/Interface configuration, and underlying CNE and hardware environment.You can configure alerts in Prometheus and
Alertrules.yaml file.
The following table describes the various severity types of alerts generated by Policy:
Table 5-1 Alerts Levels or Severity Types
| Alerts Levels / Severity Types | Definition |
|---|---|
| Critical | Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions can affect the service of BSF. |
| Major | Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions can affect the service of BSF. |
| Minor | Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions can affect the service of BSF. |
| Info or Warn (Informational) | Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of BSF. |
For more details on how to configure alerts, see Configuring BSF Alerts section in Oracle Communications Cloud Native Core, Binding Support Function Installation, Upgrade, and Fault Recovery Guide.
5.1 List of Alerts
This section lists the alerts available for Oracle Communications Cloud Native Core, Binding Support Function (BSF).
5.1.1 AAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
Table 5-2 AAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | AAA Rx fail count exceeds the critical threshold limit. |
| Summary | AAA Rx fail count exceeds the critical threshold limit. |
| Severity | CRITICAL |
| Expression | sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2.*"}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) * 100 > 90 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.40 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
5.1.2 AAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
Table 5-3 AAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description |
AAA Rx fail count exceeds the major threshold limit |
| Summary | AAA Rx fail count exceeds the major threshold limit. |
| Severity | MAJOR |
| Expression | sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2.*"}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) * 100 <=90 and sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2.*"}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) * 100 > 80 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.40 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
5.1.3 AAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
Table 5-4 AAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | AAA Rx fail count exceeds the minor threshold limit. |
| Summary | AAA Rx fail count exceeds the minor threshold limit. |
| Severity | MINOR |
| Expression | sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2.*"}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) * 100 <=80 and sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2.*"}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) * 100 > 60 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.40 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions | For any additional guidance, contact My Oracle Support. |
5.1.4 SCP_PEER_UNAVAILABLE
Table 5-5 SCP_PEER_UNAVAILABLE
| Field | Details |
|---|---|
| Description | Configured SCP peer is unavailable. |
| Summary | SCP peer [ {{$labels.peer}} ] is unavailable. |
| Severity | Major |
| Expression | ocbsf_oc_egressgateway_peer_health_status == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.38 |
| Metric Used | ocbsf_oc_egressgateway_peer_health_status |
| Recommended Actions |
This alert gets cleared when unavailable SCPs become available. For any additional guidance, contact My Oracle Support. |
5.1.5 SCP_PEER_SET_UNAVAILABLE
Table 5-6 SCP_PEER_SET_UNAVAILABLE
| Field | Details |
|---|---|
| Description | None of the SCP peer available for configured peerset. |
| Summary | {{ $value }} SCP peers under peer set {{$labels.peerset}} are currently unavailable. |
| Severity | Critical |
| Expression | (ocbsf_oc_egressgateway_peer_count > 0 and (ocbsf_oc_egressgateway_peer_available_count) == 0) |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.39 |
| Metric Used | oc_egressgateway_peer_count and oc_egressgateway_peer_available_count |
| Recommended Actions |
NF clears the critical alarm when at least one SCP peer in a peerset becomes available such that all other SCP peers in the given peerset are still unavailable. For any additional guidance, contact My Oracle Support. |
5.1.6 BSF_SERVICES_DOWN
Table 5-7 BSF_SERVICES_DOWN
| Field | Details |
|---|---|
| Description | {{$labels.service}} service is not running! |
| Summary | {{$labels.service}} is not running! |
| Severity | Critical |
| Expression | appinfo_service_running{application="ocbsf"} != 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.1 |
| Metric Used | appinfo_service_running |
| Recommended Actions | Perform the following steps:
In case the issue persists, capture the outputs for the preceding steps and contact My Oracle Support. |
5.1.7 BSF_TRAFFIC_RATE_ABOVE_MINOR_THRESHOLD
Table 5-8 BSF_TRAFFIC_RATE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | BSF service Ingress traffic Rate is above threshold of
Max MPS(1000) (current value is: {{ $value }})
The total Binding Management service Ingress traffic rate has crossed the configured threshold of 700 TPS. The default value of this
alert trigger point in the |
| Summary | Traffic Rate is above 70 Percent of Max requests per second(1000) |
| Severity | Minor |
| Expression |
sum(rate(ocbsf_ingress_request_total[2m])) >= 700 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.2 |
| Metric Used | ocbsf_ingress_request_total |
| Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels can be
configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
For any assistance, contact My Oracle Support. |
5.1.8 BSF_TRAFFIC_RATE_ABOVE_MAJOR_THRESHOLD
Table 5-9 BSF_TRAFFIC_RATE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | BSF service Ingress traffic Rate is above threshold of Max MPS(1000) (current value is: {{ $value }}) |
| Summary | Traffic Rate is above 80 Percent of Max requests per second(1000) |
| Severity | Major |
| Expression | sum(rate(ocbsf_ingress_request_total[2m])) >= 800 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.2 |
| Metric Used | ocbsf_ingress_request_total |
| Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels can be
configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
For any assistance, contact My Oracle Support. |
5.1.9 BSF_TRAFFIC_RATE_ABOVE_CRITICAL_THRESHOLD
Table 5-10 BSF_TRAFFIC_RATE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | BSF service Ingress traffic Rate is above threshold of Max MPS(1000) (current value is: {{ $value }}) |
| Summary | Traffic Rate is above 90 Percent of Max requests per second(1000) |
| Severity | Critical |
| Expression | sum(rate(ocbsf_ingress_request_total[2m])) >= 900 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.2 |
| Metric Used | ocbsf_ingress_request_total |
| Recommended Actions | The alert gets cleared when the Ingress traffic rate
falls below the threshold.
Note: Threshold levels can be
configured using the It is recommended to assess the reason for
additional traffic. Perform the following steps to analyze the cause
of increased traffic:
For any assistance, contact My Oracle Support. |
5.1.10 BINDING_QUERY_RESPONSE_ERROR_MINOR
Table 5-11 BINDING_QUERY_RESPONSE_ERROR_MINOR
| Field | Details |
|---|---|
| Description | At least 30% of the Binding Query connection requests failed. |
| Summary | At least 30% of the Binding Query requests failed. |
| Severity | Minor |
| Expression | (sum(rate(ocbsf_bindingQuery_response_total{response_code!~"2.*"}[10m])
or (appinfo_service_running * 0 ) ) /
sum(rate(ocbsf_bindingQuery_response_total[10m]))) * 100 >= 30
|
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.36 |
| Metric Used | ocbsf_bindingQuery_response_total |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.11 BINDING_QUERY_RESPONSE_ERROR_MAJOR
Table 5-12 BINDING_QUERY_RESPONSE_ERROR_MAJOR
| Field | Details |
|---|---|
| Description | At least 50% of the Binding Query connection requests failed. |
| Summary | At least 50% of the Binding Query requests failed. |
| Severity | Major |
| Expression | (sum(rate(ocbsf_bindingQuery_response_total{response_code!~"2.*"}[10m])
or (appinfo_service_running * 0 ) ) /
sum(rate(ocbsf_bindingQuery_response_total[10m]))) * 100 >= 50
|
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.36 |
| Metric Used | ocbsf_bindingQuery_response_total |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.12 BINDING_QUERY_RESPONSE_ERROR_CRITICAL
Table 5-13 BINDING_QUERY_RESPONSE_ERROR_CRITICAL
| Field | Details |
|---|---|
| Description | At least 70% of the Binding Query connection requests failed. |
| Summary | At least 70% of the Binding Query requests failed. |
| Severity | Critical |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.36 |
| Expression | (sum(rate(ocbsf_bindingQuery_response_total{response_code!~"2.*"}[10m]) or (appinfo_service_running * 0 ) ) / sum(rate(ocbsf_bindingQuery_response_total[10m]))) * 100 >= 70 |
| Metric Used | ocbsf_bindingQuery_response_total |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.13 DIAM_RESPONSE_NETWORK_ERROR_MINOR
Table 5-14 DIAM_RESPONSE_NETWORK_ERROR_MINOR
| Field | Details |
|---|---|
| Description | At least 20% of the Diam Response connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'. |
| Summary | At least 20% of the Diam Response requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'. |
| Severity | Minor |
| Expression |
(sum(rate(ocbsf_diam_response_network_total{responseCode="3002"}[10m]) or (appinfo_service_running * 0 ) ) / sum(rate(ocbsf_diam_response_network_total[10m]))) * 100 >= 20 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.35 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.14 DIAM_RESPONSE_NETWORK_ERROR_MAJOR
Table 5-15 DIAM_RESPONSE_NETWORK_ERROR_MAJOR
| Field | Details |
|---|---|
| Description | At least 50% of the Diam Response connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'. |
| Summary | At least 50% of the Diam Response requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'. |
| Severity | Major |
| Expression |
(sum(rate(ocbsf_diam_response_network_total{responseCode="3002"}[10m]) or (appinfo_service_running * 0 ) ) / sum(rate(ocbsf_diam_response_network_total[10m]))) * 100 >= 50 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.35 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.15 DIAM_RESPONSE_NETWORK_ERROR_CRITICAL
Table 5-16 DIAM_RESPONSE_NETWORK_ERROR_CRITICAL
| Field | Details |
|---|---|
| Description | At least 75% of the Diam Response connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'. |
| Summary | At least 75% of the Diam Response requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'. |
| Severity | Critical |
| Expression | (sum(rate(ocbsf_diam_response_network_total{responseCode="3002"}[10m]) or (appinfo_service_running * 0 ) ) / sum(rate(ocbsf_diam_response_network_total[10m]))) * 100 >= 75 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.35 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.16 DUPLICATE_BINDING_REQUEST_ERROR_MINOR
Table 5-17 DUPLICATE_BINDING_REQUEST_ERROR_MINOR
| Field | Details |
|---|---|
| Description | At least 30% of the Binding Registration requests failed were duplicate failures. |
| Summary | At least 30% of the Binding Registration requests failed were duplicate failures. |
| Severity | Minor |
| Expression |
(sum(rate({_name_=~"ocbsf_collision_detection.*"}[10m]) or (appinfo_service_running * 0)) / sum(rate(ocbsf_ingress_request_total {operation_type="register"} [10m]))) * 100 >= 30 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.37 |
| Metric Used | ocbsf_ingress_request_total |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.17 DUPLICATE_BINDING_REQUEST_ERROR_MAJOR
Table 5-18 DUPLICATE_BINDING_REQUEST_ERROR_MAJOR
| Field | Details |
|---|---|
| Description | At least 50% of the Binding Registration requests failed were duplicate failures. |
| Summary | At least 50% of the Binding Registration requests failed were duplicate failures. |
| Severity | Major |
| Expression |
(sum(rate({_name_=~"ocbsf_collision_detection.*"}[10m]) or (appinfo_service_running * 0)) / sum(rate(ocbsf_ingress_request_total {operation_type="register"} [10m]))) * 100 >= 50 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.37 |
| Metric Used | ocbsf_ingress_request_total |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.18 DUPLICATE_BINDING_REQUEST_ERROR_CRITICAL
Table 5-19 DUPLICATE_BINDING_REQUEST_ERROR_CRITICAL
| Field | Details |
|---|---|
| Description | At least 70% of the Binding Registration requests failed were duplicate failures. |
| Summary | At least 70% of the Binding Registration requests failed were duplicate failures. |
| Severity | Critical |
| Expression |
(sum(rate({_name_=~"ocbsf_collision_detection.*"}[10m]) or (appinfo_service_running * 0)) / sum(rate(ocbsf_ingress_request_total {operation_type="register"} [10m]))) * 100 >= 70 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.37 |
| Metric Used | ocbsf_ingress_request_total |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.19 INGRESS_TOTAL_ERROR_RATE_ABOVE_MINOR_THRESHOLD
Table 5-20 INGRESS_TOTAL_ERROR_RATE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 1 Percent of Total on BSF service (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 1 Percent of Total Transactions |
| Severity | Minor |
| Expression | (sum(rate(ocbsf_ingress_response_total{response_code!~"2.*"}[24h])) / sum(rate(ocbsf_ingress_response_total[24h]))) * 100 >= 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.3 |
| Metric Used | ocbsf_ingress_response_total |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 1% of the total transactions.
For any assistance, contact My Oracle Support. |
5.1.20 INGRESS_TOTAL_ERROR_RATE_ABOVE_MAJOR_THRESHOLD
Table 5-21 INGRESS_TOTAL_ERROR_RATE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 5 Percent of Total on BSF service (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 5 Percent of Total Transactions |
| Severity | Major |
| Expression | (sum(rate(ocbsf_ingress_response_total{response_code!~"2.*"}[24h])) / sum(rate(ocbsf_ingress_response_total[24h]))) * 100 >= 5 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.3 |
| Metric Used | ocbsf_ingress_response_total |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 5% of the total transactions.
For any assistance, contact My Oracle Support. |
5.1.21 INGRESS_TOTAL_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD
Table 5-22 INGRESS_TOTAL_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | Transaction Error Rate detected above 10 Percent of Total on BSF service (current value is: {{ $value }}) |
| Summary | Transaction Error Rate detected above 10 Percent of Total Transactions |
| Severity | Critical |
| Expression | (sum(rate(ocbsf_ingress_response_total{response_code!~"2.*"}[24h])) / sum(rate(ocbsf_ingress_response_total[24h]))) * 100 >= 10 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.3 |
| Metric Used | ocbsf_ingress_response_total |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 10% of the total transactions.
For any assistance, contact My Oracle Support. |
5.1.22 PCF_BINDING_ERROR_RATE_ABOVE_MINOR_THRESHOLD
Table 5-23 PCF_BINDING_ERROR_RATE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | PCF Binding Error Rate above 1 Percent in {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | PCF Binding Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
| Severity | Minor |
| Expression | (sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200|204",method="GET"}[24h])) / sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="GET"}[24h]))) * 100 >= 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.5 |
| Metric Used | http_server_requests_seconds_count |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 1% of the total transactions.
To assess the reason for failed transactions, check the service specific metrics for the GET method. For any assistance, contact My Oracle Support. |
5.1.23 PCF_BINDING_ERROR_RATE_ABOVE_MAJOR_THRESHOLD
Table 5-24 PCF_BINDING_ERROR_RATE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | PCF Binding Error Rate above 5 Percent in {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | PCF Binding Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
| Severity | Major |
| Expression | (sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200|204",method="GET"}[24h])) / sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="GET"}[24h]))) * 100 >= 5 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.5 |
| Metric Used | http_server_requests_seconds_count |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 5% of the total transactions.
To assess the reason for failed transactions, check the service specific metrics for the GET method. For any assistance, contact My Oracle Support. |
5.1.24 PCF_BINDING_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD
Table 5-25 PCF_BINDING_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | PCF Binding Error Rate above 10 Percent in {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | PCF Binding Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
| Severity | Critical |
| Expression | (sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200|204",method="GET"}[24h])) / sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="GET"}[24h]))) * 100 >= 10 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.5 |
| Metric Used | http_server_requests_seconds_count |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 10% of the total transactions.
To assess the reason for failed transactions, check the service specific metrics for the GET method. For any assistance, contact My Oracle Support. |
5.1.25 INGRESS_CREATE_ERROR_RATE_ABOVE_MINOR_THRESHOLD
Table 5-26 INGRESS_CREATE_ERROR_RATE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | BSF Ingress Create Error Rate above 1 Percent in {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | Transaction Create Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
| Severity | Minor |
| Expression | sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200|201",method="POST"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="POST"}[24h])) * 100 >= 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.4 |
| Metric Used | http_server_requests_seconds_count |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 1% of the total transactions.
To assess the reason for failed transactions, check the service specific metrics for the POST method. For any assistance, contact My Oracle Support. |
5.1.26 INGRESS_CREATE_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD
Table 5-27 INGRESS_CREATE_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | BSF Ingress Create Error Rate above 10 Percent in {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | Transaction Create Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
| Severity | Critical |
| Expression | sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200|201",method="POST"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="POST"}[24h])) * 100 >= 10 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.4 |
| Metric Used | http_server_requests_seconds_count |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 10% of the total transactions.
To assess the reason for failed transactions, check the service specific metrics for the POST method. For any assistance, contact My Oracle Support. |
5.1.27 INGRESS_CREATE_ERROR_RATE_ABOVE_MAJOR_THRESHOLD
Table 5-28 INGRESS_CREATE_ERROR_RATE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | BSF Ingress Create Error Rate above 5 Percent in {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | Transaction Create Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
| Severity | Major |
| Expression | sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200|201",method="POST"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="POST"}[24h])) * 100 >= 5 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.4 |
| Metric Used | http_server_requests_seconds_count |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 5% of the total transactions.
To assess the reason for failed transactions, check the service specific metrics for the POST method. For any assistance, contact My Oracle Support. |
5.1.28 INGRESS_DELETE_ERROR_RATE_ABOVE_MINOR_THRESHOLD
Table 5-29 INGRESS_DELETE_ERROR_RATE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | Ingress Delete Error Rate above 1 Percent in {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | Ingress Delete Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
| Severity | Minor |
| Expression | sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!="204",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h])) * 100 >= 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.6 |
| Metric Used | http_server_requests_seconds_count |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 1% of the total transactions.
To assess the reason for failed transactions, check the service specific metrics for the DELETE method. For any assistance, contact My Oracle Support. |
5.1.29 INGRESS_DELETE_ERROR_RATE_ABOVE_MAJOR_THRESHOLD
Table 5-30 INGRESS_DELETE_ERROR_RATE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | Ingress Delete Error Rate above 5 Percent in {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | Ingress Delete Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
| Severity | Major |
| Expression | sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!="204",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h])) * 100 >= 5 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.6 |
| Metric Used | http_server_requests_seconds_count |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 5% of the total transactions.
To assess the reason for failed transactions, check the service specific metrics for the DELETE method. For any assistance, contact My Oracle Support. |
5.1.30 INGRESS_DELETE_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD
Table 5-31 INGRESS_DELETE_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | Ingress Delete Error Rate above 10 Percent in {{$labels.microservice}} in {{$labels.namespace}} |
| Summary | Ingress Delete Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }}) |
| Severity | Critical |
| Expression | sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!="204",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h])) * 100 >= 10 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.6 |
| Metric Used | http_server_requests_seconds_count |
| Recommended Actions | The alert gets cleared when the number of failed
transactions is below 10% of the total transactions.
To assess the reason for failed transactions, check the service specific metrics for the DELETE method. For any assistance, contact My Oracle Support. |
5.1.31 DB_TIER_DOWN_ALERT
Table 5-32 DB_TIER_DOWN_ALERT
| Field | Details |
|---|---|
| Description | DB cannot be reachable! |
| Summary | DB cannot be reachable! |
| Severity | Critical |
| Expression | appinfo_category_running{category="database", application="ocbsf"} != 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.7 |
| Metric Used | appinfo_category_running |
| Recommended Actions |
Check whether the database service is up. Check the status or age of the MySQL pod by using the
following
command:
where <namespace> is the namespace used to deploy MySQL pod. This alert is cleared automatically when the DB service is up and running. |
5.1.32 CPU_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
Table 5-33 CPU_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | CPU usage for {{$labels.microservice}} service is above 60 |
| Summary | CPU usage for {{$labels.microservice}} service is above 60 |
| Severity | Minor |
| Expression | sum(rate(cgroup_cpu_usage{application="ocbsf"}[2m])) >= 60 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.8 |
| Metric Used | cgroup_cpu_usage |
| Recommended Actions | The alert gets cleared when the CPU utilization falls
below the minor threshold or crosses the major threshold, in which case
CPUUsagePerServiceAboveMajorThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any assistance, contact My Oracle Support. |
5.1.33 CPU_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
Table 5-34 CPU_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | CPU usage for {{$labels.microservice}} service is above 80 |
| Summary | CPU usage for {{$labels.microservice}} service is above 80 |
| Severity | Major |
| Expression | sum(rate(cgroup_cpu_usage{application="ocbsf"}[2m])) >= 80 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.9 |
| Metric Used | cgroup_cpu_usage |
| Recommended Actions | The alert gets cleared when the CPU utilization falls
below the major threshold or crosses the critical threshold, in which
case CPUUsagePerServiceAboveCriticalThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any assistance, contact My Oracle Support. |
5.1.34 CPU_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
Table 5-35 CPU_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | CPU usage for {{$labels.microservice}} service is above 90 |
| Summary | CPU usage for {{$labels.microservice}} service is above 90 |
| Severity | Critical |
| Expression | sum(rate(cgroup_cpu_usage{application="ocbsf"}[2m])) >= 90 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.10 |
| Metric Used | cgroup_cpu_usage |
| Recommended Actions | The alert gets cleared when the CPU utilization falls
below the critical threshold.
Note: Threshold levels can be
configured using the For any assistance, contact My Oracle Support. |
5.1.35 MEMORY_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
Table 5-36 MEMORY_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD
| Field | Details |
|---|---|
| Description | Memory usage for {{$labels.microservice}} service is above 60 |
| Summary | Memory usage for {{$labels.microservice}} service is above 60 |
| Severity | Minor |
| Expression | sum(rate(cgroup_memory_usage{application="ocbsf"}[2m])) >= 60 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.11 |
| Metric Used | cgroup_memory_usage |
| Recommended Actions | The alert gets cleared when the memory utilization falls
below the minor threshold or crosses the major threshold, in which case
MemoryUsagePerServiceAboveMajorThreshold alert shall be
raised.
Note: Threshold levels can be configured using
the For any assistance, contact My Oracle Support. |
5.1.36 MEMORY_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
Table 5-37 MEMORY_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD
| Field | Details |
|---|---|
| Description | Memory usage for {{$labels.microservice}} service is above 80 |
| Summary | Memory usage for {{$labels.microservice}} service is above 80 |
| Severity | Major |
| Expression | sum(rate(cgroup_memory_usage{application="ocbsf"}[2m])) >= 80 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.12 |
| Metric Used | cgroup_memory_usage |
| Recommended Actions | The alert gets cleared when the memory utilization falls
below the major threshold or crosses the critical threshold, in which
case MemoryUsagePerServiceAboveCriticalThreshold alert shall
be raised.
Note: Threshold levels can be configured using the
For any additional guidance, contact My Oracle Support. |
5.1.37 MEMORY_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
Table 5-38 MEMORY_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD
| Field | Details |
|---|---|
| Description | Memory usage for {{$labels.microservice}} service is above 90 |
| Summary | Memory usage for {{$labels.microservice}} service is above 90 |
| Severity | Critical |
| Expression | sum(rate(cgroup_memory_usage{application="ocbsf"}[2m])) >= 90 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.13 |
| Metric Used | cgroup_memory_usage |
| Recommended Actions | The alert gets cleared when the memory utilization falls
below the critical threshold.
Note: Threshold levels can be
configured using the For any assistance, contact My Oracle Support. |
5.1.38 NRF_COMMUNICATION_FAILURE
Table 5-39 NRF_COMMUNICATION_FAILURE
| Field | Details |
|---|---|
| Description | There has been a external failure communication error with NRF. |
| Summary | There has been a external failure communication error with NRF. |
| Severity | Critical |
| Expression | ocbsf_nrfclient_nrf_operative_status == 0 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.33 |
| Metric Used | ocbsf_nrfclient_nrf_operative_status |
| Recommended Actions | For any assistance, contact My Oracle Support. |
5.1.39 NRF_SERVICE_REQUEST_FAILURE
Table 5-40 NRF_SERVICE_REQUEST_FAILURE
| Field | Details |
|---|---|
| Description | There has been a Service Request Failure with NRF, either a Registration failure, Heartbeat failure, or Profile Update Failure. |
| Summary | There has been a Service Request Failure with NRF, either a Registration failure, Heartbeat failure, or Profile Update Failure. |
| Severity | Critical |
| Expression |
ocbsf_nrfclient_nfUpdate_status == 0 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.34 |
| Metric Used | ocbsf_nrfclient_nfUpdate_status |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.40 PERF_INFO_ACTIVE_OVERLOAD_THRESHOLD_FETCH_FAILED
Table 5-41 PERF_INFO_ACTIVE_OVERLOAD_THRESHOLD_FETCH_FAILED
| Field | Details |
|---|---|
| Description | The application fails to get the current active overload level threshold data. |
| Summary | The application raises
PERF_INFO_ACTIVE_OVERLOAD_THRESHOLD_FETCH_FAILED
alert when it fails to fetch the current active overload level threshold
data and active_overload_threshold_fetch_failed ==
1.
|
| Severity | Major |
| Expression | active_overload_threshold_fetch_failed == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.20 |
| Metric Used | active_overload_threshold_fetch_failed |
| Recommended Actions |
The alert gets cleared when the application fetches the current active overload level threshold data. For any additional guidance, contact My Oracle Support. |
5.1.41 POD_DOC
Table 5-42 POD_DOC
| Field | Details |
|---|---|
| Description | Pod Congestion status of {{$labels.microservice}} service is DoC |
| Summary | Pod Congestion status of {{$labels.microservice}} service is DoC |
| Severity | Major |
| Expression |
ocbsf_pod_congestion_state == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.25 |
| Metric Used | ocbsf_pod_congestion_state |
| Recommended Actions |
Cause: The pod entered DANGER_OF_CONGESTION (DOC) due to rising CPU and/or queue close to configured limits. Diagnostic Information: Check pod_congestion_state == 1; review pod_resource_stress (cpu and queue) and pod_cong_state_report_total to see recent transitions. Recovery: Confirm the DOC thresholds and active Load Shedding rule. If DOC is triggered by brief spikes, increase stateChangeSampleCount or the calculation interval. If sustained, consider slightly increasing discard aggressiveness for low-value calls per policy. |
5.1.42 POD_CONGESTED
Table 5-43 POD_CONGESTED
| Field | Details |
|---|---|
| Description | Pod Congestion status of {{$labels.microservice}} service is congested |
| Summary | Pod Congestion status of {{$labels.microservice}} service is congested |
| Severity | Critical |
| Expression | ocbsf_pod_congestion_state==4 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.26 |
| Metric Used | ocbsf_pod_congestion_state |
| Recommended Actions |
Cause: The pod has reached the CONGESTED state based on CPU consumption and/or the pending request queue exceeding the active threshold profile. Diagnostic Information: Check pod_congestion_state (expect 4), pod_resource_congestion_state for cpu/queue, pod_resource_stress, and ocbsf_http_congestion_message_reject_total (filter by congestionState, requestUri, requestMethod, priority). Recovery: In CNC Console, (BSF → Overload and Congestion Control → Congestion Control), ensure the feature is enabled and the intended Thresholds and Load Shedding profiles are active. If rejections are excessive, raise discard priority for the current state or relax thresholds based on performance baselines. Consider increasing stateChangeSampleCount and/or the calculation interval to reduce flapping due to short spikes. |
5.1.43 POD_CONGESTION_L1
Table 5-44 POD_CONGESTION_L1
| Field | Details |
|---|---|
| Description | Pod Congestion status of {{$labels.microservice}} service is Congestion_L1. |
| Summary | Pod Congestion status of {{$labels.microservice}} service is Congestion_L1. |
| Severity | Critical |
| Expression |
ocbsf_pod_congestion_state == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.52 |
| Metric Used | ocbsf_pod_congestion_state |
| Recommended Actions |
Cause: The pod reached CONGESTION_L1 based on CPU and/or queue thresholds in the active profile. Diagnostic Information: Check pod_congestion_state == 2; identify driver via pod_resource_congestion_state (CPU vs. queue); review ocbsf_http_congestion_message_reject_total with congestionState=CONGESTION_L1. Recovery: Confirm L1 discard priority (default 24) and thresholds. If important calls are being dropped, adjust discard priority or tune thresholds to match the expected load profile. |
5.1.44 POD_CPU_CONGESTION_L1
Table 5-45 POD_CPU_CONGESTION_L1
| Field | Details |
|---|---|
| Description | Pod resource is in Congestion_L1 for CPU type. |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L1 for CPU type. |
| Severity | Critical |
| Expression |
ocbsf_pod_resource_congestion_state{type="cpu"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.54 |
| Metric Used | ocbsf_pod_resource_congestion_state |
| Recommended Actions |
Cause: CPU utilization reached CONGESTION_L1. Diagnostic Information: pod_resource_congestion_state{resourceType="cpu"} == 2; check CPU stress and transitions. Recovery: Validate L1 CPU thresholds and discard priority. If brief spikes cause churn, increase stateChangeSampleCount; otherwise increase shedding at L1. |
5.1.45 POD_CONGESTION_L2
Table 5-46 POD_CONGESTION_L2
| Field | Details |
|---|---|
| Description | Pod Congestion status of {{$labels.microservice}} service is Congestion_L2 |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L2. |
| Severity | Critical |
| Expression |
ocbsf_pod_congestion_state == 3 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.53 |
| Metric Used | ocbsf_pod_congestion_state |
| Recommended Actions |
Cause: The pod reached CONGESTION_L2, indicating higher stress than L1. Diagnostic Information: Check pod_congestion_state == 3; validate resource-specific states and stress metrics; inspect rejection counters at L2. Recovery: Use the L2 discard priority (default 18) to shed more low-priority traffic; consider tuning thresholds and sample counts to balance protection versus availability. |
5.1.46 POD_CPU_CONGESTION_L2
Table 5-47 POD_CPU_CONGESTION_L2
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L2 for CPU type. |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L2 for CPU type. |
| Severity | Critical |
| Expression |
ocbsf_pod_resource_congestion_state{type="cpu"} == 3 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.55 |
| Metric Used | ocbsf_pod_resource_congestion_state |
| Recommended Actions |
Cause: CPU utilization reached CONGESTION_L2. Diagnostic Information: pod_resource_congestion_state{resourceType="cpu"} == 3; check CPU stress, EMA interval/ratio, and L2 rejection counters. Recovery: Raise L2 discard priority to protect the pod; tune CPU thresholds or EMA cadence only after comparing with test baselines. |
5.1.47 POD_PENDING_REQUEST_DOC
Table 5-48 POD_PENDING_REQUEST_DOC
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.microservice}} service is DoC for PendingRequest type |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is DoC for PendingRequest type |
| Severity | Major |
| Expression | ocbsf_pod_resource_congestion_state{type="queue"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.27 |
| Metric Used | ocbsf_pod_resource_congestion_state{type="queue"} |
| Recommended Actions |
Cause: The pending request queue is in DANGER_OF_CONGESTION. Diagnostic Information: Validate pod_resource_congestion_state{resourceType="queue"} == 1 and queue-related pod_resource_stress. Recovery: Review queue DOC thresholds; if early protection is desired, allow gentle shedding of lowest-priority traffic at DOC, otherwise tune thresholds to match observed load. |
5.1.48 POD_PENDING_REQUEST_CONGESTED
Table 5-49 POD_PENDING_REQUEST_CONGESTED
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.microservice}} service is congested for PendingRequest type |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is congested for PendingRequest type |
| Severity | Critical |
| Expression | ocbsf_pod_resource_congestion_state{type="queue"} == 4 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.28 |
| Metric Used | ocbsf_pod_resource_congestion_state{type="queue"} |
| Recommended Actions |
Cause: The pending HTTP request queue is in CONGESTED state. Diagnostic Information: Verify pod_resource_congestion_state{resourceType="queue"} == 4 and pod_resource_stress{resourceType="queue"}; review ocbsf_http_congestion_message_reject_total for low-priority discards at this level. Recovery: Validate the queue thresholds in the active profile and the CONGESTED discard priority. If backlog persists, increase shedding (raise discard priority) so that the lower priority (higher number) requests are rejected earlier. |
5.1.49 POD_CPU_DOC
Table 5-50 POD_CPU_DOC
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.microservice}} service is DoC for CPU type |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is DoC for CPU type |
| Severity | Major |
| Expression |
ocbsf_pod_resource_congestion_state{type="cpu"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.29 |
| Metric Used | ocbsf_pod_resource_congestion_state{type="cpu"} |
| Recommended Actions |
Cause: CPU utilization is in DANGER_OF_CONGESTION per current thresholds and EMA settings. Diagnostic Information: Check pod_resource_congestion_state{resourceType="cpu"} == 1 and pod_resource_stress{resourceType="cpu"}; confirm EMA parameters (interval and 70:30 ratios). Recovery: If transient, increase stateChangeSampleCount to avoid oscillation; if sustained, adjust CPU DOC threshold or enable mild shedding for non-critical, low-priority requests. |
5.1.50 POD_CPU_CONGESTED
Table 5-51 POD_CPU_CONGESTED
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.microservice}} service is congested for CPU type |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is congested for CPU type |
| Severity | Critical |
| Expression | ocbsf_pod_resource_congestion_state{type="cpu"} == 4 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.30 |
| Metric Used | ocbsf_pod_resource_congestion_state |
| Recommended Actions |
Cause: CPU utilization reached CONGESTED state. Diagnostic Information: Validate pod_resource_congestion_state{resourceType="cpu"} in {2,4} per alert rule (CONGESTION_L1 and/or CONGESTED), and pod_resource_stress{resourceType="cpu"}; check message rejections at this congestion level. Recovery: Tighten protection by raising the discard priority at this state so more low-priority requests are dropped. Reassess CPU thresholds and EMA intervals only after reviewing benchmarks. |
5.1.51 POD_MEMORY_DOC
Table 5-52 POD_MEMORY_DOC
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.microservice}} service is DoC for Memory type |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is DoC for Memory type |
| Severity | Major |
| Expression | ocbsf_pod_resource_congestion_state{type="memory"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.31 |
| Metric Used | ocbsf_pod_resource_congestion_state{type="memory"} |
| Recommended Actions | The alert gets cleared when the system memory comes
below the configured threshold value.
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.52 POD_MEMORY_CONGESTED
Table 5-53 POD_MEMORY_CONGESTED
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.microservice}} service is congested for Memory type |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is congested for Memory type |
| Severity | Critical |
| Expression | ocbsf_pod_resource_congestion_state{type="memory"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.32 |
| Metric Used | ocbsf_pod_resource_congestion_state{type="memory"} |
| Recommended Actions | The alert gets cleared when the system memory comes
below the configured threshold value.
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.53 SERVICE_OVERLOADED
Table 5-54 SERVICE_OVERLOADED
| Field | Details |
|---|---|
| Description | Overload Level of {{$labels.microservice}} service is L1 |
| Summary | Overload Level of {{$labels.microservice}} service is L1 |
| Severity | Minor |
| Expression | load_level == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.14 |
| Metric Used | load_level |
| Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
Table 5-55 SERVICE_OVERLOADED
| Field | Details |
|---|---|
| Description | Overload Level of {{$labels.microservice}} service is L2 |
| Summary | Overload Level of {{$labels.microservice}} service is L2 |
| Severity | Major |
| Expression | load_level == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.14 |
| Metric Used | load_level |
| Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
Table 5-56 SERVICE_OVERLOADED
| Field | Details |
|---|---|
| Description | Overload Level of {{$labels.service}} service is L3 |
| Summary | Overload Level of {{$labels.service}} service is L3 |
| Severity | Critical |
| Expression | load_level == 3 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.14 |
| Metric Used | load_level |
| Recommended Actions | The alert gets cleared when the system is back to normal
state.
For any additional guidance, contact My Oracle Support. |
5.1.54 SERVICE_RESOURCE_OVERLOADED
Alerts when service is in overload state due to memory usage
Table 5-57 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L1 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L1 for {{$labels.type}} type |
| Severity | Minor |
| Expression | service_resource_overload_level{type="memory"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="memory"} |
| Recommended Actions | The alert gets cleared when the memory usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 5-58 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L2 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L2 for {{$labels.type}} type |
| Severity | Major |
| Expression | service_resource_overload_level{type="memory"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="memory"} |
| Recommended Actions | The alert gets cleared when the memory usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 5-59 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L3 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L3 for {{$labels.type}} type |
| Severity | Critical |
| Expression | service_resource_overload_level{type="memory"} == 3 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="memory"} |
| Recommended Actions | The alert gets cleared when the memory usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Alerts when service is in overload state due to CPU usage
Table 5-60 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L1 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L1 for {{$labels.type}} type |
| Severity | Minor |
| Expression | service_resource_overload_level{type="cpu"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="cpu"} |
| Recommended Actions | The alert gets cleared when the CPU usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 5-61 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L2 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L2 for {{$labels.type}} type |
| Severity | Major |
| Expression | service_resource_overload_level{type="cpu"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="cpu"} |
| Recommended Actions | The alert gets cleared when the CPU usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 5-62 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L3 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L3 for {{$labels.type}} type |
| Severity | Critical |
| Expression | service_resource_overload_level{type="cpu"} == 3 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="cpu"} |
| Recommended Actions | The alert gets cleared when the CPU usage of the
service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Alerts when service is in overload state due to number of pending messages
Table 5-63 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L1 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L1 for {{$labels.type}} type |
| Severity | Minor |
| Expression | service_resource_overload_level{type="svc_pending_count"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="svc_pending_count"} |
| Recommended Actions | The alert gets cleared when the number of pending
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 5-64 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L2 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L2 for {{$labels.type}} type |
| Severity | Major |
| Expression | service_resource_overload_level{type="svc_pending_count"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="svc_pending_count"} |
| Recommended Actions | The alert gets cleared when the number of pending
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 5-65 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L3 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L3 for {{$labels.type}} type |
| Severity | Critical |
| Expression | service_resource_overload_level{type="svc_pending_count"} == 3 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="svc_pending_count"} |
| Recommended Actions | The alert gets cleared when the number of pending
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Alerts when service is in overload state due to number of failed requests
Table 5-66 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L1 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L1 for {{$labels.type}} type |
| Severity | Minor |
| Expression | service_resource_overload_level{type="svc_failure_count"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="svc_failure_count"} |
| Recommended Actions | The alert gets cleared when the number of failed
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 5-67 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L2 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L2 for {{$labels.type}} type |
| Severity | Major |
| Expression | service_resource_overload_level{type="svc_failure_count"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="svc_failure_count"} |
| Recommended Actions | The alert gets cleared when the number of failed
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
Table 5-68 SERVICE_RESOURCE_OVERLOADED
| Field | Details |
|---|---|
| Description | {{$labels.microservice}} service is L3 for {{$labels.type}} type |
| Summary | {{$labels.microservice}} service is L3 for {{$labels.type}} type |
| Severity | Critical |
| Expression | service_resource_overload_level{type="svc_failure_count"} == 3 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.15 |
| Metric Used | service_resource_overload_level{type="svc_failure_count"} |
| Recommended Actions | The alert gets cleared when the number of failed
messages of the service is back to normal state.
For any additional guidance, contact My Oracle Support. |
5.1.55 SYSTEM_IMPAIRMENT_MAJOR
Table 5-69 SYSTEM_IMPAIRMENT_MAJOR
| Field | Details |
|---|---|
| Description | Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 10 minutes. |
| Summary | Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 10 minutes. |
| Severity | Major |
| Expression | (db_tier_replication_status{role="failed"} == 0) or (db_tier_replication_status{role="active"} == 0) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="standby"})) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="failed"})) or (avg_over_time(db_tier_binlog_used_bytes_percentage[5m])>= 80) |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.16 |
| Metric Used | db_tier_replication_status |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
5.1.56 SYSTEM_IMPAIRMENT_CRITICAL
Table 5-70 SYSTEM_IMPAIRMENT_CRITICAL
| Field | Details |
|---|---|
| Description | Critical impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 30 minutes. |
| Summary | Critical impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 30 minutes. |
| Severity | Critical |
| Expression | (db_tier_replication_status{role="failed"} == 0) or (db_tier_replication_status{role="active"} == 0) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="standby"})) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="failed"})) or (avg_over_time(db_tier_binlog_used_bytes_percentage[5m])>= 80) |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.16 |
| Metric Used | db_tier_replication_status |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
5.1.57 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN
Table 5-71 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN
| Field | Details |
|---|---|
| Description | System Operational State is now in partial shutdown state. |
| Summary | System Operational State is now in partial shutdown state. |
| Severity | Major |
| Expression | system_operational_state == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.17 |
| Metric Used | system_operational_state == 2 |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
5.1.58 SYSTEM_OPERATIONAL_STATE_COMPLETE_SHUTDOWN
Table 5-72 SYSTEM_OPERATIONAL_COMPLETE_SHUTDOWN
| Field | Details |
|---|---|
| Description | System Operational State is now in complete shutdown state |
| Summary | System Operational State is now in complete shutdown state |
| Severity | Critical |
| Expression | system_operational_state == 3 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.17 |
| Metric Used | system_operational_state |
| Recommended Actions |
For any additional guidance, contact My Oracle Support. |
5.1.59 DIAM_CONN_PEER_DOWN
Table 5-73 DIAM_CONN_PEER_DOWN
| Field | Details |
|---|---|
| Description | Diameter connection to peer {{ $labels.peerHost }} is down. |
| Summary | Diameter connection to peer down. |
| Severity | Major |
| Expression | (sum by (namespace,peerHost)(ocbsf_diam_conn_network) == 0) and (sum by (namespace,peerHost)(max_over_time(ocbsf_diam_conn_network[24h])) != 0) |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.18 |
| Metric Used | ocbsf_diam_conn_network |
| Recommended Actions | For any assistance, contact My Oracle Support. |
5.1.60 DIAM_CONN_NETWORK_DOWN
Table 5-74 DIAM_CONN_NETWORK_DOWN
| Field | Details |
|---|---|
| Description | All diameter network connections are down. |
| Summary | All diameter network connections are down. |
| Severity | Critical |
| Expression | sum by (namespace)(ocbsf_diam_conn_network) == 0 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.19 |
| Metric Used | ocbsf_diam_conn_network |
| Recommended Actions |
For any assistance, contact My Oracle Support. |
5.1.61 DIAM_RESPONSE_REALM_VALIDATION_ERROR_CRITICAL
Table 5-75 DIAM_RESPONSE_REALM_VALIDATION_ERROR_CRITICAL
| Field | Details |
|---|---|
| Description | At least 75% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED', either of BSF realm or PCF Realm doesn't match with received destination realm in diameter message. |
| Summary | {{ $value }}% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED'. |
| Severity | CRITICAL |
| Expression | (sum(increase(ocbsf_diam_realm_validation_failed_total{responseCode="3003", appId="16777236"}[10m])) / sum(increase(ocbsf_diam_response_network_total{appId="16777236"}[10m]))) * 100 >= 75 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.41 |
| Metric Used | ocbsf_diam_realm_validation_failed_total |
| Recommended Actions |
|
5.1.62 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MAJOR
Table 5-76 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MAJOR
| Field | Details |
|---|---|
| Description | At least 50% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED', either of BSF realm or PCF Realm doesn't match with received destination realm in diameter message. |
| Summary | {{ $value }}% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED'. |
| Severity | MAJOR |
| Expression | (sum(increase(ocbsf_diam_realm_validation_failed_total{responseCode="3003", appId="16777236"}[10m])) / sum(increase(ocbsf_diam_response_network_total{appId="16777236"}[10m]))) * 100 >= 50 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.41 |
| Metric Used | ocbsf_diam_realm_validation_failed_total |
| Recommended Actions |
|
5.1.63 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MINOR
Table 5-77 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MINOR
| Field | Details |
|---|---|
| Description | At least 20% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED', either of BSF realm or PCF Realm doesn't match with received destination realm in diameter message. |
| Summary | {{ $value }}% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED'. |
| Severity | MINOR |
| Expression | (sum(increase(ocbsf_diam_realm_validation_failed_total{responseCode="3003", appId="16777236"}[10m])) / sum(increase(ocbsf_diam_response_network_total{appId="16777236"}[10m]))) * 100 >= 20 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.41 |
| Metric Used | ocbsf_diam_realm_validation_failed_total |
| Recommended Actions |
|
5.1.64 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MINOR
Table 5-78 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MINOR
| Field | Details |
|---|---|
| Description |
At least 20 % of the BSF Audit Notification Requests sent to PCF to check for Suspected Stale Sessions have responded with a 5xx or 4xx (excluding 404) Status in the last 24 hours. |
| Summary | At least 20% of the BSF Notification Request for Audit have responded with a 5xx or 4xx (not 404) Status in the last 24 hours. |
| Severity | MINOR |
| Expression | (sum by (namespace, microservice) (increase(ocbsf_query_response_count_total{response_code=~"5..|4..|timeout",response_code!="404"}[24h])) / sum by (namespace, microservice) (increase(ocbsf_query_response_count_total[24h]))) * 100 >= 20 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.42 |
| Metric Used | ocbsf_query_response_count_total |
| Recommended Actions | Determine the reason why these notification requests are failing. This alert indicates that there is a potential issue either with the network communications, or the NF where the audit notifications point to. |
5.1.65 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MAJOR
Table 5-79 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MAJOR
| Field | Details |
|---|---|
| Description |
At least 40 % of the BSF Audit Notification Requests sent to PCF to check for Suspected Stale Sessions have responded with a 5xx or 4xx (excluding 404) Status in the last 24 hours. |
| Summary | {{ $value }} % of the BSF Audit Notification Requests sent to PCF to check for Suspected Stale Sessions have responded with a 5xx or 4xx (excluding 404) Status in the last 24 hours. |
| Severity | MAJOR |
| Expression | (sum by (namespace, microservice) (increase(ocbsf_query_response_count_total{response_code=~"5..|4..|timeout",response_code!="404"}[24h])) / sum by (namespace, microservice) (increase(ocbsf_query_response_count_total[24h]))) * 100 >= 40 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.42 |
| Metric Used | ocbsf_query_response_count_total |
| Recommended Actions | Determine the reason why these notification requests are failing. This alert indicates that there is an issue either with the network communications, or the NF where the audit notifications point to, that needs to be addressed as soon as possible. |
5.1.66 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_CRITICAL
Table 5-80 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_CRITICAL
| Field | Details |
|---|---|
| Description |
At least 20 % of the BSF Audit Notification Requests sent to PCF to check for Suspected Stale Sessions have responded with a 5xx or 4xx (excluding 404) Status in the last 24 hours. |
| Summary | At least 60% of the BSF Notification Request for Audit to
PCF (or its respective NF) failed with a 5xx or 4xx (not 404) Status in
the last 24 hours.
The threshold default value is
defined at |
| Severity | CRITICAL |
| Expression | (sum by (namespace, microservice) (increase(ocbsf_query_response_count_total{response_code=~"5..|4..|timeout",response_code!="404"}[24h])) / sum by (namespace, microservice) (increase(ocbsf_query_response_count_total[24h]))) * 100 >= 20 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.42 |
| Metric Used | ocbsf_query_response_count_total |
| Recommended Actions | Determine the reason why these notification requests are failing. This alert indicates that there is a critical issue either with the network communications, or the NF where the audit notifications point to, that needs to be addressed immediately. |
5.1.67 BSF_CONNECTION_FAILURE
Table 5-81 BSF_CONNECTION_FAILURE
| Field | Details |
|---|---|
| Description | Connection failure on Egress and Ingress Gateways for incoming and outgoing connections. |
| Summary | Connection failure on Egress and Ingress Gateways for incoming and outgoing connections. |
| Severity | Major |
| Expression | sum(increase(ocbsf_oc_ingressgateway_connection_failure_total[5m]) >0 or (ocbsf_oc_ingressgateway_connection_failure_total unless ocbsf_oc_ingressgateway_connection_failure_total offset 5m )) by (namespace,app, error_reason) > 0 or sum(increase(ocbsf_oc_egressgateway_connection_failure_total[5m]) >0 or (ocbsf_oc_egressgateway_connection_failure_total unless ocbsf_oc_egressgateway_connection_failure_total offset 5m )) by (namespace,app, error_reason) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.43 |
| Metric Used | ocbsf_oc_ingressgateway_connection_failure_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.68 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
Table 5-82 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
| Field | Details |
|---|---|
| Description | 'BSF Ingress Gateway Data Director unreachable for {{$labels.namespace}}' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} BSF Ingress Gateway Data Director unreachable' |
| Severity | Major |
| Expression | sum(oc_ingressgateway_dd_unreachable) by(namespace,container) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.48 |
| Metric Used | oc_ingressgateway_dd_unreachable |
| Recommended Actions | Alert gets cleared automatically when the connection with data director is established. |
5.1.69 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
Table 5-83 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR
| Field | Details |
|---|---|
| Description | 'BSF Egress Gateway Data Director unreachable for {{$labels.namespace}}' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} BSF Egress Gateway Data Director unreachable' |
| Severity | Major |
| Expression | sum(oc_egressgateway_dd_unreachable) by(namespace,container) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.49 |
| Metric Used | oc_egressgateway_dd_unreachable |
| Recommended Actions | Alert gets cleared automatically when the connection with data director is established. |
5.1.70 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR
Table 5-84 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR
| Field | Details |
|---|---|
| Description | Diam-gw certificate expiry in less than 6 months for {{$labels.namespace}} |
| Summary | Diam-gw certificate expiry in less than 6 months |
| Severity | Minor |
| Expression | dgw_tls_cert_expiration_seconds - time() <= 15724800 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.47 |
| Metric Used | dgw_tls_cert_expiration_seconds |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.71 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR
Table 5-85 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR
| Field | Details |
|---|---|
| Description | Diam-gw certificate expiry in less than 3 months for {{$labels.namespace}}. |
| Summary | Diam-gw certificate expiry in less than 3 months. |
| Severity | Major |
| Expression | dgw_tls_cert_expiration_seconds - time() <= 7862400 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.47 |
| Metric Used | dgw_tls_cert_expiration_seconds |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.72 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL
Table 5-86 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL
| Field | Details |
|---|---|
| Description | Diam-gw certificate expiry in less than a month for {{$labels.namespace}}. |
| Summary | Diam-gw certificate expiry in less than a month. |
| Severity | Critical |
| Expression | dgw_tls_cert_expiration_seconds - time() <= 2592000 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.47 |
| Metric Used | dgw_tls_cert_expiration_seconds |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.73 DGW_TLS_CONNECTION_FAILURE
Table 5-87 DGW_TLS_CONNECTION_FAILURE
| Field | Details |
|---|---|
| Description | Alert for TLS connection establishment. |
| Summary | TLS Connection failure when Diam gateway is an initiator. |
| Severity | Major |
| Expression | sum by (namespace,reason)(ocbsf_diam_failed_conn_network) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.81 |
| Metric Used | ocbsf_diam_failed_conn_network |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.74 BINDING_REVALIDATION_PCF_BINDING_MISSING_MINOR
Table 5-88 BINDING_REVALIDATION_PCF_BINDING_MISSING_MINOR
| Field | Details |
|---|---|
| Description | At least 30% but less than 50% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes. |
| Summary | At least 30% but less than 50% of the PCF BINDING missing among all Binding Revalidation records in the last 5 minutes. |
| Severity | Minor |
| Expression | (sum by (namespace) (rate(ocbsf_binding_revalidation_pcfBinding_missing_total[5m])) / sum by (namespace) (rate(ocbsf_binding_revalidation_response_total[5m]))) * 100 >= 30 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.51 |
| Metric Used | |
| Recommended Actions |
Check BSF Management service health history. Increase binding audit frequency. For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.75 BINDING_REVALIDATION_PCF_BINDING_MISSING_MAJOR
Table 5-89 BINDING_REVALIDATION_PCF_BINDING_MISSING_MAJOR
| Field | Details |
|---|---|
| Description | At least 50% but less than 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes. |
| Summary | At least 50% but less than 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes. |
| Severity | Major |
| Expression | (sum by (namespace) (rate(ocbsf_binding_revalidation_pcfBinding_missing_total[5m])) / sum by (namespace) (rate(ocbsf_binding_revalidation_response_total[5m]))) * 100 >= 50 < 70 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.51 |
| Metric Used | |
| Recommended Actions |
Check BSF Management service health history. Increase binding audit frequency. For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.76 BINDING_REVALIDATION_PCF_BINDING_MISSING_CRITICAL
Table 5-90 BINDING_REVALIDATION_PCF_BINDING_MISSING_CRITICAL
| Field | Details |
|---|---|
| Description | At least 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes. |
| Summary | At least 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes. |
| Severity | Critical |
| Expression | (sum by (namespace) (rate(ocbsf_binding_revalidation_pcfBinding_missing_total[5m])) / sum by (namespace) (rate(ocbsf_binding_revalidation_response_total[5m]))) * 100 >= 70 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.51 |
| Metric Used | |
| Recommended Actions |
Check BSF Management service health history. Increase binding audit frequency. For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.77 BSF_STATE_NON_FUNCTIONAL_CRITICAL
Table 5-91 BSF_STATE_NON_FUNCTIONAL_CRITICAL
| Field | Details |
|---|---|
| Description | BSF is in non functional state due to DB Cluster state down |
| Summary | BSF is in non functional state due to DB Cluster state down |
| Severity | Critical |
| Expression | appinfo_nfDbFunctionalState_current{nfDbFunctionalState="Not_Running"} == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.56 |
| Metric Used | appinfo_dbmonitorclusterDbState_current |
| Recommended Actions |
Cause: The alert is raised because the BSF network function is non-functional due to the database cluster being down.
Diagnostic Information: System monitoring indicates that the database cluster state is "Not Running" and is unreachable, preventing the BSF network function from operating normally. Recovery: Check and restore the database cluster to a running state. After recovery, verify that the BSF network function returns to operational status. Escalate to database administration if the issue persists. |
5.1.78 POD_PENDING_REQUEST_CONGESTION_L1
Table 5-92 POD_PENDING_REQUEST_CONGESTION_L1
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L1 for resource type queue. |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L1 for resource type queue. |
| Severity | Critical |
| Expression | occnp_pod_resource_congestion_state{type="queue"} == 2 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.54 |
| Metric Used | occnp_pod_resource_congestion_state |
| Recommended Actions |
Cause: Pending HTTP request queue reached CONGESTION_L1. Diagnostic Information: pod_resource_congestion_state{resourceType="queue"} == 2; verify pod_resource_stress queue values; review rejections by requestUri/requestMethod at L1. Recovery: Ensure L1 queue thresholds are correct; if queues grow, raise the L1 discard priority to reject lower-priority requests earlier. |
5.1.79 POD_PENDING_REQUEST_CONGESTION_L2
Table 5-93 POD_PENDING_REQUEST_CONGESTION_L2
| Field | Details |
|---|---|
| Description | Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L2 for resource type queue. |
| Summary | Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L2 for resource type queue. |
| Severity | Critical |
| Expression | occnp_pod_resource_congestion_state{type="queue"} == 3 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.55 |
| Metric Used | occnp_pod_resource_congestion_state |
| Recommended Actions |
Cause: Pending HTTP request queue reached CONGESTION_L2. Diagnostic Information: pod_resource_congestion_state{resourceType="queue"} == 3; examine queue stress trends and rejection counters for low-priority traffic at L2. Recovery: Increase shedding at L2 (raise discard priority), and review queue thresholds to prevent saturation. |
5.1.80 AUDIT_NOT_RUNNING
Table 5-94 AUDIT_NOT_RUNNING
| Field | Details |
|---|---|
| Description | Audit has not been running for at least 1 hour in pod {{$labels.pod}}. |
| Summary | Audit has been stuck in an unhealthy state for over 1 hour. |
| Severity | Critical |
| Expression | (increase(data_repository_invocations_seconds_count{method="getQueuedTablesToAudit",state="SUCCESS"}[1h])) == 0 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.45 |
| Metric Used | data_repository_invocations_seconds_count |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.81 DIAMETER_POD_ERROR_RESPONSE_MINOR
Table 5-95 DIAMETER_POD_ERROR_RESPONSE_MINOR
| Field | Details |
|---|---|
| Description | At least 1% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER in pod {{$labels.pod}} |
| Summary | At least 1% of the Diam Response requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
| Severity | Minor |
| Expression | (sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m]))) * 100 >=1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.46 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.82 DIAMETER_POD_ERROR_RESPONSE_MAJOR
Table 5-96 DIAMETER_POD_ERROR_RESPONSE_MAJOR
| Field | Details |
|---|---|
| Description | At least 5% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER in pod {{$labels.pod}} |
| Summary | At least 5% of the Diam Response requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
| Severity | Major |
| Expression | (sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m]))) * 100 >=5 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.46 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.83 DIAMETER_POD_ERROR_RESPONSE_CRITICAL
Table 5-97 DIAMETER_POD_ERROR_RESPONSE_CRITICAL
| Field | Details |
|---|---|
| Description | At least 10% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER in pod {{$labels.pod}} |
| Summary | At least 10% of the Diam Response requests failed with error DIAMETER_UNABLE_TO_DELIVER. |
| Severity | Critical |
| Expression | (sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m]))) * 100 >=10 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.46 |
| Metric Used | ocbsf_diam_response_network_total |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.84 BSF_PCF_BINDING_TABLE_MIGRATED_PERCENTAGE
Table 5-98 BSF_PCF_BINDING_TABLE_MIGRATED_PERCENTAGE
| Field | Details |
|---|---|
| Description | Pcf binding table migration configuration should be updated to only use the pcf binding v2 table |
| Summary | Pcf binding table migration configuration should be updated to only use the pcf binding v2 table |
| Severity | Minor |
| Expression | sum by (siteId) (ocbsf_binding_record_migrated_percentage{microservice="bsf-management-service"}) / count by (siteId) (ocbsf_binding_record_migrated_percentage{microservice="bsf-management-service"}) == 100 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.57 |
| Metric Used | ocbsf_binding_record_migrated_percentage |
| Recommended Actions |
Cause: The alert is raised because all BSF pcf binding records in legacy v1 on current site have been migrated to pcf binding v2 table. Diagnostic Information: Verify pcf_binding table is empty and transition Advanced Settings PCF_BINDING_TABLE_LOOKUP to 3. Recovery: No recovery steps needed as it just indicating to move into a migration complete status. Alert is cleared after 24 hours |
5.1.85 BSF_PCF_BINDING_TABLE_MIGRATION_INVALID_CONFIGURATION
Table 5-99 BSF_PCF_BINDING_TABLE_MIGRATION_INVALID_CONFIGURATION
| Field | Details |
|---|---|
| Description | Pcf binding table migration configuration should be reviewed and updated to a valid configuration, invalid configurations: {{$labels.incompatibleFeatures}}. |
| Summary | Pcf binding table migration, invalid configuration was set, latest valid values are used. |
| Severity | Critical |
| Expression | ocbsf_feature_incompatibility == 1 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.58 |
| Metric Used | ocbsf_feature_incompatibility |
| Recommended Actions |
Cause: The alert is raised because the current configuration for Remove Index Based Lookup feature is having an incorrect combination for ENABLE_PCF_BINDING_TABLE_MIGRATION and PCF_BINDING_TABLE_LOOKUP_VALUE.
Diagnostic Information: Verify configuration is valid according
to the following rules:
Recovery: Alert is cleared once the configuration is updated to a valid configuration. |
5.1.86 CERTIFICATE_EXPIRY_MINOR
Table 5-100 CERTIFICATE_EXPIRY_MINOR
| Field | Details |
|---|---|
| Description | Certificate expiry in less than 6 months for {{$labels.namespace}} |
| Summary | Certificate expiry in less than 6 months |
| Severity | Minor |
| Expression | security_cert_x509_expiration_seconds - time() <= 15724800 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.44 |
| Metric Used | security_cert_x509_expiration_seconds |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.87 CERTIFICATE_EXPIRY_MAJOR
Table 5-101 CERTIFICATE_EXPIRY_MAJOR
| Field | Details |
|---|---|
| Description | Certificate expiry in less than 3 months for {{$labels.namespace}} |
| Summary | Certificate expiry in less than 3 months. |
| Severity | Major |
| Expression | security_cert_x509_expiration_seconds - time() <= 7862400 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.44 |
| Metric Used | security_cert_x509_expiration_seconds |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |
5.1.88 CERTIFICATE_EXPIRY_CRITICAL
Table 5-102 CERTIFICATE_EXPIRY_CRITICAL
| Field | Details |
|---|---|
| Description | Certificate expiry in less than a month for {{$labels.namespace}} |
| Summary | Certificate expiry in less than a month. |
| Severity | Critical |
| Expression | security_cert_x509_expiration_seconds - time() <= 2592000 |
| OID | 1.3.6.1.4.1.323.5.3.37.1.2.44 |
| Metric Used | security_cert_x509_expiration_seconds |
| Recommended Actions |
For any additional guidance, contact My Oracle Support (https://support.oracle.com). |