BSF Alerts

Alerts Levels / Severity Types	Definition
Critical	Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions can affect the service of BSF.
Major	Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions can affect the service of BSF.
Minor	Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions can affect the service of BSF.
Info or Warn (Informational)	Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of BSF.

5.1.1 AAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Table 5-2 AAA_RX_FAIL_COUNT_EXCEEDS_CRITICAL_THRESHOLD

Field	Details
Description	AAA Rx fail count exceeds the critical threshold limit.
Summary	AAA Rx fail count exceeds the critical threshold limit.
Severity	CRITICAL
Expression	sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2."}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) 100 > 90
OID	1.3.6.1.4.1.323.5.3.37.1.2.40
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

5.1.2 AAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Table 5-3 AAA_RX_FAIL_COUNT_EXCEEDS_MAJOR_THRESHOLD

Field	Details
Description	AAA Rx fail count exceeds the major threshold limit
Summary	AAA Rx fail count exceeds the major threshold limit.
Severity	MAJOR
Expression	sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2."}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) 100 <=90 and sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2."}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) 100 > 80
OID	1.3.6.1.4.1.323.5.3.37.1.2.40
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

5.1.3 AAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Table 5-4 AAA_RX_FAIL_COUNT_EXCEEDS_MINOR_THRESHOLD

Field	Details
Description	AAA Rx fail count exceeds the minor threshold limit.
Summary	AAA Rx fail count exceeds the minor threshold limit.
Severity	MINOR
Expression	sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2."}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) 100 <=80 and sum by(namespace)(rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236", responseCode!~"2."}[5m]) / rate(ocbsf_diam_response_network_total{msgType="AAA", appId="16777236"}[5m])) 100 > 60
OID	1.3.6.1.4.1.323.5.3.37.1.2.40
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any additional guidance, contact My Oracle Support.

5.1.4 SCP_PEER_UNAVAILABLE

Table 5-5 SCP_PEER_UNAVAILABLE

Field	Details
Description	Configured SCP peer is unavailable.
Summary	SCP peer [ {{$labels.peer}} ] is unavailable.
Severity	Major
Expression	ocbsf_oc_egressgateway_peer_health_status == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.38
Metric Used	ocbsf_oc_egressgateway_peer_health_status
Recommended Actions	This alert gets cleared when unavailable SCPs become available. For any additional guidance, contact My Oracle Support.

5.1.5 SCP_PEER_SET_UNAVAILABLE

Table 5-6 SCP_PEER_SET_UNAVAILABLE

Field	Details
Description	None of the SCP peer available for configured peerset.
Summary	{{ $value }} SCP peers under peer set {{$labels.peerset}} are currently unavailable.
Severity	Critical
Expression	(ocbsf_oc_egressgateway_peer_count > 0 and (ocbsf_oc_egressgateway_peer_available_count) == 0)
OID	1.3.6.1.4.1.323.5.3.37.1.2.39
Metric Used	oc_egressgateway_peer_count and oc_egressgateway_peer_available_count
Recommended Actions	NF clears the critical alarm when at least one SCP peer in a peerset becomes available such that all other SCP peers in the given peerset are still unavailable. For any additional guidance, contact My Oracle Support.

5.1.6 BSF_SERVICES_DOWN

Table 5-7 BSF_SERVICES_DOWN

Field	Details
Description	{{$labels.service}} service is not running!
Summary	{{$labels.service}} is not running!
Severity	Critical
Expression	appinfo_service_running{application="ocbsf"} != 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.1
Metric Used	appinfo_service_running
Recommended Actions	Perform the following steps: Check for service specific alerts that may be causing the issues with service exposure. Verify if the POD is in a Running state by using the following command: `kubectl -n <namespace> get pod` If the output shows any pod that is not running, copy the pod name and run the following command: `kubectl describe pod <podname> -n <namespace>` Check the application logs on Kibana and look for database related failures such as connectivity, invalid secrets, and so on. The logs can be easily filtered for different services. Check for Helm status to ensure no errors are present by using the following command: `helm status <release-name> -n <namespace>` If it is not in `STATUS: DEPLOYED`, capture the logs and events again. In case the issue persists, capture the outputs for the preceding steps and contact My Oracle Support.

5.1.7 BSF_TRAFFIC_RATE_ABOVE_MINOR_THRESHOLD

Table 5-8 BSF_TRAFFIC_RATE_ABOVE_MINOR_THRESHOLD

Field	Details
Description	BSF service Ingress traffic Rate is above threshold of Max MPS(1000) (current value is: {{ $value }}) The total Binding Management service Ingress traffic rate has crossed the configured threshold of 700 TPS. The default value of this alert trigger point in the `BSF_Alertrules.yaml` file is when the Binding management service Ingress Rate crosses 70% of maximum ingress requests per second.
Summary	Traffic Rate is above 70 Percent of Max requests per second(1000)
Severity	Minor
Expression	sum(rate(ocbsf_ingress_request_total[2m])) >= 700
OID	1.3.6.1.4.1.323.5.3.37.1.2.2
Metric Used	ocbsf_ingress_request_total
Recommended Actions	The alert gets cleared when the Ingress traffic rate falls below the threshold. Note: Threshold levels can be configured using the `BSF_Alertrules.yaml` file. It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic: Refer Ingress Gateway section in Grafana to determine an increase in 4xx and 5xx error response codes. Check Ingress Gateway logs on Kibana to determine the reason for the errors. For any assistance, contact My Oracle Support.

5.1.8 BSF_TRAFFIC_RATE_ABOVE_MAJOR_THRESHOLD

Table 5-9 BSF_TRAFFIC_RATE_ABOVE_MAJOR_THRESHOLD

Field	Details
Description	BSF service Ingress traffic Rate is above threshold of Max MPS(1000) (current value is: {{ $value }})
Summary	Traffic Rate is above 80 Percent of Max requests per second(1000)
Severity	Major
Expression	sum(rate(ocbsf_ingress_request_total[2m])) >= 800
OID	1.3.6.1.4.1.323.5.3.37.1.2.2
Metric Used	ocbsf_ingress_request_total
Recommended Actions	The alert gets cleared when the Ingress traffic rate falls below the threshold. Note: Threshold levels can be configured using the `BSF_Alertrules.yaml` file. It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic: Refer Ingress Gateway section in Grafana to determine an increase in 4xx and 5xx error response codes. Check Ingress Gateway logs on Kibana to determine the reason for the errors. For any assistance, contact My Oracle Support.

5.1.9 BSF_TRAFFIC_RATE_ABOVE_CRITICAL_THRESHOLD

Table 5-10 BSF_TRAFFIC_RATE_ABOVE_CRITICAL_THRESHOLD

Field	Details
Description	BSF service Ingress traffic Rate is above threshold of Max MPS(1000) (current value is: {{ $value }})
Summary	Traffic Rate is above 90 Percent of Max requests per second(1000)
Severity	Critical
Expression	sum(rate(ocbsf_ingress_request_total[2m])) >= 900
OID	1.3.6.1.4.1.323.5.3.37.1.2.2
Metric Used	ocbsf_ingress_request_total
Recommended Actions	The alert gets cleared when the Ingress traffic rate falls below the threshold. Note: Threshold levels can be configured using the `BSF_Alertrules.yaml` file. It is recommended to assess the reason for additional traffic. Perform the following steps to analyze the cause of increased traffic: Refer Ingress Gateway section in Grafana to determine an increase in 4xx and 5xx error response codes. Check Ingress Gateway logs on Kibana to determine the reason for the errors. For any assistance, contact My Oracle Support.

5.1.10 BINDING_QUERY_RESPONSE_ERROR_MINOR

Table 5-11 BINDING_QUERY_RESPONSE_ERROR_MINOR

Field	Details
Description	At least 30% of the Binding Query connection requests failed.
Summary	At least 30% of the Binding Query requests failed.
Severity	Minor
Expression	(sum(rate(ocbsf_bindingQuery_response_total{response_code!~"2."}[10m]) or (appinfo_service_running 0 ) ) / sum(rate(ocbsf_bindingQuery_response_total[10m]))) * 100 >= 30
OID	1.3.6.1.4.1.323.5.3.37.1.2.36
Metric Used	ocbsf_bindingQuery_response_total
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.11 BINDING_QUERY_RESPONSE_ERROR_MAJOR

Table 5-12 BINDING_QUERY_RESPONSE_ERROR_MAJOR

Field	Details
Description	At least 50% of the Binding Query connection requests failed.
Summary	At least 50% of the Binding Query requests failed.
Severity	Major
Expression	(sum(rate(ocbsf_bindingQuery_response_total{response_code!~"2."}[10m]) or (appinfo_service_running 0 ) ) / sum(rate(ocbsf_bindingQuery_response_total[10m]))) * 100 >= 50
OID	1.3.6.1.4.1.323.5.3.37.1.2.36
Metric Used	ocbsf_bindingQuery_response_total
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.12 BINDING_QUERY_RESPONSE_ERROR_CRITICAL

Table 5-13 BINDING_QUERY_RESPONSE_ERROR_CRITICAL

Field	Details
Description	At least 70% of the Binding Query connection requests failed.
Summary	At least 70% of the Binding Query requests failed.
Severity	Critical
OID	1.3.6.1.4.1.323.5.3.37.1.2.36
Expression	(sum(rate(ocbsf_bindingQuery_response_total{response_code!~"2."}[10m]) or (appinfo_service_running 0 ) ) / sum(rate(ocbsf_bindingQuery_response_total[10m]))) * 100 >= 70
Metric Used	ocbsf_bindingQuery_response_total
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.13 DIAM_RESPONSE_NETWORK_ERROR_MINOR

Table 5-14 DIAM_RESPONSE_NETWORK_ERROR_MINOR

Field	Details
Description	At least 20% of the Diam Response connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Summary	At least 20% of the Diam Response requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Severity	Minor
Expression	(sum(rate(ocbsf_diam_response_network_total{responseCode="3002"}[10m]) or (appinfo_service_running * 0 ) ) / sum(rate(ocbsf_diam_response_network_total[10m]))) * 100 >= 20
OID	1.3.6.1.4.1.323.5.3.37.1.2.35
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.14 DIAM_RESPONSE_NETWORK_ERROR_MAJOR

Table 5-15 DIAM_RESPONSE_NETWORK_ERROR_MAJOR

Field	Details
Description	At least 50% of the Diam Response connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Summary	At least 50% of the Diam Response requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Severity	Major
Expression	(sum(rate(ocbsf_diam_response_network_total{responseCode="3002"}[10m]) or (appinfo_service_running * 0 ) ) / sum(rate(ocbsf_diam_response_network_total[10m]))) * 100 >= 50
OID	1.3.6.1.4.1.323.5.3.37.1.2.35
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.15 DIAM_RESPONSE_NETWORK_ERROR_CRITICAL

Table 5-16 DIAM_RESPONSE_NETWORK_ERROR_CRITICAL

Field	Details
Description	At least 75% of the Diam Response connection requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Summary	At least 75% of the Diam Response requests failed with error 'DIAMETER_UNABLE_TO_DELIVER'.
Severity	Critical
Expression	(sum(rate(ocbsf_diam_response_network_total{responseCode="3002"}[10m]) or (appinfo_service_running * 0 ) ) / sum(rate(ocbsf_diam_response_network_total[10m]))) * 100 >= 75
OID	1.3.6.1.4.1.323.5.3.37.1.2.35
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.16 DUPLICATE_BINDING_REQUEST_ERROR_MINOR

Table 5-17 DUPLICATE_BINDING_REQUEST_ERROR_MINOR

Field	Details
Description	At least 30% of the Binding Registration requests failed were duplicate failures.
Summary	At least 30% of the Binding Registration requests failed were duplicate failures.
Severity	Minor
Expression	(sum(rate({_name_=~"ocbsf_collision_detection."}[10m]) or (appinfo_service_running 0)) / sum(rate(ocbsf_ingress_request_total {operation_type="register"} [10m]))) * 100 >= 30
OID	1.3.6.1.4.1.323.5.3.37.1.2.37
Metric Used	ocbsf_ingress_request_total
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.17 DUPLICATE_BINDING_REQUEST_ERROR_MAJOR

Table 5-18 DUPLICATE_BINDING_REQUEST_ERROR_MAJOR

Field	Details
Description	At least 50% of the Binding Registration requests failed were duplicate failures.
Summary	At least 50% of the Binding Registration requests failed were duplicate failures.
Severity	Major
Expression	(sum(rate({_name_=~"ocbsf_collision_detection."}[10m]) or (appinfo_service_running 0)) / sum(rate(ocbsf_ingress_request_total {operation_type="register"} [10m]))) * 100 >= 50
OID	1.3.6.1.4.1.323.5.3.37.1.2.37
Metric Used	ocbsf_ingress_request_total
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.18 DUPLICATE_BINDING_REQUEST_ERROR_CRITICAL

Table 5-19 DUPLICATE_BINDING_REQUEST_ERROR_CRITICAL

Field	Details
Description	At least 70% of the Binding Registration requests failed were duplicate failures.
Summary	At least 70% of the Binding Registration requests failed were duplicate failures.
Severity	Critical
Expression	(sum(rate({_name_=~"ocbsf_collision_detection."}[10m]) or (appinfo_service_running 0)) / sum(rate(ocbsf_ingress_request_total {operation_type="register"} [10m]))) * 100 >= 70
OID	1.3.6.1.4.1.323.5.3.37.1.2.37
Metric Used	ocbsf_ingress_request_total
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.19 INGRESS_TOTAL_ERROR_RATE_ABOVE_MINOR_THRESHOLD

Table 5-20 INGRESS_TOTAL_ERROR_RATE_ABOVE_MINOR_THRESHOLD

Field	Details
Description	Transaction Error Rate detected above 1 Percent of Total on BSF service (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 1 Percent of Total Transactions
Severity	Minor
Expression	(sum(rate(ocbsf_ingress_response_total{response_code!~"2."}[24h])) / sum(rate(ocbsf_ingress_response_total[24h]))) 100 >= 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.3
Metric Used	ocbsf_ingress_response_total
Recommended Actions	The alert gets cleared when the number of failed transactions is below 1% of the total transactions. For any assistance, contact My Oracle Support.

5.1.20 INGRESS_TOTAL_ERROR_RATE_ABOVE_MAJOR_THRESHOLD

Table 5-21 INGRESS_TOTAL_ERROR_RATE_ABOVE_MAJOR_THRESHOLD

Field	Details
Description	Transaction Error Rate detected above 5 Percent of Total on BSF service (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 5 Percent of Total Transactions
Severity	Major
Expression	(sum(rate(ocbsf_ingress_response_total{response_code!~"2."}[24h])) / sum(rate(ocbsf_ingress_response_total[24h]))) 100 >= 5
OID	1.3.6.1.4.1.323.5.3.37.1.2.3
Metric Used	ocbsf_ingress_response_total
Recommended Actions	The alert gets cleared when the number of failed transactions is below 5% of the total transactions. For any assistance, contact My Oracle Support.

5.1.21 INGRESS_TOTAL_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD

Table 5-22 INGRESS_TOTAL_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD

Field	Details
Description	Transaction Error Rate detected above 10 Percent of Total on BSF service (current value is: {{ $value }})
Summary	Transaction Error Rate detected above 10 Percent of Total Transactions
Severity	Critical
Expression	(sum(rate(ocbsf_ingress_response_total{response_code!~"2."}[24h])) / sum(rate(ocbsf_ingress_response_total[24h]))) 100 >= 10
OID	1.3.6.1.4.1.323.5.3.37.1.2.3
Metric Used	ocbsf_ingress_response_total
Recommended Actions	The alert gets cleared when the number of failed transactions is below 10% of the total transactions. For any assistance, contact My Oracle Support.

5.1.22 PCF_BINDING_ERROR_RATE_ABOVE_MINOR_THRESHOLD

Table 5-23 PCF_BINDING_ERROR_RATE_ABOVE_MINOR_THRESHOLD

Field	Details
Description	PCF Binding Error Rate above 1 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary	PCF Binding Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity	Minor
Expression	(sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200\|204",method="GET"}[24h])) / sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="GET"}[24h]))) * 100 >= 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.5
Metric Used	http_server_requests_seconds_count
Recommended Actions	The alert gets cleared when the number of failed transactions is below 1% of the total transactions. To assess the reason for failed transactions, check the service specific metrics for the GET method. For any assistance, contact My Oracle Support.

5.1.23 PCF_BINDING_ERROR_RATE_ABOVE_MAJOR_THRESHOLD

Table 5-24 PCF_BINDING_ERROR_RATE_ABOVE_MAJOR_THRESHOLD

Field	Details
Description	PCF Binding Error Rate above 5 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary	PCF Binding Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity	Major
Expression	(sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200\|204",method="GET"}[24h])) / sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="GET"}[24h]))) * 100 >= 5
OID	1.3.6.1.4.1.323.5.3.37.1.2.5
Metric Used	http_server_requests_seconds_count
Recommended Actions	The alert gets cleared when the number of failed transactions is below 5% of the total transactions. To assess the reason for failed transactions, check the service specific metrics for the GET method. For any assistance, contact My Oracle Support.

5.1.24 PCF_BINDING_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD

Table 5-25 PCF_BINDING_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD

Field	Details
Description	PCF Binding Error Rate above 10 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary	PCF Binding Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity	Critical
Expression	(sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200\|204",method="GET"}[24h])) / sum by (microservice,namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="GET"}[24h]))) * 100 >= 10
OID	1.3.6.1.4.1.323.5.3.37.1.2.5
Metric Used	http_server_requests_seconds_count
Recommended Actions	The alert gets cleared when the number of failed transactions is below 10% of the total transactions. To assess the reason for failed transactions, check the service specific metrics for the GET method. For any assistance, contact My Oracle Support.

5.1.25 INGRESS_CREATE_ERROR_RATE_ABOVE_MINOR_THRESHOLD

Table 5-26 INGRESS_CREATE_ERROR_RATE_ABOVE_MINOR_THRESHOLD

Field	Details
Description	BSF Ingress Create Error Rate above 1 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary	Transaction Create Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity	Minor
Expression	sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200\|201",method="POST"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="POST"}[24h])) * 100 >= 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.4
Metric Used	http_server_requests_seconds_count
Recommended Actions	The alert gets cleared when the number of failed transactions is below 1% of the total transactions. To assess the reason for failed transactions, check the service specific metrics for the POST method. For any assistance, contact My Oracle Support.

5.1.26 INGRESS_CREATE_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD

Table 5-27 INGRESS_CREATE_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD

Field	Details
Description	BSF Ingress Create Error Rate above 10 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary	Transaction Create Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity	Critical
Expression	sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200\|201",method="POST"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="POST"}[24h])) * 100 >= 10
OID	1.3.6.1.4.1.323.5.3.37.1.2.4
Metric Used	http_server_requests_seconds_count
Recommended Actions	The alert gets cleared when the number of failed transactions is below 10% of the total transactions. To assess the reason for failed transactions, check the service specific metrics for the POST method. For any assistance, contact My Oracle Support.

5.1.27 INGRESS_CREATE_ERROR_RATE_ABOVE_MAJOR_THRESHOLD

Table 5-28 INGRESS_CREATE_ERROR_RATE_ABOVE_MAJOR_THRESHOLD

Field	Details
Description	BSF Ingress Create Error Rate above 5 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary	Transaction Create Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity	Major
Expression	sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!~"200\|201",method="POST"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="POST"}[24h])) * 100 >= 5
OID	1.3.6.1.4.1.323.5.3.37.1.2.4
Metric Used	http_server_requests_seconds_count
Recommended Actions	The alert gets cleared when the number of failed transactions is below 5% of the total transactions. To assess the reason for failed transactions, check the service specific metrics for the POST method. For any assistance, contact My Oracle Support.

5.1.28 INGRESS_DELETE_ERROR_RATE_ABOVE_MINOR_THRESHOLD

Table 5-29 INGRESS_DELETE_ERROR_RATE_ABOVE_MINOR_THRESHOLD

Field	Details
Description	Ingress Delete Error Rate above 1 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary	Ingress Delete Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity	Minor
Expression	sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!="204",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h])) * 100 >= 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.6
Metric Used	http_server_requests_seconds_count
Recommended Actions	The alert gets cleared when the number of failed transactions is below 1% of the total transactions. To assess the reason for failed transactions, check the service specific metrics for the DELETE method. For any assistance, contact My Oracle Support.

5.1.29 INGRESS_DELETE_ERROR_RATE_ABOVE_MAJOR_THRESHOLD

Table 5-30 INGRESS_DELETE_ERROR_RATE_ABOVE_MAJOR_THRESHOLD

Field	Details
Description	Ingress Delete Error Rate above 5 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary	Ingress Delete Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity	Major
Expression	sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!="204",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h])) * 100 >= 5
OID	1.3.6.1.4.1.323.5.3.37.1.2.6
Metric Used	http_server_requests_seconds_count
Recommended Actions	The alert gets cleared when the number of failed transactions is below 5% of the total transactions. To assess the reason for failed transactions, check the service specific metrics for the DELETE method. For any assistance, contact My Oracle Support.

5.1.30 INGRESS_DELETE_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD

Table 5-31 INGRESS_DELETE_ERROR_RATE_ABOVE_CRITICAL_THRESHOLD

Field	Details
Description	Ingress Delete Error Rate above 10 Percent in {{$labels.microservice}} in {{$labels.namespace}}
Summary	Ingress Delete Error Rate in {{$labels.kubernetes_node}} (current value is: {{ $value }})
Severity	Critical
Expression	sum by(namespace)(rate(http_server_requests_seconds_count{microservice="bsf-management-service", status!="204",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h]) / rate(http_server_requests_seconds_count{microservice="bsf-management-service",method="DELETE", uri="/nbsf-management/v1/pcfBindings/{bindingId}"}[24h])) * 100 >= 10
OID	1.3.6.1.4.1.323.5.3.37.1.2.6
Metric Used	http_server_requests_seconds_count
Recommended Actions	The alert gets cleared when the number of failed transactions is below 10% of the total transactions. To assess the reason for failed transactions, check the service specific metrics for the DELETE method. For any assistance, contact My Oracle Support.

5.1.31 DB_TIER_DOWN_ALERT

Table 5-32 DB_TIER_DOWN_ALERT

Field	Details
Description	DB cannot be reachable!
Summary	DB cannot be reachable!
Severity	Critical
Expression	appinfo_category_running{category="database", application="ocbsf"} != 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.7
Metric Used	appinfo_category_running
Recommended Actions	Check whether the database service is up. Check the status or age of the MySQL pod by using the following command: `kubectl get pods -n <namespace>` where <namespace> is the namespace used to deploy MySQL pod. This alert is cleared automatically when the DB service is up and running.

5.1.32 CPU_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD

Table 5-33 CPU_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD

Field	Details
Description	CPU usage for {{$labels.microservice}} service is above 60
Summary	CPU usage for {{$labels.microservice}} service is above 60
Severity	Minor
Expression	sum(rate(cgroup_cpu_usage{application="ocbsf"}[2m])) >= 60
OID	1.3.6.1.4.1.323.5.3.37.1.2.8
Metric Used	cgroup_cpu_usage
Recommended Actions	The alert gets cleared when the CPU utilization falls below the minor threshold or crosses the major threshold, in which case CPUUsagePerServiceAboveMajorThreshold alert shall be raised. Note: Threshold levels can be configured using the `BSF_Alertrules.yaml` file. For any assistance, contact My Oracle Support.

5.1.33 CPU_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD

Table 5-34 CPU_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD

Field	Details
Description	CPU usage for {{$labels.microservice}} service is above 80
Summary	CPU usage for {{$labels.microservice}} service is above 80
Severity	Major
Expression	sum(rate(cgroup_cpu_usage{application="ocbsf"}[2m])) >= 80
OID	1.3.6.1.4.1.323.5.3.37.1.2.9
Metric Used	cgroup_cpu_usage
Recommended Actions	The alert gets cleared when the CPU utilization falls below the major threshold or crosses the critical threshold, in which case CPUUsagePerServiceAboveCriticalThreshold alert shall be raised. Note: Threshold levels can be configured using the `BSF_Alertrules.yaml` file. For any assistance, contact My Oracle Support.

5.1.34 CPU_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD

Table 5-35 CPU_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD

Field	Details
Description	CPU usage for {{$labels.microservice}} service is above 90
Summary	CPU usage for {{$labels.microservice}} service is above 90
Severity	Critical
Expression	sum(rate(cgroup_cpu_usage{application="ocbsf"}[2m])) >= 90
OID	1.3.6.1.4.1.323.5.3.37.1.2.10
Metric Used	cgroup_cpu_usage
Recommended Actions	The alert gets cleared when the CPU utilization falls below the critical threshold. Note: Threshold levels can be configured using the `BSF_Alertrules.yaml` file. For any assistance, contact My Oracle Support.

5.1.35 MEMORY_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD

Table 5-36 MEMORY_USAGE_PER_SERVICE_ABOVE_MINOR_THRESHOLD

Field	Details
Description	Memory usage for {{$labels.microservice}} service is above 60
Summary	Memory usage for {{$labels.microservice}} service is above 60
Severity	Minor
Expression	sum(rate(cgroup_memory_usage{application="ocbsf"}[2m])) >= 60
OID	1.3.6.1.4.1.323.5.3.37.1.2.11
Metric Used	cgroup_memory_usage
Recommended Actions	The alert gets cleared when the memory utilization falls below the minor threshold or crosses the major threshold, in which case MemoryUsagePerServiceAboveMajorThreshold alert shall be raised. Note: Threshold levels can be configured using the `BSF_Alertrules.yaml` file. For any assistance, contact My Oracle Support.

5.1.36 MEMORY_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD

Table 5-37 MEMORY_USAGE_PER_SERVICE_ABOVE_MAJOR_THRESHOLD

Field	Details
Description	Memory usage for {{$labels.microservice}} service is above 80
Summary	Memory usage for {{$labels.microservice}} service is above 80
Severity	Major
Expression	sum(rate(cgroup_memory_usage{application="ocbsf"}[2m])) >= 80
OID	1.3.6.1.4.1.323.5.3.37.1.2.12
Metric Used	cgroup_memory_usage
Recommended Actions	The alert gets cleared when the memory utilization falls below the major threshold or crosses the critical threshold, in which case MemoryUsagePerServiceAboveCriticalThreshold alert shall be raised. Note: Threshold levels can be configured using the `BSF_Alertrules.yaml` file. For any additional guidance, contact My Oracle Support.

5.1.37 MEMORY_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD

Table 5-38 MEMORY_USAGE_PER_SERVICE_ABOVE_CRITICAL_THRESHOLD

Field	Details
Description	Memory usage for {{$labels.microservice}} service is above 90
Summary	Memory usage for {{$labels.microservice}} service is above 90
Severity	Critical
Expression	sum(rate(cgroup_memory_usage{application="ocbsf"}[2m])) >= 90
OID	1.3.6.1.4.1.323.5.3.37.1.2.13
Metric Used	cgroup_memory_usage
Recommended Actions	The alert gets cleared when the memory utilization falls below the critical threshold. Note: Threshold levels can be configured using the `BSF_Alertrules.yaml` For any assistance, contact My Oracle Support.

5.1.38 NRF_COMMUNICATION_FAILURE

Table 5-39 NRF_COMMUNICATION_FAILURE

Field	Details
Description	There has been a external failure communication error with NRF.
Summary	There has been a external failure communication error with NRF.
Severity	Critical
Expression	ocbsf_nrfclient_nrf_operative_status == 0
OID	1.3.6.1.4.1.323.5.3.37.1.2.33
Metric Used	ocbsf_nrfclient_nrf_operative_status
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.39 NRF_SERVICE_REQUEST_FAILURE

Table 5-40 NRF_SERVICE_REQUEST_FAILURE

Field	Details
Description	There has been a Service Request Failure with NRF, either a Registration failure, Heartbeat failure, or Profile Update Failure.
Summary	There has been a Service Request Failure with NRF, either a Registration failure, Heartbeat failure, or Profile Update Failure.
Severity	Critical
Expression	ocbsf_nrfclient_nfUpdate_status == 0
OID	1.3.6.1.4.1.323.5.3.37.1.2.34
Metric Used	ocbsf_nrfclient_nfUpdate_status
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.40 PERF_INFO_ACTIVE_OVERLOAD_THRESHOLD_FETCH_FAILED

Table 5-41 PERF_INFO_ACTIVE_OVERLOAD_THRESHOLD_FETCH_FAILED

Field	Details
Description	The application fails to get the current active overload level threshold data.
Summary	The application raises `PERF_INFO_ACTIVE_OVERLOAD_THRESHOLD_FETCH_FAILED` alert when it fails to fetch the current active overload level threshold data and `active_overload_threshold_fetch_failed == 1`.
Severity	Major
Expression	active_overload_threshold_fetch_failed == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.20
Metric Used	active_overload_threshold_fetch_failed
Recommended Actions	The alert gets cleared when the application fetches the current active overload level threshold data. For any additional guidance, contact My Oracle Support.

5.1.41 POD_DOC

Table 5-42 POD_DOC

Field	Details
Description	Pod Congestion status of {{$labels.microservice}} service is DoC
Summary	Pod Congestion status of {{$labels.microservice}} service is DoC
Severity	Major
Expression	ocbsf_pod_congestion_state == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.25
Metric Used	ocbsf_pod_congestion_state
Recommended Actions	Cause: The pod entered DANGER_OF_CONGESTION (DOC) due to rising CPU and/or queue close to configured limits. Diagnostic Information: Check pod_congestion_state == 1; review pod_resource_stress (cpu and queue) and pod_cong_state_report_total to see recent transitions. Recovery: Confirm the DOC thresholds and active Load Shedding rule. If DOC is triggered by brief spikes, increase stateChangeSampleCount or the calculation interval. If sustained, consider slightly increasing discard aggressiveness for low-value calls per policy.

5.1.42 POD_CONGESTED

Table 5-43 POD_CONGESTED

Field	Details
Description	Pod Congestion status of {{$labels.microservice}} service is congested
Summary	Pod Congestion status of {{$labels.microservice}} service is congested
Severity	Critical
Expression	ocbsf_pod_congestion_state==4
OID	1.3.6.1.4.1.323.5.3.37.1.2.26
Metric Used	ocbsf_pod_congestion_state
Recommended Actions	Cause: The pod has reached the CONGESTED state based on CPU consumption and/or the pending request queue exceeding the active threshold profile. Diagnostic Information: Check pod_congestion_state (expect 4), pod_resource_congestion_state for cpu/queue, pod_resource_stress, and ocbsf_http_congestion_message_reject_total (filter by congestionState, requestUri, requestMethod, priority). Recovery: In CNC Console, (BSF → Overload and Congestion Control → Congestion Control), ensure the feature is enabled and the intended Thresholds and Load Shedding profiles are active. If rejections are excessive, raise discard priority for the current state or relax thresholds based on performance baselines. Consider increasing stateChangeSampleCount and/or the calculation interval to reduce flapping due to short spikes.

5.1.43 POD_CONGESTION_L1

Table 5-44 POD_CONGESTION_L1

Field	Details
Description	Pod Congestion status of {{$labels.microservice}} service is Congestion_L1.
Summary	Pod Congestion status of {{$labels.microservice}} service is Congestion_L1.
Severity	Critical
Expression	ocbsf_pod_congestion_state == 2
OID	1.3.6.1.4.1.323.5.3.37.1.2.52
Metric Used	ocbsf_pod_congestion_state
Recommended Actions	Cause: The pod reached CONGESTION_L1 based on CPU and/or queue thresholds in the active profile. Diagnostic Information: Check pod_congestion_state == 2; identify driver via pod_resource_congestion_state (CPU vs. queue); review ocbsf_http_congestion_message_reject_total with congestionState=CONGESTION_L1. Recovery: Confirm L1 discard priority (default 24) and thresholds. If important calls are being dropped, adjust discard priority or tune thresholds to match the expected load profile.

5.1.44 POD_CPU_CONGESTION_L1

Table 5-45 POD_CPU_CONGESTION_L1

Field	Details
Description	Pod resource is in Congestion_L1 for CPU type.
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L1 for CPU type.
Severity	Critical
Expression	ocbsf_pod_resource_congestion_state{type="cpu"} == 2
OID	1.3.6.1.4.1.323.5.3.37.1.2.54
Metric Used	ocbsf_pod_resource_congestion_state
Recommended Actions	Cause: CPU utilization reached CONGESTION_L1. Diagnostic Information: pod_resource_congestion_state{resourceType="cpu"} == 2; check CPU stress and transitions. Recovery: Validate L1 CPU thresholds and discard priority. If brief spikes cause churn, increase stateChangeSampleCount; otherwise increase shedding at L1.

5.1.45 POD_CONGESTION_L2

Table 5-46 POD_CONGESTION_L2

Field	Details
Description	Pod Congestion status of {{$labels.microservice}} service is Congestion_L2
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L2.
Severity	Critical
Expression	ocbsf_pod_congestion_state == 3
OID	1.3.6.1.4.1.323.5.3.37.1.2.53
Metric Used	ocbsf_pod_congestion_state
Recommended Actions	Cause: The pod reached CONGESTION_L2, indicating higher stress than L1. Diagnostic Information: Check pod_congestion_state == 3; validate resource-specific states and stress metrics; inspect rejection counters at L2. Recovery: Use the L2 discard priority (default 18) to shed more low-priority traffic; consider tuning thresholds and sample counts to balance protection versus availability.

5.1.46 POD_CPU_CONGESTION_L2

Table 5-47 POD_CPU_CONGESTION_L2

Field	Details
Description	Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L2 for CPU type.
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L2 for CPU type.
Severity	Critical
Expression	ocbsf_pod_resource_congestion_state{type="cpu"} == 3
OID	1.3.6.1.4.1.323.5.3.37.1.2.55
Metric Used	ocbsf_pod_resource_congestion_state
Recommended Actions	Cause: CPU utilization reached CONGESTION_L2. Diagnostic Information: pod_resource_congestion_state{resourceType="cpu"} == 3; check CPU stress, EMA interval/ratio, and L2 rejection counters. Recovery: Raise L2 discard priority to protect the pod; tune CPU thresholds or EMA cadence only after comparing with test baselines.

5.1.47 POD_PENDING_REQUEST_DOC

Table 5-48 POD_PENDING_REQUEST_DOC

Field	Details
Description	Pod Resource Congestion status of {{$labels.microservice}} service is DoC for PendingRequest type
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is DoC for PendingRequest type
Severity	Major
Expression	ocbsf_pod_resource_congestion_state{type="queue"} == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.27
Metric Used	ocbsf_pod_resource_congestion_state{type="queue"}
Recommended Actions	Cause: The pending request queue is in DANGER_OF_CONGESTION. Diagnostic Information: Validate pod_resource_congestion_state{resourceType="queue"} == 1 and queue-related pod_resource_stress. Recovery: Review queue DOC thresholds; if early protection is desired, allow gentle shedding of lowest-priority traffic at DOC, otherwise tune thresholds to match observed load.

5.1.48 POD_PENDING_REQUEST_CONGESTED

Table 5-49 POD_PENDING_REQUEST_CONGESTED

Field	Details
Description	Pod Resource Congestion status of {{$labels.microservice}} service is congested for PendingRequest type
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is congested for PendingRequest type
Severity	Critical
Expression	ocbsf_pod_resource_congestion_state{type="queue"} == 4
OID	1.3.6.1.4.1.323.5.3.37.1.2.28
Metric Used	ocbsf_pod_resource_congestion_state{type="queue"}
Recommended Actions	Cause: The pending HTTP request queue is in CONGESTED state. Diagnostic Information: Verify pod_resource_congestion_state{resourceType="queue"} == 4 and pod_resource_stress{resourceType="queue"}; review ocbsf_http_congestion_message_reject_total for low-priority discards at this level. Recovery: Validate the queue thresholds in the active profile and the CONGESTED discard priority. If backlog persists, increase shedding (raise discard priority) so that the lower priority (higher number) requests are rejected earlier.

5.1.49 POD_CPU_DOC

Table 5-50 POD_CPU_DOC

Field	Details
Description	Pod Resource Congestion status of {{$labels.microservice}} service is DoC for CPU type
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is DoC for CPU type
Severity	Major
Expression	ocbsf_pod_resource_congestion_state{type="cpu"} == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.29
Metric Used	ocbsf_pod_resource_congestion_state{type="cpu"}
Recommended Actions	Cause: CPU utilization is in DANGER_OF_CONGESTION per current thresholds and EMA settings. Diagnostic Information: Check pod_resource_congestion_state{resourceType="cpu"} == 1 and pod_resource_stress{resourceType="cpu"}; confirm EMA parameters (interval and 70:30 ratios). Recovery: If transient, increase stateChangeSampleCount to avoid oscillation; if sustained, adjust CPU DOC threshold or enable mild shedding for non-critical, low-priority requests.

5.1.50 POD_CPU_CONGESTED

Table 5-51 POD_CPU_CONGESTED

Field	Details
Description	Pod Resource Congestion status of {{$labels.microservice}} service is congested for CPU type
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is congested for CPU type
Severity	Critical
Expression	ocbsf_pod_resource_congestion_state{type="cpu"} == 4
OID	1.3.6.1.4.1.323.5.3.37.1.2.30
Metric Used	ocbsf_pod_resource_congestion_state
Recommended Actions	Cause: CPU utilization reached CONGESTED state. Diagnostic Information: Validate pod_resource_congestion_state{resourceType="cpu"} in {2,4} per alert rule (CONGESTION_L1 and/or CONGESTED), and pod_resource_stress{resourceType="cpu"}; check message rejections at this congestion level. Recovery: Tighten protection by raising the discard priority at this state so more low-priority requests are dropped. Reassess CPU thresholds and EMA intervals only after reviewing benchmarks.

5.1.51 POD_MEMORY_DOC

Table 5-52 POD_MEMORY_DOC

Field	Details
Description	Pod Resource Congestion status of {{$labels.microservice}} service is DoC for Memory type
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is DoC for Memory type
Severity	Major
Expression	ocbsf_pod_resource_congestion_state{type="memory"} == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.31
Metric Used	ocbsf_pod_resource_congestion_state{type="memory"}
Recommended Actions	The alert gets cleared when the system memory comes below the configured threshold value. For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.52 POD_MEMORY_CONGESTED

Table 5-53 POD_MEMORY_CONGESTED

Field	Details
Description	Pod Resource Congestion status of {{$labels.microservice}} service is congested for Memory type
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is congested for Memory type
Severity	Critical
Expression	ocbsf_pod_resource_congestion_state{type="memory"} == 2
OID	1.3.6.1.4.1.323.5.3.37.1.2.32
Metric Used	ocbsf_pod_resource_congestion_state{type="memory"}
Recommended Actions	The alert gets cleared when the system memory comes below the configured threshold value. For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.53 SERVICE_OVERLOADED

Table 5-54 SERVICE_OVERLOADED

Field	Details
Description	Overload Level of {{$labels.microservice}} service is L1
Summary	Overload Level of {{$labels.microservice}} service is L1
Severity	Minor
Expression	load_level == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.14
Metric Used	load_level
Recommended Actions	The alert gets cleared when the system is back to normal state. For any additional guidance, contact My Oracle Support.

Table 5-55 SERVICE_OVERLOADED

Field	Details
Description	Overload Level of {{$labels.microservice}} service is L2
Summary	Overload Level of {{$labels.microservice}} service is L2
Severity	Major
Expression	load_level == 2
OID	1.3.6.1.4.1.323.5.3.37.1.2.14
Metric Used	load_level
Recommended Actions	The alert gets cleared when the system is back to normal state. For any additional guidance, contact My Oracle Support.

Table 5-56 SERVICE_OVERLOADED

Field	Details
Description	Overload Level of {{$labels.service}} service is L3
Summary	Overload Level of {{$labels.service}} service is L3
Severity	Critical
Expression	load_level == 3
OID	1.3.6.1.4.1.323.5.3.37.1.2.14
Metric Used	load_level
Recommended Actions	The alert gets cleared when the system is back to normal state. For any additional guidance, contact My Oracle Support.

5.1.54 SERVICE_RESOURCE_OVERLOADED

Alerts when service is in overload state due to memory usage

Table 5-57 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L1 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L1 for {{$labels.type}} type
Severity	Minor
Expression	service_resource_overload_level{type="memory"} == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="memory"}
Recommended Actions	The alert gets cleared when the memory usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 5-58 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L2 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L2 for {{$labels.type}} type
Severity	Major
Expression	service_resource_overload_level{type="memory"} == 2
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="memory"}
Recommended Actions	The alert gets cleared when the memory usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 5-59 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L3 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L3 for {{$labels.type}} type
Severity	Critical
Expression	service_resource_overload_level{type="memory"} == 3
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="memory"}
Recommended Actions	The alert gets cleared when the memory usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to CPU usage

Table 5-60 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L1 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L1 for {{$labels.type}} type
Severity	Minor
Expression	service_resource_overload_level{type="cpu"} == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="cpu"}
Recommended Actions	The alert gets cleared when the CPU usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 5-61 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L2 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L2 for {{$labels.type}} type
Severity	Major
Expression	service_resource_overload_level{type="cpu"} == 2
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="cpu"}
Recommended Actions	The alert gets cleared when the CPU usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 5-62 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L3 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L3 for {{$labels.type}} type
Severity	Critical
Expression	service_resource_overload_level{type="cpu"} == 3
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="cpu"}
Recommended Actions	The alert gets cleared when the CPU usage of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to number of pending messages

Table 5-63 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L1 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L1 for {{$labels.type}} type
Severity	Minor
Expression	service_resource_overload_level{type="svc_pending_count"} == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="svc_pending_count"}
Recommended Actions	The alert gets cleared when the number of pending messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 5-64 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L2 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L2 for {{$labels.type}} type
Severity	Major
Expression	service_resource_overload_level{type="svc_pending_count"} == 2
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="svc_pending_count"}
Recommended Actions	The alert gets cleared when the number of pending messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 5-65 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L3 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L3 for {{$labels.type}} type
Severity	Critical
Expression	service_resource_overload_level{type="svc_pending_count"} == 3
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="svc_pending_count"}
Recommended Actions	The alert gets cleared when the number of pending messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Alerts when service is in overload state due to number of failed requests

Table 5-66 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L1 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L1 for {{$labels.type}} type
Severity	Minor
Expression	service_resource_overload_level{type="svc_failure_count"} == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="svc_failure_count"}
Recommended Actions	The alert gets cleared when the number of failed messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 5-67 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L2 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L2 for {{$labels.type}} type
Severity	Major
Expression	service_resource_overload_level{type="svc_failure_count"} == 2
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="svc_failure_count"}
Recommended Actions	The alert gets cleared when the number of failed messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

Table 5-68 SERVICE_RESOURCE_OVERLOADED

Field	Details
Description	{{$labels.microservice}} service is L3 for {{$labels.type}} type
Summary	{{$labels.microservice}} service is L3 for {{$labels.type}} type
Severity	Critical
Expression	service_resource_overload_level{type="svc_failure_count"} == 3
OID	1.3.6.1.4.1.323.5.3.37.1.2.15
Metric Used	service_resource_overload_level{type="svc_failure_count"}
Recommended Actions	The alert gets cleared when the number of failed messages of the service is back to normal state. For any additional guidance, contact My Oracle Support.

5.1.55 SYSTEM_IMPAIRMENT_MAJOR

Table 5-69 SYSTEM_IMPAIRMENT_MAJOR

Field	Details
Description	Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 10 minutes.
Summary	Major impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 10 minutes.
Severity	Major
Expression	(db_tier_replication_status{role="failed"} == 0) or (db_tier_replication_status{role="active"} == 0) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="standby"})) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="failed"})) or (avg_over_time(db_tier_binlog_used_bytes_percentage[5m])>= 80)
OID	1.3.6.1.4.1.323.5.3.37.1.2.16
Metric Used	db_tier_replication_status
Recommended Actions	For any additional guidance, contact My Oracle Support.

5.1.56 SYSTEM_IMPAIRMENT_CRITICAL

Table 5-70 SYSTEM_IMPAIRMENT_CRITICAL

Field	Details
Description	Critical impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 30 minutes.
Summary	Critical impairment alert raised for REPLICATION_FAILED or REPLICATION_CHANNEL_DOWN or BINLOG_STORAGE usage must be more than 80% for 30 minutes.
Severity	Critical
Expression	(db_tier_replication_status{role="failed"} == 0) or (db_tier_replication_status{role="active"} == 0) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="standby"})) or (count by (site_name) (db_tier_replication_status) == count by (site_name) (db_tier_replication_status{role="failed"})) or (avg_over_time(db_tier_binlog_used_bytes_percentage[5m])>= 80)
OID	1.3.6.1.4.1.323.5.3.37.1.2.16
Metric Used	db_tier_replication_status
Recommended Actions	For any additional guidance, contact My Oracle Support.

5.1.57 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN

Table 5-71 SYSTEM_OPERATIONAL_STATE_PARTIAL_SHUTDOWN

Field	Details
Description	System Operational State is now in partial shutdown state.
Summary	System Operational State is now in partial shutdown state.
Severity	Major
Expression	system_operational_state == 2
OID	1.3.6.1.4.1.323.5.3.37.1.2.17
Metric Used	system_operational_state == 2
Recommended Actions	For any additional guidance, contact My Oracle Support.

5.1.58 SYSTEM_OPERATIONAL_STATE_COMPLETE_SHUTDOWN

Table 5-72 SYSTEM_OPERATIONAL_COMPLETE_SHUTDOWN

Field	Details
Description	System Operational State is now in complete shutdown state
Summary	System Operational State is now in complete shutdown state
Severity	Critical
Expression	system_operational_state == 3
OID	1.3.6.1.4.1.323.5.3.37.1.2.17
Metric Used	system_operational_state
Recommended Actions	For any additional guidance, contact My Oracle Support.

5.1.59 DIAM_CONN_PEER_DOWN

Table 5-73 DIAM_CONN_PEER_DOWN

Field	Details
Description	Diameter connection to peer {{ $labels.peerHost }} is down.
Summary	Diameter connection to peer down.
Severity	Major
Expression	(sum by (namespace,peerHost)(ocbsf_diam_conn_network) == 0) and (sum by (namespace,peerHost)(max_over_time(ocbsf_diam_conn_network[24h])) != 0)
OID	1.3.6.1.4.1.323.5.3.37.1.2.18
Metric Used	ocbsf_diam_conn_network
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.60 DIAM_CONN_NETWORK_DOWN

Table 5-74 DIAM_CONN_NETWORK_DOWN

Field	Details
Description	All diameter network connections are down.
Summary	All diameter network connections are down.
Severity	Critical
Expression	sum by (namespace)(ocbsf_diam_conn_network) == 0
OID	1.3.6.1.4.1.323.5.3.37.1.2.19
Metric Used	ocbsf_diam_conn_network
Recommended Actions	For any assistance, contact My Oracle Support.

5.1.61 DIAM_RESPONSE_REALM_VALIDATION_ERROR_CRITICAL

Table 5-75 DIAM_RESPONSE_REALM_VALIDATION_ERROR_CRITICAL

Field	Details
Description	At least 75% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED', either of BSF realm or PCF Realm doesn't match with received destination realm in diameter message.
Summary	{{ $value }}% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED'.
Severity	CRITICAL
Expression	(sum(increase(ocbsf_diam_realm_validation_failed_total{responseCode="3003", appId="16777236"}[10m])) / sum(increase(ocbsf_diam_response_network_total{appId="16777236"}[10m]))) * 100 >= 75
OID	1.3.6.1.4.1.323.5.3.37.1.2.41
Metric Used	ocbsf_diam_realm_validation_failed_total
Recommended Actions	Check if the value of the following keys under Advanced settings of diameter settings page are set to true: DIAMETER.Enable.Validate.Realm DIAMETER.BSF.Enable.Validate.Binding.Realm Check the destination-realm in diameter request.

5.1.62 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MAJOR

Table 5-76 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MAJOR

Field	Details
Description	At least 50% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED', either of BSF realm or PCF Realm doesn't match with received destination realm in diameter message.
Summary	{{ $value }}% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED'.
Severity	MAJOR
Expression	(sum(increase(ocbsf_diam_realm_validation_failed_total{responseCode="3003", appId="16777236"}[10m])) / sum(increase(ocbsf_diam_response_network_total{appId="16777236"}[10m]))) * 100 >= 50
OID	1.3.6.1.4.1.323.5.3.37.1.2.41
Metric Used	ocbsf_diam_realm_validation_failed_total
Recommended Actions	Check if the value of the following keys under Advanced settings of diameter settings page are set to true: DIAMETER.Enable.Validate.Realm DIAMETER.BSF.Enable.Validate.Binding.Realm Check the destination-realm coming in diameter request.

5.1.63 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MINOR

Table 5-77 DIAM_RESPONSE_REALM_VALIDATION_ERROR_MINOR

Field	Details
Description	At least 20% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED', either of BSF realm or PCF Realm doesn't match with received destination realm in diameter message.
Summary	{{ $value }}% of the Diam Response failed with error 'DIAMETER_REALM_NOT_SERVED'.
Severity	MINOR
Expression	(sum(increase(ocbsf_diam_realm_validation_failed_total{responseCode="3003", appId="16777236"}[10m])) / sum(increase(ocbsf_diam_response_network_total{appId="16777236"}[10m]))) * 100 >= 20
OID	1.3.6.1.4.1.323.5.3.37.1.2.41
Metric Used	ocbsf_diam_realm_validation_failed_total
Recommended Actions	Check if the value of the following keys under Advanced settings of diameter settings page are set to true: DIAMETER.Enable.Validate.Realm DIAMETER.BSF.Enable.Validate.Binding.Realm Check the destination-realm coming in diameter request.

5.1.64 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MINOR

Table 5-78 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MINOR

Field	Details
Description	At least 20 % of the BSF Audit Notification Requests sent to PCF to check for Suspected Stale Sessions have responded with a 5xx or 4xx (excluding 404) Status in the last 24 hours.
Summary	At least 20% of the BSF Notification Request for Audit have responded with a 5xx or 4xx (not 404) Status in the last 24 hours.
Severity	MINOR
Expression	(sum by (namespace, microservice) (increase(ocbsf_query_response_count_total{response_code=~"5..\|4..\|timeout",response_code!="404"}[24h])) / sum by (namespace, microservice) (increase(ocbsf_query_response_count_total[24h]))) * 100 >= 20
OID	1.3.6.1.4.1.323.5.3.37.1.2.42
Metric Used	ocbsf_query_response_count_total
Recommended Actions	Determine the reason why these notification requests are failing. This alert indicates that there is a potential issue either with the network communications, or the NF where the audit notifications point to.

5.1.65 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MAJOR

Table 5-79 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_MAJOR

Field	Details
Description	At least 40 % of the BSF Audit Notification Requests sent to PCF to check for Suspected Stale Sessions have responded with a 5xx or 4xx (excluding 404) Status in the last 24 hours.
Summary	{{ $value }} % of the BSF Audit Notification Requests sent to PCF to check for Suspected Stale Sessions have responded with a 5xx or 4xx (excluding 404) Status in the last 24 hours.
Severity	MAJOR
Expression	(sum by (namespace, microservice) (increase(ocbsf_query_response_count_total{response_code=~"5..\|4..\|timeout",response_code!="404"}[24h])) / sum by (namespace, microservice) (increase(ocbsf_query_response_count_total[24h]))) * 100 >= 40
OID	1.3.6.1.4.1.323.5.3.37.1.2.42
Metric Used	ocbsf_query_response_count_total
Recommended Actions	Determine the reason why these notification requests are failing. This alert indicates that there is an issue either with the network communications, or the NF where the audit notifications point to, that needs to be addressed as soon as possible.

5.1.66 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_CRITICAL

Table 5-80 AUDIT_STALE_NOTIFY_ERROR_RESPONSE_CRITICAL

Field	Details
Description	At least 20 % of the BSF Audit Notification Requests sent to PCF to check for Suspected Stale Sessions have responded with a 5xx or 4xx (excluding 404) Status in the last 24 hours.
Summary	At least 60% of the BSF Notification Request for Audit to PCF (or its respective NF) failed with a 5xx or 4xx (not 404) Status in the last 24 hours. The threshold default value is defined at `BSF_Alertrules.yaml`.
Severity	CRITICAL
Expression	(sum by (namespace, microservice) (increase(ocbsf_query_response_count_total{response_code=~"5..\|4..\|timeout",response_code!="404"}[24h])) / sum by (namespace, microservice) (increase(ocbsf_query_response_count_total[24h]))) * 100 >= 20
OID	1.3.6.1.4.1.323.5.3.37.1.2.42
Metric Used	ocbsf_query_response_count_total
Recommended Actions	Determine the reason why these notification requests are failing. This alert indicates that there is a critical issue either with the network communications, or the NF where the audit notifications point to, that needs to be addressed immediately.

5.1.67 BSF_CONNECTION_FAILURE

Table 5-81 BSF_CONNECTION_FAILURE

Field	Details
Description	Connection failure on Egress and Ingress Gateways for incoming and outgoing connections.
Summary	Connection failure on Egress and Ingress Gateways for incoming and outgoing connections.
Severity	Major
Expression	sum(increase(ocbsf_oc_ingressgateway_connection_failure_total[5m]) >0 or (ocbsf_oc_ingressgateway_connection_failure_total unless ocbsf_oc_ingressgateway_connection_failure_total offset 5m )) by (namespace,app, error_reason) > 0 or sum(increase(ocbsf_oc_egressgateway_connection_failure_total[5m]) >0 or (ocbsf_oc_egressgateway_connection_failure_total unless ocbsf_oc_egressgateway_connection_failure_total offset 5m )) by (namespace,app, error_reason) > 0
OID	1.3.6.1.4.1.323.5.3.37.1.2.43
Metric Used	ocbsf_oc_ingressgateway_connection_failure_total
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.68 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Table 5-82 INGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Field	Details
Description	'BSF Ingress Gateway Data Director unreachable for {{$labels.namespace}}'
Summary	'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} BSF Ingress Gateway Data Director unreachable'
Severity	Major
Expression	sum(oc_ingressgateway_dd_unreachable) by(namespace,container) > 0
OID	1.3.6.1.4.1.323.5.3.37.1.2.48
Metric Used	oc_ingressgateway_dd_unreachable
Recommended Actions	Alert gets cleared automatically when the connection with data director is established.

5.1.69 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Table 5-83 EGRESS_GATEWAY_DD_UNREACHABLE_MAJOR

Field	Details
Description	'BSF Egress Gateway Data Director unreachable for {{$labels.namespace}}'
Summary	'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} BSF Egress Gateway Data Director unreachable'
Severity	Major
Expression	sum(oc_egressgateway_dd_unreachable) by(namespace,container) > 0
OID	1.3.6.1.4.1.323.5.3.37.1.2.49
Metric Used	oc_egressgateway_dd_unreachable
Recommended Actions	Alert gets cleared automatically when the connection with data director is established.

5.1.70 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Table 5-84 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MINOR

Field	Details
Description	Diam-gw certificate expiry in less than 6 months for {{$labels.namespace}}
Summary	Diam-gw certificate expiry in less than 6 months
Severity	Minor
Expression	dgw_tls_cert_expiration_seconds - time() <= 15724800
OID	1.3.6.1.4.1.323.5.3.37.1.2.47
Metric Used	dgw_tls_cert_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.71 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Table 5-85 DIAM_GATEWAY_CERTIFICATE_EXPIRY_MAJOR

Field	Details
Description	Diam-gw certificate expiry in less than 3 months for {{$labels.namespace}}.
Summary	Diam-gw certificate expiry in less than 3 months.
Severity	Major
Expression	dgw_tls_cert_expiration_seconds - time() <= 7862400
OID	1.3.6.1.4.1.323.5.3.37.1.2.47
Metric Used	dgw_tls_cert_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.72 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Table 5-86 DIAM_GATEWAY_CERTIFICATE_EXPIRY_CRITICAL

Field	Details
Description	Diam-gw certificate expiry in less than a month for {{$labels.namespace}}.
Summary	Diam-gw certificate expiry in less than a month.
Severity	Critical
Expression	dgw_tls_cert_expiration_seconds - time() <= 2592000
OID	1.3.6.1.4.1.323.5.3.37.1.2.47
Metric Used	dgw_tls_cert_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.73 DGW_TLS_CONNECTION_FAILURE

Table 5-87 DGW_TLS_CONNECTION_FAILURE

Field	Details
Description	Alert for TLS connection establishment.
Summary	TLS Connection failure when Diam gateway is an initiator.
Severity	Major
Expression	sum by (namespace,reason)(ocbsf_diam_failed_conn_network) > 0
OID	1.3.6.1.4.1.323.5.3.37.1.2.81
Metric Used	ocbsf_diam_failed_conn_network
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.74 BINDING_REVALIDATION_PCF_BINDING_MISSING_MINOR

Table 5-88 BINDING_REVALIDATION_PCF_BINDING_MISSING_MINOR

Field	Details
Description	At least 30% but less than 50% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes.
Summary	At least 30% but less than 50% of the PCF BINDING missing among all Binding Revalidation records in the last 5 minutes.
Severity	Minor
Expression	(sum by (namespace) (rate(ocbsf_binding_revalidation_pcfBinding_missing_total[5m])) / sum by (namespace) (rate(ocbsf_binding_revalidation_response_total[5m]))) * 100 >= 30 < 50
OID	1.3.6.1.4.1.323.5.3.37.1.2.51
Metric Used
Recommended Actions	Check BSF Management service health history. Increase binding audit frequency. For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.75 BINDING_REVALIDATION_PCF_BINDING_MISSING_MAJOR

Table 5-89 BINDING_REVALIDATION_PCF_BINDING_MISSING_MAJOR

Field	Details
Description	At least 50% but less than 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes.
Summary	At least 50% but less than 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes.
Severity	Major
Expression	(sum by (namespace) (rate(ocbsf_binding_revalidation_pcfBinding_missing_total[5m])) / sum by (namespace) (rate(ocbsf_binding_revalidation_response_total[5m]))) * 100 >= 50 < 70
OID	1.3.6.1.4.1.323.5.3.37.1.2.51
Metric Used
Recommended Actions	Check BSF Management service health history. Increase binding audit frequency. For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.76 BINDING_REVALIDATION_PCF_BINDING_MISSING_CRITICAL

Table 5-90 BINDING_REVALIDATION_PCF_BINDING_MISSING_CRITICAL

Field	Details
Description	At least 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes.
Summary	At least 70% of the PCF BINDING missing among all binding revalidation records in the last 5 minutes.
Severity	Critical
Expression	(sum by (namespace) (rate(ocbsf_binding_revalidation_pcfBinding_missing_total[5m])) / sum by (namespace) (rate(ocbsf_binding_revalidation_response_total[5m]))) * 100 >= 70
OID	1.3.6.1.4.1.323.5.3.37.1.2.51
Metric Used
Recommended Actions	Check BSF Management service health history. Increase binding audit frequency. For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.77 BSF_STATE_NON_FUNCTIONAL_CRITICAL

Table 5-91 BSF_STATE_NON_FUNCTIONAL_CRITICAL

Field	Details
Description	BSF is in non functional state due to DB Cluster state down
Summary	BSF is in non functional state due to DB Cluster state down
Severity	Critical
Expression	appinfo_nfDbFunctionalState_current{nfDbFunctionalState="Not_Running"} == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.56
Metric Used	appinfo_dbmonitorclusterDbState_current
Recommended Actions	Cause: The alert is raised because the BSF network function is non-functional due to the database cluster being down. Diagnostic Information: System monitoring indicates that the database cluster state is "Not Running" and is unreachable, preventing the BSF network function from operating normally. Recovery: Check and restore the database cluster to a running state. After recovery, verify that the BSF network function returns to operational status. Escalate to database administration if the issue persists.

5.1.78 POD_PENDING_REQUEST_CONGESTION_L1

Table 5-92 POD_PENDING_REQUEST_CONGESTION_L1

Field	Details
Description	Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L1 for resource type queue.
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L1 for resource type queue.
Severity	Critical
Expression	occnp_pod_resource_congestion_state{type="queue"} == 2
OID	1.3.6.1.4.1.323.5.3.37.1.2.54
Metric Used	occnp_pod_resource_congestion_state
Recommended Actions	Cause: Pending HTTP request queue reached CONGESTION_L1. Diagnostic Information: pod_resource_congestion_state{resourceType="queue"} == 2; verify pod_resource_stress queue values; review rejections by requestUri/requestMethod at L1. Recovery: Ensure L1 queue thresholds are correct; if queues grow, raise the L1 discard priority to reject lower-priority requests earlier.

5.1.79 POD_PENDING_REQUEST_CONGESTION_L2

Table 5-93 POD_PENDING_REQUEST_CONGESTION_L2

Field	Details
Description	Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L2 for resource type queue.
Summary	Pod Resource Congestion status of {{$labels.microservice}} service is Congestion_L2 for resource type queue.
Severity	Critical
Expression	occnp_pod_resource_congestion_state{type="queue"} == 3
OID	1.3.6.1.4.1.323.5.3.37.1.2.55
Metric Used	occnp_pod_resource_congestion_state
Recommended Actions	Cause: Pending HTTP request queue reached CONGESTION_L2. Diagnostic Information: pod_resource_congestion_state{resourceType="queue"} == 3; examine queue stress trends and rejection counters for low-priority traffic at L2. Recovery: Increase shedding at L2 (raise discard priority), and review queue thresholds to prevent saturation.

5.1.80 AUDIT_NOT_RUNNING

Table 5-94 AUDIT_NOT_RUNNING

Field	Details
Description	Audit has not been running for at least 1 hour in pod {{$labels.pod}}.
Summary	Audit has been stuck in an unhealthy state for over 1 hour.
Severity	Critical
Expression	(increase(data_repository_invocations_seconds_count{method="getQueuedTablesToAudit",state="SUCCESS"}[1h])) == 0
OID	1.3.6.1.4.1.323.5.3.37.1.2.45
Metric Used	data_repository_invocations_seconds_count
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.81 DIAMETER_POD_ERROR_RESPONSE_MINOR

Table 5-95 DIAMETER_POD_ERROR_RESPONSE_MINOR

Field	Details
Description	At least 1% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER in pod {{$labels.pod}}
Summary	At least 1% of the Diam Response requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Severity	Minor
Expression	(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m]))) * 100 >=1
OID	1.3.6.1.4.1.323.5.3.37.1.2.46
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.82 DIAMETER_POD_ERROR_RESPONSE_MAJOR

Table 5-96 DIAMETER_POD_ERROR_RESPONSE_MAJOR

Field	Details
Description	At least 5% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER in pod {{$labels.pod}}
Summary	At least 5% of the Diam Response requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Severity	Major
Expression	(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m]))) * 100 >=5
OID	1.3.6.1.4.1.323.5.3.37.1.2.46
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.83 DIAMETER_POD_ERROR_RESPONSE_CRITICAL

Table 5-97 DIAMETER_POD_ERROR_RESPONSE_CRITICAL

Field	Details
Description	At least 10% of the Diam Response connection requests failed with error DIAMETER_UNABLE_TO_DELIVER in pod {{$labels.pod}}
Summary	At least 10% of the Diam Response requests failed with error DIAMETER_UNABLE_TO_DELIVER.
Severity	Critical
Expression	(sum by (pod) (rate(ocbsf_diam_response_network_total{responseCode="3002"}[2m])))/ (sum by (pod) (rate(ocbsf_diam_response_network_total[2m]))) * 100 >=10
OID	1.3.6.1.4.1.323.5.3.37.1.2.46
Metric Used	ocbsf_diam_response_network_total
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.84 BSF_PCF_BINDING_TABLE_MIGRATED_PERCENTAGE

Table 5-98 BSF_PCF_BINDING_TABLE_MIGRATED_PERCENTAGE

Field	Details
Description	Pcf binding table migration configuration should be updated to only use the pcf binding v2 table
Summary	Pcf binding table migration configuration should be updated to only use the pcf binding v2 table
Severity	Minor
Expression	sum by (siteId) (ocbsf_binding_record_migrated_percentage{microservice="bsf-management-service"}) / count by (siteId) (ocbsf_binding_record_migrated_percentage{microservice="bsf-management-service"}) == 100
OID	1.3.6.1.4.1.323.5.3.37.1.2.57
Metric Used	ocbsf_binding_record_migrated_percentage
Recommended Actions	Cause: The alert is raised because all BSF pcf binding records in legacy v1 on current site have been migrated to pcf binding v2 table. Diagnostic Information: Verify pcf_binding table is empty and transition Advanced Settings PCF_BINDING_TABLE_LOOKUP to 3. Recovery: No recovery steps needed as it just indicating to move into a migration complete status. Alert is cleared after 24 hours

5.1.85 BSF_PCF_BINDING_TABLE_MIGRATION_INVALID_CONFIGURATION

Table 5-99 BSF_PCF_BINDING_TABLE_MIGRATION_INVALID_CONFIGURATION

Field	Details
Description	Pcf binding table migration configuration should be reviewed and updated to a valid configuration, invalid configurations: {{$labels.incompatibleFeatures}}.
Summary	Pcf binding table migration, invalid configuration was set, latest valid values are used.
Severity	Critical
Expression	ocbsf_feature_incompatibility == 1
OID	1.3.6.1.4.1.323.5.3.37.1.2.58
Metric Used	ocbsf_feature_incompatibility
Recommended Actions	Cause: The alert is raised because the current configuration for Remove Index Based Lookup feature is having an incorrect combination for ENABLE_PCF_BINDING_TABLE_MIGRATION and PCF_BINDING_TABLE_LOOKUP_VALUE. Diagnostic Information: Verify configuration is valid according to the following rules: If ENABLE_PCF_BINDING_TABLE_MIGRATION is false, PCF_BINDING_TABLE_LOOKUP_VALUE can only have value 0. If ENABLE_PCF_BINDING_TABLE_MIGRATION is true, PCF_BINDING_TABLE_LOOKUP_VALUE can only have values between 1-3. Recovery: Alert is cleared once the configuration is updated to a valid configuration.

5.1.86 CERTIFICATE_EXPIRY_MINOR

Table 5-100 CERTIFICATE_EXPIRY_MINOR

Field	Details
Description	Certificate expiry in less than 6 months for {{$labels.namespace}}
Summary	Certificate expiry in less than 6 months
Severity	Minor
Expression	security_cert_x509_expiration_seconds - time() <= 15724800
OID	1.3.6.1.4.1.323.5.3.37.1.2.44
Metric Used	security_cert_x509_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.87 CERTIFICATE_EXPIRY_MAJOR

Table 5-101 CERTIFICATE_EXPIRY_MAJOR

Field	Details
Description	Certificate expiry in less than 3 months for {{$labels.namespace}}
Summary	Certificate expiry in less than 3 months.
Severity	Major
Expression	security_cert_x509_expiration_seconds - time() <= 7862400
OID	1.3.6.1.4.1.323.5.3.37.1.2.44
Metric Used	security_cert_x509_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5.1.88 CERTIFICATE_EXPIRY_CRITICAL

Table 5-102 CERTIFICATE_EXPIRY_CRITICAL

Field	Details
Description	Certificate expiry in less than a month for {{$labels.namespace}}
Summary	Certificate expiry in less than a month.
Severity	Critical
Expression	security_cert_x509_expiration_seconds - time() <= 2592000
OID	1.3.6.1.4.1.323.5.3.37.1.2.44
Metric Used	security_cert_x509_expiration_seconds
Recommended Actions	For any additional guidance, contact My Oracle Support (https://support.oracle.com).

5 BSF Alerts

5.1 List of Alerts