6 CAPIF Alerts
Note:
- The performance and capacity of the CAPIF system may vary based on the call model, feature or interface configuration, and underlying CNE and hardware environment.
- Due to unavailability of metric and/or MQL queries, the
following alerts are not supported for OCI:
- OccapifNfStatusUnavailable
- OccapifPodsRestart
- OccapifEgressGatewayServiceDown
- OccapifIngressGatewayServiceDown
- OccapifAfManagerServiceDown
- OccapifAPIManagerServiceDown
- OccapifEventManagerServiceDown
6.1 System Level Alerts
This section lists the system level alerts for CAPIF.
6.1.1 OccapifNfStatusUnavailable
Table 6-1 OccapifNfStatusUnavailable
Field | Details |
---|---|
Description | CAPIF services unavailable' |
Summary | "namespace: {{$labels.namespace}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : All OCCAPIF services are unavailable." |
Severity | Critical |
Condition | All the CAPIF services are unavailable. |
Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
Recommended Actions | The alert is cleared automatically when the CAPIF services restart.
Steps:
|
6.1.2 OccapifPodsRestart
Table 6-2 OccapifPodsRestart
Field | Details |
---|---|
Description | 'Pod <Pod Name> has restarted. |
Summary | "namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A Pod has restarted" |
Severity | Major |
Condition | A pod belonging to any of the CAPIF services has restarted. |
Metric Used | kube_pod_container_status_restarts_total |
Recommended Actions |
The alert is cleared automatically if the specific pod is up. Steps:
|
6.1.3 OccapifTotalExternalIngressTrafficRateAboveMinorThreshold
Table 6-3 OccapifTotalExternalIngressTrafficRateAboveMinorThreshold
Field | Details |
---|---|
Description | "OCCAPIF External Ingress traffic rate is above the configured minor threshold i.e. 800 TPS (current value is: {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic rate is above 80 percent of max TPS (1000)" |
Severity | Minor |
Condition | The total CAPIF External Ingress traffic rate has crossed
the configured minor threshold of 800 TPS.
Default value of this alert trigger point in Occapif Alert.yaml is 80 % of 1000 (Maximum ingress request rate). |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5003 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions | The alert is cleared either when the External Ingress
traffic rate goes above the minor threshold.
Note: The threshold is configurable in the Occapif Alert.yaml alert file. Reassess why the CAPIF is receiving additional traffic. If this alert is unexpected, contact My Oracle Support.Steps:
|
6.1.4 OccapifTotalNetworkIngressTrafficRateAboveMinorThreshold
Table 6-4 OccapifTotalNetworkIngressTrafficRateAboveMinorThreshold
Field | Details |
---|---|
Description | "OCCAPIF Network Ingress traffic rate is above the configured minor threshold i.e. 800 TPS (current value is: {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic rate is above 80 percent of max TPS (1000)" |
Severity | Minor |
Condition | The total CAPIF Network Ingress traffic rate has crossed
the configured minor threshold of 800 TPS.
Default value of this alert trigger point in Occapif Alert.yaml is 80% of 1000 (maximum ingress request rate). |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5004 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions | The alert is cleared either when the Network Ingress
traffic rate goes above the minor threshold.
Note: The threshold is configurable in the Occapif Alert.yaml alert file. Reassess why the CAPIF is receiving additional traffic. If this alert is unexpected, contact My Oracle Support.Steps:
|
6.1.5 OccapifTotalExternalIngressTrafficRateAboveMajorThreshold
Table 6-5 OccapifTotalExternalIngressTrafficRateAboveMajorThreshold
Field | Details |
---|---|
Description | "OCCAPIF External Ingress traffic rate is above the configured major threshold i.e. 900 TPS (current value is: {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic rate is above 90 percent of max TPS (1000)" |
Severity | Major |
Condition | The total CAPIF External Ingress traffic rate has crossed
the configured major threshold of 900 TPS.
Default value of this alert trigger point in Occapif Alert.yaml is 90 % of 1000 (maximum ingress request rate). |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5005 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions | The alert is cleared either when the External Ingress
traffic rate goes above the major threshold.
Note: The threshold is configurable in the Occapif Alert.yaml alert file. Reassess why the CAPIF is receiving additional traffic. If this alert is unexpected, contact My Oracle Support.Steps:
|
6.1.6 OccapifTotalNetworkIngressTrafficRateAboveMajorThreshold
Table 6-6 OccapifTotalNetworkIngressTrafficRateAboveMajorThreshold
Field | Details |
---|---|
Description | "OCCAPIF Network Ingress traffic rate is above the configured major threshold i.e. 900 TPS (current value is: {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 percent of max TPS (1000)" |
Severity | Major |
Condition | The total CAPIF Network Ingress traffic rate has crossed
the configured major threshold of 900 TPS.
Default value of this alert trigger point in Occapif Alert.yaml is 90 % of 1000 (maximum ingress request rate). |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5006 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions | The alert is cleared either when the Network Ingress
traffic rate goes above the major threshold.
Note: The threshold is configurable in the Occapif Alert.yaml alert file. Reassess why the CAPIF is receiving additional traffic. If this alert is unexpected, contact My Oracle Support.Steps:
|
6.1.7 OccapifTotalExternalIngressTrafficRateAboveCriticalThreshold
Table 6-7 OccapifTotalExternalIngressTrafficRateAboveCriticalThreshold
Field | Details |
---|---|
Description | "OCCAPIF External Ingress traffic rate is above the configured critical threshold i.e. 950 TPS (current value is: {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic rate is above 95 percent of max TPS (1000)" |
Severity | Critical |
Condition | The total CAPIF External Ingress traffic rate has crossed
the configured critical threshold of 950 TPS.
Default value of this alert trigger point in Occapif Alert.yaml is 95 % of 1000 (maximum ingress request rate). |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5007 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions | The alert is cleared either when the External Ingress
traffic rate goes above the critical threshold.
Note: The threshold is configurable in the Occapif Alert.yaml alert file. Reassess why the CAPIF is receiving additional traffic. If this alert is unexpected, contact My Oracle Support.Steps:
|
6.1.8 OccapifTotalNetworkIngressTrafficRateAboveCriticalThreshold
Table 6-8 OccapifTotalNetworkIngressTrafficRateAboveCriticalThreshold
Field | Details |
---|---|
Description | "OCCAPIF Network Ingress traffic rate is above the configured critical threshold i.e. 950 TPS (current value is: {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic rate is above 95 percent of max TPS (1000)" |
Severity | Critical |
Condition | The total CAPIF Network Ingress traffic rate has crossed
the configured critical threshold of 950 TPS.
Default value of this alert trigger point in Occapif Alert.yaml is 95 % of 1000 (Maximum ingress request rate). |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5008 |
Metric Used | oc_ingressgateway_http_requests_total |
Recommended Actions | The alert is cleared either when the Network Ingress
traffic rate goes above the critical threshold.
Note: The threshold is configurable in the Occapif Alert.yaml alert file. Reassess why the CAPIF is receiving additional traffic. If this alert is unexpected, contact My Oracle Support.Steps:
|
6.1.9 OccapifExternalIngressTransactionErrorRateAboveZeroPointOnePercent
Table 6-9 OccapifExternalIngressTransactionErrorRateAboveZeroPointOnePercent
Field | Details |
---|---|
Description | "OCCAPIF External Ingress transaction error rate is above 0.1 percent of total transactions (current value is {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction error rate detected above 0.1 percent of total transactions" |
Severity | Warning |
Condition | The number of failed External Ingress transactions is above 0.1 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5009 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure External
Ingress transactions is below 0.1 percent of the total transactions or
when the number of failed transactions crosses the 1% threshold, in
which case the OccapifExternalIngressTransactionErrorRateAbove1Percent
is raised.
Steps:
|
6.1.10 OccapifNetworkIngressTransactionErrorRateAboveZeroPointOnePercent
Table 6-10 OccapifNetworkIngressTransactionErrorRateAboveZeroPointOnePercent
Field | Details |
---|---|
Description | "OCCAPIF Network Ingress transaction error rate is above 0.1 percent of total transactions (current value is {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction error rate detected above 0.1 percent of total transactions" |
Severity | Warning |
Condition | The number of failed Network Ingress transactions is above 0.1 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5010 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure Network
Ingress transactions is below 0.1 percent of the total transactions or
when the number of failed transactions crosses the 1% threshold, in
which case the OccapifNetworkIngressTransactionErrorRateAbove1Percent is
raised.
Steps:
|
6.1.11 OccapifExternalIngressTransactionErrorRateAbove1Percent
Table 6-11 OccapifExternalIngressTransactionErrorRateAbove1Percent
Field | Details |
---|---|
Description | "OCCAPIF External Ingress transaction error rate is above 1 percent of total transactions (current value is {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction error rate detected above 1 percent of total transactions" |
Severity | Warning |
Condition | The number of failed External Ingress transactions is above 1 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5011 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure External
Ingress transactions is below 1 percent of the total transactions.
Steps:
|
6.1.12 OccapifNetworkIngressTransactionErrorRateAbove1Percent
Table 6-12 OccapifNetworkIngressTransactionErrorRateAbove1Percent
Field | Details |
---|---|
Description | "OCCAPIF Network Ingress transaction error rate is above 1 percent of total transactions (current value is {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction error rate detected above 1 percent of total transactions" |
Severity | Warning |
Condition | The number of failed Network Ingress transactions is above 1 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5012 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure Network
Ingress transactions is below 1 percent of the total transactions.
Steps:
|
6.1.13 OccapifExternalIngressTransactionErrorRateAbove10Percent
Table 6-13 OccapifExternalIngressTransactionErrorRateAbove10Percent
Field | Details |
---|---|
Description | "OCCAPIF External Ingress transaction error rate is above 10 percent of total transactions (current value is {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction error rate detected above 10 percent of total transactions" |
Severity | Minor |
Condition | The number of failed External Ingress transactions is above 10 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5013 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure External
Ingress transactions is below 10 percent of the total transactions.
Steps:
|
6.1.14 OccapifNetworkIngressTransactionErrorRateAbove10Percent
Table 6-14 OccapifNetworkIngressTransactionErrorRateAbove10Percent
Field | Details |
---|---|
Description | "OCCAPIF Network Ingress transaction error rate is above 10 percent of total transactions (current value is {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction error rate detected above 10 percent of total transactions" |
Severity | Minor |
Condition | The number of failed Network Ingress transactions is above 10 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5014 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure Network
Ingress transactions is below 10 percent of the total transactions.
Steps:
|
6.1.15 OccapifExternalIngressTransactionErrorRateAbove25Percent
Table 6-15 OccapifExternalIngressTransactionErrorRateAbove25Percent
Field | Details |
---|---|
Description | "OCCAPIF External Ingress transaction error rate detected above 25 percent of total transactions (current value is {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction error rate detected above 25 percent of total transactions" |
Severity | Major |
Condition | The number of failed External Ingress transactions is above 25 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5015 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure External
Ingress transactions is below 25 percent of the total transactions.
Steps:
|
6.1.16 OccapifNetworkIngressTransactionErrorRateAbove25Percent
Table 6-16 OccapifNetworkIngressTransactionErrorRateAbove25Percent
Field | Details |
---|---|
Description | "OCCAPIF Network Ingress transaction error rate detected above 25 percent of total transactions (current value is {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction error rate detected above 25 percent of total transactions" |
Severity | Major |
Condition | The number of failed Network Ingress transactions is above 25 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5016 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure Network
Ingress transactions is below 25 percent of the total transactions.
Steps:
|
6.1.17 OccapifExternalIngressTransactionErrorRateAbove50Percent
Table 6-17 OccapifExternalIngressTransactionErrorRateAbove50Percent
Field | Details |
---|---|
Description | "OCCAPIF External Ingress transaction error rate detected above 50 percent of total transactions (current value is {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction error rate detected above 50 percent of total transactions" |
Severity | Critical |
Condition | The number of failed External Ingress transactions is above 50 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5017 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure External
Ingress transactions is below 50 percent of the total transactions.
Steps:
|
6.1.18 OccapifNetworkIngressTransactionErrorRateAbove50Percent
Table 6-18 OccapifNetworkIngressTransactionErrorRateAbove50Percent
Field | Details |
---|---|
Description | "OCCAPIF Network Ingress transaction error rate detected above 50 percent of total transactions (current value is {{ $value }})" |
Summary | "timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction error rate detected above 50 percent of total transactions" |
Severity | Critical |
Condition | The number of failed Network Ingress transactions is above 50 percent of the total transactions. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5018 |
Metric Used | oc_ingressgateway_http_responses_total |
Recommended Actions | The alert is cleared when the number of failure Network
Ingress transactions is below 50 percent of the total transactions.
Steps:
|
6.1.19 OccapifEgressGatewayServiceDown
Table 6-19 OccapifEgressGatewayServiceDown
Field | Details |
---|---|
Description | "CAPIF Egress-Gateway service {{$labels.app_kubernetes_io_name}} is down" |
Summary | "kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Egress-Gateway service down" |
Severity | Critical |
Condition | None of the pods of the Egress Gateway microservice is available. |
Metric Used | 'up'
Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
Recommended Actions |
The alert is cleared when the Egress Gateway service is available. Note: The threshold is configurable in the NefAlertrules alert file. Steps:
|
6.1.20 OccapifMemoryUsageCrossedMinorThreshold
Table 6-20 OccapifMemoryUsageCrossedMinorThreshold
Field | Details |
---|---|
Description | "CAPIF Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (50%) (value={{ $value }}) of its limit." |
Summary | "namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 50% of its limit." |
Severity | Minor |
Condition | A pod has reached the configured minor threshold (50%) of its memory resource limits. |
Metric Used | 'container_memory_usage_bytes''container_spec_memory_limit_bytes'
Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the memory utilization falls
below the Minor Threshold or crosses the major threshold, in which case
OccapifMemoryUsageCrossedMajorThreshold alert is raised.
Note: The threshold is configurable in the NefAlertrules alert file. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide. |
6.1.21 OccapifMemoryUsageCrossedMajorThreshold
Table 6-21 OccapifMemoryUsageCrossedMajorThreshold
Field | Details |
---|---|
Description | "CAPIF Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (60%) (value = {{ $value }}) of its limit." |
Summary | "namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 60% of its limit." |
Severity | Major |
Condition | A pod has reached the configured major threshold (60%) of its memory resource limits. |
OID | 1.3.6.1.4.1.323.5.3.39.1.3.5021 |
Metric Used |
'container_memory_usage_bytes' 'container_spec_memory_limit_bytes' Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the memory utilization falls
below the Major Threshold or crosses the critical threshold, in which
case OccapifMemoryUsageCrossedCriticalThreshold alert is raised.
Note: The threshold is configurable in the NefAlertrules alert file. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide. |
6.1.22 OccapifMemoryUsageCrossedCriticalThreshold
Table 6-22 OccapifMemoryUsageCrossedCriticalThreshold
Field | Details |
---|---|
Description | "CAPIF Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (70%) (value = {{ $value }}) of its limit." |
Summary | "namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit." |
Severity | Critical |
Condition | A pod has reached the configured critical threshold (70%) of its memory resource limits. |
Metric Used |
'container_memory_usage_bytes' 'container_spec_memory_limit_bytes' Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the memory utilization falls
below the Critical threshold.
Note: The threshold is configurable in the NefAlertrules alert file. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support. Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide. |
6.1.23 OccapifIngressGatewayServiceDown
Table 6-23 OccapifIngressGatewayServiceDown
Field | Details |
---|---|
Description | "CAPIF Ingress-Gateway service {{$labels.app_kubernetes_io_name}} is down" |
Summary | "kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down" |
Severity | Critical |
Condition | None of the pods of the Ingress-Gateway microservice is available. |
Metric Used | 'up'
Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
Recommended Actions |
The alert is cleared when the Ingress Gateway service is available. Steps:
|
6.1.24 OccapifApiManagerServiceDown
Table 6-24 OccapifApiManagerServiceDown
Field | Details |
---|---|
Description | "CAPIF API Manager service {{$labels.app_kubernetes_io_name}} is down" |
Summary | "namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : AF Manager service down" |
Severity | Critical |
Condition | The API Manager service is down. |
Metric Used | 'up'
Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
Recommended Actions |
The alert is cleared when the CAPIF API Manager service is available. Steps:
|
6.1.25 OcapifAfManagerServiceDown
Table 6-25 OcapifAfManagerServiceDown
Field | Details |
---|---|
Description | "CAPIF AF Manager service {{$labels.app_kubernetes_io_name}} is down" |
Summary | "kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : AF Manager service down" |
Severity | Critical |
Condition | The AF Manager service is down. |
Metric Used | 'up'
Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
Recommended Actions |
The alert is cleared when the CAPIF AF Manager service is available. Steps:
|
6.1.26 OccapifEventManagerServiceDown
Table 6-26 OccapifEventManagerServiceDown
Field | Details |
---|---|
Description | "CAPIF API Manager service {{$labels.app_kubernetes_io_name}} is down" |
Summary | "kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : API Manager service down" |
Severity | Critical |
Condition | The Event Manager service is down. |
Metric Used | 'up'
Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
Recommended Actions |
The alert is cleared when the CAPIF Event Manager service is available. Steps:
|
6.2 Application Level Alerts
This section lists the application level alerts for CAPIF.
6.2.1 AfMgrOnboardingOauthValidationFailureRateCrossedThreshold
Table 6-27 AfMgrOnboardingOauthValidationFailureRateCrossedThreshold
Field | Details |
---|---|
Description | "Failure Rate of AI Onboarding Oauth Validation Is Crossing the Threshold (10%)" |
Summary | "namespace: {{$labels.namespace}}, timestamp: {{ with query \"time()\" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Failure Rate Of Onboarding is above 10 percent of total requests." |
Severity | Error |
Condition | The failure rate of API Invoker onboarding is reaching the threshold value. |
Metric Used | occapif_afmgr_resp_total |
Recommended Actions |
The alert is cleared when the failure rate of API invoker onboarding is below the threshold. Steps:
|