8 CNC Console Alerts

This section provides information about CNC Console Alerts.

Note:

  • The user must use updated occncc_agent_alertrules_<version>.yaml file for agent cluster, in case of multicluster deployment.
  • Use occncc_manager_alertrules_<version>.yaml file for single cluster deployment and in the manager cluster, in case of multi cluster deployment.

Table 8-1 Alerts Levels or Severity Types

Alerts Levels / Severity Types Definition
Critical Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of OCCM.
Major Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of OCCM.
Minor Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of OCCM.
Info or Warn (Informational) Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of OCCM.

8.1 CNC Console IAM Alerts

This section provides information about CNC Console IAM Alerts.

8.1.1 CnccIamTotalIngressTrafficRateAboveMinorThreshold

Table 8-2 CnccIamTotalIngressTrafficRateAboveMinorThreshold

Field Details
Description This alert notifies that CNCC IAM Ingress Message rate has crossed the configured minor threshold of 700 to 800 TPS.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)
Severity minor
Condition sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[2m])) >= 700 < 800
OID 1.3.6.1.4.1.323.5.3.51.1.2.7001
Metric Used oc_ingressgateway_http_requests_total
Recommended Actions

Cause:

  • This alert triggers when the CNC Console IAM Ingress receives more traffic than expected.
  • The traffic is primarily authentication, authorization, or user management requests.
  • For example, an Integrated NF or a common service may be sending an unusually high volume of such requests.

Diagnostic Information:

  • Monitor ingress traffic to the pod using the KPI Dashboard.
  • Review iam-ingress pod logs for any irregularities or anomalies, especially spikes in authentication, authorization, or user management operations.

Recovery:

The alert is cleared automatically when ingress traffic drops below the minor threshold or exceeds the major threshold.

If the alert does not clear:

  • Check if an Integrated NF or a common service is generating unexpectedly high volumes of authentication, authorization, or user management requests.
  • Analyze logs and metrics for unusual traffic patterns.
  • Take action to block or limit any unauthorized or unexpected traffic if necessary.

For any assistance, contact My Oracle Support.

Make sure to capture iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.2 CnccIamTotalIngressTrafficRateAboveMajorThreshold

Table 8-3 CnccIamTotalIngressTrafficRateAboveMajorThreshold

Field Details
Description This alert notifies that the CNCC IAM Ingress message rate has crossed the configured major threshold of 800 to 900 TPS.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)
Severity major
Condition sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[2m])) >= 800 < 900
OID 1.3.6.1.4.1.323.5.3.51.1.2.7001
Metric Used oc_ingressgateway_http_requests_total
Recommended Action Cause:
  • This alert triggers when the CNC Console IAM Ingress receives more traffic than expected.
  • The traffic primarily consists of authentication, authorization, or user management requests.
  • For example, an integrated NF or a common service may be sending an unusually high volume of such requests.
Diagnostic Information:
  • Monitor ingress traffic to the pod using the KPI Dashboard.
  • Review the iam-ingress pod logs for any irregularities or anomalies, especially spikes in authentication, authorization, or user management operations.
Recovery:The alert clears automatically when ingress traffic drops below the major threshold or exceeds the critical threshold. If the alert does not clear:
  • Check if an integrated NF or a common service is generating unexpectedly high volumes of authentication, authorization, or user management requests.
  • Analyze logs and metrics for unusual traffic patterns.
  • Take action to block or limit any unauthorized or unexpected traffic, if necessary.
For any assistance, contact My Oracle Support. Make sure to capture the iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.3 CnccIamTotalIngressTrafficRateAboveCriticalThreshold

Table 8-4 CnccIamTotalIngressTrafficRateAboveCriticalThreshold

Field Details
Description This alert notifies that the CNCC IAM Ingress message rate has crossed the configured critical threshold of 900 TPS.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)
Severity critical
Condition sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[2m])) >= 900
OID 1.3.6.1.4.1.323.5.3.51.1.2.7001
Metric Used oc_ingressgateway_http_requests_total
Recommended Action Cause:
  • This alert triggers when the CNC Console IAM Ingress receives more traffic than expected.
  • The traffic primarily consists of authentication, authorization, or user management requests.
  • For example, an integrated NF or a common service may be sending an unusually high volume of such requests.
Diagnostic Information:
  • Monitor ingress traffic to the pod using the KPI Dashboard.
  • Review the iam-ingress pod logs for any irregularities or anomalies, especially spikes in authentication, authorization, or user management operations.
Recovery:The alert clears automatically when ingress traffic drops below the critical threshold. If the alert does not clear:
  • Check if an integrated NF or a common service is generating unexpectedly high volumes of authentication, authorization, or user management requests.
  • Analyze logs and metrics for unusual traffic patterns.
  • Take action to block or limit any unauthorized or unexpected traffic, if necessary.
For any assistance, contact My Oracle Support. Make sure to capture the iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.4 CnccIamMemoryUsageCrossedMinorThreshold

Table 8-5 CnccIamMemoryUsageCrossedMinorThreshold

Field Details
Description This alert notifies that the CNCC IAM Ingress pod has reached the configured minor threshold (70%) of its memory resource limits.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.
Severity minor
Condition sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*iam-ingress-gateway.*|.*iam-kc.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*iam-ingress-gateway.*|.*iam-kc.*",resource="memory"}) * 100 >= 70 < 80
OID 1.3.6.1.4.1.323.5.3.51.1.2.7002
Metric Used container_memory_usage_bytes

kube_pod_container_resource_limits

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Action Cause:
  • This alert triggers when the CNC Console IAM Ingress pod’s memory usage reaches the configured minor threshold (70%) of its resource limits.
  • Higher memory consumption can result from increased authentication, authorization, or user management requests. For example, if an integrated NF or a common service is generating more traffic than expected.
Diagnostic Information:
  • Monitor memory usage metrics (container_memory_usage_bytes and kube_pod_container_resource_limits) using the KPI Dashboard.
  • Review the iam-ingress pod logs for any irregularities, spikes in memory usage, or high volumes of authentication, authorization, or user management activity.
Recovery:The alert clears automatically when memory utilization drops below the minor threshold or exceeds the major threshold.If the alert does not clear:
  • Check if an integrated NF or a common service is causing consistently high levels of authentication, authorization, or user management requests, leading to increased memory usage.
  • Analyze logs and metrics to identify and address the source.
  • Take steps to optimize usage or resolve issues as needed.
  • Also, double-check the resource limits and requests configuration to ensure it is aligned with CNC Console recommendations.
For any assistance, contact My Oracle Support. Make sure to capture the iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.5 CnccIamMemoryUsageCrossedMajorThreshold

Table 8-6 CnccIamMemoryUsageCrossedMajorThreshold

Field Details
Description This alert notifies that the CNC Console IAM Ingress pod has reached the configured major threshold (80%) of its memory resource limits.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit.
Severity major
Condition sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*iam-ingress-gateway.*|.*iam-kc.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*iam-ingress-gateway.*|.*iam-kc.*",resource="memory"}) * 100 >= 80 < 90
OID 1.3.6.1.4.1.323.5.3.51.1.2.7002
Metric Used container_memory_usage_bytes

kube_pod_container_resource_limits

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.
Recommended Action Cause:
  • This alert triggers when the CNC Console IAM Ingress pod’s memory usage reaches the configured major threshold (80%) of its resource limits.
  • Higher memory consumption can result from increased authentication, authorization, or user management requests. For example, if an integrated NF or a common service is generating more traffic than expected.
Diagnostic Information:
  • Monitor memory usage metrics (container_memory_usage_bytes and kube_pod_container_resource_limits) using the KPI Dashboard.
  • Review the iam-ingress pod logs for any irregularities, spikes in memory usage, or high volumes of authentication, authorization, or user management activity.
Recovery:The alert clears automatically when memory utilization drops below the major threshold or exceeds the critical threshold. If the alert does not clear:
  • Check if an integrated NF or a common service is causing consistently high levels of authentication, authorization, or user management requests, leading to increased memory usage.
  • Analyze logs and metrics to identify and address the source.
  • Take steps to optimize usage or resolve issues as needed.
  • Also, double-check the resource limits and requests configuration to ensure it is aligned with CNC Console recommendations.
For any assistance, contact My Oracle Support. Make sure to capture the iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.6 CnccIamMemoryUsageCrossedCriticalThreshold

Table 8-7 CnccIamMemoryUsageCrossedCriticalThreshold

Field Details
Description This alert notifies that the CNC Console IAM Ingress pod has reached the configured critical threshold (90%) of its memory resource limits.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit.
Severity critical
Condition sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*iam-ingress-gateway.*|.*iam-kc.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*iam-ingress-gateway.*|.*iam-kc.*",resource="memory"}) * 100 >= 90
OID 1.3.6.1.4.1.323.5.3.51.1.2.7002
Metric Used container_memory_usage_bytes

kube_pod_container_resource_limits

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.
Recommended Action Cause:
  • This alert triggers when the CNC Console IAM Ingress pod’s memory usage reaches the configured critical threshold (90%) of its resource limits.
  • Higher memory consumption can result from increased authentication, authorization, or user management requests. For example, if an integrated NF or a common service is generating more traffic than expected.
Diagnostic Information:
  • Monitor memory usage metrics (container_memory_usage_bytes and kube_pod_container_resource_limits) using the KPI Dashboard.
  • Review the iam-ingress pod logs for any irregularities, spikes in memory usage, or high volumes of authentication, authorization, or user management activity.
Recovery:The alert clears automatically when memory utilization drops below the critical threshold. If the alert does not clear:
  • Check if an integrated NF or a common service is causing consistently high levels of authentication, authorization, or user management requests, leading to increased memory usage.
  • Analyze logs and metrics to identify and address the source.
  • Take steps to optimize usage or resolve issues as needed.
  • Also, double-check the resource limits and requests configuration to ensure it is aligned with CNC Console recommendations.
For any assistance, contact My Oracle Support. Make sure to capture the iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.7 CnccIamTransactionErrorRateAbove0.1Percent

Table 8-8 CnccIamTransactionErrorRateAbove0.1Percent

Field Details
Description This alert notifies that the number of CNC Console IAM failed transactions is above 0.1 percent of the total transactions.
Summary CNC Console IAM transaction Error Rate detected above 0.1 Percent of Total Transactions
Severity warning
Condition (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]))) * 100 >= 0.1 < 1
OID 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:This alert triggers when CNC Console IAM failed transactions exceed 0.1% of total transactions.5xx errors typically indicate server-side issues, such as:
  • The IAM service or its dependencies (like databases or external services) are down or unreachable.
  • Internal errors due to misconfiguration in the CNC Console custom_values.yaml file.
  • Backend processing errors, resource exhaustion (CPU or memory), or temporary overload.
  • Issues during authentication, authorization, or user management requests.
Diagnostic Information:
  • Check the health and status of all IAM pods and their dependencies (e.g., databases, storage, or external services).
  • Review the iam-ingress pod logs for error details and stack traces, especially around the time of the alert.
  • Monitor service-specific metrics to isolate which operation or endpoint is generating errors.
  • Review recent configuration changes that may have caused backend errors or instability.
Recovery:The alert clears automatically when the CNC Console IAM 5xx error rate drops below 0.1% or exceeds the 1% threshold. If the alert does not clear:
  • Investigate and resolve any backend, database, or resource issues causing 5xx errors.
  • Roll back recent configuration changes if a misconfiguration is suspected.
  • If this level of error rate is expected for your workload, note that the threshold is configurable and can be adjusted as per operational requirements.
For any assistance, contact My Oracle Support. Make sure to capture the iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.8 CnccIamTransactionErrorRateAbove1Percent

Table 8-9 CnccIamTransactionErrorRateAbove1Percent

Field Details
Description This alert notifies that the number of CNC Console IAM failed transactions is above 1 percent of the total transactions.
Summary CNC Console IAM transaction Error Rate detected above 1 Percent of Total Transactions
Severity warning
Condition (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]))) * 100 >= 1 < 10
OID 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:This alert triggers when CNC Console IAM failed transactions exceed 1% of total transactions.5xx errors typically indicate server-side issues, such as:
  • The IAM service or its dependencies (like databases or external services) are down or unreachable.
  • Internal errors due to misconfiguration in the CNC Console custom_values.yaml file.
  • Backend processing errors, resource exhaustion (CPU or memory), or temporary overload.
  • Issues during authentication, authorization, or user management requests.

Diagnostic Information:
  • Check the health and status of all IAM pods and their dependencies (e.g, databases, storage, or external services).
  • Review the iam-ingress pod logs for error details and stack traces, especially around the time of the alert.
  • Monitor service-specific metrics to isolate which operation or endpoint is generating errors.
  • Review recent configuration changes that may have caused backend errors or instability.
Recovery:The alert clears automatically when the CNC Console IAM 5xx error rate drops below 1% or exceeds the 10% threshold. If the alert does not clear:
  • Investigate and resolve any backend, database, or resource issues causing 5xx errors.
  • Roll back recent configuration changes if a misconfiguration is suspected.
  • If this level of error rate is expected for your workload, note that the threshold is configurable and can be adjusted as per operational requirements.
For any assistance, contact My Oracle Support. Make sure to capture the iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.9 CnccIamTransactionErrorRateAbove10Percent

Table 8-10 CnccIamTransactionErrorRateAbove10Percent

Field Details
Description This alert notifies that the number of CNC Console IAM failed transactions is above 10 percent of the total transactions.
Summary CNC Console IAM transaction error rate detected above 10 percent of total transactions
Severity minor
Condition (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]))) * 100 >= 10 < 25
OID 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:This alert triggers when CNC Console IAM failed transactions exceed 10% of total transactions.5xx errors typically indicate server-side issues, such as:
  • The IAM service or its dependencies (like databases or external services) are down or unreachable.
  • Internal errors due to misconfiguration in the CNC Console custom_values.yaml file.
  • Backend processing errors, resource exhaustion (CPU or memory), or temporary overload.
  • Issues during authentication, authorization, or user management requests.

Diagnostic Information:
  • Check the health and status of all IAM pods and their dependencies (e.g, databases, storage, or external services).
  • Review the iam-ingress pod logs for error details and stack traces, especially around the time of the alert.
  • Monitor service-specific metrics to isolate which operation or endpoint is generating errors.
  • Review recent configuration changes that may have caused backend errors or instability.
Recovery:The alert clears automatically when the CNC Console IAM 5xx error rate drops below 10% or exceeds the 25% threshold. If the alert does not clear:
  • Investigate and resolve any backend, database, or resource issues causing 5xx errors.
  • Roll back recent configuration changes if a misconfiguration is suspected.
  • If this level of error rate is expected for your workload, note that the threshold is configurable and can be adjusted as per operational requirements.
For any assistance, contact My Oracle Support. Make sure to capture the iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.10 CnccIamTransactionErrorRateAbove25Percent

Table 8-11 CnccIamTransactionErrorRateAbove25Percent

Field Details
Description This alert notifies that the number of CNCC IAM failed transactions is above 25 percent of the total transactions.
Summary CNCC IAM transaction Error Rate detected above 25 Percent of Total Transactions
Severity major
Condition (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]))) * 100 >= 25 < 50
OID 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:This alert triggers when CNCC IAM failed transactions exceed 25% of total transactions.5xx errors typically indicate server-side issues, such as:
  • The IAM service or its dependencies (like databases or external services) are down or unreachable.
  • Internal errors due to misconfiguration in the CNC Console custom_values.yaml file.
  • Backend processing errors, resource exhaustion (CPU or memory), or temporary overload.
  • Issues during authentication, authorization, or user management requests.

Diagnostic Information:
  • Check the health and status of all IAM pods and their dependencies (e.g, databases, storage, or external services).
  • Review the iam-ingress pod logs for error details and stack traces, especially around the time of the alert.
  • Monitor service-specific metrics to isolate which operation or endpoint is generating errors.
  • Review recent configuration changes that may have caused backend errors or instability.
Recovery:The alert clears automatically when the CNCC IAM 5xx error rate drops below 25% or exceeds the 50% threshold. If the alert does not clear:
  • Investigate and resolve any backend, database, or resource issues causing 5xx errors.
  • Roll back recent configuration changes if a misconfiguration is suspected.
  • If this level of error rate is expected for your workload, note that the threshold is configurable and can be adjusted as per operational requirements.
For any assistance, contact My Oracle Support. Make sure to capture the iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.11 CnccIamTransactionErrorRateAbove50Percent

Table 8-12 CnccIamTransactionErrorRateAbove50Percent

Field Details
Description This alert notifies that the number of CNCC IAM failed transactions is above 50 percent of the total transactions.
Summary CNCC IAM transaction Error Rate detected above 50 Percent of Total Transactions
Severity critical
Condition (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]))) * 100 >= 50
OID 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:This alert triggers when CNCC IAM failed transactions exceed 50% of total transactions.5xx errors typically indicate server-side issues, such as:
  • The IAM service or its dependencies (like databases or external services) are down or unreachable.
  • Internal errors due to misconfiguration in the CNC Console custom_values.yaml file.
  • Backend processing errors, resource exhaustion (CPU or memory), or temporary overload.
  • Issues during authentication, authorization, or user management requests.

Diagnostic Information:
  • Check the health and status of all IAM pods and their dependencies (e.g, databases, storage, or external services).
  • Review the iam-ingress pod logs for error details and stack traces, especially around the time of the alert.
  • Monitor service-specific metrics to isolate which operation or endpoint is generating errors.
  • Review recent configuration changes that may have caused backend errors or instability.
Recovery:The alert clears automatically when the CNCC IAM 5xx error rate drops below 50%.If the alert does not clear:
  • Investigate and resolve any backend, database, or resource issues causing 5xx errors.
  • Roll back recent configuration changes if a misconfiguration is suspected.
  • If this level of error rate is expected for your workload, note that the threshold is configurable and can be adjusted as per operational requirements.
For any assistance, contact My Oracle Support. Make sure to capture the iam-ingress pod logs and relevant metrics to help Support analyze the issue.

8.1.12 CnccIamIngressGatewayServiceDown

Table 8-13 CnccIamIngressGatewayServiceDown

Field Details
Description This alert notifies that the CNCC IAM Ingress Gateway pod is down.
Summary namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : cncc-iam-ingress-gateway service down
Severity critical
Condition absent(up{pod=~".*iam-ingress-gateway.*", namespace="cncc-ns"}) or (up{pod=~".*iam-ingress-gateway.*", namespace="cncc-ns"}) == 0
OID 1.3.6.1.4.1.323.5.3.51.1.2.7004
Metric Used up

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Action Cause:This alert triggers when the CNCC IAM Ingress Gateway pod or service is down.

Diagnostic Information:
  • Check the orchestration platform (e.g, Kubernetes) logs for the cncc-iam-ingress-gateway pod to identify liveness or readiness probe failures.
  • Review application logs for the cncc-iam-ingress-gateway service, filtering for error or warning messages, or recent crash loops.
  • Verify recent configuration or deployment changes that might have impacted pod availability.
  • Check for resource issues (CPU, memory, disk) or dependency/service failures.
Recovery:The alert clears automatically when the cncc-iam-ingress-gateway service becomes available again. If the alert does not clear:
  • Continue to review logs, resource allocations, and configuration for possible causes of downtime.
  • Address any identified issues to restore service availability.
  • If this downtime is expected (e.g, for planned maintenance), you may adjust the alerting threshold as needed.
For any assistance, contact My Oracle Support. Make sure to capture pod logs, orchestration event logs, and relevant metrics to help Support analyze the issue.

8.1.13 CnccIamFailedLogin

Table 8-14 CnccIamFailedLogin

Field Details
Description This alert notifies you if there are more than 3 failed login attempts in CNCC IAM for a user within 5 minutes.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value
Severity warning
Condition sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/master/login-actions/authenticate",Method="POST",Status="200 OK"})- sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/master/login-actions/authenticate",Method="POST",Status="200 OK"} offset 5m) > 3
OID 1.3.6.1.4.1.323.5.3.51.1.2.7005
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:This alert triggers when there are more than 3 failed login attempts in CNCC IAM for a user within 5 minutes. This may be due to users entering incorrect credentials, potential automated attacks (such as brute force attempts), or issues with the login process.

Diagnostic Information:
  • Verify if the affected user(s) are entering the correct username and password.
  • Review login logs, the iam-kc container, and iam-ingress pod logs for patterns, such as repeated failures from the same source.
  • Check for potential account lockouts or signs of automated login attempts.
  • Confirm there have not been any recent configuration changes that could impact the authentication process.
Recovery:The alert is cleared automatically when the number of failed login attempts for a user falls below the configured threshold (default is 3) within the last 5 minutes. If the alert does not clear:
  • Investigate for possible brute-force activity, misconfigurations, or issues causing repeated login failures.
  • Verify the username being used and, if an incorrect password is suspected, reset the password and attempt to log in again.
  • If this level of failed attempts is expected for your use case, note that the threshold is configurable and can be adjusted as needed.
For any assistance, contact My Oracle Support. Make sure to capture the iam-kc container, iam-ingress pod logs, and relevant metrics to help Support analyze the issue.

8.1.14 AdminUserCreation

Table 8-15 AdminUserCreation

Field Details
Description This alert notifies you when a new admin user is created in CNCC IAM within the last 5 minutes.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, user: {{$labels.UserName}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Admin users have been created
Severity warning
Condition sum by(namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/admin/realms/master/users",Method="POST"}) - sum by(namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/admin/realms/master/users",Method="POST"} offset 5m) > 0
OID 1.3.6.1.4.1.323.5.3.51.1.2.7006
Metric Used oc_ingressgateway_http_requests_total
Recommended Action Cause:This alert triggers when a new admin user is created in CNCC IAM within the last 5 minutes. This may be the result of a legitimate administrative action or an unauthorized attempt to gain privileged access.

Diagnostic Information:
  • Verify whether the creation of the new admin user was authorized and expected.
  • Review the iam-kc container, iam-ingress pod logs, and audit logs for information about who initiated the action, including source IP and request details.
  • Monitor for any unusual or suspicious activity related to admin user creation.
Recovery:The alert is cleared automatically after the admin user creation event is processed, or when subsequent checks do not detect any new admin user creation within the last 5 minutes.
  • Log in to the admin GUI and review the details of the newly created admin user.
  • If the user is legitimate, no further action is required. If the creation was unauthorized, take appropriate steps to delete or disable the admin account, and investigate further for potential security issues.
  • Update policies or controls, if necessary, to prevent unauthorized admin user creation in the future.
For any assistance, contact My Oracle Support. Make sure to capture the iam-kc container, iam-ingress pod logs, and relevant audit details to help Support analyze the issue.

8.1.15 CnccIamAccessTokenFailure

Table 8-16 CnccIamAccessTokenFailure

Field Details
Description This alert notifies you if there are more than 3 failed access token requests in CNCC IAM within 5 minutes.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value
Severity warning
Condition sum by(Status,namespace,ResourcePath,Method,UserName,UserId,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/master/protocol/openid-connect/token",Method="POST",Status=~"4.*|5.*"}) - sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/master/protocol/openid-connect/token",Method="POST",Status=~"4.*|5.*"} offset 5m) > 3
OID 1.3.6.1.4.1.323.5.3.51.1.2.7007
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:This alert triggers when there are more than 3 failed access token requests in CNCC IAM within 5 minutes. This may be due to incorrect user credentials, misconfigured client applications, or backend errors affecting the token generation process.

Diagnostic Information:
  • Check if users or client applications are sending correct credentials and valid request parameters.
  • Review the iam-kc container and iam-ingress pod logs for error messages related to access token requests.
  • Verify if there are recent changes or configuration issues impacting the token endpoint.
  • Look for patterns such as repeated failures from a specific user or application.
Recovery:The alert clears automatically when the number of failed access token requests for a user drops below the configured threshold (default: 3) within the last 5 minutes. If the alert does not clear:
  • Verify the credentials and parameters being used for the access token request.
  • If the failure is caused by expired or invalid credentials, advise users to refresh or reset their credentials and attempt to obtain a new token.
  • If this level of failed requests is expected, note that the threshold is configurable and can be adjusted as needed.
For any assistance, contact My Oracle Support. Make sure to capture the iam-kc container, iam-ingress pod logs, and relevant metrics to help Support analyze the issue.

8.2 CNC Console Core Alerts

This section provides the information about CNC Console Core Alerts.

8.2.1 CnccCoreTotalIngressTrafficRateAboveMinorThreshold

Table 8-17 CnccCoreTotalIngressTrafficRateAboveMinorThreshold

Field Details
Description This alert notifies that the CNCC Core Ingress message rate has crossed the configured minor threshold of 700 to 800 TPS.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)
Severity minor
Condition sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[2m])) >= 700 < 800
OID 1.3.6.1.4.1.323.5.3.51.1.2.8001
Metric Used oc_ingressgateway_http_requests_total
Recommended Action Cause:
  • This alert triggers when the CNCC Core Ingress receives more traffic than expected, primarily consisting of NF resource requests or common service requests.
  • For example, an integrated NF or a common service may be sending an unusually high volume of requests to the CNCC Core.
Diagnostic Information:
  • Monitor ingress traffic to the pod using the KPI Dashboard.
  • Review the core-ingress pod logs for any irregularities or anomalies, especially spikes in NF resource or common service operations.
Recovery: The alert clears automatically when ingress traffic drops below the minor threshold or exceeds the major threshold. If the alert does not clear:
  • Check if an integrated NF or a common service is generating unexpectedly high volumes of traffic.
  • Analyze logs and metrics for unusual patterns or possible misconfigurations leading to increased traffic.
  • Take action, as needed, to block or limit any unauthorized or unexpected traffic.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.2 CnccCoreTotalIngressTrafficRateAboveMajorThreshold

Table 8-18 CnccCoreTotalIngressTrafficRateAboveMajorThreshold

Field Details
Description This alert notifies that the CNCC Core Ingress message rate has crossed the configured major threshold of 800 to 900 TPS.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)
Severity major
Condition sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[2m])) >= 800 < 900
OID 1.3.6.1.4.1.323.5.3.51.1.2.8001
Metric Used oc_ingressgateway_http_requests_total
Recommended Action Cause:
  • This alert triggers when the CNCC Core Ingress receives more traffic than expected, primarily consisting of NF resource requests or common service requests.
  • For example, an integrated NF or a common service may be sending an unusually high volume of requests to the CNCC Core.
Diagnostic Information:
  • Monitor ingress traffic to the pod using the KPI Dashboard.
  • Review the core-ingress pod logs for any irregularities or anomalies, especially spikes in NF resource or common service operations.
Recovery: The alert clears automatically when ingress traffic drops below the major threshold or exceeds the critical threshold. If the alert does not clear:
  • Check if an integrated NF or a common service is generating unexpectedly high volumes of traffic.
  • Analyze logs and metrics for unusual patterns or possible misconfigurations leading to increased traffic.
  • Take action, as needed, to block or limit any unauthorized or unexpected traffic.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.3 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold

Table 8-19 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold

Field Details
Description This alert notifies that the CNCC Core Ingress message rate has crossed the configured critical threshold of 900 TPS.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)
Severity critical
Condition sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[2m])) >= 900
OID 1.3.6.1.4.1.323.5.3.51.1.2.8001
Metric Used oc_ingressgateway_http_requests_total
Recommended Action Cause:
  • This alert triggers when the CNCC Core Ingress receives more traffic than expected, primarily consisting of NF resource requests or common service requests.
  • For example, an integrated NF or a common service may be sending an unusually high volume of requests to the CNCC Core.
Diagnostic Information:
  • Monitor ingress traffic to the pod using the KPI Dashboard.
  • Review the core-ingress pod logs for any irregularities or anomalies, especially spikes in NF resource or common service operations.
Recovery: The alert clears automatically when ingress traffic drops below the critical threshold. If the alert does not clear:
  • Check if an integrated NF or a common service is generating unexpectedly high volumes of traffic.
  • Analyze logs and metrics for unusual patterns or possible misconfigurations leading to increased traffic.
  • Take action, as needed, to block or limit any unauthorized or unexpected traffic.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.4 CnccCoreMemoryUsageCrossedMinorThreshold

Table 8-20 CnccCoreMemoryUsageCrossedMinorThreshold

Field Details
Description This alert notifies that the CNCC Core Ingress pod has reached the configured minor threshold (70%) of its memory resource limits.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.
Severity minor
Condition sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*core-cmservice.*|.*core-ingress-gateway.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*core-cmservice.*|.*core-ingress-gateway.*",resource="memory"}) * 100 >= 70 < 80
OID 1.3.6.1.4.1.323.5.3.51.1.2.8002
Metric Used container_memory_usage_bytes

kube_pod_container_resource_limits

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.
Recommended Action Cause:
  • This alert triggers when the CNCC Core Ingress pod’s memory usage reaches the configured minor threshold (70%) of its resource limits.
  • Higher memory consumption can result from increased NF resource or common service requests. For example, if a network function or common service is generating more traffic than expected.
Diagnostic Information:
  • Monitor memory usage metrics (container_memory_usage_bytes and kube_pod_container_resource_limits) using the KPI Dashboard.
  • Review the core-ingress pod logs for any irregularities, spikes in memory usage, or high volumes of NF resource or common service activity.
Recovery: The alert clears automatically when memory utilization drops below the minor threshold or exceeds the major threshold. If the alert does not clear:
  • Check if any network function or common service is causing consistently high resource requests, leading to increased memory usage.
  • Analyze logs and metrics to identify and address the source of high usage.
  • Take steps to optimize memory usage or resolve any configuration issues.
  • Also, double-check the resource limits and requests configuration to ensure it is aligned with CNCC Console recommendations.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.5 CnccCoreMemoryUsageCrossedMajorThreshold

Table 8-21 CnccCoreMemoryUsageCrossedMajorThreshold

Field Details
Description This alert notifies that the CNCC Core Ingress pod has reached the configured major threshold (80%) of its memory resource limits.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit.
Severity major
Condition sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*core-cmservice.*|.*core-ingress-gateway.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*core-cmservice.*|.*core-ingress-gateway.*",resource="memory"}) * 100 >= 80 < 90
OID 1.3.6.1.4.1.323.5.3.51.1.2.8002
Metric Used container_memory_usage_bytes

kube_pod_container_resource_limits

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.
Recommended Action Cause:
  • This alert triggers when the CNCC Core Ingress pod’s memory usage reaches the configured major threshold (80%) of its resource limits.
  • Higher memory consumption can result from increased NF resource or common service requests. For example, if a network function or common service is generating more traffic than expected.
Diagnostic Information:
  • Monitor memory usage metrics (container_memory_usage_bytes and kube_pod_container_resource_limits) using the KPI Dashboard.
  • Review the core-ingress pod logs for any irregularities, spikes in memory usage, or high volumes of NF resource or common service activity.
Recovery:The alert clears automatically when memory utilization drops below the major threshold or exceeds the critical threshold. If the alert does not clear:
  • Check if any network function or common service is causing consistently high resource requests, leading to increased memory usage.
  • Analyze logs and metrics to identify and address the source of high usage.
  • Take steps to optimize memory usage or resolve any configuration issues.
  • Also, double-check the resource limits and requests configuration to ensure it is aligned with CNCC Console recommendations.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.6 CnccCoreMemoryUsageCrossedCriticalThreshold

Table 8-22 CnccCoreMemoryUsageCrossedCriticalThreshold

Field Details
Description This alert notifies that the CNCC Core Ingress pod has reached the configured critical threshold (90%) of its memory resource limits.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit.
Severity critical
Condition sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*core-cmservice.*|.*core-ingress-gateway.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*core-cmservice.*|.*core-ingress-gateway.*",resource="memory"}) * 100 >= 90
OID 1.3.6.1.4.1.323.5.3.51.1.2.8002
Metric Used container_memory_usage_bytes

kube_pod_container_resource_limits

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.
Recommended Action Cause:
  • This alert triggers when the CNCC Core Ingress pod’s memory usage reaches the configured critical threshold (90%) of its resource limits.
  • Higher memory consumption can result from increased NF resource or common service requests. For example, if a network function or common service is generating more traffic than expected.
Diagnostic Information:
  • Monitor memory usage metrics (
    container_memory_usage_bytes
    and
    kube_pod_container_resource_limits
    ) using the KPI Dashboard.
  • Review the core-ingress pod logs for any irregularities, spikes in memory usage, or high volumes of NF resource or common service activity.
Recovery: The alert clears automatically when memory utilization drops below the critical threshold. If the alert does not clear:
  • Check if any network function or common service is causing consistently high resource requests, leading to increased memory usage.
  • Analyze logs and metrics to identify and address the source of high usage.
  • Take steps to optimize memory usage or resolve any configuration issues.
  • Also, double-check the resource limits and requests configuration to ensure it is aligned with CNC Console recommendations.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.7 CnccCoreTransactionErrorRateAbove0.1Percent

Table 8-23 CnccCoreTransactionErrorRateAbove0.1Percent

Field Details
Description This alert notifies that the number of CNCC Core failed transactions is above 0.1 percent of the total transactions.
Summary CNCC Core transaction Error Rate detected above 0.1 Percent of Total Transactions
Severity warning
Condition (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]))) *100 >= 0.1 < 1
OID 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:
  • This alert triggers when CNCC Core failed transactions exceed 0.1% of total transactions.
  • 5xx errors typically indicate server-side issues, such as the Core service or its dependencies (e.g, NFs, databases, or common services) being down, unreachable, overloaded, or misconfigured.
  • Unexpected spikes may also result from backend processing errors, resource exhaustion, or recent configuration changes.
  • Misconfiguration of any network function (NF) instance in the CNCC custom_values.yaml file can also lead to increased failure rates.
Diagnostic Information:
  • Monitor the health and status of all Core pods and their dependencies (such as databases, network functions, and external services).
  • Review the core-ingress pod logs for error messages and stack traces, especially around the time of the alert.
  • Examine service-specific and application metrics to pinpoint which operations or endpoints are failing.
  • Check for recent configuration changes or deployment updates that could impact backend stability.
  • Review the CNCC custom_values.yaml for any misconfiguration in network function (NF) instances.
Recovery: The alert is cleared automatically when the CNCC Core 5xx error rate drops below 0.1% or exceeds the 1% threshold. If the alert does not clear:
  • Investigate and address any backend, resource, or configuration issues causing errors.
  • Review CNCC custom_values.yaml for any misconfiguration in NF instances and correct them if necessary.
  • Coordinate with relevant teams to resolve dependency or service interruptions.
  • If this error level is expected for your workload, note that the threshold is configurable and can be adjusted as needed.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.8 CnccCoreTransactionErrorRateAbove1Percent

Table 8-24 CnccCoreTransactionErrorRateAbove1Percent

Field Details
Description This alert notifies that the number of CNCC Core failed transactions is above 1 percent of the total transactions.
Summary CNCC Core transaction Error Rate detected above 1 Percent of Total Transactions
Severity warning
Condition (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]))) *100 >= 1 < 10
OID 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:
  • This alert triggers when CNCC Core failed transactions exceed 1% of total transactions.
  • 5xx errors typically indicate server-side issues, such as the Core service or its dependencies (e.g, NFs, databases, or common services) being down, unreachable, overloaded, or misconfigured.
  • Unexpected spikes may also result from backend processing errors, resource exhaustion, or recent configuration changes.
  • Misconfiguration of any network function (NF) instance in the CNCC custom_values.yaml file can also lead to increased failure rates.
Diagnostic Information:
  • Monitor the health and status of all Core pods and their dependencies (such as databases, network functions, and external services).
  • Review the core-ingress pod logs for error messages and stack traces, especially around the time of the alert.
  • Examine service-specific and application metrics to pinpoint which operations or endpoints are failing.
  • Check for recent configuration changes or deployment updates that could impact backend stability.
  • Review the CNCC custom_values.yaml file for any misconfiguration in network function (NF) instances.
Recovery: The alert is cleared automatically when the CNCC Core 5xx error rate drops below 1% or exceeds the 10% threshold. If the alert does not clear:
  • Investigate and address any backend, resource, or configuration issues causing errors.
  • Review the CNCC custom_values.yaml file for any misconfiguration in NF instances and correct if necessary.
  • Coordinate with relevant teams to resolve dependency or service interruptions.
  • If this error level is expected for your workload, note that the threshold is configurable and can be adjusted as needed.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.9 CnccCoreTransactionErrorRateAbove10Percent

Table 8-25 CnccCoreTransactionErrorRateAbove10Percent

Field Details
Description This alert notifies that the number of CNCC Core failed transactions is above 10 percent of the total transactions.
Summary CNCC Core transaction Error Rate detected above 10 Percent of Total Transactions
Severity minor
Condition (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]))) *100 >= 10 < 25
OID 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:
  • This alert triggers when CNCC Core failed transactions exceed 10% of total transactions.
  • 5xx errors typically indicate server-side issues, such as the Core service or its dependencies (e.g, NFs, databases, or common services) being down, unreachable, overloaded, or misconfigured.
  • Unexpected spikes may also result from backend processing errors, resource exhaustion, or recent configuration changes.
  • Misconfiguration of any network function (NF) instance in the CNCC custom_values.yaml file can also lead to increased failure rates.
Diagnostic Information:
  • Monitor the health and status of all Core pods and their dependencies (such as databases, network functions, and external services).
  • Review the core-ingress pod logs for error messages and stack traces, especially around the time of the alert.
  • Examine service-specific and application metrics to pinpoint which operations or endpoints are failing.
  • Check for recent configuration changes or deployment updates that could impact backend stability.
  • Review the CNCC custom_values.yaml file for any misconfiguration in network function (NF) instances.
Recovery: The alert is cleared automatically when the CNCC Core 5xx error rate drops below 10% or exceeds the 25% threshold. If the alert does not clear:
  • Investigate and address any backend, resource, or configuration issues causing errors.
  • Review the CNCC custom_values.yaml file for any misconfiguration in NF instances and correct if necessary.
  • Coordinate with relevant teams to resolve dependency or service interruptions.
  • If this error level is expected for your workload, note that the threshold is configurable and can be adjusted as needed.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.10 CnccCoreTransactionErrorRateAbove25Percent

Table 8-26 CnccCoreTransactionErrorRateAbove25Percent

Field Details
Description This alert notifies that the number of CNCC Core failed transactions is above 25 percent of the total transactions.
Summary CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions
Severity major
Condition (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]))) *100 >= 25 < 50
OID 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:
  • This alert triggers when CNCC Core failed transactions exceed 25% of total transactions.
  • 5xx errors typically indicate server-side issues, such as the Core service or its dependencies (e.g, NFs, databases, or common services) being down, unreachable, overloaded, or misconfigured.
  • Unexpected spikes may also result from backend processing errors, resource exhaustion, or recent configuration changes.
  • Misconfiguration of any network function (NF) instance in the CNCC custom_values.yaml file can also lead to increased failure rates.
Diagnostic Information:
  • Monitor the health and status of all Core pods and their dependencies (such as databases, network functions, and external services).
  • Review the core-ingress pod logs for error messages and stack traces, especially around the time of the alert.
  • Examine service-specific and application metrics to pinpoint which operations or endpoints are failing.
  • Check for recent configuration changes or deployment updates that could impact backend stability.
  • Review the CNCC custom_values.yaml file for any misconfiguration in network function (NF) instances.
Recovery: The alert is cleared automatically when the CNCC Core 5xx error rate drops below 25% or exceeds the 50% threshold.If the alert does not clear:
  • Investigate and address any backend, resource, or configuration issues causing errors.
  • Review the CNCC custom_values.yaml file for any misconfiguration in NF instances and correct if necessary.
  • Coordinate with relevant teams to resolve dependency or service interruptions.
  • If this error level is expected for your workload, note that the threshold is configurable and can be adjusted as needed.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.11 CnccCoreTransactionErrorRateAbove50Percent

Table 8-27 CnccCoreTransactionErrorRateAbove50Percent

Field Details
Description This alert notifies that the number of CNCC Core failed transactions is above 50 percent of the total transactions.
Summary CNCC Core transaction error rate detected above 50 percent of total transactions
Severity critical
Condition (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]))) *100 >= 50
OID 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:
  • This alert triggers when CNCC Core failed transactions exceed 50% of total transactions.
  • 5xx errors typically indicate server-side issues, such as the Core service or its dependencies (e.g, NFs, databases, or common services) being down, unreachable, overloaded, or misconfigured.
  • Unexpected spikes may also result from backend processing errors, resource exhaustion, or recent configuration changes.
  • Misconfiguration of any network function (NF) instance in the CNCC custom_values.yaml file can also lead to increased failure rates.
Diagnostic Information:
  • Monitor the health and status of all Core pods and their dependencies (such as databases, network functions, and external services).
  • Review the core-ingress pod logs for error messages and stack traces, especially around the time of the alert.
  • Examine service-specific and application metrics to pinpoint which operations or endpoints are failing.
  • Check for recent configuration changes or deployment updates that could impact backend stability.
  • Review the CNCC custom_values.yaml file for any misconfiguration in network function (NF) instances.
Recovery: The alert is cleared automatically when the CNCC Core 5xx error rate drops below the 50% threshold. If the alert does not clear:
  • Investigate and address any backend, resource, or configuration issues causing errors.
  • Review the CNCC custom_values.yaml file for any misconfiguration in NF instances and correct if necessary.
  • Coordinate with relevant teams to resolve dependency or service interruptions.
  • If this error level is expected for your workload, note that the threshold is configurable and can be adjusted as needed.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.12 CnccCoreIngressGatewayServiceDown

Table 8-28 CnccCoreIngressGatewayServiceDown

Field Details
Description This alert notifies that the CNCC Core Ingress Gateway pod is down.
Summary namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : cncc-core-ingress-gateway service down
Severity critical
Condition absent(up{pod=~".*core-ingress-gateway.*", namespace="cncc-ns"}) or (up{pod=~".*core-ingress-gateway.*", namespace="cncc-ns"}) == 0
OID 1.3.6.1.4.1.323.5.3.51.1.2.8004
Metric Used up

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.
Recommended Action Cause: This alert triggers when the CNCC Core Ingress Gateway pod or service is down.

Diagnostic Information:
  • Check the orchestration platform (e.g., Kubernetes) logs for the cncc-core-ingress-gateway pod to identify liveness or readiness probe failures.
  • Review application logs for the cncc-core-ingress-gateway service, filtering for error or warning messages, or recent crash loops.
  • Verify recent configuration or deployment changes that might have impacted pod availability.
  • Check for resource issues (CPU, memory, disk) or dependency/service failures.
Recovery: The alert clears automatically when the cncc-core-ingress-gateway service becomes available again. If the alert does not clear:
  • Continue to review logs, resource allocations, and configuration for possible causes of downtime.
  • Address any identified issues to restore service availability.
  • If this downtime is expected (e.g, for planned maintenance), you may adjust the alerting threshold as needed.
For any assistance, contact My Oracle Support. Make sure to capture pod logs, orchestration event logs, and relevant metrics to help Support analyze the issue.

8.2.13 CnccCoreFailedLogin

Table 8-29 CnccCoreFailedLogin

Field Details
Description This alert notifies you if there are more than 3 failed login attempts in CNCC Core for a user within 5 minutes.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value
Severity warning
Condition sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/cncc/login-actions/authenticate",Method="POST",Status="200 OK"}) - sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/cncc/login-actions/authenticate",Method="POST",Status="200 OK"} offset 5m) > 3
OID 1.3.6.1.4.1.323.5.3.51.1.2.8005
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:
  • This alert triggers when there are more than 3 failed login attempts for a user in CNCC Core within 5 minutes.
  • This may be caused by users entering incorrect credentials, automated login attempts (such as brute-force attacks), or issues with the authentication process.
Diagnostic Information:
  • Check if the affected user(s) are entering the correct username and password.
  • Review authentication and core-ingress pod logs for patterns such as repeated failures from the same source.
  • Investigate any potential account lockouts, recent configuration changes, or signs of automated or scripted login attempts.
  • Ensure the authentication service and related configurations are working as expected.
Recovery: The alert is cleared automatically when the number of failed login attempts for a user drops below the configured threshold (default: 3) within the last 5 minutes. If the alert does not clear:
  • Investigate for possible brute-force activity, misconfigurations, or issues causing repeated login failures.
  • Verify the username being used. If the password is suspected to be incorrect, reset the password and attempt to log in again.
  • If this level of failed attempts is expected for your use case, note that the threshold is configurable and can be adjusted as needed.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.14 CnccCoreUnauthorizedAccess

Table 8-30 CnccCoreUnauthorizedAccess

Field Details
Description This alert notifies you if there are more than 3 unauthorized access (403 Forbidden) attempts in CNCC Core within 5 minutes.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Unauthorized Access for CNCC-Core are more than threshold value
Severity warning
Condition sum by(Status,Method,namespace,ResourceType,UserId,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns",Status="403 FORBIDDEN", ResourceType!="UNKNOWN"}) - sum by(Status,Method,namespace,ResourceType,UserId,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns",Status="403 FORBIDDEN",ResourceType!="UNKNOWN"} offset 5m) > 3
OID 1.3.6.1.4.1.323.5.3.51.1.2.8006
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:
  • This alert triggers when there are more than 3 unauthorized access (403 Forbidden) attempts in CNCC Core within 5 minutes.
  • Unauthorized access can occur when users or services attempt to access resources without sufficient permissions.
  • Misconfigured access controls or recent changes in user roles or policies may also lead to repeated 403 errors.
Diagnostic Information:
  • Check if the users or services encountering these errors have the necessary permissions for the requested resources.
  • Review core-ingress pod logs and access logs for repeated failed attempts from particular users, roles, services, or source IPs.
  • Examine recent changes in role assignments, access policies, or configurations that might have resulted in new authorization failures.
  • Investigate for potential automated or scripted attempts to access restricted resources.
Recovery:This alert will clear automatically when unauthorized access attempts for a particular user or service drop below the configured threshold (default: 3) within the last 5 minutes. If the alert does not clear:
  • Verify that user and service permissions are configured appropriately.
  • Correct any misconfigured access controls, roles, or policies as needed.
  • If these attempts are expected for your environment, note that the threshold is configurable and can be adjusted as required.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.2.15 CnccCoreAccessTokenFailure

Table 8-31 CnccCoreAccessTokenFailure

Field Details
Description This alert notifies you if there are more than 3 failed access token requests in CNCC Core within 5 minutes.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value
Severity warning
Condition sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/cncc/protocol/openid-connect/token",Method="POST",Status=~"4.*|5.*"}) - sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/cncc/protocol/openid-connect/token",Method="POST",Status=~"4.*|5.*"} offset 5m) > 3
OID 1.3.6.1.4.1.323.5.3.51.1.2.8007
Metric Used oc_ingressgateway_http_responses_total
Recommended Action Cause:
  • This alert triggers when there are more than 3 failed access token requests in CNCC Core within 5 minutes.
  • Failures may occur due to incorrect credentials, misconfigured client applications, or backend errors affecting token generation.
  • Configuration issues or recent changes in the token endpoint can also cause failures.
Diagnostic Information:
  • Check if users or client applications are using the correct credentials and valid request parameters.
  • Review the core-ingress pod logs for errors related to access token requests.
  • Look for repeated failures from specific users or applications, or patterns suggesting misconfiguration.
  • Verify whether there have been any recent changes to the token endpoint or related configurations.
Recovery: The alert clears automatically when the number of failed access token requests for a user drops below the configured threshold (default: 3) within the last 5 minutes.If the alert does not clear:
  • Verify the credentials and parameters being used by users.
  • If requests are failing due to expired or incorrect credentials, reset the credentials and attempt to generate a new token.
  • Address any misconfigurations or backend issues found during diagnostics.
  • If this failure rate is expected, note that the threshold for the alert is configurable and can be adjusted as needed.
For any assistance, contact My Oracle Support. Make sure to capture the core-ingress pod logs and relevant metrics to help Support analyze the issue.

8.3 Validating Alerts

Configure and Validate Alerts in Prometheus Server. Refer to CNCC Alert Configuration section for procedure to configure the alerts.

After configuring the alerts in Prometheus server, a user can verify that by following steps:
  1. Open the Prometheus server from your browser using the <IP>:<Port>
  2. Navigate to Status and to Rules.
  3. Search CNCC. CNCC Alerts list is displayed.

Note:

If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try again

8.4 Disabling Alerts

This section explains how to disable the alerts in CNC Console.

  1. Edit manager or agent alert file to remove specific alert.
  2. Remove complete content of the specific alert from the manager or agent alert file

    For example, if you want to remove CnccIamTotalIngressTrafficRateAboveMinorThreshold alert from occncc_manager_alerting_rules_promha_<version>.yaml file, remove the complete content like below:

    cncc_alert_rules_<version>.yaml
    
    
    
    ## ALERT SAMPLE START##
    - alert: CnccIamTotalIngressTrafficRateAboveMinorThreshold
     annotations:
     description: 'CNCC IAM Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})'
     summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'
     expr: sum(rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*cncc-iam_ingressgateway",kubernetes_namespace="cncc"}[2m])) > 0
     labels:
     severity: minor
     oid: "1.3.6.1.4.1.323.5.3.51.1.2.7001"
     namespace: ' {{ $labels.kubernetes_namespace }} '
     podname: ' {{$labels.kubernetes_pod_name}} '
    ## ALERT SAMPLE END##
    
  3. Perform Alert configuration. See CNCC Alert Configuration section for details.

8.5 Configuring SNMP-Notifier

Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using following procedure:

  1. Run the following command to edit the deployment:
     kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>
    Example:
     $ kubectl edit deploy occne-snmp-notifier -n occne-infra
  2. Edit the destination as follows:
      --snmp.destination=<destination_ip>:<destination_port>
    Example:
    --snmp.destination=10.75.203.94:162

8.6 CNC Console MIB Files

There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.

occncc_mib_tc_<version>.mib

This file is considered as CNCC top level mib file, where the Objects and their data types are defined.

occncc_mib_<version>.mib

This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.