8 CNC Console Alerts
This section provides information about CNC Console Alerts.
Note:
- The user must use updated
occncc_agent_alertrules_<version>.yamlfile for agent cluster, in case of multicluster deployment. - Use
occncc_manager_alertrules_<version>.yamlfile for single cluster deployment and in the manager cluster, in case of multi cluster deployment.
Table 8-1 Alerts Levels or Severity Types
| Alerts Levels / Severity Types | Definition |
|---|---|
| Critical | Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of OCCM. |
| Major | Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of OCCM. |
| Minor | Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of OCCM. |
| Info or Warn (Informational) | Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of OCCM. |
8.1 CNC Console IAM Alerts
This section provides information about CNC Console IAM Alerts.
8.1.1 CnccIamTotalIngressTrafficRateAboveMinorThreshold
Table 8-2 CnccIamTotalIngressTrafficRateAboveMinorThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that CNCC IAM Ingress Message rate has crossed the configured minor threshold of 700 to 800 TPS. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000) |
| Severity | minor |
| Condition | sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[2m])) >= 700 < 800 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7001 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Actions |
Cause:
Diagnostic Information:
Recovery: The alert is cleared automatically when ingress traffic drops below the minor threshold or exceeds the major threshold. If the alert does not clear:
For any assistance, contact My Oracle Support. Make sure to capture |
8.1.2 CnccIamTotalIngressTrafficRateAboveMajorThreshold
Table 8-3 CnccIamTotalIngressTrafficRateAboveMajorThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC IAM Ingress message rate has crossed the configured major threshold of 800 to 900 TPS. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000) |
| Severity | major |
| Condition | sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[2m])) >= 800 < 900 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7001 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Action | Cause:
iam-ingress pod logs and relevant metrics to help
Support analyze the issue.
|
8.1.3 CnccIamTotalIngressTrafficRateAboveCriticalThreshold
Table 8-4 CnccIamTotalIngressTrafficRateAboveCriticalThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC IAM Ingress message rate has crossed the configured critical threshold of 900 TPS. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000) |
| Severity | critical |
| Condition | sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[2m])) >= 900 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7001 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Action | Cause:
iam-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.1.4 CnccIamMemoryUsageCrossedMinorThreshold
Table 8-5 CnccIamMemoryUsageCrossedMinorThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC IAM Ingress pod has reached the configured minor threshold (70%) of its memory resource limits. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit. |
| Severity | minor |
| Condition | sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*iam-ingress-gateway.*|.*iam-kc.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*iam-ingress-gateway.*|.*iam-kc.*",resource="memory"}) * 100 >= 70 < 80 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7002 |
| Metric Used | container_memory_usage_bytes
Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system. |
| Recommended Action | Cause:
iam-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.1.5 CnccIamMemoryUsageCrossedMajorThreshold
Table 8-6 CnccIamMemoryUsageCrossedMajorThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNC Console IAM Ingress pod has reached the configured major threshold (80%) of its memory resource limits. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit. |
| Severity | major |
| Condition | sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*iam-ingress-gateway.*|.*iam-kc.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*iam-ingress-gateway.*|.*iam-kc.*",resource="memory"}) * 100 >= 80 < 90 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7002 |
| Metric Used | container_memory_usage_bytes
|
| Recommended Action | Cause:
iam-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.1.6 CnccIamMemoryUsageCrossedCriticalThreshold
Table 8-7 CnccIamMemoryUsageCrossedCriticalThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNC Console IAM Ingress pod has reached the configured critical threshold (90%) of its memory resource limits. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit. |
| Severity | critical |
| Condition | sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*iam-ingress-gateway.*|.*iam-kc.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*iam-ingress-gateway.*|.*iam-kc.*",resource="memory"}) * 100 >= 90 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7002 |
| Metric Used | container_memory_usage_bytes
|
| Recommended Action | Cause:
iam-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.1.7 CnccIamTransactionErrorRateAbove0.1Percent
Table 8-8 CnccIamTransactionErrorRateAbove0.1Percent
| Field | Details |
|---|---|
| Description | This alert notifies that the number of CNC Console IAM failed transactions is above 0.1 percent of the total transactions. |
| Summary | CNC Console IAM transaction Error Rate detected above 0.1 Percent of Total Transactions |
| Severity | warning |
| Condition | (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]))) * 100 >= 0.1 < 1 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:This alert triggers when CNC Console IAM failed
transactions exceed 0.1% of total transactions.5xx errors typically indicate
server-side issues, such as:
iam-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.1.8 CnccIamTransactionErrorRateAbove1Percent
Table 8-9 CnccIamTransactionErrorRateAbove1Percent
| Field | Details |
|---|---|
| Description | This alert notifies that the number of CNC Console IAM failed transactions is above 1 percent of the total transactions. |
| Summary | CNC Console IAM transaction Error Rate detected above 1 Percent of Total Transactions |
| Severity | warning |
| Condition | (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]))) * 100 >= 1 < 10 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:This alert triggers when CNC Console IAM failed
transactions exceed 1% of total transactions.5xx errors typically indicate server-side
issues, such as:
iam-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.1.9 CnccIamTransactionErrorRateAbove10Percent
Table 8-10 CnccIamTransactionErrorRateAbove10Percent
| Field | Details |
|---|---|
| Description | This alert notifies that the number of CNC Console IAM failed transactions is above 10 percent of the total transactions. |
| Summary | CNC Console IAM transaction error rate detected above 10 percent of total transactions |
| Severity | minor |
| Condition | (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]))) * 100 >= 10 < 25 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:This alert triggers when CNC Console IAM failed
transactions exceed 10% of total transactions.5xx errors typically indicate
server-side issues, such as:
am-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.1.10 CnccIamTransactionErrorRateAbove25Percent
Table 8-11 CnccIamTransactionErrorRateAbove25Percent
| Field | Details |
|---|---|
| Description | This alert notifies that the number of CNCC IAM failed transactions is above 25 percent of the total transactions. |
| Summary | CNCC IAM transaction Error Rate detected above 25 Percent of Total Transactions |
| Severity | major |
| Condition | (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]))) * 100 >= 25 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:This alert triggers when CNCC IAM failed transactions
exceed 25% of total transactions.5xx errors typically indicate server-side issues,
such as:
iam-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.1.11 CnccIamTransactionErrorRateAbove50Percent
Table 8-12 CnccIamTransactionErrorRateAbove50Percent
| Field | Details |
|---|---|
| Description | This alert notifies that the number of CNCC IAM failed transactions is above 50 percent of the total transactions. |
| Summary | CNCC IAM transaction Error Rate detected above 50 Percent of Total Transactions |
| Severity | critical |
| Condition | (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns"}[5m]))) * 100 >= 50 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:This alert triggers when CNCC IAM failed transactions
exceed 50% of total transactions.5xx errors typically indicate server-side issues,
such as:
iam-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.1.12 CnccIamIngressGatewayServiceDown
Table 8-13 CnccIamIngressGatewayServiceDown
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC IAM Ingress Gateway pod is down. |
| Summary | namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : cncc-iam-ingress-gateway service down |
| Severity | critical |
| Condition | absent(up{pod=~".*iam-ingress-gateway.*", namespace="cncc-ns"}) or (up{pod=~".*iam-ingress-gateway.*", namespace="cncc-ns"}) == 0 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7004 |
| Metric Used | upNote: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
| Recommended Action | Cause:This alert triggers when the CNCC IAM Ingress Gateway
pod or service is down.
Diagnostic Information:
cncc-iam-ingress-gateway service becomes available again. If the
alert does not clear:
|
8.1.13 CnccIamFailedLogin
Table 8-14 CnccIamFailedLogin
| Field | Details |
|---|---|
| Description | This alert notifies you if there are more than 3 failed login attempts in CNCC IAM for a user within 5 minutes. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value |
| Severity | warning |
| Condition | sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/master/login-actions/authenticate",Method="POST",Status="200 OK"})- sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/master/login-actions/authenticate",Method="POST",Status="200 OK"} offset 5m) > 3 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7005 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:This alert triggers when there are more than 3 failed
login attempts in CNCC IAM for a user within 5 minutes. This may be due to users
entering incorrect credentials, potential automated attacks (such as brute force
attempts), or issues with the login process.
Diagnostic Information:
iam-kc container, iam-ingress pod logs, and
relevant metrics to help Support analyze the issue.
|
8.1.14 AdminUserCreation
Table 8-15 AdminUserCreation
| Field | Details |
|---|---|
| Description | This alert notifies you when a new admin user is created in CNCC IAM within the last 5 minutes. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, user: {{$labels.UserName}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Admin users have been created |
| Severity | warning |
| Condition | sum by(namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/admin/realms/master/users",Method="POST"}) - sum by(namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/admin/realms/master/users",Method="POST"} offset 5m) > 0 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7006 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Action | Cause:This alert triggers when a new admin user is created in
CNCC IAM within the last 5 minutes. This may be the result of a legitimate
administrative action or an unauthorized attempt to gain privileged
access.
Diagnostic Information:
iam-kc container, iam-ingress pod logs, and
relevant audit details to help Support analyze the issue.
|
8.1.15 CnccIamAccessTokenFailure
Table 8-16 CnccIamAccessTokenFailure
| Field | Details |
|---|---|
| Description | This alert notifies you if there are more than 3 failed access token requests in CNCC IAM within 5 minutes. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value |
| Severity | warning |
| Condition | sum by(Status,namespace,ResourcePath,Method,UserName,UserId,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/master/protocol/openid-connect/token",Method="POST",Status=~"4.*|5.*"}) - sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/master/protocol/openid-connect/token",Method="POST",Status=~"4.*|5.*"} offset 5m) > 3 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.7007 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:This alert triggers when there are more than 3 failed
access token requests in CNCC IAM within 5 minutes. This may be due to incorrect user
credentials, misconfigured client applications, or backend errors affecting the token
generation process.
Diagnostic Information:
iam-kc container, iam-ingress pod logs, and
relevant metrics to help Support analyze the issue.
|
8.2 CNC Console Core Alerts
This section provides the information about CNC Console Core Alerts.
8.2.1 CnccCoreTotalIngressTrafficRateAboveMinorThreshold
Table 8-17 CnccCoreTotalIngressTrafficRateAboveMinorThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC Core Ingress message rate has crossed the configured minor threshold of 700 to 800 TPS. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000) |
| Severity | minor |
| Condition | sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[2m])) >= 700 < 800 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8001 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Action | Cause:
core-ingress pod logs and
relevant metrics to help Support analyze the issue.
|
8.2.2 CnccCoreTotalIngressTrafficRateAboveMajorThreshold
Table 8-18 CnccCoreTotalIngressTrafficRateAboveMajorThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC Core Ingress message rate has crossed the configured major threshold of 800 to 900 TPS. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000) |
| Severity | major |
| Condition | sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[2m])) >= 800 < 900 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8001 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Action | Cause:
core-ingress
pod logs and relevant metrics to help Support analyze the issue.
|
8.2.3 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold
Table 8-19 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC Core Ingress message rate has crossed the configured critical threshold of 900 TPS. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000) |
| Severity | critical |
| Condition | sum by(namespace,pod) (rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[2m])) >= 900 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8001 |
| Metric Used | oc_ingressgateway_http_requests_total |
| Recommended Action | Cause:
|
8.2.4 CnccCoreMemoryUsageCrossedMinorThreshold
Table 8-20 CnccCoreMemoryUsageCrossedMinorThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC Core Ingress pod has reached the configured minor threshold (70%) of its memory resource limits. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit. |
| Severity | minor |
| Condition | sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*core-cmservice.*|.*core-ingress-gateway.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*core-cmservice.*|.*core-ingress-gateway.*",resource="memory"}) * 100 >= 70 < 80 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8002 |
| Metric Used | container_memory_usage_bytes
|
| Recommended Action | Cause:
|
8.2.5 CnccCoreMemoryUsageCrossedMajorThreshold
Table 8-21 CnccCoreMemoryUsageCrossedMajorThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC Core Ingress pod has reached the configured major threshold (80%) of its memory resource limits. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit. |
| Severity | major |
| Condition | sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*core-cmservice.*|.*core-ingress-gateway.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*core-cmservice.*|.*core-ingress-gateway.*",resource="memory"}) * 100 >= 80 < 90 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8002 |
| Metric Used | container_memory_usage_bytes
|
| Recommended Action | Cause:
core-ingress
pod logs and relevant metrics to help Support analyze the issue.
|
8.2.6 CnccCoreMemoryUsageCrossedCriticalThreshold
Table 8-22 CnccCoreMemoryUsageCrossedCriticalThreshold
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC Core Ingress pod has reached the configured critical threshold (90%) of its memory resource limits. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit. |
| Severity | critical |
| Condition | sum by(namespace,pod) (container_memory_usage_bytes{container!="", namespace="cncc-ns", pod=~".*core-cmservice.*|.*core-ingress-gateway.*"}) / sum by(namespace, pod) (kube_pod_container_resource_limits{namespace="cncc-ns",pod=~".*core-cmservice.*|.*core-ingress-gateway.*",resource="memory"}) * 100 >= 90 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8002 |
| Metric Used | container_memory_usage_bytes
|
| Recommended Action | Cause:
core-ingress
pod logs and relevant metrics to help Support analyze the issue.
|
8.2.7 CnccCoreTransactionErrorRateAbove0.1Percent
Table 8-23 CnccCoreTransactionErrorRateAbove0.1Percent
| Field | Details |
|---|---|
| Description | This alert notifies that the number of CNCC Core failed transactions is above 0.1 percent of the total transactions. |
| Summary | CNCC Core transaction Error Rate detected above 0.1 Percent of Total Transactions |
| Severity | warning |
| Condition | (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]))) *100 >= 0.1 < 1 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:
|
8.2.8 CnccCoreTransactionErrorRateAbove1Percent
Table 8-24 CnccCoreTransactionErrorRateAbove1Percent
| Field | Details |
|---|---|
| Description | This alert notifies that the number of CNCC Core failed transactions is above 1 percent of the total transactions. |
| Summary | CNCC Core transaction Error Rate detected above 1 Percent of Total Transactions |
| Severity | warning |
| Condition | (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]))) *100 >= 1 < 10 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:
core-ingress
pod logs and relevant metrics to help Support analyze the issue.
|
8.2.9 CnccCoreTransactionErrorRateAbove10Percent
Table 8-25 CnccCoreTransactionErrorRateAbove10Percent
| Field | Details |
|---|---|
| Description | This alert notifies that the number of CNCC Core failed transactions is above 10 percent of the total transactions. |
| Summary | CNCC Core transaction Error Rate detected above 10 Percent of Total Transactions |
| Severity | minor |
| Condition | (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]))) *100 >= 10 < 25 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:
core-ingress
pod logs and relevant metrics to help Support analyze the issue.
|
8.2.10 CnccCoreTransactionErrorRateAbove25Percent
Table 8-26 CnccCoreTransactionErrorRateAbove25Percent
| Field | Details |
|---|---|
| Description | This alert notifies that the number of CNCC Core failed transactions is above 25 percent of the total transactions. |
| Summary | CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions |
| Severity | major |
| Condition | (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]))) *100 >= 25 < 50 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:
core-ingress
pod logs and relevant metrics to help Support analyze the issue.
|
8.2.11 CnccCoreTransactionErrorRateAbove50Percent
Table 8-27 CnccCoreTransactionErrorRateAbove50Percent
| Field | Details |
|---|---|
| Description | This alert notifies that the number of CNCC Core failed transactions is above 50 percent of the total transactions. |
| Summary | CNCC Core transaction error rate detected above 50 percent of total transactions |
| Severity | critical |
| Condition | (sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{Status=~"5.*",InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]) or (up * 0 ) ) )/(sum by(namespace,pod)(rate(oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns"}[5m]))) *100 >= 50 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:
core-ingress pod logs and relevant metrics to help
Support analyze the issue.
|
8.2.12 CnccCoreIngressGatewayServiceDown
Table 8-28 CnccCoreIngressGatewayServiceDown
| Field | Details |
|---|---|
| Description | This alert notifies that the CNCC Core Ingress Gateway pod is down. |
| Summary | namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : cncc-core-ingress-gateway service down |
| Severity | critical |
| Condition | absent(up{pod=~".*core-ingress-gateway.*", namespace="cncc-ns"}) or (up{pod=~".*core-ingress-gateway.*", namespace="cncc-ns"}) == 0 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8004 |
| Metric Used | upNote: This is a Prometheus metric used for instance
availability monitoring. If this metric is not available, use a similar metric as
exposed by the monitoring system.
|
| Recommended Action | Cause: This alert triggers when the CNCC Core Ingress Gateway pod or
service is down.
Diagnostic Information:
cncc-core-ingress-gateway service becomes available again. If the
alert does not clear:
|
8.2.13 CnccCoreFailedLogin
Table 8-29 CnccCoreFailedLogin
| Field | Details |
|---|---|
| Description | This alert notifies you if there are more than 3 failed login attempts in CNCC Core for a user within 5 minutes. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value |
| Severity | warning |
| Condition | sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/cncc/login-actions/authenticate",Method="POST",Status="200 OK"}) - sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/cncc/login-actions/authenticate",Method="POST",Status="200 OK"} offset 5m) > 3 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8005 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:
core-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.2.14 CnccCoreUnauthorizedAccess
Table 8-30 CnccCoreUnauthorizedAccess
| Field | Details |
|---|---|
| Description | This alert notifies you if there are more than 3 unauthorized access (403 Forbidden) attempts in CNCC Core within 5 minutes. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Unauthorized Access for CNCC-Core are more than threshold value |
| Severity | warning |
| Condition | sum by(Status,Method,namespace,ResourceType,UserId,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns",Status="403 FORBIDDEN", ResourceType!="UNKNOWN"}) - sum by(Status,Method,namespace,ResourceType,UserId,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*core_ingressgateway",namespace="cncc-ns",Status="403 FORBIDDEN",ResourceType!="UNKNOWN"} offset 5m) > 3 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8006 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:
core-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.2.15 CnccCoreAccessTokenFailure
Table 8-31 CnccCoreAccessTokenFailure
| Field | Details |
|---|---|
| Description | This alert notifies you if there are more than 3 failed access token requests in CNCC Core within 5 minutes. |
| Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value |
| Severity | warning |
| Condition | sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/cncc/protocol/openid-connect/token",Method="POST",Status=~"4.*|5.*"}) - sum by(Status,namespace,ResourcePath,Method,UserName,pod) (oc_ingressgateway_http_responses_total{InstanceIdentifier=~".*iam_ingressgateway",namespace="cncc-ns",ResourcePath="/cncc/auth/realms/cncc/protocol/openid-connect/token",Method="POST",Status=~"4.*|5.*"} offset 5m) > 3 |
| OID | 1.3.6.1.4.1.323.5.3.51.1.2.8007 |
| Metric Used | oc_ingressgateway_http_responses_total |
| Recommended Action | Cause:
core-ingress pod logs and relevant metrics to help Support analyze
the issue.
|
8.3 Validating Alerts
Configure and Validate Alerts in Prometheus Server. Refer to CNCC Alert Configuration section for procedure to configure the alerts.
- Open the Prometheus server from your browser using the <IP>:<Port>
- Navigate to Status and to Rules.
- Search CNCC. CNCC Alerts list is displayed.
Note:
If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try again8.4 Disabling Alerts
This section explains how to disable the alerts in CNC Console.
- Edit manager or agent alert file to remove specific alert.
- Remove complete content of the specific alert from the manager
or agent alert file
For example, if you want to remove
cncc_alert_rules_<version>.yamlCnccIamTotalIngressTrafficRateAboveMinorThreshold alertfromoccncc_manager_alerting_rules_promha_<version>.yamlfile, remove the complete content like below:## ALERT SAMPLE START## - alert: CnccIamTotalIngressTrafficRateAboveMinorThreshold annotations: description: 'CNCC IAM Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' expr: sum(rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*cncc-iam_ingressgateway",kubernetes_namespace="cncc"}[2m])) > 0 labels: severity: minor oid: "1.3.6.1.4.1.323.5.3.51.1.2.7001" namespace: ' {{ $labels.kubernetes_namespace }} ' podname: ' {{$labels.kubernetes_pod_name}} ' ## ALERT SAMPLE END## - Perform Alert configuration. See CNCC Alert Configuration section for details.
8.5 Configuring SNMP-Notifier
Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using following procedure:
- Run the following command to edit the
deployment:
kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>Example:$ kubectl edit deploy occne-snmp-notifier -n occne-infra - Edit the destination as follows:
--snmp.destination=<destination_ip>:<destination_port>Example:--snmp.destination=10.75.203.94:162
8.6 CNC Console MIB Files
There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.
occncc_mib_tc_<version>.mibThis file is considered as CNCC top level mib file, where the Objects and their data types are defined.
occncc_mib_<version>.mibThis file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.