8 CNC Console Alerts
This section provides information about CNC Console Alerts.
Note:
- The user must use updated
occncc_agent_alertrules_<version>.yaml
file for agent cluster, in case of multi cluster deployment. - Use
occncc_manager_alertrules_<version>.yaml
file for single cluster deployment and in the manager cluster, in case of multi cluster deployment.
Note:
Added the four sample alert files:
- CNCC alert rules file for CNE without Prometheus Operator:
occncc_manager_alertrules_<version>.yaml
file for manager only or manager plus agent deployments andoccncc_agent_alertrules_<version>.yaml
file for agent only deployments. - CNCC alert rules file supporting CNE with Prometheus HA
Operator:
occncc_manager_alerting_rules_promha_<version>.yaml
file for deployments in manager cluster andoccncc_agent_alertingrules_promha_<version>.yaml
file for deployments in agent only cluster.
Table 8-1 Alerts Levels or Severity Types
Alerts Levels / Severity Types | Definition |
---|---|
Critical | Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of OCCM. |
Major | Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of OCCM. |
Minor | Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of OCCM. |
Info or Warn (Informational) | Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of OCCM. |
8.1 CNC Console IAM Alerts
This section provides information about CNC Console IAM Alerts.
8.1.1 CnccIamTotalIngressTrafficRateAboveMinorThreshold
Table 8-2 CnccIamTotalIngressTrafficRateAboveMinorThreshold
Trigger Condition |
The total CNCC IAM Ingress Message rate has crossed the configured minor threshold of 700 TPS. Default value of this alert trigger point in occncc_alertrules_<version>.yamlis when CNCC IAM Ingress Rate crosses 70 % of 1000 (Maximum ingress request rate) |
Severity | Minor |
Alert details provided |
Description: CNCC IAM Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }}) For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' For CNE without Prometheus Operator : summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate crosses the Major threshold, in which case the CnccIamTotalIngressTrafficRateAboveMajorThreshold alert is raised. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
8.1.2 CnccIamTotalIngressTrafficRateAboveMajorThreshold
Table 8-3 CnccIamTotalIngressTrafficRateAboveMajorThreshold
Trigger Condition |
The total CNCC IAM Ingress Message rate has crossed the configured major threshold of 800 TPS. Default value of this alert trigger point ino ccncc_alertrules_<version>.yaml is when CNCC IAM Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate) |
Severity | Major |
Alert details provided |
Description: 'CNCC IAM Ingress traffic Rate is above the configured major threshold i.e. 800 requests per second (current value is: {{ $value }})' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' For CNE without Prometheus Operator : summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate crosses the Critical threshold, in which case the CnccIamTotalIngressTrafficRateAboveCriticalThreshold alert is raised. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
8.1.3 CnccIamTotalIngressTrafficRateAboveCriticalThreshold
Table 8-4 CnccIamTotalIngressTrafficRateAboveCriticalThreshold
Trigger Condition |
The total CNCC IAM Ingress Message rate has crossed the configured critical threshold of 900TPS. Default value of this alert trigger point in occncc_alertrules_<version>.yaml is when CNCC IAM Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate) |
Severity | Critical |
Alert details provided |
Description:CNCC IAM Ingress traffic Rate is above the configured critical threshold, that is, 900 requests per second (current value is: {{ $value }}) For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' For CNE without Prometheus Operator: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
8.1.4 CnccIamMemoryUsageCrossedMinorThreshold
Table 8-5 CnccIamMemoryUsageCrossedMinorThreshold
Trigger Condition | A pod has reached the configured minor threshold( 70%) of its memory resource limits. |
Severity | Minor |
Alert details provided |
Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (70%) (value={{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7002 |
Metric Used |
container_memory_usage_bytes, kube_pod_container_resource_limits Note:This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case CnccIamMemoryUsageCrossedMajorThreshold alert is raised. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. If guidance is required, contact My Oracle Support. |
8.1.5 CnccIamMemoryUsageCrossedMajorThreshold
Table 8-6 CnccIamMemoryUsageCrossedMajorThreshold
Trigger Condition | A pod has reached the configured major threshold( 80%) of its memory resource limits. |
Severity | Major |
Alert details provided |
Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (80%) (value = {{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7002 |
Metric Used |
container_memory_usage_bytes, kube_pod_container_resource_limits Note: This is a Kubernetes metric used for instance availability monitoring.If the metric is not available, use a similar metric as exposed by the monitoring system. |
Resolution | The alert gets cleared when the memory utilization falls below the
Major Threshold or crosses the critical threshold, in which case
CnccIamMemoryUsageCrossedCriticalThreshold alert shall be raised.
Note: The threshold is configurable in the
occncc_alertrules_<version>.yaml file.
If guidance is required, contact My Oracle Support. |
8.1.6 CnccIamMemoryUsageCrossedCriticalThreshold
Table 8-7 CnccIamMemoryUsageCrossedCriticalThreshold
Trigger Condition | A pod has reached the configured critical threshold ( 90% ) of its memory resource limits |
Severity | Critical |
Alert details provided |
Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (90%) (value = {{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7002 |
Metric Used |
container_memory_usage_bytes, kube_pod_container_resource_limits Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system. |
Resolution | The alert gets cleared when the memory utilization falls below the
Critical Threshold.
Note : The threshold is configurable in the
occncc_alertrules_<version>.yaml file.
If guidance is required, contact My Oracle Support. |
8.1.7 CnccIamTransactionErrorRateAbove0.1Percent
Table 8-8 CnccIamTransactionErrorRateAbove0.1Percent
Trigger Condition | The number of failed transactions is above 0.1 percent of the total transactions. |
Severity | Warning |
Alert details provided | Description: 'CNCC IAM transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC IAM transaction Error Rate detected above 0.1 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failed transactions is below
0.1% of the total transactions or when the number of failed transactions crosses the
1% threshold in which case the CnccIamTransactionErrorRateAbove1Percent is
raised.
Steps:
|
8.1.8 CnccIamTransactionErrorRateAbove1Percent
Table 8-9 CnccIamTransactionErrorRateAbove1Percent
Trigger Condition | The number of failed transactions is above 1 percent of the total transactions. |
Severity | Warning |
Alert details provided | Description: 'CNCC IAM transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC IAM transaction Error Rate detected above 1 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failed transactions is below
1% of the total transactions or when the number of failed transactions crosses the 10%
threshold in which case the CnccIamTransactionErrorRateAbove10Percent is raised.
Steps:
|
8.1.9 CnccIamTransactionErrorRateAbove10Percent
Table 8-10 CnccIamTransactionErrorRateAbove10Percent
Trigger Condition | The number of failed transactions is above 10 percent of the total transactions. |
Severity | Minor |
Alert details provided |
Description: CNCC IAM transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC IAM transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failed transactions is below
10% of the total transactions or when the number of failed transactions crosses the
25% threshold in which case the CnccIamTransactionErrorRateAbove25Percent is
raised.
Steps:
|
8.1.10 CnccIamTransactionErrorRateAbove25Percent
Table 8-11 CnccIamTransactionErrorRateAbove25Percent
Trigger Condition |
The number of failed transactions is above 25 percent of the total transactions. |
Severity | Major |
Alert details provided |
Description: 'CNCC IAM transaction Error Rate detected above 25 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC IAM transaction Error Rate detected above 25 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
TThe alert is cleared when the number of failed transactions are below 25% of the total transactions or when the number of failed transactions cross the 50% threshold in which case the CnccIamTransactionErrorRateAbove50Percent is raised. Steps:
|
8.1.11 CnccIamTransactionErrorRateAbove50Percent
Table 8-12 CnccIamTransactionErrorRateAbove50Percent
Trigger Condition |
The number of failed transactions is above 50 percent of the total transactions. |
Severity | Critical |
Alert details provided |
Description: The number of failed transactions is above 50 percent of the total transactions. Summary: 'CNCC IAM transaction Error Rate detected above 50 Percent of Total Transactions'. |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions is below 50 percent of the total transactions. The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
8.1.12 CnccIamIngressGatewayServiceDown
Table 8-13 CnccIamIngressGatewayServiceDown
Trigger Condition |
The pods of the CNCC IAM Ingress Gateway microservice is available. |
Severity | Critical |
Alert details provided |
Description:'CNCC IAM Ingress-Gateway service InstanceIdentifier=~".*cncc-iam_ingressgateway" is down' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down' For CNE without Prometheus Operator: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7004 |
Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
Resolution |
The alert is cleared when the cncc-iam_ingressgateway service is available. Steps:
|
8.1.13 CnccIamFailedLogin
Table 8-14 CnccIamFailedLogin
Trigger Condition |
The count of failed login attempts in CNCC-IAM by a user goes above '3' |
Severity | Warning |
Alert details provided |
Description:'{{ $value }} failed Login attempts have been detected in CNCC IAM for user {{$labels.UserName}}, the configured threshold value is 3 failed login attempts for every 5 min' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value' For CNE without Prometheus Operator: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7005 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total failed login attempts for a particular user goes below the threshold value (default value is '3') in the last 5 min (default value is 5 m). Note: The threshold and time is configurable in the alerts.yaml file. If guidance is required, contact My Oracle Support. |
8.1.14 AdminUserCreation
Table 8-15 AdminUserCreation
Trigger Condition |
If a new admin account is created in the last 5 min |
Severity | Warning |
Alert details provided |
For CNE with Prometheus HA Operator: Description: '{{ $value }} admin users have been created by {{$labels.UserName}} ' summary: 'namespace: {{$labels.namespace}} summary: {{$labels.pod}}, user: {{$labels.UserName}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Admin users have been created ' For CNE without Prometheus Operator: Description: '{{ $value }} admin users have been created by {{$labels.UserName}} ' summary: 'namespace: {{$labels.kubernetes_namespace}} summary: {{$labels.kubernetes_pod_name}}, user: {{$labels.UserName}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Admin users have been created ' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7006 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note: The threshold and time is configurable in the occncc_alertrules_<version>.yaml file. Login to admin GUI and review the user created. If guidance is required, contact My Oracle Support. |
8.1.15 CnccIamAccessTokenFailure
Table 8-16 CnccIamAccessTokenFailure
Trigger Condition |
If the count of failed token for CNCC-IAM goes above configured value of '3' |
Severity | Warning |
Alert details provided |
Description: 'CNCC Iam Access Token Failure count is above the configured value i.e. 3 for every 5 min. Failed access token request count per second is (current value is: {{ $value }})' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value' For CNE without Prometheus Operator: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7007 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total failed tokens for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note: The threshold and time is configurable in the occncc_alertrules_<version>.yaml file. If guidance is required, contact My Oracle Support. |
8.2 CNC Console Core Alerts
This section provides the information about CNC Console Core Alerts.
8.2.1 CnccCoreTotalIngressTrafficRateAboveMinorThreshold
Table 8-17 CnccCoreTotalIngressTrafficRateAboveMinorThreshold
Trigger Condition |
The total CNCC Core Ingress Message rate has crossed the configured minor threshold of 700 TPS. Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 70 % of 1000 (Maximum ingress request rate) |
Severity | Minor |
Alert details provided |
Description: 'CNCC Core Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' For CNE without Prometheus Operator: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert is cleared either when the total Ingress traffic rate falls below the minor threshold or when the total traffic rate crosses the major threshold, in which case the CnccCoreTotalIngressTrafficRateAboveMajorThreshold alert is raised. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
8.2.2 CnccCoreTotalIngressTrafficRateAboveMajorThreshold
Table 8-18 CnccCoreTotalIngressTrafficRateAboveMajorThreshold
Trigger Condition |
The total CNCC Core Ingress Message rate has crossed the configured major threshold of 800 TPS. Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate) |
Severity | Major |
Alert details provided |
Description: 'CNCC Core Ingress traffic Rate is above the configured major threshold i.e. 800 requests per second (current value is: {{ $value }})' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' For CNE without Prometheus Operator : summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution | The alert is cleared when the total Ingress Traffic ratefalls below
the Major threshold or when the total traffic rate cross the Critical threshold, in
which case the CnccCoreTotalIngressTrafficRate Above CriticalThreshold.
Note: The threshold is configurable in the alerts.yaml file. Steps:
|
8.2.3 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold
Table 8-19 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold
Trigger Condition |
The total CNCC Core Ingress Message rate has crossed the configured critical threshold of 900TPS. Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate) |
Severity | Critical |
Alert details provided |
Description: 'CNCC Core Ingress traffic Rate is above the configured critical threshold i.e. 900 requests per second (current value is: {{ $value }})' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' For CNE without Prometheus Operator : summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
8.2.4 CnccCoreMemoryUsageCrossedMinorThreshold
Table 8-20 CnccCoreMemoryUsageCrossedMinorThreshold
Trigger Condition |
A pod has reached the configured minor threshold( 70%) of its memory resource limits. |
Severity | Minor |
Alert details provided |
Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (70%) (value={{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8002 |
Metric Used |
container_memory_usage_bytes kube_pod_container_resource_limits Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case CnccCoreMemoryUsageCrossedMajorThreshold alert is raised. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. If guidance is required, contact My Oracle Support. |
8.2.5 CnccCoreMemoryUsageCrossedMajorThreshold
Table 8-21 CnccCoreMemoryUsageCrossedMajorThreshold
Trigger Condition |
A pod has reached the configured major threshold( 80%) of its memory resource limits. |
Severity | Major |
Alert details provided |
Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (80%) (value = {{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8002 |
Metric Used |
container_memory_usage_bytes kube_pod_container_resource_limits Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case CnccCoreMemoryUsageCrossedCriticalThreshold alert is raised Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. If guidance is required, contact My Oracle Support. |
8.2.6 CnccCoreMemoryUsageCrossedCriticalThreshold
Table 8-22 CnccCoreMemoryUsageCrossedCriticalThreshold
Trigger Condition |
A pod has reached the configured critical threshold ( 90% ) of its memory resource limits |
Severity | Critical |
Alert details provided |
Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (90%) (value = {{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8002 |
Metric Used |
container_memory_usage_bytes kube_pod_container_resource_limits Note : This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Critical Threshold. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. If guidance is required, contact My Oracle Support. |
8.2.7 CnccCoreTransactionErrorRateAbove0.1Percent
Table 8-23 CnccCoreTransactionErrorRateAbove0.1Percent
Trigger Condition |
The number of failed transactions is above 0.1 percent of the total transactions.. |
Severity | Warning |
Alert details provided |
Description:'CNCC Core transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})' Summary:'CNCC Core transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions are below 0.1% of the total transactions or when the number of failed transactions cross the 1% threshold in which case the CnccCoreTransactionErrorRateAbove1Percent is raised. The threshold is configurable in the alerts.yaml file. Steps:
|
8.2.8 CnccCoreTransactionErrorRateAbove1Percent
Table 8-24 CnccCoreTransactionErrorRateAbove1Percent
Trigger Condition | The number of failed transactions is above 1 percent of the total transactions. |
Severity | Warning |
Alert details provided |
Description: 'CNCC Core transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})' Summary:'CNCC Core transaction Error Rate detected above 1 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions are below 1% of the total transactions or when the number of failed transactions crosses the 10% threshold in which case the CnccCoreTransactionErrorRateAbove10Percent is raised. Steps:
|
8.2.9 CnccCoreTransactionErrorRateAbove10Percent
Table 8-25 CnccCoreTransactionErrorRateAbove10Percent
Trigger Condition | The number of failed transactions is above 10 percent of the total transactions. |
Severity | Minor |
Alert details provided |
Description: 'CNCC Core transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})' summary: 'CNCC Core ransaction Error Rate detected above 10 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions are below 10% of the total transactions or when the number of failed transactions crosses the 25% threshold in which case the CnccCoreTransactionErrorRateAbove25Percent is raised. Steps:
|
8.2.10 CnccCoreTransactionErrorRateAbove25Percent
Table 8-26 CnccCoreTransactionErrorRateAbove25Percent
Trigger Condition | The number of failed transactions is above 25 percent of the total transactions. |
Severity | Major |
Alert details provided |
Description: 'CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions are below 25% of the total transactions or when the number of failed transactions crosses the 50% threshold in which case the CnccCoreTransactionErrorRateAbove50Percent is raised. Steps:
|
8.2.11 CnccCoreTransactionErrorRateAbove50Percent
Table 8-27 CnccCoreTransactionErrorRateAbove50Percent
Trigger Condition | The number of failed transactions is above 50 percent of the total transactions. |
Severity | Critical |
Alert details provided |
Description: 'CNCC Core transaction Error Rate detected above 50 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC Core transaction Error Rate detected above 50 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions are below 50 percent of the total transactions Steps:
|
8.2.12 CnccCoreIngressGatewayServiceDown
Table 8-28 CnccCoreIngressGatewayServiceDown
Trigger Condition | Cncc Core Ingress Gateway service is down |
Severity | Critical |
Alert details provided |
Description: 'CNCC Core Ingress-Gateway service InstanceIdentifier=~".*core_ingressgateway" is down' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down' For CNE without Prometheus Operator: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8004 |
Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
Resolution |
The alert is cleared when the cncc-core_ingressgateway service is available. Steps:
|
8.2.13 CnccCoreFailedLogin
Table 8-29 CnccCoreFailedLogin
Trigger Condition |
The count of failed login attempts in CNCC-Core by a user goes above '3' |
Severity | Warning |
Alert details provided |
Description:'{{ $value }} failed Login attempts have been detected in CNCC Core for user {{$labels.UserName}}, the configured threshold value is 3 failed login attempts for every 5 min' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value' For CNE without Prometheus Operator : summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8005 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note: The threshold and time is configurable in the alerts.yaml file. If guidance is required, contact My Oracle Support. |
8.2.14 CnccCoreUnauthorizedAccess
Table 8-30 CnccCoreUnauthorizedAccess
Trigger Condition |
The count of failed login attempts in CNCC-Core by a user goes above '3' |
Severity | Warning |
Alert details provided |
Description:'{{ $value }} Unauthorized Accesses have been detected in CNCC-Core for {{$labels.ResourceType}} for {{$labels.Method}} request. The configured threshold value is 3 for every 5 min' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Unauthorized Access for CNCC-Core are more than threshold value' For CNE without Prometheus Operator : summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Unauthorized Access for CNCC-Core are more than threshold value' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8006 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note: The threshold and time is configurable in the alerts.yaml If guidance is required, contact My Oracle Support. |
8.2.15 CnccCoreAccessTokenFailure
Table 8-31 CnccCoreAccessTokenFailure
Trigger Condition |
If the count of failed token for CNCC-Core goes above configured value of '3' |
Severity | Warning |
Alert details provided |
Description: 'CNCC Core Access Token Failure count is above the configured value i.e. 3 for every 5 min. Failed access token request count per second is (current value is: {{ $value }})' For CNE with Prometheus HA Operator: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value' For CNE without Prometheus Operator : summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8007 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total failed tokens for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note: The threshold and time is configurable in the alerts.yaml file. If guidance is required, Contact My Oracle Support. |
8.3 CNC Console Alert configuration in Prometheus
Applying Alerts Rule to CNE without Prometheus Operator:
This section describes the measurement based Alert rules configuration for CNCC
in Prometheus. Use updated occncc_agent_alertrules_<version>.yaml
file for
agent only deployments and occncc_manager_alertrules_<version>.yaml
for
manager only as well as manager plus agent deployments.
_NAME_ :- Helm Release of Prometheus
_Namespace_ :- Kubernetes NameSpace in which Prometheus is installed
Configuration
- Run the following command to take Backup of current config map of
prometheus
server:
where, <Namespace> is the prometheus server namespace used in helm install command.kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml
For Example, assuming chart name is "prometheus-alert", so "_NAME_-server" becomes "prometheus-alert-server", run the following command to find the config map:kubectl get configmaps prometheus-alert-server -o yaml -n prometheus-alert2 > /tmp/tempConfig.yaml
- Run the following command to check and Add CNCC Alert File name inside
Prometheus config map:
- If alertscncc is present, delete the alertscncc entry from the
t_mapConfig.yaml file, by executing the following command:
sed -i '/etc\/config\/alertscncc/d' /tmp/mapConfig.yaml
- If alertscncc is not present, add the alertscncc entry in the t_mapConfig.yaml file by executing the following command:
sed -i '/rule_files:/a\ \- /etc/config/alertscncc' /tmp/mapConfig.yaml
- If alertscncc is present, delete the alertscncc entry from the
t_mapConfig.yaml file, by executing the following command:
- Run the following command to update Config map with updated file name of
CNCC:
kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
- Run the following commands to add cnccAlertsRules in Config map under
file name of CNCC alert file:
- For single cluster and manager only or manager plus agent namespace in multi
cluster
deployment:
kubectl patch configmap _NAME_-server -n _Namespace_ --type merge --patch "$(cat ~/occncc_manager_alertrules_<version>.yaml)"
- For agent only namespace in multicluster:
kubectl patch configmap _NAME_-server -n _Namespace_ --type merge --patch "$(cat ~/occncc_agent_alertrules_<version>.yaml)"
- For single cluster and manager only or manager plus agent namespace in multi
cluster
deployment:
- Restart prometheus-server pod.
- Verify the alerts in prometheus GUI.
Note:
Prometheus server takes updated configmap reloaded after sometime automatically (~20 sec)
$ kubectl get configmaps occne-prometheus-server -o yaml -n occne-infra > /tmp/tempConfig.yaml
$ sed -i '/etc\/config\/alertscncc/d' /tmp/tempConfig.yaml
$ sed -i '/rule_files:/a\ \- /etc/config/alertscncc' /tmp/tempConfig.yaml
$ kubectl replace configmap occne-prometheus-server -f /tmp/tempConfig.yaml
$ kubectl patch configmap occne-prometheus-server -n occne-infra --type merge --patch "$(cat ~/occncc_alertrules_<version>.yaml)"
Note:
The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 60 seconds. Refresh the Prometheus GUI to confirm that the CNCC Alerts have been reloaded.This
section describes the measurement based Alert rules configuration for CNCC in Prometheus.
Use updated occncc_agent_alerting_rules_promha_<version>.yaml
file for
agent only deployments and
occncc_manager_alerting_rules_promha_<version>.yaml
for manager only
as well as manager plus agent deployments.
Note:
Default namespace configured in
occncc_manager_alerting_rules_promha_<version>.yaml
and
occncc_agent_alerting_rules_promha_<version>.yaml
is "cncc-ns". User must
update the namespace as per the deployment.
$ kubectl apply -f <file_name> -n <cncc namespace>
Where, <file_name> is the CNCC alerts file and <cncc namespace> is the CNCC namespace.
Example:$ kubectl apply -f occncc_manager_alerting_rules_promha_<version>.yaml -n cncc
The
following sample file delivered with CNC Console
package:occncc_manager_alerting_rules_promha_<version>.yaml
8.4 Validating Alerts
Configure and Validate Alerts in Prometheus Server. Refer to CNCC Alert Configuration section for procedure to configure the alerts.
- Open the Prometheus server from your browser using the <IP>:<Port>
- Navigate to Status and to Rules.
- Search CNCC. CNCC Alerts list is displayed.
Note:
If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try again8.5 Disabling Alerts
This section explains how to disable the alerts in CNCC.
- Edit occncc_alerting_rules_promha_<version>.yaml file to remove specific alert.
- Remove complete content of the specific alert from the
occncc_alerting_rules_promha_<version>.yaml file.
cncc_alert_rules_<version>.yaml
For example: If you want to remove CnccIamTotalIngressTrafficRateAboveMinorThreshold alert, remove the complete content: ## ALERT SAMPLE START## - alert: CnccIamTotalIngressTrafficRateAboveMinorThreshold annotations: description: 'CNCC IAM Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' expr: sum(rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*cncc-iam_ingressgateway",kubernetes_namespace="cncc"}[2m])) > 0 labels: severity: minor oid: "1.3.6.1.4.1.323.5.3.51.1.2.7001" namespace: ' {{ $labels.kubernetes_namespace }} ' podname: ' {{$labels.kubernetes_pod_name}} ' ## ALERT SAMPLE END##
- Perform Alert configuration. See CNCC Alert Configuration section for details.
8.6 Configuring SNMP-Notifier
Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using following procedure:
- Run the following command to edit the
deployment:
kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>
Example:$ kubectl edit deploy occne-snmp-notifier -n occne-infra
- Edit the destination as follows:
--snmp.destination=<destination_ip>:<destination_port>
Example:--snmp.destination=10.75.203.94:162
8.7 CNC Console MIB Files
There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.
occncc_mib_tc_<version>.mibThis file is considered as CNCC top level mib file, where the Objects and their data types are defined.
occncc_mib_<version>.mibThis file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.
8.8 CNC Console Alerts on OCI
This section provides information about CNC Console Alerts on OCI:
8.8.1 Configure CNC Console Alerts on OCI
Note:
You must ensure that the user has the required roles and permissions to view or modify the alerts. For more information, see the Creating OCI User Management section in the Oracle Communications Cloud Native Core OCI Adaptor, NF Deployment on OCI Guide.
- CNC Console CSAR package occncc_csar_<version>.zip
includes alarm files specific to OCI deployment. These files are
zipped as
occncc_oci_alertrules_<version>.zip
and placed in the Scripts directory of CSAR package. - Unzip the file to get
/occncc_oci_alertrules_<version>/occncc_oci
and/occncc_oci_alertrules_<version>/occncc_oci_resources
these are the terraform folders to be applied in OCI console to create CNC Console related alarms. - Review the
occncc_oci_alertrules_<version>occncc_oci/alarms.tf
and/occncc_oci_alertrules_<version>/occncc_oci_resources/alarms.tf
file, edit the value of the parameters in the file (if needed to be changed from default values) before configuring the alerts. - In
/occncc_oci_alertrules_<version>/occncc_oci/alarms.tf
k8Namespace is configured as kubernetes namespace in which CNC Console is deployed. Default value is cncc. Update the/occncc_oci/alarms.tf
file to reflect the correct CNC Console kubernetes namespace. - In
/occncc_oci_alertrules_<version>/occncc_oci_resources/alarms.tf
namespace is configured as kubernetes namespace in which CNC Console is deployed. Default value is cncc. Update the /occncc_oci/alarms.tf file to reflect the correct CNC Console kubernetes namespace. - In
/
occncc_oci_alertrules_<version>/occncc_oci/notifications.tf
and/occncc_oci_alertrules_<version>/occncc_oci_resources/notifications.tf
update the subscription email, where resource"email_subscription_1"
endpoint parameter should be updated with user email to which alarm trigger notification will be sent.Example:
resource "oci_ons_notification_topic" "notification_topic" { compartment_id = var.compartment_id name = var.topic_name description = "Send an email to the subscribed stakeholders" } resource "oci_ons_subscription" "email_subscription_1" { compartment_id = var.compartment_id endpoint = "xxx.oracle.com" protocol = "EMAIL" topic_id = oci_ons_notification_topic.notification_topic.id }
To configure two or more email subscriptions, add new entries. For example:resource "oci_ons_notification_topic" "notification_topic" { compartment_id = var.compartment_id name = var.topic_name description = "Send an email to the subscribed stakeholders" } resource "oci_ons_subscription" "email_subscription_1" { compartment_id = var.compartment_id endpoint = "xxx.oracle.com" protocol = "EMAIL" topic_id = oci_ons_notification_topic.notification_topic.id } resource "oci_ons_subscription" "email_subscription_2" { compartment_id = var.compartment_id endpoint = "yyy.oracle.com" protocol = "EMAIL" topic_id = oci_ons_notification_topic.notification_topic.id }
- Once alarms.tf and notifications.tf files are updated, these /occncc_oci and /occncc_oci_resources terraform folders can be applied in OCI console to create alarms.
- Login to OCI Console GUI and search for "Stacks" or click on Hamburger menu, click on Developer Services, then click Stacks in the Resource Manager section → A page will open containing table with existing stacks if any along with Create stack button.
- To apply these updated terraform folders /occncc_oci and
/occncc_oci_resources, click on Create Stack button → This will open Create
stack configuration page which includes three steps:
- Stack information: Choose the origin of the Terraform configuration to "My configuration" if not checked. Select "Folder" terraform configuration source in Stack Configuration to Brose and upload the updated terraform folders(one at a time: /occncc_oci or /occncc_oci_resources) Provide Name and Description (optional) and select the Create in compartment parameter in which terraform needs to applied, alarms should be created and click Next.
- Configure variables:
- Compartment Name: Choose the compartment in which metric namespace is present (namespace created for metrics as part of adapters installation).
- Metric namespace: Input Namespace in which your
Metrics will be populated.
Note1: In case of /occncc_oci_alertrules_<version>/occncc_oci user should input the metric namespace created for metrics as part of adapters installation.
Note2: In case of /occncc_oci_alertrules_<version>/occncc_oci_resources metric namespace is prepopulated.
- Topic Name: Input name of the topic to be
created.
Note:
- Topic name must contain fewer than 256 characters. Only alphanumeric characters plus hyphens (-) and underscores (_) are allowed.
- Topic name should be different for both the terraforms /occncc_oci_alertrules_<version>/occncc_oci and /occncc_oci_alertrules_<version>/occncc_oci_resources when applied on OCI console.
- Message Format: Select the message format as required, Format in which the notification will be sent.
- Click Next.
- Review: Verify your configuration variables and then Click Create.
- A Stack will be created upon clicking on create button, Click on Plan button and wait for Plan job to succeed.
- Once the Plan job succeeded, Click on Apply button on which right pane will pop up in which select previously ran "Apply job plan resolution" and then click on Apply.
- Wait for the Apply job to get succeeded. Once Apply job is succeeded Alerts are created successfully.
- Once Plan and Apply job is successfully completed, check the email Notification triggered to confirm on the topic subscription. Click on "Confirm subscription" link to confirm.
- To Check the created alarms Click on Hamburger menu, click "Observability and Management", then click "Alarm Definitions" in Monitoring section to view all the alarms.
For more details, see Managing alarms in Oracle Cloud Infrastructure Documentation
Delete Created Alarms and Stack (terraforms applied for Alarms creation)
To delete created alarms:- Login to OCI Console GUI, click on Hamburger menu, click "Observability and Management", then click "Alarm Definitions" under Monitoring section and make sure right compartment is selected in left pane under "List scope". Check the alarms to be deleted, Click on "Actions" and then "Delete alarms".
- Once all the alarms deleted, search for "Stacks" on OCI console search bar or click on Hamburger menu, click on Developer Services, then click on Stacks in Resource Manager section → A page will open containing table with existing stacks.
- Click the stack names which was applied to create alarms and then click "Destroy".
- Wait for "Destroy" job to complete and come back to Resources Manager → Stacks, click more options(three dots) for which destroy job was executed and perform "Delete" to delete the stack.
8.8.2 CnccCoreTotalIngressTrafficRateAboveMinorThreshold
Table 8-32 CnccCoreTotalIngressTrafficRateAboveMinorThreshold
Trigger Condition |
The total CNCC Core Ingress Message rate has crossed the configured minor threshold of 700 TPS. Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 70 % of 1000 (Maximum ingress request rate) |
Severity | minor |
Alert details provided | CNCC Core Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution | The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, inwhich case the CnccCoreTotalIngressTrafficRateAboveMajorThresholdalert shall be raised.Note: The threshold is configurable in the alerts.yamlSteps:Reassess why the CNCC Core is receiving additionaltraffic.If this is unexpected, contact My Oracle Support. |
8.8.3 CnccCoreTotalIngressTrafficRateAboveMajorThreshold
Table 8-33 CnccCoreTotalIngressTrafficRateAboveMajorThreshold
Trigger Condition |
The total CNCC Core Ingress Message rate has crossed the configured major threshold of 800 TPS. Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate) |
Severity | major |
Alert details provided | CNCC Core Ingress traffic Rate is above the configured major threshold i.e. 800 requests per second |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution | The alert is cleared when the total Ingress Traffic ratefalls below the Major threshold or when the total traffic rate cross the Critical threshold, in which casethe CnccCoreTotalIngressTrafficRateAboveCriticalThresholdNote: The threshold is configurable in the alerts.yaml alert shall be raised.Steps:Reassess why the CNCC Core is receiving additionaltraffic.If this is unexpected, contact My Oracle Support. |
8.8.4 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold
Table 8-34 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold
Trigger Condition |
The total CNCC Core Ingress Message rate has crossed the configured critical threshold of 900TPS. Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate) |
Severity | critical |
Alert details provided | CNCC Core Ingress traffic Rate is above the configured critical threshold i.e. 900 requests per second |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution | The alert is cleared when the Ingress Traffic rate falls below the Critical threshold.Note: The threshold is configurable in the alerts.yamlSteps:Reassess why the CNCC Core is receiving additional traffic.If this is unexpected, contact My Oracle Support. |
8.8.5 CnccCoreMemoryUsageCrossedMinorThreshold
Table 8-35 CnccCoreMemoryUsageCrossedMinorThreshold
Trigger Condition | A pod has reached the configured minor threshold( 70%) of its memory resource limits. |
Severity | minor |
Alert details provided | CNCC Core Memory Usage for pod has crossed the configured minor threshold (70%) of its limit. |
Metric Used |
container_memory_usage_bytes Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case CnccCoreMemoryUsageCrossedMajorThreshold alert shall be raised Note : The threshold is configurable in the alerts.yaml If guidance required, Contact My Oracle Support. |
8.8.6 CnccCoreMemoryUsageCrossedMajorThreshold
Table 8-36 CnccCoreMemoryUsageCrossedMajorThreshold
Trigger Condition | A pod has reached the configured major threshold( 80%) of its memory resource limits. |
Severity | major |
Alert details provided | CNCC Core Memory Usage for pod has crossed the configured major threshold (80%) of its limit. |
Metric Used |
container_memory_usage_bytes Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case CnccCoreMemoryUsageCrossedCriticalThreshold alert shall be raised Note : The threshold is configurable in the alerts.yaml If guidance required, Contact My Oracle Support. |
8.8.7 CnccCoreMemoryUsageCrossedCriticalThreshold
Table 8-37 CnccCoreMemoryUsageCrossedCriticalThreshold
Trigger Condition | A pod has reached the configured critical threshold ( 90% ) of its memory resource limits |
Severity | critical |
Alert details provided | CNCC Core Memory Usage for pod has crossed the configured critical threshold (90%) of its limit. |
Metric Used |
container_memory_usage_bytes Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Critical Threshold. Note : The threshold is configurable in the alerts.yaml If guidance required, Contact My Oracle Support. |
8.8.8 CnccCoreTransactionErrorRateAbovePointOnePercent
Table 8-38 CnccCoreTransactionErrorRateAbovePointOnePercent
Trigger Condition | The number of failed transactions is above 0.1 percent of the total transactions. |
Severity | warning |
Alert details provided | CNCC Core transaction Error rate is above 0.1 Percent of Total Transactions |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failure transactions are below 0.1% of the total transactions or when the number of failure transactions cross the 1% threshold in which case the CnccCoreTransactionErrorRateAbove1Percent shall beraised.Steps:1. Check the Service specific metrics to understand the specific service request errors.2. If guidance required, contact My Oracle Support. |
8.8.9 CnccCoreTransactionErrorRateAboveOnePercent
Table 8-39 CnccCoreTransactionErrorRateAboveOnePercent
Trigger Condition | The number of failed transactions is above 1 percent of the total transactions. |
Severity | warning |
Alert details provided | CNCC Core transaction Error rate is above 1 Percent of Total Transactions |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the CnccCoreTransactionErrorRateAbove10Percent shall beraised.Steps:1. Check the Service specific metrics to understand the specific service request errors.2. If guidance required, contact My Oracle Support. |
8.8.10 CnccCoreTransactionErrorRateAboveTenPercent
Table 8-40 CnccCoreTransactionErrorRateAboveTenPercent
Trigger Condition | The number of failed transactions is above 10 percent of the total transactions. |
Severity | minor |
Alert details provided | CNCC Core transaction Error rate is above 10 Percent of Total Transactions |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the CnccCoreTransactionErrorRateAbove25Percent shall beraised.Steps:1. Check the Service specific metrics to understand the specific service request errors.2. If guidance required, contact My Oracle Support. |
8.8.11 CnccCoreTransactionErrorRateAboveTwentyFivePercent
Table 8-41 CnccCoreTransactionErrorRateAboveTwentyFivePercent
Trigger Condition | The number of failed transactions is above 25 percent of the total transactions. |
Severity | major |
Alert details provided | CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the CnccCoreTransactionErrorRateAbove50Percent shall beraised.Steps:1. Check the Service specific metrics to understand the specific service request errors.2. If guidance required, contact My Oracle Support. |
8.8.12 CnccCoreTransactionErrorRateAboveFiftyPercent
Table 8-42 CnccCoreTransactionErrorRateAboveFiftyPercent
Trigger Condition | The number of failed transactions is above 50 percent of the total transactions. |
Severity | critical |
Alert details provided | CNCC Core transaction Error Rate detected above 50 Percent of Total Transactions |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failure transactions are below 50 percent of the total transactions.Steps:1. Check the Service specific metrics to understand the specific service request errors.2. If guidance required, contact My Oracle Support. |
8.8.13 CnccCoreUnauthorizedAccess
Table 8-43 CnccCoreUnauthorizedAccess
Trigger Condition | If the count of unauthorized access goes above the configured value of '3' |
Severity | warning |
Alert details provided | Unauthorized Accesses have been detected in CNCC-Core for request. The configured threshold value is 3 for every 5 min |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total unauthorized accesses for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note : The threshold and time is configurable in the alerts.yaml If guidance required, Contact My Oracle Support. |