8 CNC Console Alerts
This section provides information about CNC Console Alerts.
Note:
Alert file is packaged with CNCC custom templates. The occncc_custom_configtemplates_<version>.zip file can be downloaded from MOS. Unzip the file to get occncc_alerting_rules_promha_<version>.yaml file.
- Review the ooccncc_alerting_rules_promha_<version>.yaml file and edit the value of the parameters in the occncc_alerting_rules_promha_<version>.yaml file (if needed to be changed from default values) before configuring the alerts.
- kubernetes_namespace is configured as kubernetes namespace in which CNCC is deployed. Default value is cncc. Update the occncc_alertrules_<version>.yaml file to reflect the correct CNCC kubernetes namespace.
Two sample Alert files are provided, one for supporting CNE 1.8 or lower and second one supporting CNE Prometheus HA.
- CNCC Alert Rules file: occncc_alertrules_<version>.yaml file.
- CNCC Alert Rules file supporting CNE Prometheus HA: occncc_alerting_rules_promha_<version>.yaml file.
CNC Console IAM Alerts
This section provides information about CNC Console IAM Alerts.
CnccIamTotalIngressTrafficRateAboveMinorThreshold
Table 8-1 CnccIamTotalIngressTrafficRateAboveMinorThreshold
Trigger Condition |
The total CNCC IAM Ingress Message rate has crossed the configured minor threshold of 700 TPS. Default value of this alert trigger point in occncc_alertrules_<version>.yamlis when CNCC IAM Ingress Rate crosses 70 % of 1000 (Maximum ingress request rate) |
Severity | Minor |
Alert details provided |
Description: CNCC IAM Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }}) For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate crosses the Major threshold, in which case the CnccIamTotalIngressTrafficRateAboveMajorThreshold alert is raised. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
CnccIamTotalIngressTrafficRateAboveMajorThreshold
Table 8-2 CnccIamTotalIngressTrafficRateAboveMajorThreshold
Trigger Condition |
The total CNCC IAM Ingress Message rate has crossed the configured major threshold of 800 TPS. Default value of this alert trigger point ino ccncc_alertrules_<version>.yaml is when CNCC IAM Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate) |
Severity | Major |
Alert details provided |
Description: 'CNCC IAM Ingress traffic Rate is above the configured major threshold i.e. 800 requests per second (current value is: {{ $value }})' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate crosses the Critical threshold, in which case the CnccIamTotalIngressTrafficRateAboveCriticalThreshold alert is raised. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
CnccIamTotalIngressTrafficRateAboveCriticalThreshold
Table 8-3 CnccIamTotalIngressTrafficRateAboveCriticalThreshold
Trigger Condition |
The total CNCC IAM Ingress Message rate has crossed the configured critical threshold of 900TPS. Default value of this alert trigger point in occncc_alertrules_<version>.yaml is when CNCC IAM Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate) |
Severity | Critical |
Alert details provided |
Description:CNCC IAM Ingress traffic Rate is above the configured critical threshold, that is, 900 requests per second (current value is: {{ $value }}) For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
CnccIamMemoryUsageCrossedMinorThreshold
Table 8-4 CnccIamMemoryUsageCrossedMinorThreshold
Trigger Condition | A pod has reached the configured minor threshold( 70%) of its memory resource limits. |
Severity | Minor |
Alert details provided |
Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (70%) (value={{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7002 |
Metric Used |
container_memory_usage_bytes, container_spec_memory_limit_bytes Note:This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case CnccIamMemoryUsageCrossedMajorThreshold alert is raised. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. If guidance is required, contact My Oracle Support. |
CnccIamMemoryUsageCrossedMajorThreshold
Table 8-5 CnccIamMemoryUsageCrossedMajorThreshold
Trigger Condition | A pod has reached the configured major threshold( 80%) of its memory resource limits. |
Severity | Major |
Alert details provided |
Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (80%) (value = {{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7002 |
Metric Used |
container_memory_usage_bytes, container_spec_memory_limit_bytes Note: This is a Kubernetes metric used for instance availability monitoring.If the metric is not available, use a similar metric as exposed by the monitoring system. |
Resolution | The alert gets cleared when the memory utilization falls below the
Major Threshold or crosses the critical threshold, in which case
CnccIamMemoryUsageCrossedCriticalThreshold alert shall be raised.
Note: The threshold is configurable in the
occncc_alertrules_<version>.yaml file.
If guidance is required, contact My Oracle Support. |
CnccIamMemoryUsageCrossedCriticalThreshold
Table 8-6 CnccIamMemoryUsageCrossedCriticalThreshold
Trigger Condition | A pod has reached the configured critical threshold ( 90% ) of its memory resource limits |
Severity | Critical |
Alert details provided |
Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (90%) (value = {{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7002 |
Metric Used |
container_memory_usage_bytes, container_spec_memory_limit_bytes Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system. |
Resolution | The alert gets cleared when the memory utilization falls below the
Critical Threshold.
Note : The threshold is configurable in the
occncc_alertrules_<version>.yaml file.
If guidance is required, contact My Oracle Support. |
CnccIamTransactionErrorRateAbove0.1Percent
Table 8-7 CnccIamTransactionErrorRateAbove0.1Percent
Trigger Condition | The number of failed transactions is above 0.1 percent of the total transactions. |
Severity | Warning |
Alert details provided | Description: 'CNCC IAM transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC IAM transaction Error Rate detected above 0.1 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failed transactions is below
0.1% of the total transactions or when the number of failed transactions crosses the
1% threshold in which case the CnccIamTransactionErrorRateAbove1Percent is
raised.
Steps:
|
CnccIamTransactionErrorRateAbove1Percent
Table 8-8 CnccIamTransactionErrorRateAbove1Percent
Trigger Condition | The number of failed transactions is above 1 percent of the total transactions. |
Severity | Warning |
Alert details provided | Description: 'CNCC IAM transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC IAM transaction Error Rate detected above 1 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failed transactions is below
1% of the total transactions or when the number of failed transactions crosses the 10%
threshold in which case the CnccIamTransactionErrorRateAbove10Percent is raised.
Steps:
|
CnccIamTransactionErrorRateAbove10Percent
Table 8-9 CnccIamTransactionErrorRateAbove10Percent
Trigger Condition | The number of failed transactions is above 10 percent of the total transactions. |
Severity | Minor |
Alert details provided |
Description: CNCC IAM transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC IAM transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution | The alert is cleared when the number of failed transactions is below
10% of the total transactions or when the number of failed transactions crosses the
25% threshold in which case the CnccIamTransactionErrorRateAbove25Percent is
raised.
Steps:
|
CnccIamTransactionErrorRateAbove25Percent
Table 8-10 CnccIamTransactionErrorRateAbove25Percent
Trigger Condition |
The number of failed transactions is above 25 percent of the total transactions. |
Severity | Major |
Alert details provided |
Description: 'CNCC IAM transaction Error Rate detected above 25 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC IAM transaction Error Rate detected above 25 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
TThe alert is cleared when the number of failed transactions are below 25% of the total transactions or when the number of failed transactions cross the 50% threshold in which case the CnccIamTransactionErrorRateAbove50Percent is raised. Steps:
|
CnccIamTransactionErrorRateAbove50Percent
Table 8-11 CnccIamTransactionErrorRateAbove50Percent
Trigger Condition |
The number of failed transactions is above 50 percent of the total transactions. |
Severity | Critical |
Alert details provided |
Description: The number of failed transactions is above 50 percent of the total transactions. Summary: 'CNCC IAM transaction Error Rate detected above 50 Percent of Total Transactions'. |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions is below 50 percent of the total transactions. The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
CnccIamIngressGatewayServiceDown
Table 8-12 CnccIamIngressGatewayServiceDown
Trigger Condition |
The pods of the CNCC IAM Ingress Gateway microservice is available. |
Severity | Critical |
Alert details provided |
Description:'CNCC IAM Ingress-Gateway service InstanceIdentifier=~".*cncc-iam_ingressgateway" is down' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7004 |
Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
Resolution |
The alert is cleared when the cncc-iam_ingressgateway service is available. Steps:
|
CnccIamFailedLogin
Table 8-13 CnccIamFailedLogin
Trigger Condition |
The count of failed login attempts in CNCC-IAM by a user goes above '3' |
Severity | Warning |
Alert details provided |
Description:'{{ $value }} failed Login attempts have been detected in CNCC IAM for user {{$labels.UserName}}, the configured threshold value is 3 failed login attempts for every 5 min' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7005 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total failed login attempts for a particular user goes below the threshold value (default value is '3') in the last 5 min (default value is 5 m). Note: The threshold and time is configurable in the alerts.yaml file. If guidance is required, contact My Oracle Support. |
AdminUserCreation
Table 8-14 AdminUserCreation
Trigger Condition |
If a new admin account is created in the last 5 min |
Severity | Warning |
Alert details provided |
For CNE 1.9.0 or later versions: Description: '{{ $value }} admin users have been created by {{$labels.UserName}} ' summary: 'namespace: {{$labels.namespace}} summary: {{$labels.pod}}, user: {{$labels.UserName}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Admin users have been created ' For CNE 1.8.x or previous versions: Description: '{{ $value }} admin users have been created by {{$labels.UserName}} ' summary: 'namespace: {{$labels.kubernetes_namespace}} summary: {{$labels.kubernetes_pod_name}}, user: {{$labels.UserName}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Admin users have been created ' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7006 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note: The threshold and time is configurable in the occncc_alertrules_<version>.yaml file. Login to admin GUI and review the user created. If guidance is required, contact My Oracle Support. |
CnccIamAccessTokenFailure
Table 8-15 CnccIamAccessTokenFailure
Trigger Condition |
If the count of failed token for CNCC-IAM goes above configured value of '3' |
Severity | Warning |
Alert details provided |
Description: 'CNCC Iam Access Token Failure count is above the configured value i.e. 3 for every 5 min. Failed access token request count per second is (current value is: {{ $value }})' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.7007 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total failed tokens for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note: The threshold and time is configurable in the occncc_alertrules_<version>.yaml file. If guidance is required, contact My Oracle Support. |
CNC Console Core Alerts
This section provides the information about CNC Console Core Alerts.
CnccCoreTotalIngressTrafficRateAboveMinorThreshold
Table 8-16 CnccCoreTotalIngressTrafficRateAboveMinorThreshold
Trigger Condition |
The total CNCC Core Ingress Message rate has crossed the configured minor threshold of 700 TPS. Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 70 % of 1000 (Maximum ingress request rate) |
Severity | Minor |
Alert details provided |
Description: 'CNCC Core Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert is cleared either when the total Ingress traffic rate falls below the minor threshold or when the total traffic rate crosses the major threshold, in which case the CnccCoreTotalIngressTrafficRateAboveMajorThreshold alert is raised. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
CnccCoreTotalIngressTrafficRateAboveMajorThreshold
Table 8-17 CnccCoreTotalIngressTrafficRateAboveMajorThreshold
Trigger Condition |
The total CNCC Core Ingress Message rate has crossed the configured major threshold of 800 TPS. Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate) |
Severity | Major |
Alert details provided |
Description: 'CNCC Core Ingress traffic Rate is above the configured major threshold i.e. 800 requests per second (current value is: {{ $value }})' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution | The alert is cleared when the total Ingress Traffic ratefalls below
the Major threshold or when the total traffic rate cross the Critical threshold, in
which case the CnccCoreTotalIngressTrafficRate Above CriticalThreshold.
Note: The threshold is configurable in the alerts.yaml file. Steps:
|
CnccCoreTotalIngressTrafficRateAboveCriticalThreshold
Table 8-18 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold
Trigger Condition |
The total CNCC Core Ingress Message rate has crossed the configured critical threshold of 900TPS. Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate) |
Severity | Critical |
Alert details provided |
Description: 'CNCC Core Ingress traffic Rate is above the configured critical threshold i.e. 900 requests per second (current value is: {{ $value }})' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8001 |
Metric Used | oc_ingressgateway_http_requests_total |
Resolution |
The alert is cleared when the Ingress Traffic rate falls below the Critical threshold. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. Steps:
|
CnccCoreMemoryUsageCrossedMinorThreshold
Table 8-19 CnccCoreMemoryUsageCrossedMinorThreshold
Trigger Condition |
A pod has reached the configured minor threshold( 70%) of its memory resource limits. |
Severity | Minor |
Alert details provided |
Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (70%) (value={{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8002 |
Metric Used |
container_memory_usage_bytes container_spec_memory_limit_bytes Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case CnccCoreMemoryUsageCrossedMajorThreshold alert is raised. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. If guidance is required, contact My Oracle Support. |
CnccCoreMemoryUsageCrossedMajorThreshold
Table 8-20 CnccCoreMemoryUsageCrossedMajorThreshold
Trigger Condition |
A pod has reached the configured major threshold( 80%) of its memory resource limits. |
Severity | Major |
Alert details provided |
Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (80%) (value = {{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8002 |
Metric Used |
container_memory_usage_bytes container_spec_memory_limit_bytes Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case CnccCoreMemoryUsageCrossedCriticalThreshold alert is raised Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. If guidance is required, contact My Oracle Support. |
CnccCoreMemoryUsageCrossedCriticalThreshold
Table 8-21 CnccCoreMemoryUsageCrossedCriticalThreshold
Trigger Condition |
A pod has reached the configured critical threshold ( 90% ) of its memory resource limits |
Severity | Critical |
Alert details provided |
Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (90%) (value = {{ $value }}) of its limit.' Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit.' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8002 |
Metric Used |
container_memory_usage_bytes container_spec_memory_limit_bytes Note : This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the Critical Threshold. Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file. If guidance is required, contact My Oracle Support. |
CnccCoreTransactionErrorRateAbove0.1Percent
Table 8-22 CnccCoreTransactionErrorRateAbove0.1Percent
Trigger Condition |
The number of failed transactions is above 0.1 percent of the total transactions.. |
Severity | Warning |
Alert details provided |
Description:'CNCC Core transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})' Summary:'CNCC Core transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions are below 0.1% of the total transactions or when the number of failed transactions cross the 1% threshold in which case the CnccCoreTransactionErrorRateAbove1Percent is raised. The threshold is configurable in the alerts.yaml file. Steps:
|
CnccCoreTransactionErrorRateAbove1Percent
Table 8-23 CnccCoreTransactionErrorRateAbove1Percent
Trigger Condition | The number of failed transactions is above 1 percent of the total transactions. |
Severity | Warning |
Alert details provided |
Description: 'CNCC Core transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})' Summary:'CNCC Core transaction Error Rate detected above 1 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions are below 1% of the total transactions or when the number of failed transactions crosses the 10% threshold in which case the CnccCoreTransactionErrorRateAbove10Percent is raised. Steps:
|
CnccCoreTransactionErrorRateAbove10Percent
Table 8-24 CnccCoreTransactionErrorRateAbove10Percent
Trigger Condition | The number of failed transactions is above 10 percent of the total transactions. |
Severity | Minor |
Alert details provided |
Description: 'CNCC Core transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})' summary: 'CNCC Core ransaction Error Rate detected above 10 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions are below 10% of the total transactions or when the number of failed transactions crosses the 25% threshold in which case the CnccCoreTransactionErrorRateAbove25Percent is raised. Steps:
|
CnccCoreTransactionErrorRateAbove25Percent
Table 8-25 CnccCoreTransactionErrorRateAbove25Percent
Trigger Condition | The number of failed transactions is above 25 percent of the total transactions. |
Severity | Major |
Alert details provided |
Description: 'CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions are below 25% of the total transactions or when the number of failed transactions crosses the 50% threshold in which case the CnccCoreTransactionErrorRateAbove50Percent is raised. Steps:
|
CnccCoreTransactionErrorRateAbove50Percent
Table 8-26 CnccCoreTransactionErrorRateAbove50Percent
Trigger Condition | The number of failed transactions is above 50 percent of the total transactions. |
Severity | Critical |
Alert details provided |
Description: 'CNCC Core transaction Error Rate detected above 50 Percent of Total Transactions (current value is {{ $value }})' Summary: 'CNCC Core transaction Error Rate detected above 50 Percent of Total Transactions' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8003 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert is cleared when the number of failed transactions are below 50 percent of the total transactions Steps:
|
CnccCoreIngressGatewayServiceDown
Table 8-27 CnccCoreIngressGatewayServiceDown
Trigger Condition | Cncc Core Ingress Gateway service is down |
Severity | Critical |
Alert details provided |
Description: 'CNCC Core Ingress-Gateway service InstanceIdentifier=~".*core_ingressgateway" is down' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8004 |
Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system. |
Resolution |
The alert is cleared when the cncc-core_ingressgateway service is available. Steps:
|
CnccCoreFailedLogin
Table 8-28 CnccCoreFailedLogin
Trigger Condition |
The count of failed login attempts in CNCC-Core by a user goes above '3' |
Severity | Warning |
Alert details provided |
Description:'{{ $value }} failed Login attempts have been detected in CNCC Core for user {{$labels.UserName}}, the configured threshold value is 3 failed login attempts for every 5 min' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8005 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note: The threshold and time is configurable in the alerts.yaml file. If guidance is required, contact My Oracle Support. |
CnccCoreUnauthorizedAccess
Table 8-29 CnccCoreUnauthorizedAccess
Trigger Condition |
The count of failed login attempts in CNCC-Core by a user goes above '3' |
Severity | Warning |
Alert details provided |
Description:'{{ $value }} Unauthorized Accesses have been detected in CNCC-Core for {{$labels.ResourceType}} for {{$labels.Method}} request. The configured threshold value is 3 for every 5 min' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Unauthorized Access for CNCC-Core are more than threshold value' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Unauthorized Access for CNCC-Core are more than threshold value' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8006 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note: The threshold and time is configurable in the alerts.yaml If guidance is required, contact My Oracle Support. |
CnccCoreAccessTokenFailure
Table 8-30 CnccCoreAccessTokenFailure
Trigger Condition |
If the count of failed token for CNCC-Core goes above configured value of '3' |
Severity | Warning |
Alert details provided |
Description: 'CNCC Core Access Token Failure count is above the configured value i.e. 3 for every 5 min. Failed access token request count per second is (current value is: {{ $value }})' For CNE 1.8.x or previous versions: summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value' For CNE 1.9.0 or later versions: summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value' |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.51.1.2.8007 |
Metric Used | oc_ingressgateway_http_responses_total |
Resolution |
The alert gets cleared when the total failed tokens for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m) Note: The threshold and time is configurable in the alerts.yaml If guidance required, Contact My Oracle Support. |
CNCC Alert configuration in Prometheus
CNE 1.8 or previous versions:
This section describes the measurement based Alert rules configuration for CNCC in Prometheus. Use the occncc_alertrules_<version>.yaml file updated in CNCC Alert configuration section.
_NAME_ :- Helm Release of Prometheus
_Namespace_ :- Kubernetes NameSpace in which Prometheus is installed
Configuration
- Run the following command to take Backup of current config map of
prometheus
server:
where, <Namespace> is the prometheus server namespace used in helm install command.kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml
For Example, assuming chart name is "prometheus-alert", so "_NAME_-server" becomes "prometheus-alert-server", run the following command to find the config map:kubectl get configmaps prometheus-alert-server -o yaml -n prometheus-alert2 > /tmp/tempConfig.yaml
- Run the following command to check and Add CNCC Alert File name inside
Prometheus config map:
- If alertscncc is present, delete the alertscncc entry from the
t_mapConfig.yaml file, by executing the following command:
sed -i '/etc\/config\/alertscncc/d' /tmp/mapConfig.yaml
- If alertscncc is not present, add the alertscncc entry in the t_mapConfig.yaml file by executing the following command:
sed -i '/rule_files:/a\ \- /etc/config/alertscncc' /tmp/mapConfig.yaml
- If alertscncc is present, delete the alertscncc entry from the
t_mapConfig.yaml file, by executing the following command:
- Run the following command to update Config map with updated file name of
CNCC:
kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
- Run the following command to add cnccAlertRules.yaml file into prometheus
config map under filename of CNCC alert
file:
kubectl patch configmap _NAME_-server -n _Namespace_ --type merge --patch "$(cat ~/occncc_alertrules_<version>.yaml)"
- Restart prometheus-server pod.
- Verify the alerts in prometheus GUI.
Note:
Prometheus server takes updated configmap reloaded after sometime automatically (~20 sec)
$ kubectl get configmaps occne-prometheus-server -o yaml -n occne-infra > /tmp/tempConfig.yaml
$ sed -i '/etc\/config\/alertscncc/d' /tmp/tempConfig.yaml
$ sed -i '/rule_files:/a\ \- /etc/config/alertscncc' /tmp/tempConfig.yaml
$ kubectl replace configmap occne-prometheus-server -f /tmp/tempConfig.yaml
$ kubectl patch configmap occne-prometheus-server -n occne-infra --type merge --patch "$(cat ~/occncc_alertrules_<version>.yaml)"
Note:
The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 60 seconds. Refresh the Prometheus GUI to confirm that the CNCC Alerts have been reloaded.This section describes the measurement based Alert rules configuration for CNCC in Prometheus. Use an updated occncc_alerting_rules_promha_<version>.yaml file.
The PrometheusRule resource specified in the alert YAML file can be created or updated by using following command:
$ kubectl apply -f occncc_alerting_rules_promha_<version>.yaml
Validating Alerts
Configure and Validate Alerts in Prometheus Server. Refer to CNCC Alert Configuration section for procedure to configure the alerts.
- Open the Prometheus server from your browser using the <IP>:<Port>
- Navigate to Status and to Rules.
- Search CNCC. CNCC Alerts list is displayed.
Note:
If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try againDisabling Alerts
This section explains how to disable the alerts in CNCC.
- Edit occncc_alerting_rules_promha_<version>.yaml file to remove specific alert.
- Remove complete content of the specific alert from the
occncc_alerting_rules_promha_<version>.yaml file.
cncc_alert_rules_<version>.yaml
For example: If you want to remove CnccIamTotalIngressTrafficRateAboveMinorThreshold alert, remove the complete content: ## ALERT SAMPLE START## - alert: CnccIamTotalIngressTrafficRateAboveMinorThreshold annotations: description: 'CNCC IAM Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})' summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)' expr: sum(rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*cncc-iam_ingressgateway",kubernetes_namespace="cncc"}[2m])) > 0 labels: severity: minor oid: "1.3.6.1.4.1.323.5.3.51.1.2.7001" namespace: ' {{ $labels.kubernetes_namespace }} ' podname: ' {{$labels.kubernetes_pod_name}} ' ## ALERT SAMPLE END##
- Perform Alert configuration. See CNCC Alert Configuration section for details.
Configuring SNMP-Notifier
Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using following procedure:
- Run the following command to edit the
deployment:
kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>
Example:$ kubectl edit deploy occne-snmp-notifier -n occne-infra
- Edit the destination as follows:
--snmp.destination=<destination_ip>:<destination_port>
Example:--snmp.destination=10.75.203.94:162
MIB Files for CNCC
There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.
occncc_mib_tc_<version>.mibThis file is considered as CNCC top level mib file, where the Objects and their data types are defined.
occncc_mib_<version>.mibThis file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.
Note:
MIB files are packaged along with CNCC Custom Templates. Download the file from MOS. Refer to CNCC Installation guide for more details.