8 CNC Console Alerts

This section provides information about CNC Console Alerts.

Note:

Alert file is packaged with CNCC custom templates. The occncc_custom_configtemplates_<version>.zip file can be downloaded from MOS. Unzip the file to get occncc_alerting_rules_promha_<version>.yaml file.

  • Review the ooccncc_alerting_rules_promha_<version>.yaml file and edit the value of the parameters in the occncc_alerting_rules_promha_<version>.yaml file (if needed to be changed from default values) before configuring the alerts.
  • kubernetes_namespace is configured as kubernetes namespace in which CNCC is deployed. Default value is cncc. Update the occncc_alertrules_<version>.yaml file to reflect the correct CNCC kubernetes namespace.

Two sample Alert files are provided, one for supporting CNE 1.8 or lower and second one supporting CNE Prometheus HA.

  • CNCC Alert Rules file: occncc_alertrules_<version>.yaml file.
  • CNCC Alert Rules file supporting CNE Prometheus HA: occncc_alerting_rules_promha_<version>.yaml file.

CNC Console IAM Alerts

This section provides information about CNC Console IAM Alerts.

CnccIamTotalIngressTrafficRateAboveMinorThreshold

Table 8-1 CnccIamTotalIngressTrafficRateAboveMinorThreshold

Trigger Condition

The total CNCC IAM Ingress Message rate has crossed the configured minor threshold of 700 TPS.

Default value of this alert trigger point in occncc_alertrules_<version>.yamlis when CNCC IAM Ingress Rate crosses 70 % of 1000 (Maximum ingress request rate)

Severity Minor
Alert details provided

Description: CNCC IAM Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7001
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate crosses the Major threshold, in which case the CnccIamTotalIngressTrafficRateAboveMajorThreshold alert is raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Reassess why the CNCC IAM is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

CnccIamTotalIngressTrafficRateAboveMajorThreshold

Table 8-2 CnccIamTotalIngressTrafficRateAboveMajorThreshold

Trigger Condition

The total CNCC IAM Ingress Message rate has crossed the configured major threshold of 800 TPS.

Default value of this alert trigger point ino ccncc_alertrules_<version>.yaml is when CNCC IAM Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate)

Severity Major
Alert details provided

Description: 'CNCC IAM Ingress traffic Rate is above the configured major threshold i.e. 800 requests per second (current value is: {{ $value }})'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7001
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate crosses the Critical threshold, in which case the CnccIamTotalIngressTrafficRateAboveCriticalThreshold alert is raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Reassess why the CNCC IAM is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

CnccIamTotalIngressTrafficRateAboveCriticalThreshold

Table 8-3 CnccIamTotalIngressTrafficRateAboveCriticalThreshold

Trigger Condition

The total CNCC IAM Ingress Message rate has crossed the configured critical threshold of 900TPS. Default value of this alert trigger point in occncc_alertrules_<version>.yaml is when CNCC IAM Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate)

Severity Critical
Alert details provided

Description:CNCC IAM Ingress traffic Rate is above the configured critical threshold, that is, 900 requests per second (current value is: {{ $value }})

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7001
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert is cleared when the Ingress Traffic rate falls below the Critical threshold.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Reassess why the CNCC IAM is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

CnccIamMemoryUsageCrossedMinorThreshold

Table 8-4 CnccIamMemoryUsageCrossedMinorThreshold

Trigger Condition A pod has reached the configured minor threshold( 70%) of its memory resource limits.
Severity Minor
Alert details provided

Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (70%) (value={{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7002
Metric Used

container_memory_usage_bytes,

container_spec_memory_limit_bytes

Note:This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case CnccIamMemoryUsageCrossedMajorThreshold alert is raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

CnccIamMemoryUsageCrossedMajorThreshold

Table 8-5 CnccIamMemoryUsageCrossedMajorThreshold

Trigger Condition A pod has reached the configured major threshold( 80%) of its memory resource limits.
Severity Major
Alert details provided

Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (80%) (value = {{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7002
Metric Used

container_memory_usage_bytes,

container_spec_memory_limit_bytes

Note: This is a Kubernetes metric used for instance availability monitoring.If the metric is not available, use a similar metric as exposed by the monitoring system.

Resolution The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case CnccIamMemoryUsageCrossedCriticalThreshold alert shall be raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

CnccIamMemoryUsageCrossedCriticalThreshold

Table 8-6 CnccIamMemoryUsageCrossedCriticalThreshold

Trigger Condition A pod has reached the configured critical threshold ( 90% ) of its memory resource limits
Severity Critical
Alert details provided

Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (90%) (value = {{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7002
Metric Used

container_memory_usage_bytes,

container_spec_memory_limit_bytes

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.

Resolution The alert gets cleared when the memory utilization falls below the Critical Threshold.

Note : The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

CnccIamTransactionErrorRateAbove0.1Percent

Table 8-7 CnccIamTransactionErrorRateAbove0.1Percent

Trigger Condition The number of failed transactions is above 0.1 percent of the total transactions.
Severity Warning
Alert details provided Description: 'CNCC IAM transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC IAM transaction Error Rate detected above 0.1 Percent of Total Transactions'
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failed transactions is below 0.1% of the total transactions or when the number of failed transactions crosses the 1% threshold in which case the CnccIamTransactionErrorRateAbove1Percent is raised.

Steps:
  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

CnccIamTransactionErrorRateAbove1Percent

Table 8-8 CnccIamTransactionErrorRateAbove1Percent

Trigger Condition The number of failed transactions is above 1 percent of the total transactions.
Severity Warning
Alert details provided Description: 'CNCC IAM transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC IAM transaction Error Rate detected above 1 Percent of Total Transactions'
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failed transactions is below 1% of the total transactions or when the number of failed transactions crosses the 10% threshold in which case the CnccIamTransactionErrorRateAbove10Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

CnccIamTransactionErrorRateAbove10Percent

Table 8-9 CnccIamTransactionErrorRateAbove10Percent

Trigger Condition The number of failed transactions is above 10 percent of the total transactions.
Severity Minor
Alert details provided

Description: CNCC IAM transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC IAM transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failed transactions is below 10% of the total transactions or when the number of failed transactions crosses the 25% threshold in which case the CnccIamTransactionErrorRateAbove25Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

CnccIamTransactionErrorRateAbove25Percent

Table 8-10 CnccIamTransactionErrorRateAbove25Percent

Trigger Condition

The number of failed transactions is above 25 percent of the total transactions.

Severity Major
Alert details provided

Description: 'CNCC IAM transaction Error Rate detected above 25 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC IAM transaction Error Rate detected above 25 Percent of Total Transactions'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Resolution

TThe alert is cleared when the number of failed transactions are below 25% of the total transactions or when the number of failed transactions cross the 50% threshold in which case the CnccIamTransactionErrorRateAbove50Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

CnccIamTransactionErrorRateAbove50Percent

Table 8-11 CnccIamTransactionErrorRateAbove50Percent

Trigger Condition

The number of failed transactions is above 50 percent of the total transactions.

Severity Critical
Alert details provided

Description: The number of failed transactions is above 50 percent of the total transactions.

Summary: 'CNCC IAM transaction Error Rate detected above 50 Percent of Total Transactions'.

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions is below 50 percent of the total transactions.

The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

CnccIamIngressGatewayServiceDown

Table 8-12 CnccIamIngressGatewayServiceDown

Trigger Condition

The pods of the CNCC IAM Ingress Gateway microservice is available.

Severity Critical
Alert details provided

Description:'CNCC IAM Ingress-Gateway service InstanceIdentifier=~".*cncc-iam_ingressgateway" is down'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7004
Metric Used

'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Resolution

The alert is cleared when the cncc-iam_ingressgateway service is available.

Steps:

  1. Check the orchestration logs of cncc-iam_ingressgateway service and check for liveness or readiness probe failures.
  2. Refer to the application logs on Kibana and filter based on cncc-iam_ingressgateway service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.

CnccIamFailedLogin

Table 8-13 CnccIamFailedLogin

Trigger Condition

The count of failed login attempts in CNCC-IAM by a user goes above '3'

Severity Warning
Alert details provided

Description:'{{ $value }} failed Login attempts have been detected in CNCC IAM for user {{$labels.UserName}}, the configured threshold value is 3 failed login attempts for every 5 min'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7005
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total failed login attempts for a particular user goes below the threshold value (default value is '3') in the last 5 min (default value is 5 m).

Note: The threshold and time is configurable in the alerts.yaml file.

If guidance is required, contact My Oracle Support.

AdminUserCreation

Table 8-14 AdminUserCreation

Trigger Condition

If a new admin account is created in the last 5 min

Severity Warning
Alert details provided

For CNE 1.9.0 or later versions:

Description: '{{ $value }} admin users have been created by {{$labels.UserName}} '

summary: 'namespace: {{$labels.namespace}}

summary: {{$labels.pod}}, user: {{$labels.UserName}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Admin users have been created '

For CNE 1.8.x or previous versions:

Description: '{{ $value }} admin users have been created by {{$labels.UserName}} '

summary: 'namespace: {{$labels.kubernetes_namespace}}

summary: {{$labels.kubernetes_pod_name}}, user: {{$labels.UserName}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Admin users have been created '

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7006
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note: The threshold and time is configurable in the occncc_alertrules_<version>.yaml file.

Login to admin GUI and review the user created.

If guidance is required, contact My Oracle Support.

CnccIamAccessTokenFailure

Table 8-15 CnccIamAccessTokenFailure

Trigger Condition

If the count of failed token for CNCC-IAM goes above configured value of '3'

Severity Warning
Alert details provided

Description: 'CNCC Iam Access Token Failure count is above the configured value i.e. 3 for every 5 min. Failed access token request count per second is (current value is: {{ $value }})'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7007
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total failed tokens for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note: The threshold and time is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

CNC Console Core Alerts

This section provides the information about CNC Console Core Alerts.

CnccCoreTotalIngressTrafficRateAboveMinorThreshold

Table 8-16 CnccCoreTotalIngressTrafficRateAboveMinorThreshold

Trigger Condition

The total CNCC Core Ingress Message rate has crossed the configured minor threshold of 700 TPS.

Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 70 % of 1000 (Maximum ingress request rate)

Severity Minor
Alert details provided

Description: 'CNCC Core Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8001
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert is cleared either when the total Ingress traffic rate falls below the minor threshold or when the total traffic rate crosses the major threshold, in which case the CnccCoreTotalIngressTrafficRateAboveMajorThreshold alert is raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Reassess why the CNCC Core is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

CnccCoreTotalIngressTrafficRateAboveMajorThreshold

Table 8-17 CnccCoreTotalIngressTrafficRateAboveMajorThreshold

Trigger Condition

The total CNCC Core Ingress Message rate has crossed the configured major threshold of 800 TPS.

Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate)

Severity Major
Alert details provided

Description: 'CNCC Core Ingress traffic Rate is above the configured major threshold i.e. 800 requests per second (current value is: {{ $value }})'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8001
Metric Used oc_ingressgateway_http_requests_total
Resolution The alert is cleared when the total Ingress Traffic ratefalls below the Major threshold or when the total traffic rate cross the Critical threshold, in which case the CnccCoreTotalIngressTrafficRate Above CriticalThreshold.

Note: The threshold is configurable in the alerts.yaml file.

Steps:

  1. Reassess why the CNCC Core is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

CnccCoreTotalIngressTrafficRateAboveCriticalThreshold

Table 8-18 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold

Trigger Condition

The total CNCC Core Ingress Message rate has crossed the configured critical threshold of 900TPS.

Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate)

Severity Critical
Alert details provided

Description: 'CNCC Core Ingress traffic Rate is above the configured critical threshold i.e. 900 requests per second (current value is: {{ $value }})'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8001
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert is cleared when the Ingress Traffic rate falls below the Critical threshold.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Reassess why the CNCC IAM is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

CnccCoreMemoryUsageCrossedMinorThreshold

Table 8-19 CnccCoreMemoryUsageCrossedMinorThreshold

Trigger Condition

A pod has reached the configured minor threshold( 70%) of its memory resource limits.

Severity Minor
Alert details provided

Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (70%) (value={{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8002
Metric Used

container_memory_usage_bytes

container_spec_memory_limit_bytes

Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case CnccCoreMemoryUsageCrossedMajorThreshold alert is raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

CnccCoreMemoryUsageCrossedMajorThreshold

Table 8-20 CnccCoreMemoryUsageCrossedMajorThreshold

Trigger Condition

A pod has reached the configured major threshold( 80%) of its memory resource limits.

Severity Major
Alert details provided

Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (80%) (value = {{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8002
Metric Used

container_memory_usage_bytes

container_spec_memory_limit_bytes

Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case CnccCoreMemoryUsageCrossedCriticalThreshold alert is raised

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

CnccCoreMemoryUsageCrossedCriticalThreshold

Table 8-21 CnccCoreMemoryUsageCrossedCriticalThreshold

Trigger Condition

A pod has reached the configured critical threshold ( 90% ) of its memory resource limits

Severity Critical
Alert details provided

Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (90%) (value = {{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8002
Metric Used

container_memory_usage_bytes

container_spec_memory_limit_bytes

Note : This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Critical Threshold.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

CnccCoreTransactionErrorRateAbove0.1Percent

Table 8-22 CnccCoreTransactionErrorRateAbove0.1Percent

Trigger Condition

The number of failed transactions is above 0.1 percent of the total transactions..

Severity Warning
Alert details provided

Description:'CNCC Core transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})'

Summary:'CNCC Core transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions are below 0.1% of the total transactions or when the number of failed transactions cross the 1% threshold in which case the CnccCoreTransactionErrorRateAbove1Percent is raised.

The threshold is configurable in the alerts.yaml file.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

CnccCoreTransactionErrorRateAbove1Percent

Table 8-23 CnccCoreTransactionErrorRateAbove1Percent

Trigger Condition The number of failed transactions is above 1 percent of the total transactions.
Severity Warning
Alert details provided

Description: 'CNCC Core transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'

Summary:'CNCC Core transaction Error Rate detected above 1 Percent of Total Transactions'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions are below 1% of the total transactions or when the number of failed transactions crosses the 10% threshold in which case the CnccCoreTransactionErrorRateAbove10Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required,contact My Oracle Support.

CnccCoreTransactionErrorRateAbove10Percent

Table 8-24 CnccCoreTransactionErrorRateAbove10Percent

Trigger Condition The number of failed transactions is above 10 percent of the total transactions.
Severity Minor
Alert details provided

Description: 'CNCC Core transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'

summary: 'CNCC Core ransaction Error Rate detected above 10 Percent of Total Transactions'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions are below 10% of the total transactions or when the number of failed transactions crosses the 25% threshold in which case the CnccCoreTransactionErrorRateAbove25Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

CnccCoreTransactionErrorRateAbove25Percent

Table 8-25 CnccCoreTransactionErrorRateAbove25Percent

Trigger Condition The number of failed transactions is above 25 percent of the total transactions.
Severity Major
Alert details provided

Description: 'CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions are below 25% of the total transactions or when the number of failed transactions crosses the 50% threshold in which case the CnccCoreTransactionErrorRateAbove50Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

CnccCoreTransactionErrorRateAbove50Percent

Table 8-26 CnccCoreTransactionErrorRateAbove50Percent

Trigger Condition The number of failed transactions is above 50 percent of the total transactions.
Severity Critical
Alert details provided

Description: 'CNCC Core transaction Error Rate detected above 50 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC Core transaction Error Rate detected above 50 Percent of Total Transactions'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions are below 50 percent of the total transactions

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

CnccCoreIngressGatewayServiceDown

Table 8-27 CnccCoreIngressGatewayServiceDown

Trigger Condition Cncc Core Ingress Gateway service is down
Severity Critical
Alert details provided

Description: 'CNCC Core Ingress-Gateway service InstanceIdentifier=~".*core_ingressgateway" is down'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8004
Metric Used

'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Resolution

The alert is cleared when the cncc-core_ingressgateway service is available.

Steps:

  1. Check the orchestration logs of cncc-core_ingressgateway service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on cncc-core_ingressgateway service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.

CnccCoreFailedLogin

Table 8-28 CnccCoreFailedLogin

Trigger Condition

The count of failed login attempts in CNCC-Core by a user goes above '3'

Severity Warning
Alert details provided

Description:'{{ $value }} failed Login attempts have been detected in CNCC Core for user {{$labels.UserName}}, the configured threshold value is 3 failed login attempts for every 5 min'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8005
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note: The threshold and time is configurable in the alerts.yaml file.

If guidance is required, contact My Oracle Support.

CnccCoreUnauthorizedAccess

Table 8-29 CnccCoreUnauthorizedAccess

Trigger Condition

The count of failed login attempts in CNCC-Core by a user goes above '3'

Severity Warning
Alert details provided

Description:'{{ $value }} Unauthorized Accesses have been detected in CNCC-Core for {{$labels.ResourceType}} for {{$labels.Method}} request. The configured threshold value is 3 for every 5 min'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Unauthorized Access for CNCC-Core are more than threshold value'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Unauthorized Access for CNCC-Core are more than threshold value'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8006
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note: The threshold and time is configurable in the alerts.yaml

If guidance is required, contact My Oracle Support.

CnccCoreAccessTokenFailure

Table 8-30 CnccCoreAccessTokenFailure

Trigger Condition

If the count of failed token for CNCC-Core goes above configured value of '3'

Severity Warning
Alert details provided

Description: 'CNCC Core Access Token Failure count is above the configured value i.e. 3 for every 5 min. Failed access token request count per second is (current value is: {{ $value }})'

For CNE 1.8.x or previous versions:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value'

For CNE 1.9.0 or later versions:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8007
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total failed tokens for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note: The threshold and time is configurable in the alerts.yaml

If guidance required, Contact My Oracle Support.

CNCC Alert configuration in Prometheus

CNE 1.8 or previous versions:

This section describes the measurement based Alert rules configuration for CNCC in Prometheus. Use the occncc_alertrules_<version>.yaml file updated in CNCC Alert configuration section.

_NAME_ :- Helm Release of Prometheus

_Namespace_ :- Kubernetes NameSpace in which Prometheus is installed

Configuration

  1. Run the following command to take Backup of current config map of prometheus server:
    kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml
    where, <Namespace> is the prometheus server namespace used in helm install command.
    For Example, assuming chart name is "prometheus-alert", so "_NAME_-server" becomes "prometheus-alert-server", run the following command to find the config map:
    kubectl get configmaps prometheus-alert-server -o yaml -n prometheus-alert2 > /tmp/tempConfig.yaml
  2. Run the following command to check and Add CNCC Alert File name inside Prometheus config map:
    1. If alertscncc is present, delete the alertscncc entry from the t_mapConfig.yaml file, by executing the following command:
      sed -i '/etc\/config\/alertscncc/d' /tmp/mapConfig.yaml
      
    2. If alertscncc is not present, add the alertscncc entry in the t_mapConfig.yaml file by executing the following command:
    sed -i '/rule_files:/a\    \- /etc/config/alertscncc'  /tmp/mapConfig.yaml
  3. Run the following command to update Config map with updated file name of CNCC:
    kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
  4. Run the following command to add cnccAlertRules.yaml file into prometheus config map under filename of CNCC alert file:
    kubectl patch  configmap _NAME_-server -n _Namespace_ --type merge --patch "$(cat ~/occncc_alertrules_<version>.yaml)"
  5. Restart prometheus-server pod.
  6. Verify the alerts in prometheus GUI.

Note:

Prometheus server takes updated configmap reloaded after sometime automatically (~20 sec)
Example:

$ kubectl get configmaps occne-prometheus-server -o yaml -n occne-infra > /tmp/tempConfig.yaml
$ sed -i '/etc\/config\/alertscncc/d' /tmp/tempConfig.yaml
$ sed -i '/rule_files:/a\ \- /etc/config/alertscncc' /tmp/tempConfig.yaml
$ kubectl replace configmap occne-prometheus-server -f /tmp/tempConfig.yaml
$ kubectl patch configmap occne-prometheus-server -n occne-infra --type merge --patch "$(cat ~/occncc_alertrules_<version>.yaml)"

Note:

The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 60 seconds. Refresh the Prometheus GUI to confirm that the CNCC Alerts have been reloaded.
CNE supporting Prometheus HA (CNE 1.9)

This section describes the measurement based Alert rules configuration for CNCC in Prometheus. Use an updated occncc_alerting_rules_promha_<version>.yaml file.

The PrometheusRule resource specified in the alert YAML file can be created or updated by using following command:

$ kubectl apply -f occncc_alerting_rules_promha_<version>.yaml

Validating Alerts

Configure and Validate Alerts in Prometheus Server. Refer to CNCC Alert Configuration section for procedure to configure the alerts.

After configuring the alerts in Prometheus server, a user can verify that by following steps:
  1. Open the Prometheus server from your browser using the <IP>:<Port>
  2. Navigate to Status and to Rules.
  3. Search CNCC. CNCC Alerts list is displayed.

Note:

If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try again

Disabling Alerts

This section explains how to disable the alerts in CNCC.

  1. Edit occncc_alerting_rules_promha_<version>.yaml file to remove specific alert.
  2. Remove complete content of the specific alert from the occncc_alerting_rules_promha_<version>.yaml file.

    cncc_alert_rules_<version>.yaml
    
    For example: If you want to remove CnccIamTotalIngressTrafficRateAboveMinorThreshold alert, remove the complete content:
    
    ## ALERT SAMPLE START##
    - alert: CnccIamTotalIngressTrafficRateAboveMinorThreshold
     annotations:
     description: 'CNCC IAM Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})'
     summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'
     expr: sum(rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*cncc-iam_ingressgateway",kubernetes_namespace="cncc"}[2m])) > 0
     labels:
     severity: minor
     oid: "1.3.6.1.4.1.323.5.3.51.1.2.7001"
     namespace: ' {{ $labels.kubernetes_namespace }} '
     podname: ' {{$labels.kubernetes_pod_name}} '
    ## ALERT SAMPLE END##
    
  3. Perform Alert configuration. See CNCC Alert Configuration section for details.

Configuring SNMP-Notifier

Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using following procedure:

  1. Run the following command to edit the deployment:
     kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>
    Example:
     $ kubectl edit deploy occne-snmp-notifier -n occne-infra
  2. Edit the destination as follows:
      --snmp.destination=<destination_ip>:<destination_port>
    Example:
    --snmp.destination=10.75.203.94:162

MIB Files for CNCC

There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.

occncc_mib_tc_<version>.mib

This file is considered as CNCC top level mib file, where the Objects and their data types are defined.

occncc_mib_<version>.mib

This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.

Note:

MIB files are packaged along with CNCC Custom Templates. Download the file from MOS. Refer to CNCC Installation guide for more details.