8 CNC Console Alerts

This section provides information about CNC Console Alerts.

Note:

For OCI:

The only section applicable for OCI is CNC Console Alerts on OCI.

Note:

  • The user must use updated occncc_agent_alertrules_<version>.yaml file for agent cluster, in case of multi cluster deployment.
  • Use occncc_manager_alertrules_<version>.yaml file for single cluster deployment and in the manager cluster, in case of multi cluster deployment.

Note:

Added the four sample alert files:

  • CNCC alert rules file for CNE without Prometheus Operator: occncc_manager_alertrules_<version>.yaml file for manager only or manager plus agent deployments and occncc_agent_alertrules_<version>.yaml file for agent only deployments.
  • CNCC alert rules file supporting CNE with Prometheus HA Operator: occncc_manager_alerting_rules_promha_<version>.yaml file for deployments in manager cluster and occncc_agent_alertingrules_promha_<version>.yaml file for deployments in agent only cluster.

Table 8-1 Alerts Levels or Severity Types

Alerts Levels / Severity Types Definition
Critical Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of OCCM.
Major Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of OCCM.
Minor Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of OCCM.
Info or Warn (Informational) Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of OCCM.

8.1 CNC Console IAM Alerts

This section provides information about CNC Console IAM Alerts.

8.1.1 CnccIamTotalIngressTrafficRateAboveMinorThreshold

Table 8-2 CnccIamTotalIngressTrafficRateAboveMinorThreshold

Trigger Condition

The total CNCC IAM Ingress Message rate has crossed the configured minor threshold of 700 TPS.

Default value of this alert trigger point in occncc_alertrules_<version>.yamlis when CNCC IAM Ingress Rate crosses 70 % of 1000 (Maximum ingress request rate)

Severity Minor
Alert details provided

Description: CNCC IAM Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'

For CNE without Prometheus Operator :

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7001
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate crosses the Major threshold, in which case the CnccIamTotalIngressTrafficRateAboveMajorThreshold alert is raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Reassess why the CNCC IAM is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

8.1.2 CnccIamTotalIngressTrafficRateAboveMajorThreshold

Table 8-3 CnccIamTotalIngressTrafficRateAboveMajorThreshold

Trigger Condition

The total CNCC IAM Ingress Message rate has crossed the configured major threshold of 800 TPS.

Default value of this alert trigger point ino ccncc_alertrules_<version>.yaml is when CNCC IAM Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate)

Severity Major
Alert details provided

Description: 'CNCC IAM Ingress traffic Rate is above the configured major threshold i.e. 800 requests per second (current value is: {{ $value }})'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'

For CNE without Prometheus Operator :

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7001
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert is cleared when the total Ingress Traffic rate falls below the Major threshold or when the total traffic rate crosses the Critical threshold, in which case the CnccIamTotalIngressTrafficRateAboveCriticalThreshold alert is raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Reassess why the CNCC IAM is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

8.1.3 CnccIamTotalIngressTrafficRateAboveCriticalThreshold

Table 8-4 CnccIamTotalIngressTrafficRateAboveCriticalThreshold

Trigger Condition

The total CNCC IAM Ingress Message rate has crossed the configured critical threshold of 900TPS. Default value of this alert trigger point in occncc_alertrules_<version>.yaml is when CNCC IAM Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate)

Severity Critical
Alert details provided

Description:CNCC IAM Ingress traffic Rate is above the configured critical threshold, that is, 900 requests per second (current value is: {{ $value }})

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'

For CNE without Prometheus Operator:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7001
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert is cleared when the Ingress Traffic rate falls below the Critical threshold.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Reassess why the CNCC IAM is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

8.1.4 CnccIamMemoryUsageCrossedMinorThreshold

Table 8-5 CnccIamMemoryUsageCrossedMinorThreshold

Trigger Condition A pod has reached the configured minor threshold( 70%) of its memory resource limits.
Severity Minor
Alert details provided

Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (70%) (value={{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7002
Metric Used

container_memory_usage_bytes,

kube_pod_container_resource_limits

Note:This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case CnccIamMemoryUsageCrossedMajorThreshold alert is raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

8.1.5 CnccIamMemoryUsageCrossedMajorThreshold

Table 8-6 CnccIamMemoryUsageCrossedMajorThreshold

Trigger Condition A pod has reached the configured major threshold( 80%) of its memory resource limits.
Severity Major
Alert details provided

Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (80%) (value = {{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7002
Metric Used

container_memory_usage_bytes,

kube_pod_container_resource_limits

Note: This is a Kubernetes metric used for instance availability monitoring.If the metric is not available, use a similar metric as exposed by the monitoring system.

Resolution The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case CnccIamMemoryUsageCrossedCriticalThreshold alert shall be raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

8.1.6 CnccIamMemoryUsageCrossedCriticalThreshold

Table 8-7 CnccIamMemoryUsageCrossedCriticalThreshold

Trigger Condition A pod has reached the configured critical threshold ( 90% ) of its memory resource limits
Severity Critical
Alert details provided

Description: 'CNCC IAM Memory Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (90%) (value = {{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7002
Metric Used

container_memory_usage_bytes,

kube_pod_container_resource_limits

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.

Resolution The alert gets cleared when the memory utilization falls below the Critical Threshold.

Note : The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

8.1.7 CnccIamTransactionErrorRateAbove0.1Percent

Table 8-8 CnccIamTransactionErrorRateAbove0.1Percent

Trigger Condition The number of failed transactions is above 0.1 percent of the total transactions.
Severity Warning
Alert details provided Description: 'CNCC IAM transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC IAM transaction Error Rate detected above 0.1 Percent of Total Transactions'
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failed transactions is below 0.1% of the total transactions or when the number of failed transactions crosses the 1% threshold in which case the CnccIamTransactionErrorRateAbove1Percent is raised.

Steps:
  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

8.1.8 CnccIamTransactionErrorRateAbove1Percent

Table 8-9 CnccIamTransactionErrorRateAbove1Percent

Trigger Condition The number of failed transactions is above 1 percent of the total transactions.
Severity Warning
Alert details provided Description: 'CNCC IAM transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC IAM transaction Error Rate detected above 1 Percent of Total Transactions'
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failed transactions is below 1% of the total transactions or when the number of failed transactions crosses the 10% threshold in which case the CnccIamTransactionErrorRateAbove10Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

8.1.9 CnccIamTransactionErrorRateAbove10Percent

Table 8-10 CnccIamTransactionErrorRateAbove10Percent

Trigger Condition The number of failed transactions is above 10 percent of the total transactions.
Severity Minor
Alert details provided

Description: CNCC IAM transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC IAM transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failed transactions is below 10% of the total transactions or when the number of failed transactions crosses the 25% threshold in which case the CnccIamTransactionErrorRateAbove25Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

8.1.10 CnccIamTransactionErrorRateAbove25Percent

Table 8-11 CnccIamTransactionErrorRateAbove25Percent

Trigger Condition

The number of failed transactions is above 25 percent of the total transactions.

Severity Major
Alert details provided

Description: 'CNCC IAM transaction Error Rate detected above 25 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC IAM transaction Error Rate detected above 25 Percent of Total Transactions'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Resolution

TThe alert is cleared when the number of failed transactions are below 25% of the total transactions or when the number of failed transactions cross the 50% threshold in which case the CnccIamTransactionErrorRateAbove50Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

8.1.11 CnccIamTransactionErrorRateAbove50Percent

Table 8-12 CnccIamTransactionErrorRateAbove50Percent

Trigger Condition

The number of failed transactions is above 50 percent of the total transactions.

Severity Critical
Alert details provided

Description: The number of failed transactions is above 50 percent of the total transactions.

Summary: 'CNCC IAM transaction Error Rate detected above 50 Percent of Total Transactions'.

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions is below 50 percent of the total transactions.

The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

8.1.12 CnccIamIngressGatewayServiceDown

Table 8-13 CnccIamIngressGatewayServiceDown

Trigger Condition

The pods of the CNCC IAM Ingress Gateway microservice is available.

Severity Critical
Alert details provided

Description:'CNCC IAM Ingress-Gateway service InstanceIdentifier=~".*cncc-iam_ingressgateway" is down'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'

For CNE without Prometheus Operator:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7004
Metric Used

'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Resolution

The alert is cleared when the cncc-iam_ingressgateway service is available.

Steps:

  1. Check the orchestration logs of cncc-iam_ingressgateway service and check for liveness or readiness probe failures.
  2. Refer to the application logs on Kibana and filter based on cncc-iam_ingressgateway service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.

8.1.13 CnccIamFailedLogin

Table 8-14 CnccIamFailedLogin

Trigger Condition

The count of failed login attempts in CNCC-IAM by a user goes above '3'

Severity Warning
Alert details provided

Description:'{{ $value }} failed Login attempts have been detected in CNCC IAM for user {{$labels.UserName}}, the configured threshold value is 3 failed login attempts for every 5 min'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value'

For CNE without Prometheus Operator:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7005
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total failed login attempts for a particular user goes below the threshold value (default value is '3') in the last 5 min (default value is 5 m).

Note: The threshold and time is configurable in the alerts.yaml file.

If guidance is required, contact My Oracle Support.

8.1.14 AdminUserCreation

Table 8-15 AdminUserCreation

Trigger Condition

If a new admin account is created in the last 5 min

Severity Warning
Alert details provided

For CNE with Prometheus HA Operator:

Description: '{{ $value }} admin users have been created by {{$labels.UserName}} '

summary: 'namespace: {{$labels.namespace}}

summary: {{$labels.pod}}, user: {{$labels.UserName}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Admin users have been created '

For CNE without Prometheus Operator:

Description: '{{ $value }} admin users have been created by {{$labels.UserName}} '

summary: 'namespace: {{$labels.kubernetes_namespace}}

summary: {{$labels.kubernetes_pod_name}}, user: {{$labels.UserName}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Admin users have been created '

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7006
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note: The threshold and time is configurable in the occncc_alertrules_<version>.yaml file.

Login to admin GUI and review the user created.

If guidance is required, contact My Oracle Support.

8.1.15 CnccIamAccessTokenFailure

Table 8-16 CnccIamAccessTokenFailure

Trigger Condition

If the count of failed token for CNCC-IAM goes above configured value of '3'

Severity Warning
Alert details provided

Description: 'CNCC Iam Access Token Failure count is above the configured value i.e. 3 for every 5 min. Failed access token request count per second is (current value is: {{ $value }})'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value'

For CNE without Prometheus Operator:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.7007
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total failed tokens for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note: The threshold and time is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

8.2 CNC Console Core Alerts

This section provides the information about CNC Console Core Alerts.

8.2.1 CnccCoreTotalIngressTrafficRateAboveMinorThreshold

Table 8-17 CnccCoreTotalIngressTrafficRateAboveMinorThreshold

Trigger Condition

The total CNCC Core Ingress Message rate has crossed the configured minor threshold of 700 TPS.

Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 70 % of 1000 (Maximum ingress request rate)

Severity Minor
Alert details provided

Description: 'CNCC Core Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'

For CNE without Prometheus Operator:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8001
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert is cleared either when the total Ingress traffic rate falls below the minor threshold or when the total traffic rate crosses the major threshold, in which case the CnccCoreTotalIngressTrafficRateAboveMajorThreshold alert is raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Reassess why the CNCC Core is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

8.2.2 CnccCoreTotalIngressTrafficRateAboveMajorThreshold

Table 8-18 CnccCoreTotalIngressTrafficRateAboveMajorThreshold

Trigger Condition

The total CNCC Core Ingress Message rate has crossed the configured major threshold of 800 TPS.

Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate)

Severity Major
Alert details provided

Description: 'CNCC Core Ingress traffic Rate is above the configured major threshold i.e. 800 requests per second (current value is: {{ $value }})'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'

For CNE without Prometheus Operator :

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8001
Metric Used oc_ingressgateway_http_requests_total
Resolution The alert is cleared when the total Ingress Traffic ratefalls below the Major threshold or when the total traffic rate cross the Critical threshold, in which case the CnccCoreTotalIngressTrafficRate Above CriticalThreshold.

Note: The threshold is configurable in the alerts.yaml file.

Steps:

  1. Reassess why the CNCC Core is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

8.2.3 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold

Table 8-19 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold

Trigger Condition

The total CNCC Core Ingress Message rate has crossed the configured critical threshold of 900TPS.

Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate)

Severity Critical
Alert details provided

Description: 'CNCC Core Ingress traffic Rate is above the configured critical threshold i.e. 900 requests per second (current value is: {{ $value }})'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'

For CNE without Prometheus Operator :

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8001
Metric Used oc_ingressgateway_http_requests_total
Resolution

The alert is cleared when the Ingress Traffic rate falls below the Critical threshold.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

Steps:

  1. Reassess why the CNCC IAM is receiving additional traffic.
  2. If this is unexpected, contact My Oracle Support.

8.2.4 CnccCoreMemoryUsageCrossedMinorThreshold

Table 8-20 CnccCoreMemoryUsageCrossedMinorThreshold

Trigger Condition

A pod has reached the configured minor threshold( 70%) of its memory resource limits.

Severity Minor
Alert details provided

Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (70%) (value={{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8002
Metric Used

container_memory_usage_bytes

kube_pod_container_resource_limits

Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case CnccCoreMemoryUsageCrossedMajorThreshold alert is raised.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

8.2.5 CnccCoreMemoryUsageCrossedMajorThreshold

Table 8-21 CnccCoreMemoryUsageCrossedMajorThreshold

Trigger Condition

A pod has reached the configured major threshold( 80%) of its memory resource limits.

Severity Major
Alert details provided

Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (80%) (value = {{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 80% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8002
Metric Used

container_memory_usage_bytes

kube_pod_container_resource_limits

Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case CnccCoreMemoryUsageCrossedCriticalThreshold alert is raised

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

8.2.6 CnccCoreMemoryUsageCrossedCriticalThreshold

Table 8-22 CnccCoreMemoryUsageCrossedCriticalThreshold

Trigger Condition

A pod has reached the configured critical threshold ( 90% ) of its memory resource limits

Severity Critical
Alert details provided

Description: 'CNCC Core Memory Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (90%) (value = {{ $value }}) of its limit.'

Summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 90% of its limit.'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8002
Metric Used

container_memory_usage_bytes

kube_pod_container_resource_limits

Note : This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Critical Threshold.

Note: The threshold is configurable in the occncc_alertrules_<version>.yaml file.

If guidance is required, contact My Oracle Support.

8.2.7 CnccCoreTransactionErrorRateAbove0.1Percent

Table 8-23 CnccCoreTransactionErrorRateAbove0.1Percent

Trigger Condition

The number of failed transactions is above 0.1 percent of the total transactions..

Severity Warning
Alert details provided

Description:'CNCC Core transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})'

Summary:'CNCC Core transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions are below 0.1% of the total transactions or when the number of failed transactions cross the 1% threshold in which case the CnccCoreTransactionErrorRateAbove1Percent is raised.

The threshold is configurable in the alerts.yaml file.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

8.2.8 CnccCoreTransactionErrorRateAbove1Percent

Table 8-24 CnccCoreTransactionErrorRateAbove1Percent

Trigger Condition The number of failed transactions is above 1 percent of the total transactions.
Severity Warning
Alert details provided

Description: 'CNCC Core transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'

Summary:'CNCC Core transaction Error Rate detected above 1 Percent of Total Transactions'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions are below 1% of the total transactions or when the number of failed transactions crosses the 10% threshold in which case the CnccCoreTransactionErrorRateAbove10Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required,contact My Oracle Support.

8.2.9 CnccCoreTransactionErrorRateAbove10Percent

Table 8-25 CnccCoreTransactionErrorRateAbove10Percent

Trigger Condition The number of failed transactions is above 10 percent of the total transactions.
Severity Minor
Alert details provided

Description: 'CNCC Core transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'

summary: 'CNCC Core ransaction Error Rate detected above 10 Percent of Total Transactions'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions are below 10% of the total transactions or when the number of failed transactions crosses the 25% threshold in which case the CnccCoreTransactionErrorRateAbove25Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

8.2.10 CnccCoreTransactionErrorRateAbove25Percent

Table 8-26 CnccCoreTransactionErrorRateAbove25Percent

Trigger Condition The number of failed transactions is above 25 percent of the total transactions.
Severity Major
Alert details provided

Description: 'CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions are below 25% of the total transactions or when the number of failed transactions crosses the 50% threshold in which case the CnccCoreTransactionErrorRateAbove50Percent is raised.

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

8.2.11 CnccCoreTransactionErrorRateAbove50Percent

Table 8-27 CnccCoreTransactionErrorRateAbove50Percent

Trigger Condition The number of failed transactions is above 50 percent of the total transactions.
Severity Critical
Alert details provided

Description: 'CNCC Core transaction Error Rate detected above 50 Percent of Total Transactions (current value is {{ $value }})'

Summary: 'CNCC Core transaction Error Rate detected above 50 Percent of Total Transactions'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8003
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert is cleared when the number of failed transactions are below 50 percent of the total transactions

Steps:

  1. Check the Service specific metrics to understand the specific service request errors.
  2. If guidance is required, contact My Oracle Support.

8.2.12 CnccCoreIngressGatewayServiceDown

Table 8-28 CnccCoreIngressGatewayServiceDown

Trigger Condition Cncc Core Ingress Gateway service is down
Severity Critical
Alert details provided

Description: 'CNCC Core Ingress-Gateway service InstanceIdentifier=~".*core_ingressgateway" is down'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'

For CNE without Prometheus Operator:

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8004
Metric Used

'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Resolution

The alert is cleared when the cncc-core_ingressgateway service is available.

Steps:

  1. Check the orchestration logs of cncc-core_ingressgateway service and check for liveness or readiness probe failures.
  2. Refer the application logs on Kibana and filter based on cncc-core_ingressgateway service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.

8.2.13 CnccCoreFailedLogin

Table 8-29 CnccCoreFailedLogin

Trigger Condition

The count of failed login attempts in CNCC-Core by a user goes above '3'

Severity Warning
Alert details provided

Description:'{{ $value }} failed Login attempts have been detected in CNCC Core for user {{$labels.UserName}}, the configured threshold value is 3 failed login attempts for every 5 min'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value'

For CNE without Prometheus Operator :

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: failed login attempts are more than the configured threshold value'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8005
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note: The threshold and time is configurable in the alerts.yaml file.

If guidance is required, contact My Oracle Support.

8.2.14 CnccCoreUnauthorizedAccess

Table 8-30 CnccCoreUnauthorizedAccess

Trigger Condition

The count of failed login attempts in CNCC-Core by a user goes above '3'

Severity Warning
Alert details provided

Description:'{{ $value }} Unauthorized Accesses have been detected in CNCC-Core for {{$labels.ResourceType}} for {{$labels.Method}} request. The configured threshold value is 3 for every 5 min'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Unauthorized Access for CNCC-Core are more than threshold value'

For CNE without Prometheus Operator :

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Unauthorized Access for CNCC-Core are more than threshold value'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8006
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total failed login attempts for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note: The threshold and time is configurable in the alerts.yaml

If guidance is required, contact My Oracle Support.

8.2.15 CnccCoreAccessTokenFailure

Table 8-31 CnccCoreAccessTokenFailure

Trigger Condition

If the count of failed token for CNCC-Core goes above configured value of '3'

Severity Warning
Alert details provided

Description: 'CNCC Core Access Token Failure count is above the configured value i.e. 3 for every 5 min. Failed access token request count per second is (current value is: {{ $value }})'

For CNE with Prometheus HA Operator:

summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value'

For CNE without Prometheus Operator :

summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Access Token Failure count is above the configured threshold value'

OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.51.1.2.8007
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total failed tokens for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note: The threshold and time is configurable in the alerts.yaml file.

If guidance is required, Contact My Oracle Support.

8.3 CNC Console Alert configuration in Prometheus

Applying Alerts Rule to CNE without Prometheus Operator:

This section describes the measurement based Alert rules configuration for CNCC in Prometheus. Use updated occncc_agent_alertrules_<version>.yaml file for agent only deployments and occncc_manager_alertrules_<version>.yaml for manager only as well as manager plus agent deployments.

_NAME_ :- Helm Release of Prometheus

_Namespace_ :- Kubernetes NameSpace in which Prometheus is installed

Configuration

  1. Run the following command to take Backup of current config map of prometheus server:
    kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml
    where, <Namespace> is the prometheus server namespace used in helm install command.
    For Example, assuming chart name is "prometheus-alert", so "_NAME_-server" becomes "prometheus-alert-server", run the following command to find the config map:
    kubectl get configmaps prometheus-alert-server -o yaml -n prometheus-alert2 > /tmp/tempConfig.yaml
  2. Run the following command to check and Add CNCC Alert File name inside Prometheus config map:
    1. If alertscncc is present, delete the alertscncc entry from the t_mapConfig.yaml file, by executing the following command:
      sed -i '/etc\/config\/alertscncc/d' /tmp/mapConfig.yaml
      
    2. If alertscncc is not present, add the alertscncc entry in the t_mapConfig.yaml file by executing the following command:
    sed -i '/rule_files:/a\    \- /etc/config/alertscncc'  /tmp/mapConfig.yaml
  3. Run the following command to update Config map with updated file name of CNCC:
    kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
  4. Run the following commands to add cnccAlertsRules in Config map under file name of CNCC alert file:
    1. For single cluster and manager only or manager plus agent namespace in multi cluster deployment:
       kubectl patch  configmap _NAME_-server -n _Namespace_ --type merge --patch "$(cat ~/occncc_manager_alertrules_<version>.yaml)"
    2. For agent only namespace in multicluster:
     kubectl patch  configmap _NAME_-server -n _Namespace_ --type merge --patch "$(cat ~/occncc_agent_alertrules_<version>.yaml)"
  5. Restart prometheus-server pod.
  6. Verify the alerts in prometheus GUI.

Note:

Prometheus server takes updated configmap reloaded after sometime automatically (~20 sec)
Example:

$ kubectl get configmaps occne-prometheus-server -o yaml -n occne-infra > /tmp/tempConfig.yaml
$ sed -i '/etc\/config\/alertscncc/d' /tmp/tempConfig.yaml
$ sed -i '/rule_files:/a\ \- /etc/config/alertscncc' /tmp/tempConfig.yaml
$ kubectl replace configmap occne-prometheus-server -f /tmp/tempConfig.yaml
$ kubectl patch configmap occne-prometheus-server -n occne-infra --type merge --patch "$(cat ~/occncc_alertrules_<version>.yaml)"

Note:

The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 60 seconds. Refresh the Prometheus GUI to confirm that the CNCC Alerts have been reloaded.
Applying Alerts Rule to CNE with Prometheus HA Operator:

This section describes the measurement based Alert rules configuration for CNCC in Prometheus. Use updated occncc_agent_alerting_rules_promha_<version>.yaml file for agent only deployments and occncc_manager_alerting_rules_promha_<version>.yaml for manager only as well as manager plus agent deployments.

Note:

Default namespace configured in occncc_manager_alerting_rules_promha_<version>.yaml and occncc_agent_alerting_rules_promha_<version>.yaml is "cncc-ns". User must update the namespace as per the deployment.

Run the following command to apply CNCC alerts file to create Prometheus rules Custom Resource Definition (CRD):
$ kubectl apply -f <file_name> -n <cncc namespace>

Where, <file_name> is the CNCC alerts file and <cncc namespace> is the CNCC namespace.

Example:
$ kubectl apply -f occncc_manager_alerting_rules_promha_<version>.yaml -n cncc
The following sample file delivered with CNC Console package:
occncc_manager_alerting_rules_promha_<version>.yaml

8.4 Validating Alerts

Configure and Validate Alerts in Prometheus Server. Refer to CNCC Alert Configuration section for procedure to configure the alerts.

After configuring the alerts in Prometheus server, a user can verify that by following steps:
  1. Open the Prometheus server from your browser using the <IP>:<Port>
  2. Navigate to Status and to Rules.
  3. Search CNCC. CNCC Alerts list is displayed.

Note:

If you are unable to see the alerts, it means the alert file is not loaded in a proper format which the Prometheus server accepts. Modify the file and try again

8.5 Disabling Alerts

This section explains how to disable the alerts in CNCC.

  1. Edit occncc_alerting_rules_promha_<version>.yaml file to remove specific alert.
  2. Remove complete content of the specific alert from the occncc_alerting_rules_promha_<version>.yaml file.

    cncc_alert_rules_<version>.yaml
    
    For example: If you want to remove CnccIamTotalIngressTrafficRateAboveMinorThreshold alert, remove the complete content:
    
    ## ALERT SAMPLE START##
    - alert: CnccIamTotalIngressTrafficRateAboveMinorThreshold
     annotations:
     description: 'CNCC IAM Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second (current value is: {{ $value }})'
     summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 70 Percent of Max requests per second(1000)'
     expr: sum(rate(oc_ingressgateway_http_requests_total{InstanceIdentifier=~".*cncc-iam_ingressgateway",kubernetes_namespace="cncc"}[2m])) > 0
     labels:
     severity: minor
     oid: "1.3.6.1.4.1.323.5.3.51.1.2.7001"
     namespace: ' {{ $labels.kubernetes_namespace }} '
     podname: ' {{$labels.kubernetes_pod_name}} '
    ## ALERT SAMPLE END##
    
  3. Perform Alert configuration. See CNCC Alert Configuration section for details.

8.6 Configuring SNMP-Notifier

Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using following procedure:

  1. Run the following command to edit the deployment:
     kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>
    Example:
     $ kubectl edit deploy occne-snmp-notifier -n occne-infra
  2. Edit the destination as follows:
      --snmp.destination=<destination_ip>:<destination_port>
    Example:
    --snmp.destination=10.75.203.94:162

8.7 CNC Console MIB Files

There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.

occncc_mib_tc_<version>.mib

This file is considered as CNCC top level mib file, where the Objects and their data types are defined.

occncc_mib_<version>.mib

This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.

8.8 CNC Console Alerts on OCI

This section provides information about CNC Console Alerts on OCI:

8.8.1 Configure CNC Console Alerts on OCI

Note:

You must ensure that the user has the required roles and permissions to view or modify the alerts. For more information, see the Creating OCI User Management section in the Oracle Communications Cloud Native Core OCI Adaptor, NF Deployment on OCI Guide.

To configure CNC Console Alerts on OCI:
  1. CNC Console CSAR package occncc_csar_<version>.zip includes alarm files specific to OCI deployment. These files are zipped as occncc_oci_alertrules_<version>.zip and placed in the Scripts directory of CSAR package.
  2. Unzip the file to get /occncc_oci_alertrules_<version>/occncc_oci and /occncc_oci_alertrules_<version>/occncc_oci_resources these are the terraform folders to be applied in OCI console to create CNC Console related alarms.
  3. Review the occncc_oci_alertrules_<version>occncc_oci/alarms.tf and /occncc_oci_alertrules_<version>/occncc_oci_resources/alarms.tf file, edit the value of the parameters in the file (if needed to be changed from default values) before configuring the alerts.
  4. In /occncc_oci_alertrules_<version>/occncc_oci/alarms.tf k8Namespace is configured as kubernetes namespace in which CNC Console is deployed. Default value is cncc. Update the /occncc_oci/alarms.tf file to reflect the correct CNC Console kubernetes namespace.
  5. In /occncc_oci_alertrules_<version>/occncc_oci_resources/alarms.tf namespace is configured as kubernetes namespace in which CNC Console is deployed. Default value is cncc. Update the /occncc_oci/alarms.tf file to reflect the correct CNC Console kubernetes namespace.
  6. In /occncc_oci_alertrules_<version>/occncc_oci/notifications.tf and /occncc_oci_alertrules_<version>/occncc_oci_resources/notifications.tf update the subscription email, where resource "email_subscription_1" endpoint parameter should be updated with user email to which alarm trigger notification will be sent.

    Example:

    resource "oci_ons_notification_topic" "notification_topic" {
      compartment_id = var.compartment_id
      name = var.topic_name
      description = "Send an email to the subscribed stakeholders"
    }
     
    resource "oci_ons_subscription" "email_subscription_1" {
      compartment_id = var.compartment_id
      endpoint = "xxx.oracle.com"
      protocol = "EMAIL"
      topic_id = oci_ons_notification_topic.notification_topic.id
    }
    To configure two or more email subscriptions, add new entries. For example:
    resource "oci_ons_notification_topic" "notification_topic" {
      compartment_id = var.compartment_id
      name = var.topic_name
      description = "Send an email to the subscribed stakeholders"
    }
     
    resource "oci_ons_subscription" "email_subscription_1" {
      compartment_id = var.compartment_id
      endpoint = "xxx.oracle.com"
      protocol = "EMAIL"
      topic_id = oci_ons_notification_topic.notification_topic.id
    }
     
    resource "oci_ons_subscription" "email_subscription_2" {
      compartment_id = var.compartment_id
      endpoint = "yyy.oracle.com"
      protocol = "EMAIL"
      topic_id = oci_ons_notification_topic.notification_topic.id
    }
  7. Once alarms.tf and notifications.tf files are updated, these /occncc_oci and /occncc_oci_resources terraform folders can be applied in OCI console to create alarms.
  8. Login to OCI Console GUI and search for "Stacks" or click on Hamburger menu, click on Developer Services, then click Stacks in the Resource Manager section → A page will open containing table with existing stacks if any along with Create stack button.
  9. To apply these updated terraform folders /occncc_oci and /occncc_oci_resources, click on Create Stack button → This will open Create stack configuration page which includes three steps:
    1. Stack information: Choose the origin of the Terraform configuration to "My configuration" if not checked. Select "Folder" terraform configuration source in Stack Configuration to Brose and upload the updated terraform folders(one at a time: /occncc_oci or /occncc_oci_resources) Provide Name and Description (optional) and select the Create in compartment parameter in which terraform needs to applied, alarms should be created and click Next.
    2. Configure variables:
      1. Compartment Name: Choose the compartment in which metric namespace is present (namespace created for metrics as part of adapters installation).
      2. Metric namespace: Input Namespace in which your Metrics will be populated.

        Note1: In case of /occncc_oci_alertrules_<version>/occncc_oci user should input the metric namespace created for metrics as part of adapters installation.

        Note2: In case of /occncc_oci_alertrules_<version>/occncc_oci_resources metric namespace is prepopulated.

      3. Topic Name: Input name of the topic to be created.

        Note:

        • Topic name must contain fewer than 256 characters. Only alphanumeric characters plus hyphens (-) and underscores (_) are allowed.
        • Topic name should be different for both the terraforms /occncc_oci_alertrules_<version>/occncc_oci and /occncc_oci_alertrules_<version>/occncc_oci_resources when applied on OCI console.
      4. Message Format: Select the message format as required, Format in which the notification will be sent.
      5. Click Next.
    3. Review: Verify your configuration variables and then Click Create.
  10. A Stack will be created upon clicking on create button, Click on Plan button and wait for Plan job to succeed.
  11. Once the Plan job succeeded, Click on Apply button on which right pane will pop up in which select previously ran "Apply job plan resolution" and then click on Apply.
  12. Wait for the Apply job to get succeeded. Once Apply job is succeeded Alerts are created successfully.
  13. Once Plan and Apply job is successfully completed, check the email Notification triggered to confirm on the topic subscription. Click on "Confirm subscription" link to confirm.
  14. To Check the created alarms Click on Hamburger menu, click "Observability and Management", then click "Alarm Definitions" in Monitoring section to view all the alarms.

For more details, see Managing alarms in Oracle Cloud Infrastructure Documentation

Delete Created Alarms and Stack (terraforms applied for Alarms creation)

To delete created alarms:
  1. Login to OCI Console GUI, click on Hamburger menu, click "Observability and Management", then click "Alarm Definitions" under Monitoring section and make sure right compartment is selected in left pane under "List scope". Check the alarms to be deleted, Click on "Actions" and then "Delete alarms".
  2. Once all the alarms deleted, search for "Stacks" on OCI console search bar or click on Hamburger menu, click on Developer Services, then click on Stacks in Resource Manager section → A page will open containing table with existing stacks.
  3. Click the stack names which was applied to create alarms and then click "Destroy".
  4. Wait for "Destroy" job to complete and come back to Resources Manager → Stacks, click more options(three dots) for which destroy job was executed and perform "Delete" to delete the stack.

8.8.2 CnccCoreTotalIngressTrafficRateAboveMinorThreshold

Table 8-32 CnccCoreTotalIngressTrafficRateAboveMinorThreshold

Trigger Condition

The total CNCC Core Ingress Message rate has crossed the configured minor threshold of 700 TPS.

Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 70 % of 1000 (Maximum ingress request rate)

Severity minor
Alert details provided CNCC Core Ingress traffic Rate is above the configured minor threshold i.e. 700 requests per second
Metric Used oc_ingressgateway_http_requests_total
Resolution The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate cross the Major threshold, inwhich case the CnccCoreTotalIngressTrafficRateAboveMajorThresholdalert shall be raised.Note: The threshold is configurable in the alerts.yamlSteps:Reassess why the CNCC Core is receiving additionaltraffic.If this is unexpected, contact My Oracle Support.

8.8.3 CnccCoreTotalIngressTrafficRateAboveMajorThreshold

Table 8-33 CnccCoreTotalIngressTrafficRateAboveMajorThreshold

Trigger Condition

The total CNCC Core Ingress Message rate has crossed the configured major threshold of 800 TPS.

Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate)

Severity major
Alert details provided CNCC Core Ingress traffic Rate is above the configured major threshold i.e. 800 requests per second
Metric Used oc_ingressgateway_http_requests_total
Resolution The alert is cleared when the total Ingress Traffic ratefalls below the Major threshold or when the total traffic rate cross the Critical threshold, in which casethe CnccCoreTotalIngressTrafficRateAboveCriticalThresholdNote: The threshold is configurable in the alerts.yaml alert shall be raised.Steps:Reassess why the CNCC Core is receiving additionaltraffic.If this is unexpected, contact My Oracle Support.

8.8.4 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold

Table 8-34 CnccCoreTotalIngressTrafficRateAboveCriticalThreshold

Trigger Condition

The total CNCC Core Ingress Message rate has crossed the configured critical threshold of 900TPS.

Default value of this alert trigger point in cncc_alert_rules.yaml is when CNCC Core Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate)

Severity critical
Alert details provided CNCC Core Ingress traffic Rate is above the configured critical threshold i.e. 900 requests per second
Metric Used oc_ingressgateway_http_requests_total
Resolution The alert is cleared when the Ingress Traffic rate falls below the Critical threshold.Note: The threshold is configurable in the alerts.yamlSteps:Reassess why the CNCC Core is receiving additional traffic.If this is unexpected, contact My Oracle Support.

8.8.5 CnccCoreMemoryUsageCrossedMinorThreshold

Table 8-35 CnccCoreMemoryUsageCrossedMinorThreshold

Trigger Condition A pod has reached the configured minor threshold( 70%) of its memory resource limits.
Severity minor
Alert details provided CNCC Core Memory Usage for pod has crossed the configured minor threshold (70%) of its limit.
Metric Used

container_memory_usage_bytes

Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case CnccCoreMemoryUsageCrossedMajorThreshold alert shall be raised

Note : The threshold is configurable in the alerts.yaml

If guidance required, Contact My Oracle Support.

8.8.6 CnccCoreMemoryUsageCrossedMajorThreshold

Table 8-36 CnccCoreMemoryUsageCrossedMajorThreshold

Trigger Condition A pod has reached the configured major threshold( 80%) of its memory resource limits.
Severity major
Alert details provided CNCC Core Memory Usage for pod has crossed the configured major threshold (80%) of its limit.
Metric Used

container_memory_usage_bytes

Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case CnccCoreMemoryUsageCrossedCriticalThreshold alert shall be raised

Note : The threshold is configurable in the alerts.yaml

If guidance required, Contact My Oracle Support.

8.8.7 CnccCoreMemoryUsageCrossedCriticalThreshold

Table 8-37 CnccCoreMemoryUsageCrossedCriticalThreshold

Trigger Condition A pod has reached the configured critical threshold ( 90% ) of its memory resource limits
Severity critical
Alert details provided CNCC Core Memory Usage for pod has crossed the configured critical threshold (90%) of its limit.
Metric Used

container_memory_usage_bytes

Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the Critical Threshold.

Note : The threshold is configurable in the alerts.yaml

If guidance required, Contact My Oracle Support.

8.8.8 CnccCoreTransactionErrorRateAbovePointOnePercent

Table 8-38 CnccCoreTransactionErrorRateAbovePointOnePercent

Trigger Condition The number of failed transactions is above 0.1 percent of the total transactions.
Severity warning
Alert details provided CNCC Core transaction Error rate is above 0.1 Percent of Total Transactions
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failure transactions are below 0.1% of the total transactions or when the number of failure transactions cross the 1% threshold in which case the CnccCoreTransactionErrorRateAbove1Percent shall beraised.Steps:1. Check the Service specific metrics to understand the specific service request errors.2. If guidance required, contact My Oracle Support.

8.8.9 CnccCoreTransactionErrorRateAboveOnePercent

Table 8-39 CnccCoreTransactionErrorRateAboveOnePercent

Trigger Condition The number of failed transactions is above 1 percent of the total transactions.
Severity warning
Alert details provided CNCC Core transaction Error rate is above 1 Percent of Total Transactions
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failure transactions are below 1% of the total transactions or when the number of failure transactions cross the 10% threshold in which case the CnccCoreTransactionErrorRateAbove10Percent shall beraised.Steps:1. Check the Service specific metrics to understand the specific service request errors.2. If guidance required, contact My Oracle Support.

8.8.10 CnccCoreTransactionErrorRateAboveTenPercent

Table 8-40 CnccCoreTransactionErrorRateAboveTenPercent

Trigger Condition The number of failed transactions is above 10 percent of the total transactions.
Severity minor
Alert details provided CNCC Core transaction Error rate is above 10 Percent of Total Transactions
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failure transactions are below 10% of the total transactions or when the number of failure transactions cross the 25% threshold in which case the CnccCoreTransactionErrorRateAbove25Percent shall beraised.Steps:1. Check the Service specific metrics to understand the specific service request errors.2. If guidance required, contact My Oracle Support.

8.8.11 CnccCoreTransactionErrorRateAboveTwentyFivePercent

Table 8-41 CnccCoreTransactionErrorRateAboveTwentyFivePercent

Trigger Condition The number of failed transactions is above 25 percent of the total transactions.
Severity major
Alert details provided CNCC Core transaction Error Rate detected above 25 Percent of Total Transactions
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failure transactions are below 25% of the total transactions or when the number of failure transactions cross the 50% threshold in which case the CnccCoreTransactionErrorRateAbove50Percent shall beraised.Steps:1. Check the Service specific metrics to understand the specific service request errors.2. If guidance required, contact My Oracle Support.

8.8.12 CnccCoreTransactionErrorRateAboveFiftyPercent

Table 8-42 CnccCoreTransactionErrorRateAboveFiftyPercent

Trigger Condition The number of failed transactions is above 50 percent of the total transactions.
Severity critical
Alert details provided CNCC Core transaction Error Rate detected above 50 Percent of Total Transactions
Metric Used oc_ingressgateway_http_responses_total
Resolution The alert is cleared when the number of failure transactions are below 50 percent of the total transactions.Steps:1. Check the Service specific metrics to understand the specific service request errors.2. If guidance required, contact My Oracle Support.

8.8.13 CnccCoreUnauthorizedAccess

Table 8-43 CnccCoreUnauthorizedAccess

Trigger Condition If the count of unauthorized access goes above the configured value of '3'
Severity warning
Alert details provided Unauthorized Accesses have been detected in CNCC-Core for request. The configured threshold value is 3 for every 5 min
Metric Used oc_ingressgateway_http_responses_total
Resolution

The alert gets cleared when the total unauthorized accesses for a particular user go below the threshold value (default value is '3') in the last 5 min (default value is 5 m)

Note : The threshold and time is configurable in the alerts.yaml

If guidance required, Contact My Oracle Support.