OCNADD_POD_CPU_USAGE_ALERT
Table 5-1 OCNADD_POD_CPU_USAGE_ALERT
| Field | Details |
|---|---|
| Triggering Condition | POD CPU usage is above set threshold (default 70%) |
| Severity | Major |
| Description | OCNADD Pod High CPU usage detected for the continuous period of 5min |
| Alert Details |
Summary: 'namespace: {{ "{{" }}$labels.namespace}}, podname: {{ "{{" }}$labels.pod}}, timestamp: {{ "{{" }} with query "time()" }}{{ "{{" }} . | first | value | humanizeTimestamp }}{{ "{{" }} end }}: CPU usage is {{ "{{" }} $value | printf "%.2f" }} which is above threshold {{ .Values.global.cluster.cpu_threshold }} % '
Expression: expr: (sum(rate(container_cpu_usage_seconds_total{image!="" , pod=~".*ocnadd.*"}[5m])) by (pod,namespace) > {{ .Values.global.cluster.cpu_threshold }}*2) or (sum(rate(container_cpu_usage_seconds_total{image!="" , pod=~".*kafka.*"}[5m])) by (pod,namespace) > {{ .Values.global.cluster.cpu_threshold }}*2) or (sum(rate(container_cpu_usage_seconds_total{image!="" , pod=~".*zookeeper.*"}[5m])) by (pod,namespace) > {{ .Values.global.cluster.cpu_threshold }}) or (sum(rate(container_cpu_usage_seconds_total{image!="" , pod=~".*egw.*"}[5m])) by (pod,namespace) > {{ .Values.global.cluster.cpu_threshold }}*2) or (sum(rate(container_cpu_usage_seconds_total{image!="" , pod=~".*adapter.*"}[5m])) by (pod,namespace) > {{ .Values.global.cluster.cpu_threshold }}*2) |
| OID | 1.3.6.1.4.1.323.5.3.51.29.4002 |
| Metric Used |
container_cpu_usage_seconds_total Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
| Resolution |
The alert gets cleared when the CPU utilization is below the critical threshold. Note: The threshold is configurable in the
|