7 OCCM Alerts
This section describes the alerts available for OCCM.
Note:
Alert file is packaged with OCCM CSAR package.
- Review the occm_alerting_rules_promha_<version>.yaml file and edit the value of the parameters in the occm_alerting_rules_promha_<version>.yaml file (if needed to be changed from default values) before configuring the alerts. See above table for details.
- kubernetes_namespace is configured as kubernetes namespace in which OCCM is deployed. Default value is occm. Please update the occm_alerting_rules_promha_<version>.yaml file to reflect the correct OCCM kubernetes namespace.
Table 7-1 Alerts Levels or Severity Types
Alerts Levels / Severity Types | Definition |
---|---|
Critical | Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of OCCM. |
Major | Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of OCCM. |
Minor | Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of OCCM. |
Info or Warn (Informational) | Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of OCCM. |
7.1 OccmCmpIdentityCertExpirationMinor
Table 7-2 OccmCmpIdentityCertExpirationMinor
Field | Details |
---|---|
Description | CMP Identity (OCCM) certificate has expired.
The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within 90 days. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 90 days' |
Severity | Minor |
Condition | The CMP Identity (OCCM) certificate will expire within 90 days. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7001 |
Metric Used | occm_cmp_identity_cert_expiration_seconds |
Recommended Actions |
Information that certificate is going to expire within 90 days. The alert is cleared when the certificate is renewed so that the certificate expiry day is below the minor threshold or when the certificate expiry day crosses the major threshold, in this case the alert is raised. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.2 OccmCmpIdentityCertExpirationMajor
Table 7-3 OccmCmpIdentityCertExpirationMajor
Field | Details |
---|---|
Description | CMP Identity (OCCM) certificate has expired.
The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within 30 days. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 30 days' |
Severity | Major |
Condition | The CMP Identity (OCCM) certificate will expire within 30 days. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7001 |
Metric Used | occm_cmp_identity_cert_expiration_seconds |
Recommended Actions |
Information that certificate is going to expire within 30 days. The alert is cleared when the certificate is renewed or when the certificate expiry days crosses the critical threshold, in which case the alert is raised. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.3 OccmCmpIdentityCertExpirationCritical
Table 7-4 OccmCmpIdentityCertExpirationCritical
Field | Details |
---|---|
Description | CMP Identity (OCCM) certificate has expired.
The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within one week. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 1 week' |
Severity | Critical |
Condition | The CMP Identity (OCCM) certificate will expire within one week. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7001 |
Metric Used | occm_cmp_identity_cert_expiration_seconds |
Recommended Actions |
Information that Certificate is going to expire within one week. The alert is cleared when the certificate is renewed. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.4 OccmCmpIdentityCertExpired
Table 7-5 OccmCmpIdentityCertExpired
Field | Details |
---|---|
Description | Alert is raised when the certificate expires and then recreation
will be triggered. If the certificate recreation is successful then alert will be
cleared automatically or the operator has to clear the alert manually.
The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} is expired. |
Summary | 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} is expired' |
Severity | Critical |
Condition | The CMP Identity (OCCM) certificate has expired. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7002 |
Metric Used | occm_cmp_identity_cert_expiration_seconds |
Recommended Actions |
Information that the certificate has expired. The alert is cleared when the certificate is recreated. Steps:
|
7.5 OccmEndEntityCertExpirationMinor
Table 7-6 OccmEndEntityCertExpirationMinor
Field | Details |
---|---|
Description | End Entity (NF) certificate has expired.
The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within 90 days. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 90 days' |
Severity | Minor |
Condition | The End Entity (NF) certificate will expire within 90 days. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7003 |
Metric Used | occm_end_entity_cert_expiration_seconds |
Recommended Actions |
Information that certificate is going to expire within 90 days. The alert is cleared when the certificate is renewed so that the certificate expiry day is below the minor threshold or when the certificate expiry day crosses the major threshold, in this case the alert is raised. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.6 OccmEndEntityCertExpirationMajor
Table 7-7 OccmEndEntityCertExpirationMajor
Field | Details |
---|---|
Description | End Entity (NF) certificate has expired.
The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within 30 days. |
Summary | 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 30 days. |
Severity | Major |
Condition | End Entity (NF) certificate will expire soon within 30 days. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7003 |
Metric Used | occm_end_entity_cert_expiration_seconds |
Recommended Actions |
Information that Certificate is going to expire within 30 days. The alert is cleared when the certificate is renewed or when the certificate expiry day crosses the critical threshold,, in this case the alert is raised. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.7 OccmEndEntityCertExpirationCritical
Table 7-8 OccmEndEntityCertExpirationCritical
Field | Details |
---|---|
Description | End Entity (NF) certificate has expired.
The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within one week. |
Summary | 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 1 week' |
Severity | Critical |
Condition | End Entity (NF) certificate will expire soon within one week. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7003 |
Metric Used | occm_end_entity_cert_expiration_seconds |
Recommended Actions |
Information that Certificate is going to expire within one week. The alert is cleared when the certificate is renewed. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.8 OccmEndEntityCertExpired
Table 7-9 OccmEndEntityCertExpired
Field | Details |
---|---|
Description | Alert is raised when the certificate expires and then recreation
will be triggered. If the certificate recreation is successful then alert will be
cleared automatically or the operator has to clear the alert manually.
The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} is expired' |
Summary | 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} is expired' |
Severity | Critical |
Condition | End Entity (NF) certificate has expired. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7009 |
Metric Used | occm_end_entity_cert_expiration_seconds |
Recommended Actions |
Information that certificate has expired. The alert is cleared when the certificate is recreated. Steps:
|
7.9 OccmServiceDown
Table 7-10 OccmServiceDown
Field | Details |
---|---|
Description | OCCM Service Down Alert New certificates will not be created, and existing ones can not be renewed until OCCM is back |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: OCCM service is down |
Severity | Critical |
Condition | The pods of the occm service is unavailable. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7004 |
Metric Used | up
Note:This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
Recommended Actions |
The alert is cleared when the occm service is available. Steps:
|
7.10 OccmMemoryUsageMinorThreshold
Table 7-11 OccmMemoryUsageMinorThreshold
Field | Details |
---|---|
Description | OCCM Memory Usage Alert OCCM Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (70%) (value={{ $value }}) of its limit. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory Usage of pod exceeded 70% of its limit. |
Severity | Minor |
Condition | A pod has reached the configured minor threshold( 70%) of its memory resource limits. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7005 |
Metric Used |
container_memory_usage_bytes, Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Recommended Actions |
The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case OccmMemoryUsageMajorThreshold alert shall be raised. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.11 OccmMemoryUsageMajorThreshold
Table 7-12 OccmMemoryUsageMajorThreshold
Field | Details |
---|---|
Description | OCCM Memory Usage Alert OCCM Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (80%) (value={{ $value }}) of its limit. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory Usage of pod exceeded 80% of its limit. |
Severity | Major |
Condition | A pod has reached the configured major threshold( 80%) of its memory resource limits. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7005 |
Metric Used |
container_memory_usage_bytes, Note : This is a kubernetes metric used for instance availability monitoring.If the metric is not available, use the similar metric as exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the memory utilization falls below the
Major Threshold or crosses the critical threshold, in which case
OccmMemoryUsageMajorThreshold alert shall be raised
Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.12 OccmMemoryUsageCriticalThreshold
Table 7-13 OccmMemoryUsageCriticalThreshold
Field | Details |
---|---|
Description | OCCM Memory Usage Alert OCCM Memory Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (90%) (value={{ $value }}) of its limit.. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory Usage of pod exceeded 90% of its limit. |
Severity | Critical |
Condition | A pod has reached the configured critical threshold ( 90% ) of its memory resource limits |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7005 |
Metric Used |
container_memory_usage_bytes, Note : This is a kubernetes metric used for instance availability monitoring.If the metric is not available, use the similar metric as exposed by the monitoring system. |
Recommended Actions | The alert gets cleared when the memory utilization falls below the
Critical Threshold.Note : The threshold is configurable in the
alerts.yaml
Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.13 OccmCPUUsageMinorThreshold
Table 7-14 OccmCPUUsageMinorThreshold
Field | Details |
---|---|
Description | OCCM CPU Usage Alert OCCM Pod {{$labels.pod}} has high CPU usage detected. |
Summary | namespace: {{ $labels.namespace}}, podname: {{ $labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: CPU usage is {{ $value | printf "%.2f" }} which is usage is above 70% (current value is: {{ $value }}) |
Severity | Minor |
Condition | CPU usage is above 70% |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7006 |
Metric Used | container_cpu_usage_seconds_total |
Recommended Actions |
Information regarding CPU usage If it is above 70% The alert gets cleared when the CPU usage falls below the Minor Threshold. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.14 OccmCMPFailureMinor
Table 7-15 OccmCMPFailureMinor
Field | Details |
---|---|
Description | OCCM CMP Command Execution Failure Alert The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}. |
Severity | Minor |
Condition | Certificate has failed while executing CMP cmds. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7007 |
Metric Used | occm_cmp_responses_total |
Recommended Actions |
Information that the rate of certificate failure due to CMP command execution error has crossed the threshold. The alert is cleared when the rate of certificate failure due to CMP command execution error falls below the Minor threshold or when the error rate crosses the Major threshold, in which case the OccmCMPFailureMajor alert is raised. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.15 OccmCMPFailureMajor
Table 7-16 OccmCMPFailureMajor
Field | Details |
---|---|
Description | OCCM CMP Command Execution Failure Alert The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}. |
Severity | Major |
Condition | Certificate has failed while executing CMP cmds |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7007 |
Metric Used | occm_cmp_responses_total |
Recommended Actions |
Information that the rate of certificate failure due to CMP command execution error has crossed the threshold. The alert is cleared when the rate of certificate failure due to CMP command execution error falls below the Major threshold or when the error rate crosses the Critical threshold, in which case the OccmCMPFailureCritical alert is raised. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.16 OccmCMPFailureCritical
Table 7-17 OccmCMPFailureCritical
Field | Details |
---|---|
Description | OCCM CMP Command Execution Failure Alert The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}. |
Severity | Critical |
Condition | Certificate has failed while executing CMP cmds |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7007 |
Metric Used | occm_cmp_responses_total |
Recommended Actions |
Information that the rate of certificate failure due to CMP command execution error has crossed the threshold. The alert is cleared when the rate of certificate failure due to CMP command execution error falls below the Critical threshold. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.17 OccmFailureMinor
Table 7-18 OccmFailureMinor
Field | Details |
---|---|
Description | OCCM Internal Failure Alert The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}. |
Summary | namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}. |
Severity | Minor |
Condition | Certificate has failed while creating |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7008 |
Metric Used | occm_cert_request_status_total |
Recommended Actions |
Information that the rate of OCCM errors has crossed the threshold. The alert is cleared when the rate OCCM error falls below the Minor threshold or when the error rate crosses the Major threshold, in which case the OccmFailureMajor alert is raised. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.18 OccmFailureMajor
Table 7-19 OccmFailureMajor
Field | Details |
---|---|
Description | OCCM Internal Failure Alert
The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}. |
Summary |
namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}. |
Severity | Major |
Condition | Certificate has failed while creating |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7008 |
Metric Used | occm_cert_request_status_total |
Recommended Actions |
Information that the rate of OCCM errors has crossed the threshold. The alert is cleared when the rate OCCM error falls below the Major threshold or when the error rate crosses the Critical threshold, in which case the OccmFailureCritical alert is raised. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.19 OccmFailureCritical
Table 7-20 OccmFailureCritical
Field | Details |
---|---|
Description | OCCM CMP Command Execution Failure Alert
The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}. |
Summary |
namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}. |
Severity | critical |
Condition | Certificate has failed while creating |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7008 |
Metric Used | occm_cert_request_status_total |
Recommended Actions |
Information that the rate of certificate failure due to CMP command execution error has crossed the threshold. The alert is cleared when the rate of certificate failure due to CMP command execution error falls below the Critical threshold. Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file. Steps:
|
7.20 OccmInputSecretModifyMajor
Table 7-21 OccmInputSecretModifyMajor
Field | Details |
---|---|
Description | Input secret is modified by non-OCCM user The Secret {{$labels.secret}} in {{$labels.secretNamespace}} is modified by non-occm user, which is used by {{$labels.name}}.' |
Summary | 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The Secret {{$labels.secret}} in {{$labels.secretNamespace}} is modified by non-occm user, which is used by {{$labels.name}} and {{$labels.type}}.' |
Severity | Major |
Condition | Input secrets are modified by non-OCCM users or by the operator manually. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7010 |
Metric Used | occm_secret_event_total |
Recommended Actions |
Information that the input secret is modified by non-OCCM user. Steps:
|
7.21 OccmOutputSecretModifyMinor
Table 7-22 OccmOutputSecretModifyMinor
Field | Details |
---|---|
Description | Output secret is modified by non-OCCM user The Secret {{$labels.secret}} in {{$labels.secretNamespace}} is modified by non-occm user, which is used by {{$labels.name}}.' |
Summary | 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The Secret {{$labels.secret}} in {{$labels.secretNamespace}} is modified by non-occm user, which is used by {{$labels.name}} and {{$labels.type}}.' |
Severity | Minor |
Condition | Output secrets are modified by non-OCCM user or by operator manually |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7011 |
Metric Used | occm_secret_event_total |
Recommended Actions |
Information that the output secret is modified by non-OCCM user. Steps:
|
7.22 OccmK8sResourceDeleteMajor
Table 7-23 OccmK8sResourceDeleteMajor
Field | Details |
---|---|
Description | Kubernetes resource (secret or namespace) is
deleted by non-OCCM user The Kubernetes resource is deleted, which is used in {{$labels.name}} of type {{$labels.type}}. K8s resources, secretNamespace: {{$labels.secretNamespace}} and secret: {{$labels.secret}}' |
Summary | {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The k8s resource is deleted, which is used in {{$labels.name}} of type {{$labels.type}}. K8s resources, namespace: {{$labels.secretNamespace}} and secret: {{$labels.secret}}.' |
Severity | Major |
Condition | Kubernetes resources (secret or namespace) are deleted by non-OCCM user or by operator manually. |
OID | 1.3.6.1.4.1.323.5.3.54.1.2.7012 |
Metric Used | occm_secret_event_total |
Recommended Actions |
Information that the Kubernetes resources (secret or namespace) are deleted by non-OCCM user. Steps:
|
7.23 OCCM Alert and MIB Configuration in Prometheus
CNE supporting Prometheus HA
This section describes the measurement based Alert rules configuration for OCCM in
Prometheus. You must use the updated
occm_alerting_rules_promha_<version>.yaml file
.
$ kubectl apply -f occm_alerting_rules_promha_<version>.yaml
Disabling Alerts
- Edit
occm_alerting_rules_promha_<version>.yaml
file to remove specific alert. - Remove complete content of the specific alert from the
occm_alerting_rules_promha_<version>.yaml
file.For example, ff you want to remove OccmServiceDown alert, remove the complete content:## ALERT SAMPLE START## - alert: OccmServiceDown annotations: description: 'New certificates will not be created, and existing ones can not be renewed until OCCM is back' summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: OCCM service is down' expr: absent(up{pod=~".*occm.*", namespace="occm-ns"}) or (up{pod=~".*occm.*", namespace="occm-ns"}) == 0 labels: severity: critical oid: "1.3.6.1.4.1.323.5.3.54.1.2.7004" namespace: ' {{ $labels.namespace }} ' podname: ' {{$labels.pod}} ' ## ALERT SAMPLE END##
- Perform Alert configuration.
Validating Alerts
Configure and Validate Alerts in Prometheus Server. Refer to OCCM Alert Configuration for procedure to configure the alerts.
After configuring the alerts in Prometheus server, a user can verify that by following steps:
- Open the Prometheus server from your browser using the <IP>:<Port>
- Navigate to Status and then Rules
- Search OCCM. OCCMAlerts list is displayed.
Note:
If you are unable to see the alerts, it means that the alert file has not loaded in a format which the Prometheus server accepts. Modify the file and try again.Configuring SNMP-Notifier
- Run the following command to edit the deployment:
Example:kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>
$ kubectl edit deploy occne-snmp-notifier -n occne-infra
- Edit the destination as
follows:
Example:--snmp.destination=<destination_ip>:<destination_port>
--snmp.destination=10.75.203.94:162
MIB Files for OCCM
There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.
occm_mib_tc_<version>.mib
: This is considered as OCCM top level mib file, where the Objects and their data types are definedoccm_mib_<version>.mib
: This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.
Note:
MIB files are packaged along with OCCM CSAR package. Download the file from MOS. For more information, see Oracle Communications Cloud Native Core, Certificate Management Installation, Upgrade, and Fault Recovery Guide.