7 OCCM Alerts

This section describes the alerts available for OCCM.

Note:

Alert file is packaged with OCCM CSAR package.

  • Review the occm_alerting_rules_promha_<version>.yaml file and edit the value of the parameters in the occm_alerting_rules_promha_<version>.yaml file (if needed to be changed from default values) before configuring the alerts. See above table for details.
  • kubernetes_namespace is configured as kubernetes namespace in which OCCM is deployed. Default value is occm. Please update the occm_alerting_rules_promha_<version>.yaml file to reflect the correct OCCM kubernetes namespace.

Table 7-1 Alerts Levels or Severity Types

Alerts Levels / Severity Types Definition
Critical Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of OCCM.
Major Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of OCCM.
Minor Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of OCCM.
Info or Warn (Informational) Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of OCCM.

7.1 OccmCmpIdentityCertExpirationMinor

Table 7-2 OccmCmpIdentityCertExpirationMinor

Field Details
Description CMP Identity (OCCM) certificate has expired.

The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within 90 days.

Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 90 days'
Severity Minor
Condition The CMP Identity (OCCM) certificate will expire within 90 days.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7001
Metric Used occm_cmp_identity_cert_expiration_seconds
Recommended Actions

Information that certificate is going to expire within 90 days. The alert is cleared when the certificate is renewed so that the certificate expiry day is below the minor threshold or when the certificate expiry day crosses the major threshold, in this case the alert is raised.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Check the certificate configuration to renew before the expiry day.
  2. If this is unexpected, contact My Oracle Support.

7.2 OccmCmpIdentityCertExpirationMajor

Table 7-3 OccmCmpIdentityCertExpirationMajor

Field Details
Description CMP Identity (OCCM) certificate has expired.

The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within 30 days.

Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 30 days'
Severity Major
Condition The CMP Identity (OCCM) certificate will expire within 30 days.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7001
Metric Used occm_cmp_identity_cert_expiration_seconds
Recommended Actions

Information that certificate is going to expire within 30 days. The alert is cleared when the certificate is renewed or when the certificate expiry days crosses the critical threshold, in which case the alert is raised.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Check the certificate configuration to renew before the expiry day.
  2. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to the thread exceptions.
  3. Perform the resolution steps depending on the failure reason.
  4. If this is unexpected, contact My Oracle Support.

7.3 OccmCmpIdentityCertExpirationCritical

Table 7-4 OccmCmpIdentityCertExpirationCritical

Field Details
Description CMP Identity (OCCM) certificate has expired.

The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within one week.

Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 1 week'
Severity Critical
Condition The CMP Identity (OCCM) certificate will expire within one week.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7001
Metric Used occm_cmp_identity_cert_expiration_seconds
Recommended Actions

Information that Certificate is going to expire within one week. The alert is cleared when the certificate is renewed.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Check the certificate configuration to renew before the expiry day.
  2. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to the thread exceptions.
  3. Perform the resolution steps depending on the failure reason.
  4. If this is unexpected, contact My Oracle Support.

7.4 OccmCmpIdentityCertExpired

Table 7-5 OccmCmpIdentityCertExpired

Field Details
Description Alert is raised when the certificate expires and then recreation will be triggered. If the certificate recreation is successful then alert will be cleared automatically or the operator has to clear the alert manually.

The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} is expired.

Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} is expired'
Severity Critical
Condition The CMP Identity (OCCM) certificate has expired.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7002
Metric Used occm_cmp_identity_cert_expiration_seconds
Recommended Actions

Information that the certificate has expired. The alert is cleared when the certificate is recreated.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to the thread exceptions.
  2. Perform the following steps if recreate fails and certificates are expired:
    1. Check logs to identify the root cause. The possible cause may be CA connection failure. In this case operator must manually configure the CMP Identity certificate.
    2. Get the kubernetes secret name corresponding to OCCM key and certificate location from the mapped issuer. This information is present under CMP client authentication options for Other Cert section of the issuer.
    3. Manually create CMP Identity (OCCM) certificate and update the secret.
    4. Manual recreation of certificate can be triggered when CA connection resumes.
    5. To renew expired certificate, see "Expired Certificate Detection" section in Oracle Communications Cloud Native Core, Certificate Management Troubleshooting Guide.
  3. If this is unexpected, contact My Oracle Support.

7.5 OccmEndEntityCertExpirationMinor

Table 7-6 OccmEndEntityCertExpirationMinor

Field Details
Description End Entity (NF) certificate has expired.

The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within 90 days.

Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 90 days'
Severity Minor
Condition The End Entity (NF) certificate will expire within 90 days.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7003
Metric Used occm_end_entity_cert_expiration_seconds
Recommended Actions

Information that certificate is going to expire within 90 days. The alert is cleared when the certificate is renewed so that the certificate expiry day is below the minor threshold or when the certificate expiry day crosses the major threshold, in this case the alert is raised.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Check the certificate configuration to renew before the expiry day.
  2. If this is unexpected, contact My Oracle Support.

7.6 OccmEndEntityCertExpirationMajor

Table 7-7 OccmEndEntityCertExpirationMajor

Field Details
Description End Entity (NF) certificate has expired.

The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within 30 days.

Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 30 days.
Severity Major
Condition End Entity (NF) certificate will expire soon within 30 days.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7003
Metric Used occm_end_entity_cert_expiration_seconds
Recommended Actions

Information that Certificate is going to expire within 30 days. The alert is cleared when the certificate is renewed or when the certificate expiry day crosses the critical threshold,, in this case the alert is raised.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Check the certificate configuration to renew before the expiry day.
  2. Refer to the application logs on Kibana and filter based on occm service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Perform the resolution steps depending on the failure reason.
  4. If this is unexpected, contact My Oracle Support.

7.7 OccmEndEntityCertExpirationCritical

Table 7-8 OccmEndEntityCertExpirationCritical

Field Details
Description End Entity (NF) certificate has expired.

The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire within one week.

Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} will expire soon within 1 week'
Severity Critical
Condition End Entity (NF) certificate will expire soon within one week.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7003
Metric Used occm_end_entity_cert_expiration_seconds
Recommended Actions

Information that Certificate is going to expire within one week. The alert is cleared when the certificate is renewed.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Check the certificate configuration to renew before the expiry day.
  2. Refer to the application logs on Kibana and filter based on occm service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Perform the resolution steps depending on the failure reason.
  4. If this is unexpected, contact My Oracle Support.

7.8 OccmEndEntityCertExpired

Table 7-9 OccmEndEntityCertExpired

Field Details
Description Alert is raised when the certificate expires and then recreation will be triggered. If the certificate recreation is successful then alert will be cleared automatically or the operator has to clear the alert manually.

The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} is expired'

Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} for {{$labels.certPurpose}} is expired'
Severity Critical
Condition End Entity (NF) certificate has expired.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7009
Metric Used occm_end_entity_cert_expiration_seconds
Recommended Actions

Information that certificate has expired. The alert is cleared when the certificate is recreated.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to the thread exceptions.
  2. Perform the following steps if recreate fails and certificates are expired:
    1. Check logs to identify the root cause. The possible cause may be CA connection failure.
    2. As a resolution perform the recreate operation when is CA is accessible. Alert will be cleared once recreation is successful.
    3. If CA is still down then manually create the End-Entity (NF) certificate and update the details in secret, which is automatically monitored by OCCM.
    4. Manual recreation of certificate can be triggered when CA connection resumes.
    5. To renew expired certificate, see "Expired Certificate Detection" section in Oracle Communications Cloud Native Core, Certificate Management Troubleshooting Guide..
  3. If this is unexpected, contact My Oracle Support.

7.9 OccmServiceDown

Table 7-10 OccmServiceDown

Field Details
Description OCCM Service Down Alert

New certificates will not be created, and existing ones can not be renewed until OCCM is back
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: OCCM service is down
Severity Critical
Condition The pods of the occm service is unavailable.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7004
Metric Used up

Note:This is a prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

Recommended Actions

The alert is cleared when the occm service is available.

Steps:

  1. Check the orchestration logs of occm service and check for liveness or readiness probe failures.
  2. Refer to the application logs on Kibana and filter based on occm service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, contact My Oracle Support.

7.10 OccmMemoryUsageMinorThreshold

Table 7-11 OccmMemoryUsageMinorThreshold

Field Details
Description OCCM Memory Usage Alert

OCCM Memory Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (70%) (value={{ $value }}) of its limit.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory Usage of pod exceeded 70% of its limit.
Severity Minor
Condition A pod has reached the configured minor threshold( 70%) of its memory resource limits.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7005
Metric Used

container_memory_usage_bytes,

Note : This is a kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Recommended Actions

The alert gets cleared when the memory utilization falls below the Minor Threshold or crosses the major threshold, in which case OccmMemoryUsageMajorThreshold alert shall be raised.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to thread exceptions.
  2. Depending on the failure reason, take the resolution steps.
  3. If this is unexpected, contact My Oracle Support.

7.11 OccmMemoryUsageMajorThreshold

Table 7-12 OccmMemoryUsageMajorThreshold

Field Details
Description OCCM Memory Usage Alert

OCCM Memory Usage for pod {{ $labels.pod }} has crossed the configured major threshold (80%) (value={{ $value }}) of its limit.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory Usage of pod exceeded 80% of its limit.
Severity Major
Condition A pod has reached the configured major threshold( 80%) of its memory resource limits.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7005
Metric Used

container_memory_usage_bytes,

Note : This is a kubernetes metric used for instance availability monitoring.If the metric is not available, use the similar metric as exposed by the monitoring system.

Recommended Actions The alert gets cleared when the memory utilization falls below the Major Threshold or crosses the critical threshold, in which case OccmMemoryUsageMajorThreshold alert shall be raised

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to thread exceptions.
  2. Depending on the failure reason, take the resolution steps.
  3. If this is unexpected, contact My Oracle Support.

7.12 OccmMemoryUsageCriticalThreshold

Table 7-13 OccmMemoryUsageCriticalThreshold

Field Details
Description OCCM Memory Usage Alert

OCCM Memory Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (90%) (value={{ $value }}) of its limit..
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory Usage of pod exceeded 90% of its limit.
Severity Critical
Condition A pod has reached the configured critical threshold ( 90% ) of its memory resource limits
OID 1.3.6.1.4.1.323.5.3.54.1.2.7005
Metric Used

container_memory_usage_bytes,

Note : This is a kubernetes metric used for instance availability monitoring.If the metric is not available, use the similar metric as exposed by the monitoring system.

Recommended Actions The alert gets cleared when the memory utilization falls below the Critical Threshold.Note : The threshold is configurable in the alerts.yaml

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to thread exceptions.
  2. Depending on the failure reason, take the resolution steps.
  3. If this is unexpected, contact My Oracle Support.

7.13 OccmCPUUsageMinorThreshold

Table 7-14 OccmCPUUsageMinorThreshold

Field Details
Description OCCM CPU Usage Alert

OCCM Pod {{$labels.pod}} has high CPU usage detected.
Summary namespace: {{ $labels.namespace}}, podname: {{ $labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: CPU usage is {{ $value | printf "%.2f" }} which is usage is above 70% (current value is: {{ $value }})
Severity Minor
Condition CPU usage is above 70%
OID 1.3.6.1.4.1.323.5.3.54.1.2.7006
Metric Used container_cpu_usage_seconds_total
Recommended Actions

Information regarding CPU usage If it is above 70%

The alert gets cleared when the CPU usage falls below the Minor Threshold.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to thread exceptions.
  2. Depending on the failure reason, take the resolution steps.
  3. If this is unexpected, contact My Oracle Support.

7.14 OccmCMPFailureMinor

Table 7-15 OccmCMPFailureMinor

Field Details
Description OCCM CMP Command Execution Failure Alert

The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}.
Severity Minor
Condition Certificate has failed while executing CMP cmds.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7007
Metric Used occm_cmp_responses_total
Recommended Actions

Information that the rate of certificate failure due to CMP command execution error has crossed the threshold. The alert is cleared when the rate of certificate failure due to CMP command execution error falls below the Minor threshold or when the error rate crosses the Major threshold, in which case the OccmCMPFailureMajor alert is raised.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to thread exceptions.
  2. Depending on the failure reason, take the resolution steps.
  3. If this is unexpected, contact My Oracle Support.

7.15 OccmCMPFailureMajor

Table 7-16 OccmCMPFailureMajor

Field Details
Description OCCM CMP Command Execution Failure Alert

The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}.
Severity Major
Condition Certificate has failed while executing CMP cmds
OID 1.3.6.1.4.1.323.5.3.54.1.2.7007
Metric Used occm_cmp_responses_total
Recommended Actions

Information that the rate of certificate failure due to CMP command execution error has crossed the threshold. The alert is cleared when the rate of certificate failure due to CMP command execution error falls below the Major threshold or when the error rate crosses the Critical threshold, in which case the OccmCMPFailureCritical alert is raised.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to thread exceptions.
  2. Depending on the failure reason, take the resolution steps.
  3. If this is unexpected, contact My Oracle Support.

7.16 OccmCMPFailureCritical

Table 7-17 OccmCMPFailureCritical

Field Details
Description OCCM CMP Command Execution Failure Alert

The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while executing CMP cmd with {{$labels.statusCode}}.
Severity Critical
Condition Certificate has failed while executing CMP cmds
OID 1.3.6.1.4.1.323.5.3.54.1.2.7007
Metric Used occm_cmp_responses_total
Recommended Actions

Information that the rate of certificate failure due to CMP command execution error has crossed the threshold. The alert is cleared when the rate of certificate failure due to CMP command execution error falls below the Critical threshold.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to thread exceptions.
  2. Depending on the failure reason, take the resolution steps.
  3. If this is unexpected, contact My Oracle Support.

7.17 OccmFailureMinor

Table 7-18 OccmFailureMinor

Field Details
Description OCCM Internal Failure Alert

The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}.
Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}.
Severity Minor
Condition Certificate has failed while creating
OID 1.3.6.1.4.1.323.5.3.54.1.2.7008
Metric Used occm_cert_request_status_total
Recommended Actions

Information that the rate of OCCM errors has crossed the threshold. The alert is cleared when the rate OCCM error falls below the Minor threshold or when the error rate crosses the Major threshold, in which case the OccmFailureMajor alert is raised.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to thread exceptions.
  2. Depending on the failure reason, take the resolution steps.
  3. If this is unexpected, contact My Oracle Support.

7.18 OccmFailureMajor

Table 7-19 OccmFailureMajor

Field Details
Description OCCM Internal Failure Alert

The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}.

Summary

namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}.

Severity Major
Condition Certificate has failed while creating
OID 1.3.6.1.4.1.323.5.3.54.1.2.7008
Metric Used occm_cert_request_status_total
Recommended Actions

Information that the rate of OCCM errors has crossed the threshold. The alert is cleared when the rate OCCM error falls below the Major threshold or when the error rate crosses the Critical threshold, in which case the OccmFailureCritical alert is raised.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to thread exceptions.
  2. Depending on the failure reason, take the resolution steps.
  3. If this is unexpected, contact My Oracle Support.

7.19 OccmFailureCritical

Table 7-20 OccmFailureCritical

Field Details
Description OCCM CMP Command Execution Failure Alert

The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}.

Summary

namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The certificate {{$labels.certName}} used by {{$labels.nfType}} has failed while creating cert with {{$labels.errorReason}}.

Severity critical
Condition Certificate has failed while creating
OID 1.3.6.1.4.1.323.5.3.54.1.2.7008
Metric Used occm_cert_request_status_total
Recommended Actions

Information that the rate of certificate failure due to CMP command execution error has crossed the threshold. The alert is cleared when the rate of certificate failure due to CMP command execution error falls below the Critical threshold.

Note: The threshold is configurable in the occm_alertingrules_<version>.yaml file.

Steps:

  1. Refer to the application logs on Kibana and filter based on occm service name. Check for ERROR WARNING logs related to thread exceptions.
  2. Depending on the failure reason, take the resolution steps.
  3. If this is unexpected, contact My Oracle Support.

7.20 OccmInputSecretModifyMajor

Table 7-21 OccmInputSecretModifyMajor

Field Details
Description Input secret is modified by non-OCCM user

The Secret {{$labels.secret}} in {{$labels.secretNamespace}} is modified by non-occm user, which is used by {{$labels.name}}.'

Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The Secret {{$labels.secret}} in {{$labels.secretNamespace}} is modified by non-occm user, which is used by {{$labels.name}} and {{$labels.type}}.'
Severity Major
Condition Input secrets are modified by non-OCCM users or by the operator manually.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7010
Metric Used occm_secret_event_total
Recommended Actions

Information that the input secret is modified by non-OCCM user.

Steps:

  1. Check input secrets for any modifications.
  2. See the alert label for the namespace and to see which secret alert is triggered.
  3. Update input secrets with correct data, if require.
  4. If this is unexpected, contact My Oracle Support.

7.21 OccmOutputSecretModifyMinor

Table 7-22 OccmOutputSecretModifyMinor

Field Details
Description Output secret is modified by non-OCCM user

The Secret {{$labels.secret}} in {{$labels.secretNamespace}} is modified by non-occm user, which is used by {{$labels.name}}.'

Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The Secret {{$labels.secret}} in {{$labels.secretNamespace}} is modified by non-occm user, which is used by {{$labels.name}} and {{$labels.type}}.'
Severity Minor
Condition Output secrets are modified by non-OCCM user or by operator manually
OID 1.3.6.1.4.1.323.5.3.54.1.2.7011
Metric Used occm_secret_event_total
Recommended Actions

Information that the output secret is modified by non-OCCM user.

Steps:

  1. Check output secrets for any modifications.
  2. Automatic recreation will be triggered if certificate which is modified does not match with cert config.
  3. Updatation of validity will be done, if the modified certificate validation is successful with certification configuration. No recreation will be triggered in this case.
  4. If this is unexpected, contact My Oracle Support.

7.22 OccmK8sResourceDeleteMajor

Table 7-23 OccmK8sResourceDeleteMajor

Field Details
Description Kubernetes resource (secret or namespace) is deleted by non-OCCM user

The Kubernetes resource is deleted, which is used in {{$labels.name}} of type {{$labels.type}}. K8s resources, secretNamespace: {{$labels.secretNamespace}} and secret: {{$labels.secret}}'

Summary {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The k8s resource is deleted, which is used in {{$labels.name}} of type {{$labels.type}}. K8s resources, namespace: {{$labels.secretNamespace}} and secret: {{$labels.secret}}.'
Severity Major
Condition Kubernetes resources (secret or namespace) are deleted by non-OCCM user or by operator manually.
OID 1.3.6.1.4.1.323.5.3.54.1.2.7012
Metric Used occm_secret_event_total
Recommended Actions

Information that the Kubernetes resources (secret or namespace) are deleted by non-OCCM user.

Steps:

  1. Check output secrets for any deletion.
  2. Automatic recreation of certificate will be triggered, if secret is deleted.
  3. if namespace is deleted, then automatic recreation of certificate does not happen and the operator must delete the certificate configuration from the OCCM which are associated with that namespace.
  4. If this is unexpected, contact My Oracle Support.

7.23 OCCM Alert and MIB Configuration in Prometheus

CNE supporting Prometheus HA

This section describes the measurement based Alert rules configuration for OCCM in Prometheus. You must use the updated occm_alerting_rules_promha_<version>.yaml file.

Run the following command to create ot update the PrometheusRule resource specified in the alert YAML file:
$ kubectl apply -f occm_alerting_rules_promha_<version>.yaml

Disabling Alerts

This section describes the procedure to disable the alerts in OCCM. To disable alerts:
  1. Edit occm_alerting_rules_promha_<version>.yaml file to remove specific alert.
  2. Remove complete content of the specific alert from the occm_alerting_rules_promha_<version>.yaml file.
    For example, ff you want to remove OccmServiceDown alert, remove the complete content:
    
    ## ALERT SAMPLE START##
    - alert: OccmServiceDown
          annotations:
            description: 'New certificates will not be created, and existing ones can not be renewed until OCCM is back'
            summary: 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: OCCM service is down'
          expr: absent(up{pod=~".*occm.*", namespace="occm-ns"}) or (up{pod=~".*occm.*", namespace="occm-ns"}) == 0
          labels:
            severity: critical
            oid: "1.3.6.1.4.1.323.5.3.54.1.2.7004"
            namespace: ' {{ $labels.namespace }} '
            podname: ' {{$labels.pod}} '
    ## ALERT SAMPLE END##
  3. Perform Alert configuration.

Validating Alerts

Configure and Validate Alerts in Prometheus Server. Refer to OCCM Alert Configuration for procedure to configure the alerts.

After configuring the alerts in Prometheus server, a user can verify that by following steps:

  1. Open the Prometheus server from your browser using the <IP>:<Port>
  2. Navigate to Status and then Rules
  3. Search OCCM. OCCMAlerts list is displayed.

Note:

If you are unable to see the alerts, it means that the alert file has not loaded in a format which the Prometheus server accepts. Modify the file and try again.

Configuring SNMP-Notifier

Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using following procedure:
  1. Run the following command to edit the deployment:
    kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>
    Example:
    $ kubectl edit deploy occne-snmp-notifier -n occne-infra
  2. Edit the destination as follows:
    --snmp.destination=<destination_ip>:<destination_port>
    Example:
    --snmp.destination=10.75.203.94:162

MIB Files for OCCM

There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.

  • occm_mib_tc_<version>.mib: This is considered as OCCM top level mib file, where the Objects and their data types are defined
  • occm_mib_<version>.mib: This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.

Note:

MIB files are packaged along with OCCM CSAR package. Download the file from MOS. For more information, see Oracle Communications Cloud Native Core, Certificate Management Installation, Upgrade, and Fault Recovery Guide.