OCNADD_ADMIN_SVC_DOWN
Table 5-9 OCNADD_ADMIN_SVC_DOWN
Field | Details |
---|---|
Triggering Condition | The OCNADD Admin service went down or not accessible |
Severity | Critical |
Description | OCNADD Admin service not available for more than 2 min |
Alert Details |
Summary: 'namespace: {{ "{{" }}$labels.namespace}}, podname: {{ "{{" }}$labels.pod}}, timestamp: {{ "{{" }} with query "time()" }}{{ "{{" }} . | first | value | humanizeTimestamp }}{{ "{{" }} end }}: ocnaddadminservice service is down'Expression: expr: up{service="ocnaddadminservice"} != 1 |
OID | 1.3.6.1.4.1.323.5.3.51.30.2002 |
Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert is cleared automatically when the OCNADD Admin service start becoming available. Steps: 1. Check for service specific alerts which may be causing the issues with service exposure. 2. Run the following command to check if the pod’s status is in “Running” state: kubectl –n <namespace> get pod If it is not in running state, capture the pod logs and events. Run the following command to fetch the events as follows: kubectl get events --sortby=.metadata.creationTimestamp -n <namespace> 3. Refer to the application logs and check for database related failures such as connectivity, invalid secrets, and so on. 4. Run the following command to check Helm status and make sure there are no errors: helm status <helm release name of data director> -n<namespace> If it is not in “STATUS: DEPLOYED”, then again capture logs and events. 5. If the issue persists, capture all the outputs from the above steps and contact unresolvable-reference.html#GUID-6BEC5E06-7EC9-457A-8A82-15663EEAE8AF, If guidance is required. |