Serviceability

Serviceability refers to the ability to detect, diagnose and correct issues that occur in an operational system. Its primary requirement is the collection of system data: general hardware telemetry details, log files from all system components, and results from system and configuration health checks. For a more detailed description of monitoring, system health and logging, refer to Status and Health Monitoring.

As an engineered system, Private Cloud Appliance is designed to process the collected data in a structured manner. To provide real-time status information, the system collects and stores metric data from all components and services in a unified way using Prometheus. Centrally aggregated data from Prometheus is visualized through metric panels in a Grafana dashboard, which permits an administrator to check overall system status at a glance. Logs are captured across the appliance using Fluentd, and collected in Loki for diagnostic purposes.

When a component status changes from healthy to not healthy, the alerting mechanism can be configured to send notifications to initiate a service workflow. If support from Oracle is required, the first step is for an administrator to open a service request and provide a problem description.

If the Private Cloud Appliance is registered for ASR, certain hardware failures cause a service request and diagnostic data to be automatically sent to Oracle support. The collection of diagnostic data is also called a support bundle. A Service Enclave administrator can also create and send a service request and supporting diagnostic data separate from ASR. For more information about ASR and support bundles, see Status and Health Monitoring in the Oracle Private Cloud Appliance Administrator Guide.

To resolve the reported issue, Oracle may need access to the appliance infrastructure. For this purpose, a dedicated service account is configured during system initialization. For security reasons, this non-root account has no password. You must generate and provide a service key to allow the engineer to work on the system on your behalf. Activity related to the service account leaves an audit trail and is clearly separated from other user activity.

Most service scenarios for Oracle Engineered Systems are covered by detailed action plans, which are executed by the assigned field engineer. When the issue is resolved, or if a component has been repaired or replaced, the engineer validates that the system is healthy again before the service request is closed.

This structured approach to problem detection, diagnosis and resolution ensures that high-quality service is delivered, with minimum operational impact, delay and cost, and with maximum efficiency.