Viewing and Interpreting Monitoring Data in Grafana

The infrastructure services layer of Private Cloud Appliance, which is built on top of the platform and enables all the cloud user and administrator functionality, can be monitored through an extensive collection of Grafana dashboards.

These microservices are deployed across the three management nodes in Kubernetes containers, so their monitoring is largely based on Kubernetes node and pod metrics. The Kubernetes cluster also extends onto the compute nodes, where Kubernetes worker nodes collect vital additional data for system operation and monitoring.

The dashboards described in this section provide a good starting point for microservices health monitoring. You might prefer to use different dashboards, metrics and visualizations instead. The necessary data, collected across the entire system, is stored in Prometheus, and can be queried and presented through Grafana in many ways.

Grafana Folder

Dashboard

Description

Service Monitoring

ClusterLabs HA Cluster Details

This dashboard uses a bespoke Prometheus exporter to display data for HA clusters based on Pacemaker. On each HTTP request it locally inspects the cluster status, by parsing preexisting distributed data provided by the cluster components' tools.

The monitoring data includes Pacemaker cluster summary, nodes and resource stats, and Corosync ring errors and quorum votes.

Service Monitoring

MySQL Cluster Exporter

This dashboard displays performance details for the MySQL database cluster. Data includes database service metrics such as uptime, connection statistics, table lock counts, as well as more general information about MySQL objects, connections, network traffic, memory and CPU usage, etc.

Service Monitoring

Service Level

This dashboard displays detailed information about RabbitMQ requests that are received by the fundamental appliance services. It allows you to monitor the number of requests, request latency, and any requests that caused an error.

Service Monitoring

VM Stats

This comprehensive dashboard displays resource consumption information across the compute instances in your environment. It includes graphs for CPU and memory utilization, disk activity, network traffic, and so on.

The panels in this dashboard display a large number of time series in a single graph. You can click to display a single time series, or hover over the graph to view detailed data at a specific point on the time axis.

PCA 3.0 Service Advisor

Kube Endpoint

This dashboard focuses specifically on the Kubernetes endpoints and provides endpoint alerts. These alerts can be sent to a notification channel of your choice.

PCA 3.0 Service Advisor

Kube Ingress

This dashboard provides data about ingress traffic to the Kubernetes services and their pods. Two alerts are built-in and can be sent to a notification channel of your choice.

PCA 3.0 Service Advisor

Kube Node

This dashboard displays metric data for all the server nodes, meaning management and compute nodes, that belong to the Kubernetes cluster and host microservices pods. You can monitor pod count, CPU and memory usage, and so on. The metric panels display information for all nodes. In the graph-based panels you can click to view information for just a single node.

PCA 3.0 Service Advisor

Kube Pod

This dashboard displays metric data at the level of the microservices pods, allowing you to view the total number of pods overall and how they are distributed across the nodes. You can monitor their status per namespace and per service, and check if they have triggered any alerts.

PCA 3.0 Service Advisor

Kube Service

This dashboard displays metric data at the Kubernetes service level. The data can be filtered for specific services, but displays all by default. Two alerts are built-in and can be sent to a notification channel of your choice.

Kubernetes Monitoring

Kubernetes Monitoring Containers

Kubernetes Monitoring Node

(all)

These folders contain a large and diverse collection of dashboards with a wide range of monitoring data that covers most of the operations of the Private Cloud Appliance system Kubernetes cluster. For example, these metrics provide information about deployment, ingress, and usage of CPU, disk, memory, and network resources.

OKE Monitoring

CAPOCI

This dashboard shows metrics from the Cluster API Provider for OCI, which is a component of the Private Cloud Appliance Kubernetes Engine (OKE). This dashboard monitors request status codes and response times for resources used by OKE such as compute instances and load balancers.

The information about controller reconciliation is for Oracle Support.

OKE Monitoring

Cluster Time Monitoring

This dashboard shows the time taken for operations such as create or update a particular OKE cluster or node pool. Average time for these operations across all clusters and node pools also is shown.

OKE Monitoring

Metrics Meter

This dashboard shows the health of various targets used by the OKE service such as the Cluster API Provider, the Cluster API Provider for OCI, OKE, and prometheus-k8s.

OKE Monitoring

OKE Service

This dashboard shows the service level metrics for OKE. Examples of metrics on this dashboard include counts of requests such as cluster and node pool create, update, and delete, and counts of exception codes for various requests. The exception code counts help expose any patterns in request failures.