Kubernetes Monitoring Solution

Use the Kubernetes Monitoring Solution in Oracle Log Analytics to monitor and generate insights into your Kubernetes deployed in OCI, third party public clouds, private clouds, or on-premises including managed Kubernetes deployments.

The telemetry data such as metrics, Kubernetes state in the form of object information, and the various logs in the Kubernetes environment are collected for the analysis.

For the permissions required to perform all the operations in Kubernetes Monitoring Solution, see Allow All Kubernetes Solution Operations.

The following types of logs are collected from the Kubernetes environment:

Type	Component: Log Source Mapping	Metadata Enrichment
Kubernetes Component/System Logs	Kube Proxy : Kubernetes Proxy Logs Kube Flannel : Kubernetes Flannel Logs Kube DNS Autoscaler : Kubernetes DNS Autoscaler Logs Core DNS : Kubernetes Core DNS Logs CSI Node Driver : Kubernetes CSI Node Driver Logs Proxymux : OKE Proxymux Client Logs Autoscaler : Kubernetes Autoscaler Logs Kubelet : Kubernetes Kubelet Logs	Kubernetes Cluster Name Kubernetes Cluster ID Node Namespace Pod Container Container Image Name
OS Logs	Syslog Logs : Linux Syslog Logs Cron Logs : Linux Cron Logs Secure Logs : Linux Secure Logs Mail Logs : Linux Mail Delivery Logs Audit Log : Linux Audit Logs Ksplice / Uptrack Logs : Ksplice Logs YUM Logs : Linux YUM Logs	Kubernetes Cluster Name Kubernetes Cluster ID Node
Kubernetes Pod/Container Logs	Kubernetes Container Generic Logs	Kubernetes Cluster Name Kubernetes Cluster ID Node Namespace Pod Container Container Image Name
Kubernetes Control Plane Logs	Scheduler: Kubernetes Scheduler Logs API Server: Kubernetes API Server Logs Controller Manager: Kubernetes Controller Manager Logs API Server Audit: Kubernetes Audit Logs Etcd: Kubernetes etcd Logs	Kubernetes Cluster Name Kubernetes Cluster ID Node

The following object information is collected from the Kubernetes environment:

Object	Log Source	Metadata Enrichment
Node	Kubernetes Node Object Logs	Kubernetes Cluster Name Kubernetes Cluster ID
Pod	Kubernetes Pod Object Logs	Kubernetes Cluster Name Kubernetes Cluster ID
Deployment (Workload)	Kubernetes Deployment Object Logs	Kubernetes Cluster Name Kubernetes Cluster ID
DaemonSet (Workload)	Kubernetes DaemonSet Object Logs	Kubernetes Cluster Name Kubernetes Cluster ID
StatefulSet (Workload)	Kubernetes StatefulSet Object Logs	Kubernetes Cluster Name Kubernetes Cluster ID
Job (Workload)	Kubernetes Job Object Logs	Kubernetes Cluster Name Kubernetes Cluster ID
CronJob (Workload)	Kubernetes CronJob Object Logs	Kubernetes Cluster Name Kubernetes Cluster ID

Steps to Use the Solution

Also, see Queries Used in the Kubernetes Solution.

Connect Your Oracle OKE Kubernetes Cluster with Log Analytics

Ensure that you have gathered the necessary information about your Kubernetes cluster in your tenancy and have the necessary privileges in place to connect your cluster. Oracle recommends that a user with Administrator privileges performs this operation. After a successful connect, the logs, metrics, and object information from related Kubernetes components, and compute nodes are collected from this cluster.

Open the navigation menu and click Observability & Management. Under Log Analytics, click Solutions, and click Kubernetes. The Kubernetes Monitoring Solution page opens.
In the Kubernetes Monitoring Solution page, click Connect clusters. The Add Data wizard opens. Here, the Monitor Kubernetes section is already expanded. Click Oracle OKE. The Configure OKE environment monitoring page opens.
Select the OKE cluster that you want to connect with Oracle Log Analytics by clicking on the corresponding row in the table of clusters. Use the details in the table to identify the right OKE cluster. Click Next.
From the menu, select the compartment to store the telemetry data and related monitoring resources.
Optionally, the required Policies and dynamic groups are created. You can disable the check box if you have already created them. For the required policies, see Allow All Kubernetes Solution Operations.
Optionally, the metrics server is installed for the collection of usage metrics. You can disable the check box if you have already installed it.
Select the Solution deployment option:
- Enable the above clusters automatically: Select this option to allow Oracle Log Analytics to automatically create all the required resources.
  Note
  
  This deployment option is not recommended for OKE clusters which don't have public API server endpoint. In such cases, use the deployment option I will manually deploy the above clusters. For details, see OCI Kubernetes Monitoring GitHub Page.
  
  The automatic log collection configuration creates or updates the following resources:
  - IAM Policy and Dynamic Groups
  - Oracle Log Analytics Log Groups and Entities
  - Management Agent key
  - Metric namespace
  - Management Agent configuration
  - Fluentd configuration
  - Kubernetes manifests and helm chart
- I will manually deploy the above clusters: Select this option for Oracle Log Analytics to create all the Oracle Cloud Infrastructure resources and for providing you the ability to manage the deployment of Fluentd and other configuration through Helm / Kubernetes manifests into your cluster. However, the installation instructions will be provided at the end of the connect workflow. This option allows you to customize the default configuration and other collection parameters used in automatic deployment.
Click Configure log collection to confirm the configuration that you specified.

The Oracle Cloud Infrastructure resources are now created.
If you select the manual deployment option for the solution, then follow the installation instructions provided at the end of the connect workflow for Helm chart deployment.

With this the configuration is complete to collect the data from your Kubernetes cluster. Go to the Kubernetes monitoring solution page, and wait for a few minutes for the data collection to complete. When the data collection is in progress, the Latest Telemetry of the cluster is Unknown. You can view the solution after this status changes.

Monitor Your Kubernetes Clusters

The telemetry data collected from your Kubernetes cluster is presented in multiple views to help you obtain insights into the environment and its performance.

To view the solution for your Kubernetes cluster:

Open the navigation menu and click Observability & Management. Under Log Analytics, click Solutions, and click Kubernetes. The Kubernetes Monitoring Solution page opens. The clusters that are already connected with Oracle Log Analytics are listed in the Monitored clusters tab.
Under Monitored clusters, click the name of the cluster that you want to monitor and analyze. The solution for the selected cluster opens with the default Cluster view.

Now explore the solution and the various views available to traverse the tiers of the topology and obtain details at each level. Note that the filter context is sustained between the different views.

Kubernetes Solution- Cluster View

Cluster tab provides a comprehensive view of your cluster, summarizing its health, resource utilization, and key metrics for the overall cluster performance. It aggregates insights on cluster-wide events, capacity, workloads, and system components helping administrators to assess the overall status, spot anomalies, or resource constraints, and make informed operational decisions.

The Cluster tab provides a comprehensive view of the entire Kubernetes cluster, summarizing its health, resource utilization, and key metrics for the overall cluster performance. It aggregates insights on cluster-wide events, capacity, workloads, and system components helping administrators to assess the overall status, spot anomalies, or resource constraints, and make informed operational decisions for the whole environment.

An example Kubernetes solution cluster view:

The following sections are displayed in the cluster view, listed in the same order as their numbering in the above image:

Filters:
- Namespaces Filter: To filter the view by Kubernetes namespace.
Time selector: There are two time range options, Last 60 Minutes (default) and Last 24 Hours. Any changes you make in the time range will impact the Events and Right Panel Widgets.
Topology: The objects data collected from the Kubernetes environment is displayed in this section. Right click on a namespace to add it to the filter. Then the topology view changes to reflect the objects in the namespace which includes workloads and nodes. The topology is based on current time and is not affected by the time range settings.

The color of each object in the topology indicates its status derived from active warning events associated with the object or its children. For example, if a pod having one or more warning events, then the pod color code changes to RED and the corresponding workload (which owns the pod) and the namespace also get reflected with the same status.

The topology can be rendered in three views. Platform View focuses on Kubernetes logical resources, Network View on communication and network flows, and Infrastructure View on the physical or virtual resources supporting your Kubernetes environment.
- Platform View: The Platform View gives a high-level, logical representation of your Kubernetes environment, focusing on clusters, namespaces, nodes, and workloads. It helps visualize how resources are organized and related, making it easier for platform operators and DevOps teams to understand application structures. This view offers quick insight into relationships and dependencies across your Kubernetes resources.
- Network View: The Network View visualizes the networking aspects of your Kubernetes deployment, displaying how components like pods, nodes, workloads, services, and load balancers communicate. It illustrates network paths, policies, and traffic flows, helping you spot connectivity issues or misconfiguration. This view benefits network and security teams monitoring or troubleshooting cluster communications.
- Infrastructure View: The Infrastructure View highlights the physical or virtual infrastructure resources such as compute, storage, and network (including subnets and load balancers) supporting your Kubernetes deployment. It maps Kubernetes nodes to underlying hardware or virtualization platforms, aiding in correlating platform issues with infrastructure components. This view is valuable for administrators managing the health and performance of cluster infrastructure.
Left Panel Summary: The left panel provides a summary of key cluster metrics, showing the number of namespaces, services, workloads, nodes, and pods in the environment. It highlights how many of each metric need attention versus those that are operating normally, helping you to assess cluster health and focus on potential issues. The Left Panel Summary is based on current time and is not affected by the time range settings.
Pods by namespace: The pods available in the topology grouped by the namespace. For details about the color of each pod, see the topology description.
Right-click on any of the pods or a namespace and click View logs to view the log details associated with it. See View Log Details for Your Namespace, Node, or Pod.
Right Panel Widgets: These widgets help you to monitor the key parameters that determine the overall health of the cluster. The type of widgets available upon using the rotating scroll bar are CPU core (used/allocatable) in %, CPU core used, Memory (used/allocatable) in %, Memory used, Kubernetes system, OS health, Total API server requests, API server request duration, API response size, API request execution duration, etcd request duration, Network: bytes rx, Network: byts tx, Network Packet Rx Rate, Network: Packet Tx Rate, Network: Packet Rx Dropped Rate, and Network: Packet Tx Dropped Rate.
For more detailed information about each right panel widget, see Queries for the Right Panel Widgets in Kubernetes Solution.
Events: This section displays the State changes occurring in Kubernetes Cluster in the form of Events. You can further filter the events by Warnings Only or All.
You can expand the events section to view the table in the center of the page.

For more details about the query used in the Events section, see Queries for the Events Table in Kubernetes Solution.

You can expand each section to view a larger visualization and do a mouse-over to view more details.

Kubernetes Solution- Workload View

Workload tab focuses on the applications and services running within your cluster, including deployments, statefulSets, daemonSets, and other workload resources. It offers visibility into the workload state, health, and performance, enabling you to monitor application behavior, failures, performance bottlenecks, and to drill down into specific workloads for troubleshooting or optimization.

An example Kubernetes solution workload view:

The sections Time selector, Events, Left Panel Summary, and Right Panel Widgets are the same as in the cluster view. The Namespace Filter context is retained from the cluster view, and additional filter for workloads is also available in this view. The Pods by Workload section offers the view of the pods as grouped by the workload that they belong to. Additionally, the view includes the Workload details. In this section, you can expand each type of workload to view the detailed information of the namespace, workload name, status, and its age.

For more information about the Issues visualization in the Workload details section, see How to Use Issues Visualization in Kubernetes Monitoring Solution?.

In the Pods by workload section, right-click on a workload name and click View insights to view details about the workload. See View Workload Insights.

Right-click on any of the pods and click View logs to view the log details associated with it. See View Log Details for Your Namespace, Node, or Pod.

For more detailed information about each right panel widget, see Queries for the Right Panel Widgets in Kubernetes Solution.

For more details about the query used in the Events section, see Queries for the Events Table in Kubernetes Solution.

Kubernetes Solution- Node View

Node tab centers on the individual machines (virtual or physical) within your cluster. It provides information about each node’s health, resource usage, network performance, and system status. This view is important for managing node-level issues, capacity planning, and ensuring the nodes remain healthy with adequate resources to support workloads.

An example Kubernetes solution node view:

The sections Time selector, Events, Left Panel Summary, and Right Panel Widgets are the same as in the cluster view. The Namespace Filter and Workloads Filter context are retained from the Workloads view, and additional filter for Nodes is also available in this view. The Pods by node section offers the view of the pods as grouped by the node that they belong to. Additionally, the view includes the Node status. In this section, you can expand each node to view the detailed information like status, issues, age, OS, container runtime, kubelet / kubeproxy versions, CPU, memory (capacity), and memory (allocatable). You can also selectively view the status of only those nodes that have issues, or are not ready.

For more information a bout the tiles in the Node status section, see How to Use Issues Visualization in Kubernetes Monitoring Solution?.

Right-click on any of the pods and click View logs to view the log details associated with it. See View Log Details for Your Namespace, Node, or Pod.

For more detailed information about each right panel widget, see Queries for the Right Panel Widgets in Kubernetes Solution.

For more details about the query used in the Events section, see Queries for the Events Table in Kubernetes Solution.

Kubernetes Solution- Pod View

Pod tab provides granularity at the lowest deployable unit in Kubernetes-the pod. It displays real-time data on pod status, resource consumption, networking, restarts, and failure events. This is essential to troubleshoot specific containers, understand scaling and scheduling decisions, and present issues affecting application components at the pod level.

An example Kubernetes solution pod view:

The sections Time selector, Events, Left Panel Summary, and Right Panel Widgets are the same as in the cluster view. The Namespace Filter, Workloads Filter, and Nodes Filter context are retained from the Nodes view. The Pods section displays the pods and their status based on the filter selection. Additionally, the view includes the Pods status. In this section, you can expand each pod to view the detailed information like status, node, namespace, pod IP, controller, controller kind, and scheduler. You can also selectively view the details of the pods based on their current status like running, failed, succeeded, and pending.

For more information about the tiles in the Pods status section, see How to Use Issues Visualization in Kubernetes Monitoring Solution?.

Right-click on any of the pods and click View logs to view the log details associated with it. See View Log Details for Your Namespace, Node, or Pod.

For more detailed information about each right panel widget, see Queries for the Right Panel Widgets in Kubernetes Solution.

For more details about the query used in the Events section, see Queries for the Events Table in Kubernetes Solution.

How to Use Issues Visualization in Kubernetes Monitoring Solution?

The Issues visualization analyzes the logs by creating clusters of log records, that is, by grouping similar log records. The following tiles are displayed in the Issues visualization in Workload, Node and Pod tabs of the Kubernetes Monitoring solution:

New Issues: The New Issues tile displays the number of new issues detected within the selected time range, such as errors or failures in workloads, nodes, or pods. These are the issues found in the selected time range but are not present in the baseline time range, typically the previous day, specified for the analysis. This helps you to identify and prioritize recent incidents that need attention for cluster health and stability.
New Outliers: New Outliers are the log records that occurred only once in the current range and not occurred in the baseline. An outlier may or may not be an issue. Detecting outliers allows proactive investigation and troubleshooting of potential issues before they escalate.
Total Records: The Total Records tile shows the total number of log records that match the current filters in Oracle Log Analytics in the selected time range. This metric provides insight into the volume of collected log data, helping you to assess activity levels and data trends in their environment.

For more information on Issues visualization and baseline time, see Issues Visualization.

Expand the section under the Issues visualization tiles to view the tabulated information about the issues and outliers. Expand the row in the table to view the histogram. The cluster sample provides the sample log record from the log message signature for the cluster in which the issue is detected.

View Workload Insights

The Workload Insights feature in Oracle Log Analytics offers a detailed, interactive view into a selected workload. The interface presents tabs for Workload Summary, Pod Summary, and individual Containers. This enables you to assess workload health, configuration, related issues, and changes, as well as drill down into resource use and container-level details.

The following Insights dialog box is for an example workload oci-onm-logan and is displaying the workload summary tab:

Workload insights dialog box showing the workload summary tab

Details in the Insights Dialog Box

The following information is common to all the tabs in the Insights dialog box:

A top-level overview is available with key health indicators in the Issues visualization, including New Issues, New Outliers, and Total Records related to the workload, pod, or container within the selected time window. The new issue and outlier details are available for further investigation in the table below. Expand the row in the table to view the histogram. The cluster sample provides the sample log record from the log message signature for the cluster in which the issue is detected. For more information on how to leverage the analysis in the Issues visualization, see How to Use Issues Visualization in Kubernetes Monitoring Solution?.
Another table (below the issues and outliers table) lists the problem labels, their priority, and the timestamp of when each problem label was first observed.

In the issues and outliers table, if you notice a cluster sample which can be used for further analysis, then you can add a conditional label to that log record. This problem label can be used for searching and filtering log records. To add a conditional label, in the row corresponding to the cluster sample, click Actions menu and click Add conditional label. The dialog box to add the conditional label opens. Provide conditions matching which the label is added to the corresponding log record. For more information about adding the conditional label, see Use Labels in Sources.
A changes section outlines any specification-level modifications made since the last recorded timestamp. This helps the platform teams to keep track of configuration drift and audit recent adjustments directly from the insights view.

Workload Summary

The Workload Summary tab displays essential metadata such as workload name, type (for example, DaemonSet), namespace, creation timestamp, labels, selectors, and the current status of scheduled and ready pods.

It summarizes the deployment strategy, such as the type of updates (for example, RollingUpdate) and its properties, and gives visibility into recent changes, annotations, and related objects within the Kubernetes cluster. Quick access to Show all related objects further enables contextual investigation of the infrastructure related to the workload.

Pod Summary

The Pod Summary tab focuses on pods created by the workload.

Summarized pod configuration details are displayed, including labels, annotations, security context, affinity, volumes, restart policy, tolerations, and node selector details. This enables a consolidated view of how pods are configured and scheduled.

Container

The Container tab drills into the configuration and state of a specific container, including image details, pull policy, environment variables, volume mounts, resource limits and requests, and probes (liveness/readiness).

Security, command, and argument configurations are displayed, along with the resources and mounts for each container instance, aiding in rapid diagnosis of container-level resource issues or misconfiguration. This information is vital for teams managing multi-container pods or optimizing individual container performance.

A separate tab displays the container metadata information for each container in case of multi-container configuration.

Full Spec

Click View full spec in the Workload Insights dialog box to view the full specification of the workload. The interface displays the complete raw JSON specification for the selected workload, showing all metadata, annotations, labels, managed fields, and spec details, including template, selectors, and volumes. This provides a transparent reference for administrators or developers to audit the workload setup, debug intricate issues, or compare versions after recent changes, all without leaving the analytics interface.

View Log Details for Your Namespace, Node, or Pod

You can view the log details by right-clicking on the pod, node, or namespace and selecting View logs in any of the following interfaces in the Kubernetes solution:

In the Cluster tab, under Pods by namespace section
In the Workload tab, under Pods by workload section
In the Node tab, under Pods by node section
In the Pod tab, under Pod section

Then a pop-up window opens which displays the details of the logs for that pod, node, or namespace.

The following image displays the log details pop-up window for the namespace otel-demo-app:

The log details window provides a summary and deep dive into log events for a selected node, namespace, or pod. The window displays the Issues visualization tiles New Issues, New Outliers, Total Records, Total Clusters, and Problem Sources. The Issues visualization analyzes the logs by creating clusters of log records, that is, by grouping similar log records. For more information on Issues visualization and baseline time, see Issues Visualization.

Below these Issues visualization tiles, the window features a table that lists the Issues and Outliers Cluster Samples and their details. Expand the row in the table to view the histogram. The cluster sample provides the sample log record from the log message signature for the cluster in which the issue is detected. The View menu allows you to customize which columns appear in the table, such as count, log source, problem priority, and label.

Allow All Kubernetes Solution Operations

Create a dynamic group to allow collection of logs, metrics, and object information:

ALL {instance.compartment.id = '<OKE_COMPARTMENT_OCID>'}
ALL {resource.type='managementagent', resource.compartment.id='<TELEMETRY_COMPARTMENT_OCID>'}

Create policies to allow the dynamic group to perform the data collection operations:

allow dynamic-group <dynamic_group_name> to {LOG_ANALYTICS_LOG_GROUP_UPLOAD_LOGS} in compartment id <TELEMETRY_COMPARTMENT_OCID>
allow dynamic-group <dynamic_group_name> to use METRICS in compartment id <TELEMETRY_COMPARTMENT_OCID> WHERE target.metrics.namespace = 'mgmtagent_kubernetes_metrics'
allow dynamic-group <dynamic_group_name> to {LOG_ANALYTICS_DISCOVERY_UPLOAD} in tenancy

For information about dynamic groups and IAM policies, see OCI Documentation: Managing Dynamic Groups and OCI Documentation: Managing Policies.

Stop Collecting Logs from Your Kubernetes Clusters

To stop collecting logs from your Kubernetes clusters in Oracle Log Analytics, you must disconnect them.

Open the navigation menu and click Observability & Management. Under Log Analytics, click Solutions, and click Kubernetes. The Kubernetes Monitoring Solution page opens. The clusters that are already connected with Oracle Log Analytics are listed in the Monitored clusters tab.
Under Monitored clusters, in the row that corresponds to your cluster, click the Actions menu and select Disconnect Cluster.

Oracle Cloud Infrastructure Documentation