Introduction

Oracle Cloud Infrastructure (OCI) Observability and Management (O&M) services provide a robust framework for monitoring and managing cloud-native and on-premise applications. By integrating with the OpenTelemetry ecosystem, Oracle demonstrates its commitment to open standards, enabling seamless telemetry data collection and deep insights into application performance. The OpenTelemetry Astronomy Shop Demo application, showcases how OCI O&M services, Application Performance Monitoring (APM), Logging Analytics (LA), and Monitoring Services can provide great observability insights in architectures like Kubernetes.

The OpenTelemetry Demo app is composed of microservices written in different programming languages that talk to each other over gRPC and HTTP, and includes a load generator to fake user traffic.

OTel Demo App Architectures
OpenTelemetry Demo App Architecture - Image credits: OpenTelemetry Demo contributors

This playbook walks you through the process of installing the Astronomy Shop Demo App and guides you to easily send telemetry data to the O&M services.


OCI Application Performance Monitoring

OCI Application Performance Monitoring (APM) provides end-to-end visibility into the Astronomy application. It collects trace data from application via OpenTelemetry ecosystem to APM and allows users to:

  • View calls made by the application
  • Understand the user experience
  • Triage errors and latency
  • Test application availability
APM allows users to quickly identify and resolve application bottlenecks, ensuring optimal performance and a seamless user experience.


OCI Logging Analytics

OCI Logging Analytics (LA) provides powerful exploration and analysis capabilities, enabling users to efficiently monitor and troubleshoot kubernetes infrastructure logs and application logs. Once logs are ingested and indexed, LA can help:

  • Detect patterns and anomalies in log data
  • Correlate logs across multiple services for faster troubleshooting
  • Perform advanced searches and visualisations using a rich query language
  • Set up alerts and automated responses based on log insights


Benefits of using OpenTelemetry ecosystem and O&M Services

Instrumenting applications with OpenTelemetry enables the collection of critical telemetry data, including traces, metrics, and logs, from all components within a Kubernetes cluster. This ensures comprehensive visibility and prevents any part of the system from remaining unobserved.

Adopting OpenTelemetry as a vendor-neutral, open-source observability framework standardises the collection and exporting of telemetry data, and avoiding dependency on a single provider. By instrumenting applications once, organisations can seamlessly switch between different observability platforms, reducing the risk of vendor lock-in and simplifying future transitions. OCI’s support for OpenTelemetry enables integration with its O&M services.

OCI is built on a highly scalable cloud platform, so OCI’s Observability and Management services process large volumes of telemetry data without performance degradation.

OCI’s LA service uses machine learning to detect anomalies significantly reducing troubleshooting time.


Implementation Steps

Prerequisites

Before getting into the implementation, ensure you have the following in place:

  • An active OCI tenancy with access to OKE and O&M services
  • Appropriate policies to deploy resources
  • An APM domain.  See Create APM Domain.
  • Data Upload Endpoint URL and the private data key for the APM domain

Deploying the Astronomy App to send data to APM services

Follow these steps to send telemetry data to O&M services from the OpenTelemetry Demo App in Kubernetes. For detailed technical instructions, refer to the GitHub Repository.

Start by creating a dedicated namespace for the demo application using the following command:

kubectl create namespace otel-demo-app

Next, a secret named oci-apm-secret is created to store the OCI APM Endpoint and Private Data Key:

kubectl create secret generic oci-apm-secret -n otel-demo-app --from-literal="OCI_APM_ENDPOINT=<Data Upload Endpoint>" --from-literal="OCI_APM_DATAKEY=<Private Data Key>"
				

To enable the deployment of the OpenTelemetry Demo, the OpenTelemetry Helm charts repository must be added to the Helm configuration:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
				

A custom my-values.yaml file is then created with the required configuration settings. Content of my-values.yaml:

opentelemetry-collector:
  extraEnvsFrom:
    - secretRef:
        name: oci-apm-secret

  config:
    exporters:
      otlphttp/oci_spans:
        endpoint: "${OCI_APM_ENDPOINT}/20200101/opentelemetry/"
        headers:
          authorization: "dataKey ${OCI_APM_DATAKEY}"
        tls:
          insecure: false
      otlphttp/oci_metrics:
        endpoint: "${OCI_APM_ENDPOINT}/20200101/observations/metric?dataFormat=otlp-metric&dataFormatVersion=1"
        headers:
          authorization: "dataKey ${OCI_APM_DATAKEY}"
        tls:
          insecure: false

  service:
    pipelines:
      traces:
        exporters: [otlp, debug, spanmetrics, otlphttp/oci_spans]
      metrics:
        exporters: [otlphttp/prometheus, debug, otlphttp/oci_metrics]

Finally, the OpenTelemetry Demo application is deployed using Helm with the following command:

helm install otel-demo-app open-telemetry/opentelemetry-demo --values my-values.yaml -n otel-demo-app

Upon successful deployment, the following will be displayed:

OTel successful Install
Description of the illustration OTelInstall.png

Generate some sample traffic with the demo app and access OCI APM service to confirm telemetry data is flowing correctly.

Collecting Logs from the Kubernetes Infrastructure

Follow the Kubernetes Solution documentation for full details in monitoring and generating insights in the Kubernetes infrastructure. The process involves the following steps:

  1. Access Logging Analytics Administration
    Log into the OCI console and navigate to Logging Analytics Administration.
  2. Select Kubernetes Monitoring Solution
    Under Solutions, select Kubernetes, then navigate to Connect Clusters,  Monitor Kubernetes and then Oracle OKE. This guide assumes the Kubernetes cluster is running in OCI.
  3. Configure Cluster and Compartment
    Select the target Kubernetes cluster and click Next. Choose the appropriate compartment for telemetry data and related monitoring resources.
  4. Enable Log Collection
    Click on Configure log collection, which will automatically create the necessary dynamic groups and policies. These configurations allow the collection of logs, metrics, and object information from Kubernetes components, including compute nodes, subnets, and load balancers.

The deployed solution will create StatefulSets, DaemonSets, and CronJobs within the oci-onm namespace to manage log collection and monitoring across the Kubernetes infrastructure.

Collecting metrics from the Kubernetes Infrastructure

With the configuration of APM and LA services, Metrics are automatically collected. Metrics from the OpenTelemetry Demo Application are routed through the OpenTelemetry Collector to the OCI APM service, which then forwards them to the OCI Monitoring service. Simultaneously, infrastructure metrics from the Kubernetes cluster are collected by the OCI Management Agent, which operates as a StatefulSet within the cluster.

The collected metrics can be analysed using the Metrics Explorer within the OCI Monitoring service.

Metrics collected by the Management Agent can be found in metric namespace mgmtagent_kubernetes_metrics:

Namespace: mgmtagent_kubernetes_metrics
Description of the illustration mgmtagent_namespace.png

Metrics sent by the OpenTelemetry Collector can be found in metric namespace oracle_apm_monitoring:

Namespace: oracle_apm_monitoring
Description of the illustration oracle_apm_monitoring.png


O&M - OpenTelemetry use cases

APM - Understanding and improving the apps health and performance

The Service Overview within the APM dashboard provides a powerful resource for understanding and improving your application’s health and performance. The dashboard provides:

  • Metrics such as throughput, errors, and response times where you gain a comprehensive view of your service’s health in a single pane of glass
  • Real-time monitoring helps detect anomalies, early detection of 4xx errors, dips in Apdex, which can serve as immediate signals to investigate
  • The Top Services Request widget with  the highest traffic endpoints, which allows you to focus on optimising your efforts on the most frequently accessed parts of the application.

Service Overview Dashboard
Description of the illustration APM-Service-Dashoard.png

APM - Application availability monitoring

The Availability Monitoring Overview dashboard provides critical insights into your application’s uptime and performance from various vantage points. You can detect issues before it impacts the end user. To gain the best insights, use APM Span Enrichment as described in the Github repo Readme.
The dashboard provides:

  • Synthetic checks that can run at regular intervals from various locations, helping you spot outages or performance issues before your users do
  • Detailed breakdowns like load time, connect time, and so on, that guide you directly to the layer or component causing issues
  • By consistently monitoring availability and load times, this helps maintaining a positive user experience by catching issues early
Availability Monitoring Overview Dashboard
Description of the illustration APM-Availability-Dashoard.png

Logging Analytics - Kubernetes Environment Health

When running applications in a Kubernetes cluster, monitoring cluster and node health are crucial for ensuring smooth operation, scalability, and high availability. This OKE Health dashboard, showing metrics such as node count, pod count, CPU/memory usage, and pod distribution, enables you to quickly detect and troubleshoot issues. Below are some reason why this use case is important:

  • Having a single pane of glass view of nodes, pods, and resource utilisation helps you grasp the entire cluster’s state at a glance.
  • Monitoring CPU and memory usage at both cluster and node levels allows you to forecast capacity needs and auto-scale effectively.
  • If a service degrades, this dashboard quickly reveals whether the issue stems from node failure, resource exhaustion, or a misconfigured pod.
LA Health Overview Dashboard
Description of the illustration LA-Health-Dashoard.png

Business Analytics

Unlike the earlier use cases that focus on application and infrastructure performance, this Business Analytics Dashboard is purpose-built to reveal how your organisation is performing from a commercial standpoint. By aggregating and visualising data on orders, payment success rates, customer details (such as email addresses), and purchase specifics (cost, type, and quantity), it offers a real-time snapshot of revenue driving activities. You can easily import and customise this dashboard, along with the log source and the lookup table, to gain actionable insights into sales and customer behavior. Follow the instructions in the README.
Below are the key benefits and reasons why this dashboard is so valuable for day-to-day operations and decision-making.

  • Having the latest data on orders, payments, and purchase details allows teams to respond quickly to changing market conditions and customer behaviors.
  • It helps business stakeholders make more informed decisions about inventory, marketing spend, and operational adjustments.
  • Monitoring the volume of orders and the success rate of payments provides visibility into revenue trends.
  • Tracking costs and quantities can reveal inefficiencies in sourcing.
LA Business Analytics Dashboard
Description of the illustration LA-Business-Dashoard.png

Next steps

Follow the instructions on Installing the OpenTelemetry Astronomy Shop demo app and configuring O&M Services GitHub repo to install the OTel Astronomy App and send metrics to O&M.

Create your own dashboards and learn how to get deep insights into your applications.



Acknowledgements

  • Primary Authors: Mahesh Sharma and Dr. Juergen Fleischer

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.