Monitor and Trace Microservices With Application Performance Monitoring on Oracle Cloud

Monitoring microservices can be challenging because of the complexity of their architecture and because of their distributed deployment model.

Microservices applications can consist of thousands of independent services deployed in many different systems, while each service runs with its process. The traditional monitoring approaches that focus on specific resources and health monitors are no longer effective when tracing the transaction flow or debugging problems in microservices applications.

Oracle Cloud Infrastructure Application Performance Monitoring Cloud Service (APM) enables automatic OpenTracing instrumentation on microservices and captures full end-to-end user transactions to help you understand both the user experience and application performance. APM includes an implementation of distributed tracing system that enables instance level, end-to-end, and transaction tracing. It also enables app server and business metric monitoring.

Architecture

This reference architecture demonstrates how to use Oracle Cloud Infrastructure Application Performance Monitoring Cloud Service (APM) to monitor a microservice application deployed on the OCI Kubernetes Engine (OKE) cluster.

The architecture diagrams below show a microservices application deployed in a Kubernetes cluster. The application is hosted on an application server with two deployment replicas and connects to an Oracle Autonomous Database by using JDBC. End users access the application through a web interface hosted on Oracle Cloud. Each diagram shows a different approach:

OpenTelemetry Operator: The first approach uses Kubernetes operator to simplify the automatic injection of Java agents into Java virtual machines (JVMs) running in Kubernetes pods.
Shared file system: The second approach uses a shared file system to provision and deploy an APM agent.

Both methods can monitor Java application servers or frameworks, such as Oracle WebLogic Server or Spring Boot, deployed on Kubernetes.

The following diagram shows the OpenTelemetry Operator implementation:

Description of apm-microservices-open-telemetry.png follows

Description of the illustration apm-microservices-open-telemetry.png

apm-microservices-open-telemetry-oracle.zip

Data flows in this architecture as follows:

A: Deploy the APM Java Agent to Kubernetes pods.

Create an APM domain in OCI and obtain the data upload endpoint URL and private and public data keys of the domain.
Create a shared file system in OCI and create Kubernetes storage objects such as a persistent volume in the OKE cluster.
Download the APM Java agent from the APM domain and provision it in the mounted volume.
Deploy the APM agent to the application by updating the YAML files for the Kubernetes deployments or StatefulSets.

B: OpenTelemetry Operator automatically downloads the Java Agent to the local cache. The Java Agent is installed locally on the each of the replicas in the OKE cluster.

C: Server and JDBC traces, spans, and metrics are sent to the APM domain. After the Kubernetes pods are restarted, traces and spans from the server are sent to the APM domain where the data-upload endpoint URL is located.

D: Browser traces, spans, and metrics are sent to the APM domain. Collected data can be visualized in the APM dashboards and Trace Explorer for performance and availability analysis.

The following diagram shows the shared file system implementation:

Description of apm-microservices-arc.png follows

Description of the illustration apm-microservices-arc.png

apm-microservices-arc-oracle.zip

Data flows in this architecture as follows:

A: Deploy the APM Java Agent to Kubernetes pods.

Create an APM domain in OCI and obtain the data upload endpoint URL and private and public data keys of the domain.
Create a shared file system in OCI and create Kubernetes storage objects such as a persistent volume in the OKE cluster.
Download the APM Java agent from the APM domain and provision it in the mounted volume.
Deploy the APM agent to the application by updating the YAML files for the Kubernetes deployments or StatefulSets.

B: Server and JDBC traces, spans, and metrics are sent to the APM domain. After the Kubernetes pods are restarted, traces and spans from the server are sent to the APM domain where the data-upload endpoint URL is located.

C: Browser traces, spans, and metrics are sent to the APM domain. Collected data can be visualized in the APM dashboards and Trace Explorer for performance and availability analysis.

This architecture has the following components:

Tenancy
A tenancy is a secure and isolated partition that Oracle sets up within Oracle Cloud when you sign up for Oracle Cloud Infrastructure. You can create, organize, and administer your resources in OCI within your tenancy. A tenancy is synonymous with a company or organization. Usually, a company will have a single tenancy and reflect its organizational structure within that tenancy. A single tenancy is usually associated with a single subscription, and a single subscription usually only has one tenancy.
Region
An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, hosting availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
Compartment
Compartments are cross-regional logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize, control access, and set usage quotas for your Oracle Cloud resources. In a given compartment, you define policies that control access and set privileges for resources.
Virtual cloud network (VCN) and subnets
A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

In this architecture, all the compute instances hosting the Reds cluster are attached to a single regional subnet.
Security lists
For each subnet, you can create security rules that specify the source, destination, and type of traffic that is allowed in and out of the subnet.

This architecture adds ingress rules for TCP ports 16379 and 6379. Port 6379 serves Reds clients, and port 16379 is used by the Reds cluster bus.
Kubernetes Engine
Oracle Cloud Infrastructure Kubernetes Engine (OCI Kubernetes Engine or OKE) is a fully-managed, scalable, and highly available service that you can use to deploy your containerized applications to the cloud. You specify the compute resources that your applications require, and Kubernetes Engine provisions them on Oracle Cloud Infrastructure in an existing tenancy. OKE uses Kubernetes to automate the deployment, scaling, and management of containerized applications across clusters of hosts.
Autonomous Transaction Processing
Oracle Autonomous Transaction Processing is a self-driving, self-securing, self-repairing database service that is optimized for transaction processing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating, backing up, patching, upgrading, and tuning the database.
Application Performance Monitoring
Oracle Cloud Infrastructure Application Performance Monitoring Cloud Service (APM) is a platform as a service (PaaS) based solution that provides deep visibility into the performance of your application, from end user to application logs. The service integrates user experience information, application metrics, and log data analytics.
- APM domain
  APM domain is an Oracle Cloud Infrastructure (OCI) resource type that contains the systems that APM monitors.
  
  Each APM domain is created in a standard OCI compartment, and you can define OCI access control policies to grant access to the APM domain to a specific set of users.
- Data keys
  Data keys are required to ensure that APM accepts the observations gathered by the data sources. The data keys are generated when an APM domain is created, and are of two types: public data key and private data key
- Data upload endpoint URL
  the data upload endpoint is the URL to which a data source sends observations. The data upload endpoint is generated when an APM domain is created and each APM domain has its data upload endpoint.
- OpenTelemetry Operator
  The OpenTelemetry Operator is a Kubernetes operator designed to simplify the automatic injection of Java agents into JVMs running in Kubernetes pods.
- APM Java agents
  APM Java agents record spans and metrics from application servers and sends them to APM.
- APM browser agents
  APM browser agents record user interactions with websites and send spans and metrics to APM.
- Trace and Span
  A trace is the complete flow of a request as it passes through all the components of a distributed system in a given period of time. A span includes operations or logical units of work within a trace, and has a name, a start time, and a duration.

Recommendations

Your requirements might differ from the architecture described here. Use the following recommendations as a starting point for Oracle Cloud Infrastructure Application Performance Monitoring Cloud Service (APM).

APM browser agent
Deploy the APM browser agent to the application’s web interface to enable end-user monitoring. Traces begin with the user action on the browser.
Synthetic monitoring
Configure synthetic monitoring to monitor the application user interface and API endpoints to proactively detect availability and performance issues. You can create browser-based or REST-based monitors and schedule them to run periodically from global locations or from within the tenant virtual cloud network (VCN).

Considerations

When using Oracle Cloud Infrastructure Application Performance Monitoring Cloud Service (APM), consider the following.

Data keys
Use custom data keys to manage the data sent to APM. In addition to the default data keys, you can create your own data keys for specific purposes. In the case of large deployments where many people use APM, provide different keys to different project owners so that you, as the APM domain administrator, can easily control which data comes into APM and which data does not.

For example, APM receives a data set from an abandoned project and you want to turn off the data collection. However, because the project owner left the organization, it is hard to identify which agents to turn off. You can delete the data key used by the project to ignore the data associated with that data key.
Kubernetes resources
When you configure microservices, consider using StatefulSets instead of Deployments when configuring Kubernetes pod resources to better track history in APM. Deployment pods change their ids whenever they are regenerated, StatefulSets are regenerated with the same ID index (for example, SS_0, SS_1, SS_2), which simplifies tracing the history with APM.
APM service name
When provisioning the APM Java agent, you need to specify the service name used in APM. Consider using the Deployment/StatefulSet name as the APM serviceName for consistent tracing in APM.
Sampling
Use sampling in high-volume applications (for example, 1 million transactions per second). APM collects all spans by default and lets you track all your application's transactions. However, this can generate many spans in the case of high-volume applications. In such cases, consider explicitly specifying the sampling configuration for better cost efficiency and to reduce the amount of trace data.

Deploy

Refer to the step-by-step instructions in the following Oracle LiveLabs workshops when deploying the APM Java agent described in this reference architecture.

The following labs use applications with Spring Boot and Oracle WebLogic Server as examples.

Explore More

Learn more about the features of this architecture and about related information.

To learn more about Oracle Application Performance Monitoring, review these additional resources:

Acknowledgments

Author: Yutaka Takatsu

Contributors: Avi Huber, Robert Lies