Monitor and Trace Microservices With Application Performance Monitoring on Oracle Cloud

Monitoring microservices can be challenging because of the complexity of their architecture and because of their distributed deployment model.

Microservices applications can consist of thousands of independent services deployed in many different systems, while each service runs with its process. The traditional monitoring approaches that focus on specific resources and health monitors are no longer effective when tracing the transaction flow or debugging problems in microservices applications.

Oracle Cloud Infrastructure Application Performance Monitoring Cloud Service (APM) enables automatic OpenTracing instrumentation on microservices and captures full end-to-end user transactions to help you understand both the user experience and application performance. APM includes an implementation of distributed tracing system that enables instance level, end-to-end, and transaction tracing. It also enables app server and business metric monitoring.

Architecture

This reference architecture shows how you can use Oracle Cloud Infrastructure Application Performance Monitoring Cloud Service (APM) to monitor a microservice application that is deployed on the Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE) cluster.

The following architecture diagram shows a microservices application running in a Kubernetes cluster. The application is hosted in an application server with two replicas of the deployments. The application connects to an Oracle Autonomous Database by using JDBC. End users of the application use a web user interface to connect to the server on Oracle Cloud.

The approach demonstrated in the architecture uses a shared file system to provision and deploy an APM agent. The same approach can be used to configure monitoring for any Java application server or Java framework, such as Oracle WebLogic Server or Spring Boot, deployed on Kubernetes.



apm-microservices-arc-oracle.zip

This architecture has the following components:

  • Tenancy

    A tenancy is a secure and isolated partition that Oracle sets up within Oracle Cloud when you sign up for Oracle Cloud Infrastructure. You can create, organize, and administer your resources in Oracle Cloud within your tenancy. A tenancy is synonymous with a company or organization. Usually, a company will have a single tenancy and reflect its organizational structure within that tenancy. A single tenancy is usually associated with a single subscription, and a single subscription usually only has one tenancy.

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Compartment

    Compartments are cross-region logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize your resources in Oracle Cloud, control access to the resources, and set usage quotas. To control access to the resources in a given compartment, you define policies that specify who can access the resources and what actions they can perform.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

    In this architecture, all the compute instances hosting the Redis cluster are attached to a single regional subnet.

  • Security lists

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

    This architecture adds ingress rules for TCP ports 16379 and 6379. Port 6379 serves Redis clients, and port 16379 is used by the Redis cluster bus.

  • Container Engine for Kubernetes

    Oracle Cloud Infrastructure Container Engine for Kubernetes is a fully managed, scalable, and highly available service that you can use to deploy your containerized applications to the cloud. You specify the compute resources that your applications require, and Container Engine for Kubernetes provisions them on Oracle Cloud Infrastructure in an existing tenancy. Container Engine for Kubernetes uses Kubernetes to automate the deployment, scaling, and management of containerized applications across clusters of hosts.

  • Autonomous Transaction Processing

    Oracle Autonomous Transaction Processing is a self-driving, self-securing, self-repairing database service that is optimized for transaction processing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.

  • Application Performance Monitoring

    Oracle Cloud Infrastructure Application Performance Monitoring Cloud Service (APM) is a platform as a service (PaaS) based solution that provides deep visibility into the performance of your application, from end user to application logs. The service integrates user experience information, application metrics, and log data analytics.

    • APM domain

      APM domain is an Oracle Cloud Infrastructure (OCI) resource type that contains the systems that APM monitors.

      Each APM domain is created in a standard OCI compartment, and you can define OCI access control policies to grant access to the APM domain to a specific set of users.

    • Data keys

      Data keys are required to ensure that APM accepts the observations gathered by the data sources. The data keys are generated when an APM domain is created, and are of two types: public data key and private data key

    • Data upload endpoint URL

      the data upload endpoint is the URL to which a data source sends observations. The data upload endpoint is generated when an APM domain is created and each APM domain has its data upload endpoint.

    • APM Java agents

      APM Java agents record spans and metrics from application servers and sends them to APM.

    • APM browser agents

      APM browser agents record user interactions with websites and send spans and metrics to APM.

    • Trace and Span

      A trace is the complete flow of a request as it passes through all the components of a distributed system in a given period of time. A span includes operations or logical units of work within a trace, and has a name, a start time, and a duration.

Recommendations

Your requirements might differ from the architecture described here. Use the following recommendations as a starting point for Oracle Cloud Infrastructure Application Performance Monitoring Cloud Service (APM).

  • APM browser agent

    Deploy the APM browser agent to the application’s web interface to enable end-user monitoring. Traces begin with the user action on the browser.

  • Synthetic monitoring

    Configure synthetic monitoring to monitor the application user interface and API endpoints to proactively detect availability and performance issues. You can create browser-based or REST-based monitors and schedule them to run periodically from global locations or from within the tenant virtual cloud network (VCN).

Considerations

When using Oracle Cloud Infrastructure Application Performance Monitoring Cloud Service (APM), consider the following.

  • Data keys

    Use custom data keys to manage the data sent to APM. In addition to the default data keys, you can create your own data keys for specific purposes. In the case of large deployments where many people use APM, provide different keys to different project owners so that you, as the APM domain administrator, can easily control which data comes into APM and which data does not.

    For example, APM receives a data set from an abandoned project and you want to turn off the data collection. However, because the project owner left the organization, it is hard to identify which agents to turn off. You can delete the data key used by the project to ignore the data associated with that data key.

  • Kubernetes resources

    When you configure microservices, consider using StatefulSets instead of Deployments when configuring Kubernetes pod resources to better track history in APM. Deployment pods change their ids whenever they are regenerated, StatefulSets are regenerated with the same ID index (for example, SS_0, SS_1, SS_2), which simplifies tracing the history with APM.

  • APM service name

    When provisioning the APM Java agent, you need to specify the service name used in APM. Consider using the Deployment/StatefulSet name as the APM serviceName for consistent tracing in APM.

  • Sampling

    Use sampling in high-volume applications (for example, 1 million transactions per second). APM collects all spans by default and lets you track all your application's transactions. However, this can generate many spans in the case of high-volume applications. In such cases, consider explicitly specifying the sampling configuration for better cost efficiency and to reduce the amount of trace data.

Deploy

Refer to the step-by-step instructions in the following Oracle LiveLabs workshops when deploying the APM Java agent described in this reference architecture.

Acknowledgments

  • Author: Yutaka Takatsu
  • Contributors: Avi Huber, Robert Lies