Enable Oracle Cloud Infrastructure Service Mesh on your Kubernetes applications
Oracle Cloud Infrastructure (OCI) customers have increasingly moved towards a microservices architecture that along with its benefits also brings new challenges. In a microservices architecture, monolithic applications are broken down into smaller microservices that communicate over the network using an API. This causes a surge of network traffic and increases the complexity and overall attack surface in the architecture.
- Allows you to control the microservices traffic flow.
- Provides visibility into your applications.
- Enables microservices to connect securely without any changes to the application code.
OCI Service Mesh Capabilities
- Enforcement of security-related policies
OCI Service Mesh uses access policies to define access rules. Access policies enforce the communication between microservices and only allow validated requests that originate inside and outside the application. Access policies are also used to define permitted communication to external services.
- Zero Trust
OCI Service Mesh implements a zero-trust security architecture automatically across all microservices. Data between microservices are encrypted. Microservice-to-microservice identification is required at the beginning of the communication. The two parties must exchange credentials with their identity information. This allows the services to identify each other to determine if they are authorized to interact. This is implemented with mutual TLS with automated certificate and key rotation using the Oracle Cloud Infrastructure Certificates (OCI Certificates) service and Oracle Key Management Cloud Service to manage certificates and keys.
- Traffic Shifting
OCI Service Mesh allows you to do canary deployments. When you publish a new version of your code to production, you only allow a portion of traffic to reach it. The feature enables you to deploy quickly and causes the least disturbance to your application. You can define routing rules that govern all inter-microservice communication inside the mesh. You might route a portion of the traffic to a certain version of the service.
- Monitoring and Logging
OCI Service Mesh is uniquely positioned to provide telemetry information as all inter-microservice communication must pass through it. This allows the service mesh to capture telemetry data such as source, destination, protocol, URL, duration, status code, latency, logging, and other detailed statistics. You can export logging information to the Oracle Cloud Infrastructure Logging (OCI Logging) service. OCI Service Mesh provides two types of logs: error logs and traffic logs. You can use these logs to debug 404 or 505 issues or generate log-based statistics. Metrics and telemetry data can be exported to Prometheus and visualized with Grafana. Both can be deployed directly into an OKE cluster.
Architecture
Oracle Cloud Infrastructure Service Mesh (OCI Service Mesh) uses a sidecar model. This architecture encapsulates the code implementing the network functionality into a network proxy and then relies on traffic from and to services to be redirected into the sidecar proxy. It is called a sidecar because a proxy is attached to each application, much like a sidecar attached to a motorbike. In OKE, the application container sits alongside the proxy sidecar container in the same pod. Since they are in the same pod, they share the same network namespace and IP address, allowing the containers to communicate via “localhost.”
- Control plane
The OCI Service Mesh control plane manages and configures the entire collection of proxies to route traffic. It handles forwarding, health checking, load balancing, authentication, authorization, and aggregation of telemetry. The control plane interacts with the OCI certificate service and OCI key management service to provide each proxy with its certificate.
- Data plane
The data plane is composed of the collection of sidecar proxies deployed in the environment and is responsible for the security, network functions, and observability of the application. They also collect and report telemetry on all mesh traffic. The Envoy proxy is used for the data plane of OCI Service Mesh.
The following diagram illustrates this reference architecture.
oci_service_mesh_oke_arch-oracle.zip
This reference architecture shows an application deployed in an OKE cluster with three services. The namespace in which the application is deployed has been “meshified”. A “meshified” namespace indicates that services deployed within the namespace will be part of a service mesh, and each new pod deployed will be injected with an envoy proxy container. As each pod is deployed, configurations and certificates are sent to each of the proxy containers by the OCI Service Mesh control plane. The OCI Service Mesh control plane communicates with the OCI Certificates service and Oracle Key Management Cloud Service to obtain certificates for each proxy.
An ingress gateway is deployed to provide external access to the application. The ingress gateway is part of the OCI Service Mesh data plane and is also an envoy proxy that receives configuration and certificates from the OCI Service Mesh control plane.
It is the responsibility of the proxy container to perform service discovery, traffic encryption, and authentication with the destination service. The proxy containers also apply network policies such as how traffic is distributed between different service versions and enforce access policies. The ingress gateway performs the same function for traffic outside the service mesh.
Prometheus and Grafana are deployed within the OKE cluster in a separate namespace that is not part of the service mesh. The service mesh data plane sends key operating statistics like latency, failures, requests, and telemetry to the Prometheus deployment. Grafana pulls data from the Prometheus deployment, which can be used to create dashboards for visualization.
OCI Service Mesh is integrated with the OCI Logging service, and logging can be enabled when the service mesh is created. OCI Service Mesh provides two types of logs: error logs and traffic logs. These logs can be used to debug 404 or 505 issues or generate log-based statistics.
This architecture has the following OCI Services:
- OCI Kubernetes Engine (OKE)
Delivers a highly available, scalable production-ready Kubernetes cluster to deploy your containerized applications in the cloud.
- Load Balancer
Provides access to the ingress gateway in the OKE cluster. The ingress directs traffic to requested services in the OKE cluster.
- Certificate Authority
Manages the TLS certificates for the OCI Service Mesh service.
- Key Management
Manages the keys used by the Certificate Authority service.
The architecture has the following components:
- Region
An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
- Virtual cloud network (VCN) and subnets
A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.
- Security list
For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.
OCI Service Mesh Resources
Kubernetes resources | Description |
Mesh | The mesh at the top level is the logical container for all microservices in your application. |
Virtual service | A virtual service is a logical representation of microservice in a service mesh. |
Virtual deployments | A virtual deployment is a logical representation of a deployment that can be grouped into a virtual service. This gives the ability for a virtual service to distribute traffic between different versions of the microservice. A Kubernetes microservice can have multiple versions that are separate deployments in the cluster. |
Virtual deployment bindings | Virtual bindings are used to associate a set of pods to a virtual deployment. |
Virtual service route tables | A virtual route table defines how traffic is distributed between virtual deployments in a virtual service based on protocol and path. Each virtual service will have a virtual route table. |
Ingress gateway deployment | An ingress gateway deployment will create ingress pods that act as load balancers for incoming traffic to the mesh. |
Ingress gateways | Ingress gateways manage ingress traffic into the mesh. The ingress gateways define a set of rules for how external traffic communicates with the mesh such as whether encryption is required on all incoming and outgoing traffic. These rules are applied to the ingress gateway deployment. |
Ingress gateway route tables | Ingress gateway route tables are associated with an ingress gateway deployment. The route table defines which virtual services within the mesh are accessible from the ingress gateway deployment. |
Access policies | Access policies are a set rules that define allowed communication between microservices in the mesh as well as external applications. |
The following diagram depicts how the configured OCI Service Mesh resources: Access Policies, Ingress Gateway, Virtual Service, and Virtual Deployment map to your application resources: K8s Service, K8s Service Load Balancer, Deployments, and Pods.
Description of the illustration oci_service_mesh_oke_config.png
Recommendations
- VCN
When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.
Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.
After you create a VCN, you can change, add, and remove its CIDR blocks.
When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.
- Load balancer bandwidth
While creating the load balancer, you can either select a predefined shape that provides a fixed bandwidth, or specify a custom (flexible) shape where you set a bandwidth range and let the service scale the bandwidth automatically based on traffic patterns. With either approach, you can change the shape at any time after creating the load balancer.
Considerations
Consider the following options when deploying this reference architecture.
- Cost
There is no charge for the control plane of OCI Service Mesh on the OKE cluster. Customers are charged for the resource utilization of the proxy containers for the Service Mesh data plane. In practice, however, customers are already paying for the resources for the node pool in an OKE cluster, and unless the utilization of the proxy containers pushes the utilization of the node pool over 100 percent, there is no additional charge for adding OCI Service Mesh to your microservice architecture.
- Availability
The control plane of the OCI Service Mesh is always deployed with high availability.