Design a secure Observability and Management compartment structure on Oracle Cloud

Observability and Management services are the backbone of cloud infrastructure solutions, providing critical monitoring and observability insights into system availability, performance, and security posture.

The design of a secure and effective Observability and Management compartment organization within Oracle Cloud Infrastructure is a strategic endeavor that addresses these imperatives. The compartment structure serves as the foundation for effective cloud resources and data organization, access control governance. This reference architecture is designed with Oracle Cloud's best practices to deliver a holistic view of the system's health, behavior, and risks. It aims to empower stakeholders with actionable intelligence, enabling prompt and informed decision making.

Architecture

This reference architecture outlines the critical components and methodologies for designing an Oracle Cloud Observability and Management Platform compartment which provides security, resilience, and operational dexterity in the cloud ecosystem.

The following diagram illustrates this Oracle Cloud Observability and Management Platform compartment reference architecture.



oracle-cloud-observability-arch-oracle.zip

This compartment design reflects a basic functional structure observed across different organizations, where IT responsibilities are typically separated among networking, security, application development, and database administrators. The resources in this reference architecture are provisioned in the following compartments:
  • An Oracle Cloud Observability and Management Platform Compartment for all Observability and Management services metrics and resources, and the metrics namespace repository.
  • A Network compartment for all the networking resources, including the required network gateways and network related log data.
  • A Security compartment for security and events logging, key management, and security related logs.
  • An Application compartment for application-related services, including compute, storage, functions, streams, Kubernetes nodes, API gateway, and application related logs.
  • A Database compartment for all database resources, and database related logs.
  • An optional enclosing compartment containing all the above compartments.

The architecture has the following components:

  • Tenancy

    A tenancy is a secure and isolated partition that Oracle sets up within Oracle Cloud when you sign up for Oracle Cloud Infrastructure. You can create, organize, and administer your resources in Oracle Cloud within your tenancy. A tenancy is synonymous with a company or organization. Usually, a company will have a single tenancy and reflect its organizational structure within that tenancy. A single tenancy is usually associated with a single subscription, and a single subscription usually only has one tenancy.

  • Policy

    An Oracle Cloud Infrastructure Identity and Access Management policy specifies who can access which resources, and how. Access is granted at the group and compartment level, which means you can write a policy that gives a group a specific type of access within a specific compartment, or to the tenancy.

  • Compartment

    Compartments are cross-region logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize your resources in Oracle Cloud, control access to the resources, and set usage quotas. To control access to the resources in a given compartment, you define policies that specify who can access the resources and what actions they can perform.

  • Monitoring

    Oracle Cloud Infrastructure Monitoring service actively and passively monitors your cloud resources using metrics to monitor resources and alarms to notify you when these metrics meet alarm-specified triggers.

  • Alarms

    The Alarms feature of the Monitoring service works with the configured destination service to notify you when metrics meet alarm-specified triggers.

  • Logging
    Logging is a highly scalable and fully managed service that provides access to the following types of logs from your resources in the cloud:
    • Audit logs: Logs related to events emitted by the Audit service.
    • Service logs: Logs emitted by individual services such as API Gateway, Events, Functions, Load Balancing, Object Storage, and VCN flow logs.
    • Custom logs: Logs that contain diagnostic information from custom applications, other cloud providers, or an on-premises environment.
  • Events

    Oracle Cloud Infrastructure services emit events, which are structured messages that describe the changes in resources. Events are emitted for create, read, update, or delete (CRUD) operations, resource lifecycle state changes, and system events that affect cloud resources.

  • Service connectors

    Oracle Cloud Infrastructure Service Connector Hub is a cloud message bus platform that orchestrates data movement between services in OCI. You can use service connectors to move data from a source service to a target service. Service connectors also enable you to optionally specify a task (such as a function) to perform on the data before it is delivered to the target service.

    You can use Oracle Cloud Infrastructure Service Connector Hub to quickly build a logging aggregation framework for security information and event management (SIEM) systems.

  • Notifications

    The Oracle Cloud Infrastructure Notifications service broadcasts messages to distributed components through a publish-subscribe pattern, delivering secure, highly reliable, low latency, and durable messages for applications hosted on Oracle Cloud Infrastructure.

  • Streaming

    Oracle Cloud Infrastructure Streaming provides a fully managed, scalable, and durable storage solution for ingesting continuous, high-volume streams of data that you can consume and process in real time. You can use Streaming for ingesting high-volume data, such as application logs, operational telemetry, web click-stream data; or for other use cases where data is produced and processed continually and sequentially in a publish-subscribe messaging model.

  • Database management

    Database Management provides comprehensive database performance diagnostics and management capabilities for Oracle Databases and MySQL HeatWave DB systems. In addition, you can use Database Management to discover and monitor on-premises Oracle Database System (External Database System) components and Exadata Storage Infrastructure.

  • Operations Insights
    Oracle Cloud Infrastructure Operations Insights is an OCI native service that provides holistic insight into database and host resource utilization and capacity. With Operations Insights you can:
    • Analyze resource usage of databases and hosts across the enterprise.
    • Forecast future demand for resources based on historical trends.
    • Compare SQL Performance across databases and identify common patterns.
    • Identify SQL performance trends across enterprise-wide databases.
    • Analyze AWR statistics for database performance, diagnostics, and tuning across a fleet of databases.
    • Create and receive weekly News Reports giving you breakdowns of new utilization highs, big utilization changes and inventory changes across your fleet of databases, hosts, and Exadata systems.
  • Application Performance Monitoring

    Oracle Cloud Infrastructure Application Performance Monitoring provides deep visibility into the performance of applications and provides the ability to diagnose issues quickly in order to deliver a consistent level of service. This includes the monitoring of the multiple components and application logic spread across clients, third-party services, and back-end computing tiers, on premises or in the cloud.

  • Stack Monitoring

    Stack Monitoring lets you proactively monitor an application and its underlying application stack, including application servers and databases. It starts by discovering all components of the application, including the application topology. Once discovered, it automatically collects status, load, response, error, and utilization metrics for all application components.

  • Logging Analytics

    Oracle Logging Analytics is a cloud solution in Oracle Cloud Infrastructure that lets you index, enrich, aggregate, explore, search, analyze, correlate, visualize and monitor all log data from your applications and system infrastructure on cloud or on-premises.

  • Management Agent

    A Management Agent (agent) allows a service plug-in to collect data from the host where you install the Management Agent. It can connect to Oracle Cloud Infrastructure directly using the Management Agent cloud service. The Management Agent is installed on a host. It monitors and collects data from the sources that reside on hosts or virtual hosts.

  • Management Agent Gateway

    The Management Agent Cloud Service (MACS), also known as Management Agent service, is a cloud service from Oracle Cloud Infrastructure. It manages the Management Agents and their lifecycle. Management Agents allow Oracle Cloud services to interact and collect data from entities that are managed by them.

  • Management Dashboard
    Management Dashboard allows you to build performance monitoring, diagnosis and data analysis solutions on Oracle Cloud Infrastructure platform, infrastructure, and application resources. It has powerful data visualization options that gather real-time and historic data and display it in widgets. Management Dashboard is available as part of the following Oracle Cloud Infrastructure Observability and Management services:
    • Application Performance Monitoring
    • Database Management
    • Logging Analytics
    • Management Agent
    • Operations Insights
  • Virtual cloud network (VCN) and subnet

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Internet gateway

    The internet gateway allows traffic between the public subnets in a VCN and the public internet.

  • Dynamic routing gateway (DRG)

    The DRG is a virtual router that provides a path for private network traffic between VCNs in the same region, between a VCN and a network outside the region, such as a VCN in another Oracle Cloud Infrastructure region, an on-premises network, or a network in another cloud provider.

  • Network address translation (NAT) gateway

    A NAT gateway enables private resources in a VCN to access hosts on the internet, without exposing those resources to incoming internet connections.

  • Service gateway

    The service gateway provides access from a VCN to other services, such as Oracle Cloud Infrastructure Object Storage. The traffic from the VCN to the Oracle service travels over the Oracle network fabric and does not traverse the internet.

  • Oracle Services Network

    The Oracle Services Network (OSN) is a conceptual network in Oracle Cloud Infrastructure that is reserved for Oracle services. These services have public IP addresses that you can reach over the internet. Hosts outside Oracle Cloud can access the OSN privately by using Oracle Cloud Infrastructure FastConnect or VPN Connect. Hosts in your VCNs can access the OSN privately through a service gateway.

  • Network security group (NSG)

    Network security group (NSG) acts as a virtual firewall for your cloud resources. With the zero-trust security model of Oracle Cloud Infrastructure, all traffic is denied, and you can control the network traffic inside a VCN. An NSG consists of a set of ingress and egress security rules that apply to only a specified set of VNICs in a single VCN.

  • Vault

    Oracle Cloud Infrastructure Vault enables you to centrally manage the encryption keys that protect your data and the secret credentials that you use to secure access to your resources in the cloud. You can use the Vault service to create and manage vaults, keys, and secrets.

  • Cloud Guard

    You can use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for certain risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.

  • Security zone

    Security zones ensure Oracle's security best practices from the start by enforcing policies such as encrypting data and preventing public access to networks for an entire compartment. A security zone is associated with a compartment of the same name and includes security zone policies or a "recipe" that applies to the compartment and its sub-compartments. You can't add or move a standard compartment to a security zone compartment.

  • Vulnerability Scanning Service

    Oracle Cloud Infrastructure Vulnerability Scanning Service helps improve the security posture in Oracle Cloud by routinely checking ports and hosts for potential vulnerabilities. The service generates reports with metrics and details about these vulnerabilities.

  • Bastion service

    Oracle Cloud Infrastructure Bastion provides restricted and time-limited secure access to resources that don't have public endpoints and that require strict resource access controls, such as bare metal and virtual machines, Oracle MySQL Database Service, Autonomous Transaction Processing (ATP), Oracle Container Engine for Kubernetes (OKE), and any other resource that allows Secure Shell Protocol (SSH) access. With Oracle Cloud Infrastructure Bastion service, you can enable access to private hosts without deploying and maintaining a jump host. In addition, you gain improved security posture with identity-based permissions and a centralized, audited, and time-bound SSH session. Oracle Cloud Infrastructure Bastion removes the need for a public IP for bastion access, eliminating the hassle and potential attack surface when providing remote access.

  • Object storage

    Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Container Engine for Kubernetes

    Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE) is a fully managed, scalable, and highly available service that you can use to deploy your containerized applications to the cloud. You specify the compute resources that your applications require, and Container Engine for Kubernetes provisions them on Oracle Cloud Infrastructure in an existing tenancy. Container Engine for Kubernetes uses Kubernetes to automate the deployment, scaling, and management of containerized applications across clusters of hosts.

  • Compute

    The Oracle Cloud Infrastructure Compute service enables you to provision and manage compute hosts in the cloud. You can launch compute instances with shapes that meet your resource requirements for CPU, memory, network bandwidth, and storage. After creating a compute instance, you can access it securely, restart it, attach and detach volumes, and terminate it when you no longer need it.

  • Autonomous database

    Oracle Cloud Infrastructure autonomous databases are fully managed, preconfigured database environments that you can use for transaction processing and data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.

  • Exadata Database on Dedicated Infrastructure

    Exadata Cloud Infrastructure allows you to leverage the power of Exadata in the cloud. You can provision flexible X8M and X9M systems that allow you to add database compute servers and storage servers to your system as your needs grow. X8M and X9M systems offer RDMA over Converged Ethernet (RoCE) networking for high bandwidth and low latency, persistent memory (PMEM) modules, and intelligent Exadata software. X8M and X9M systems can be provisioned using an shape equivalent to a quarter rack X8 or X9M system, and then database and storage servers can be added at any time after provisioning.

  • Oracle Base Database Service

    Oracle Base Database Service enables you to maintain absolute control over your data while using the combined capabilities of Oracle Database and Oracle Cloud Infrastructure. Oracle Base Database Service offers database systems (DB systems) on virtual machines. They are available as single-node DB systems and multi-node RAC DB systems on Oracle Cloud Infrastructure (OCI).

  • File storage

    The Oracle Cloud Infrastructure File Storage service provides a durable, scalable, secure, enterprise-grade network file system. You can connect to a File Storage service file system from any bare metal, virtual machine, or container instance in a VCN. You can also access a file system from outside the VCN by using Oracle Cloud Infrastructure FastConnect and IPSec VPN.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.
  • Observability and Management compartment for metrics data
    • Create a dedicated Oracle Cloud Observability and Management Platform compartment under tenancy (root compartment).
    • Store all custom metrics-related resources, such as Oracle Cloud Observability and Management Platform advanced services’ metric namespaces, metrics for Stack Monitoring, Database Management Service, Operations Insights, as well as user defined custom metrics, alarms and notifications in the Oracle Cloud Observability and Management Platform compartment.
    • Define policies to grant appropriate read and write access to the relevant teams, ensuring they can access metrics data and metric namespaces while adhering to the least privilege principal. For example, Observability and Management admin team has manage permission on the metrics data in the Observability and Management compartment whereas DBA and Application teams have read or use permissions on the metrics in the Observability and Management compartment.
  • Oracle Cloud Observability and Management Platform compartment for log data
    • Use existing cloud resources’ compartments for storing cloud resources’ log data for each support team or business unit to access their own log data.
    • Create Logging and Logging Analytics log groups in the same compartment as cloud resources.
    • Store all log-related resources, including Logging Analytics log groups, log entities, saved searches, dashboards, and retention policies, within these compartments.
    • Define strict access policies to ensure each team can access their own logs while restricting access to other teams' log data.
    • Define log compartments within Network Compartment, Security Compartment, Application Compartment, and Database Compartment respectively.

Considerations

When implementing this reference architecture, consider these options.

  • Oracle Cloud Observability and Management Platform related compartments hierarchy design
    The Oracle Cloud Observability and Management Platform related compartments hierarchy design is to help the operation and engineering teams to simplify the cloud operations while maintaining proper cloud governance and security posture. Two distinguishable monitoring and logging related data in OCI:
    • Metric Data: Aggregated data points such as comprehensive availability and performance metrics collected by advanced Observability and Management services from cloud resources.
    • Logging Data: Raw log data from the host, database, middleware, or application such as syslog, database alert logs, raw log data may contain sensitive data.
    • The nature of the log data that might contain application or user namespace sensitive information.
    • The log data security perimeter should be defined in the same compartment as the cloud resources.
    • The support team or business unit who needs access to the log data should have the same level of access to the cloud resources and its underlying configuration.
  • Other Oracle Cloud Observability and Management Platform resources in OCI
    • Service Metrics: OCI cloud resources provide basic service metrics by default in the OCI Monitoring Service. The out-of-the-box service metrics are stored in the designated metric namespaces in the same compartment as the cloud resources. The cloud resources service metrics are free and you cannot change the compartment of the service metrics.
    • Logging Analytics Log Source, Parsers, and Fields are tenancy level resources, they will be associated with the tenancy (root compartment).
    • Logs generated by Oracle Cloud Observability and Management Platform compartment resources for diagnostics such as Management Agent logs, Service Connector logs, should be stored within the Oracle Cloud Observability and Management Platform compartment.

Acknowledgments

  • Author: Royce Fu
  • Contributors: Leon Shaner, Sriram Vrinda