Monitor Your Workload

Understand and adopt a monitoring process for all Oracle Cloud Infrastructure services. The Monitoring service uses metrics to monitor resources and alarms to notify you when metrics meet alarm-specified triggers.

Analyze Operation Metrics

Network Architect, Cloud Operations Manager, Security Architect

Define, capture, and analyze operation metrics to gain visibility to events.

Oracle Cloud Infrastructure Monitoring service delivers the insight needed to understand the health of your resources, optimize the performance of your applications, and respond to anomalies in real time. You can set alarms to alert you in real time to important changes across your cloud infrastructure and services, enabling you to quickly take appropriate actions.

Define Health Checks

Cloud Architect, Cloud Operations Manager, Security Architect

The Oracle Cloud Infrastructure Health Checks service provides users with high frequency external monitoring to determine the availability and performance of any publicly facing service, including hosted websites, API endpoints, or externally facing load balancers.

Use Oracle Cloud Infrastructure Health Checks to ensure that you're immediately aware of any availability issue affecting your customers.

Monitor Compute Services

Cloud Operations Manager, Security Architect

Ensure that your operations team uses and applies compute services metrics.

Use metrics, alarms, and notifications to monitor the following:

  • Compute Instance: Monitor the health, capacity, and performance of your compute instances.
  • Infrastructure Health: Monitor the health, capacity, and performance of your compute bare metal instances
  • Oracle Cloud Infrastructure Functions: Monitor the health, capacity, and performance of functions that you've deployed to Oracle Cloud Infrastructure Functions.
  • Database Health: Monitor the health, capacity, and performance of your database services. Oracle Cloud Observability and Management Platform provides unified database monitoring and administration capabilities for cloud databases.
  • Operating System health: Implement OS-level logging tools, such as auditd.

Monitor Your Networks

Network Architect, Cloud Operations Manager, Security Architect

Adopt a mechanism where metrics are applied at different network endpoints. You can have metrics for internal virtual cloud network (VCN) and also for connectivity (Oracle Cloud Infrastructure FastConnect and IPsec VPN) or load balancer.

Use metrics, alarms, and notifications to monitor the following:

  • VNIC Metrics: Monitor the health, capacity, and performance of your Networking service VNICs (virtual network interface cards).
  • FastConnect Metrics: Monitor the health, capacity, and performance of the connection between your on-premises network and VCN (Oracle Cloud Infrastructure FastConnect connection).
  • VPN Connect Metrics: Monitor the health, capacity, and performance of the connection between your on-premises network and VCN (also known as IPSec VPN).
  • Service Gateway Metrics: Monitor the health, capacity, and performance of your service gateways, which enable on-premises hosts or VCN hosts to privately access Oracle services (such as Object Storage and Autonomous Database) without exposing the resources to the public internet.
  • Load Balancing Metrics: Monitor the health, capacity, and performance of your load balancers, which act as an intermediary for data traffic between clients and your application servers.
  • Customer Premises Equipment: Monitor the health, capacity, and performance of the border equipment on your network that connects to Oracle Cloud Infrastructure (OCI).
  • Enable VCN Flow logs and ingest them with Oracle Cloud Logging Analytics to analyze them and identify interesting patterns and gain insights, as needed.

Use the OCI Network Command Center and the available tools to monitor and observe your network. The OCI Network Command Center offers the following observability tools to support various operations use cases:

  • Network Visualizer

    Offers intuitive topology visualization to understand connections and relationships between your virtual network resources, inspect the configuration from one place, and visually troubleshoot any configuration issues.

  • Network Path Analyzer

    Allows you to troubleshoot complex virtual network configurations when you have reachability problems. Provides automated configuration analysis to determine the network path the traffic takes, identify routing and security configuration issues, and provide configuration information along the path.

  • Inter-Region Latency

    Provides real-time and historical latency information between OCI regions.

  • VCN Flow Logs

    Offers network traffic telemetry, critical to support your security and network operations use cases. You can gain extensive insights on the network traffic, stream the flow logs to your chosen tool using standard protocols such as Kafka, and archive the flow logs in OCI Object Storage for compliance purposes. VCN flow logs can be sent to Oracle Cloud Infrastructure Logging Analytics, OCI Object Storage, or to a third-party system.

  • Virtual Test Access Point (VTAP)

    Offers traffic mirroring capabilities that enable full packet capture for security analysis, troubleshooting applications, or network performance issues. VTAP is also useful for troubleshooting complex network problems by analyzing the packet contents and headers.

Monitor Data

Cloud Architect, Cloud Operations Manager, Security Architect

Monitor storage services that you use to store data by using metrics, alarms, and notifications.
  • Block Volume Metrics: Monitor the throughput and operations of Block Volumes and Boot Volumes.
  • Object Storage Metrics: Monitor the size and number of objects of your Object Storage buckets.
  • File System Metrics: Monitor the health, throughput, requests, and latency of your file systems and mount targets.

Create a Set of Alarms for Each Metric

DevOps Architect, Cloud Operations Manager, Security Architect

Create a set of alarms for your relevant service metrics. For each metric emitted by your resources, create alarms that define the resource behaviors.
  • At risk. The resource is at risk of becoming inoperable, as indicated by metric values.
  • Non-optimal. The resource is performing at non-optimal levels, as indicated by metric values.
  • Resource is up or down. The resource is either not reachable or not operating.

Tune Your Alarms

DevOps Architect, Cloud Operations Manager, Security Architect

Review your alarms on a regular basis, such as weekly, to ensure optimal configuration. Calibrate each alarm's threshold, severity, and notification details, including method, frequency, and targeted audience.
Metric thresholds that are too wide will alert unnecessarily, whereas thresholds that are too tight will reduce the time to take corrective actions prior to an outage.

An optimal alarm configuration addresses the following factors:

  • Criticality of the resource.
  • Appropriate resource behavior. Assess the behavior individually and within the context of the service ecosystem. Review metric value fluctuations for a given period of time and then adjust thresholds as needed.
  • Acceptable notification noise. Assess the notification method (for example, email or PagerDuty), the appropriate recipients, and the frequency of repeated notifications.

Enable Service Logging

DevOps Architect, Cloud Operations Manager, Security Architect

Service logs are logs that are emitted by Oracle Cloud Infrastructure (OCI) services, such as API Gateway, Events, Functions, Load Balancing, Object Storage, and VCN Flow Logs. Each of these supported services has a Logs resource that allows you to enable or disable logging for that service.

Enable service logging only if critical diagnostic information that describes how resources are performing and being accessed is required.

Consider ingesting service logs with Oracle Cloud Infrastructure Logging Analytics for better insight and detailed analysis of patterns and trends, as needed.

Create and maintain log retention policies.

Enable Oracle Cloud Infrastructure Ops Insights

Cloud Operations Manager, Security Architect

Oracle Cloud Infrastructure Ops Insights is an OCI native service that provides holistic insight into database and host resource utilization and capacity.

Oracle Cloud Infrastructure Ops Insights consists of the following integrated applications:

  • Capacity Planning
  • Oracle SQL Warehouse

Enable Oracle Cloud Guard

DevOps Architect, Cloud Operations Manager, Security Architect

Oracle Cloud Guard is a service that helps customers monitor, identify, achieve, and maintain a strong security posture on Oracle Cloud. Use the service to examine your Oracle Cloud Infrastructure resources for security weakness related to configuration, and your operators and users for risky activities.

Ensure that Oracle Cloud Guard is enabled at the root level of your tenancy to monitor all of your compartments.

Configure Auditing

DevOps Architect, Cloud Operations Manager, Security Architect

The Oracle Cloud Infrastructure Audit service automatically records calls to all supported Oracle Cloud Infrastructure (OCI) public application programming interface (API) endpoints as log events. Currently, all services support logging by Audit.

Review the following when configuring auditing:

  • Review audit retention duration. The default is set to 365 days.
  • If you have third party tools that must access OCI Audit data, configure a Service Connector to copy the OCI Audit data to Oracle Cloud Infrastructure Object Storage.
  • Ensure the retention period on the storage bucket is appropriately configured.
  • Consider ingesting a subset of audit logs with OCI Log Analytics for better insight and detailed analysis of patterns and trends, as needed.