Monitor Your Workload

Understand and adopt a monitoring process for all Oracle Cloud Infrastructure services. The Monitoring service uses metrics to monitor resources and alarms to notify you when metrics meet alarm-specified triggers.

Analyze Operation Metrics

Network Architect, Infrastructure Architect

Define, capture, and analyze operation metrics to gain visibility to events.

Oracle Cloud Infrastructure Monitoring service delivers the insight needed to understand the health of your resources, optimize the performance of your applications, and respond to anomalies in real time. You can set alarms to alert you in real time to important changes across your cloud infrastructure and services, enabling you to quickly take appropriate actions.

Define Health Checks

Cloud Architect, Infrastructure Architect

The Oracle Cloud Infrastructure Health Checks service provides users with high frequency external monitoring to determine the availability and performance of any publicly facing service, including hosted websites, API endpoints, or externally facing load balancers.
Use Oracle Cloud Infrastructure Health Checks to ensure that you're immediately aware of any availability issue affecting your customers.

Monitor Compute Services

Infrastructure Architect

Ensure that your operations team uses and applies compute services metrics.

Use metrics, alarms, and notifications to monitor the following:

  • Compute Instance: Monitor the health, capacity, and performance of your compute instances.
  • Infrastructure Health: Monitor the health, capacity, and performance of your compute bare metal instances
  • Oracle Cloud Infrastructure Functions: Monitor the health, capacity, and performance of functions that you've deployed to Oracle Cloud Infrastructure Functions.
  • Database Health: Monitor the health, capacity, and performance of your database services.

Monitor Your Networks

Network Architect, Infrastructure Architect

Adopt a mechanism where metrics are applied at different network endpoints. You can have metrics for internal virtual cloud network (VCN) and also for connectivity (Oracle Cloud Infrastructure FastConnect and IPsec VPN) or load balancer.

Use metrics, alarms, and notifications to monitor the following:

  • VNIC Metrics: Monitor the health, capacity, and performance of your Networking service VNICs (virtual network interface cards).
  • FastConnect Metrics: Monitor the health, capacity, and performance of the connection between your on-premises network and VCN (Oracle Cloud Infrastructure FastConnect connection).
  • VPN Connect Metrics: Monitor the health, capacity, and performance of the connection between your on-premises network and VCN (also known as IPSec VPN).
  • Service Gateway Metrics: Monitor the health, capacity, and performance of your service gateways, which enable on-premises hosts or VCN hosts to privately access Oracle services (such as Object Storage and Autonomous Database) without exposing the resources to the public internet.
  • Load Balancing Metrics: Monitor the health, capacity, and performance of your load balancers, which act as an intermediary for data traffic between clients and your application servers.
  • Customer Premises Equipment: Monitor the health, capacity, and performance of the border equipment on your network that connects to Oracle Cloud Infrastructure (OCI).
  • Enable VCN Flow logs and ingest them with Oracle Cloud Logging Analytics to analyze them and identify interesting patterns and gain insights, as needed.

Monitor Data

Cloud Architect, Infrastructure Architect, Security Architect

Monitor storage services that you use to store data by using metrics, alarms, and notifications.
  • Block Volume Metrics: Monitor the throughput and operations of Block Volumes and Boot Volumes.
  • Object Storage Metrics: Monitor the size and number of objects of your Object Storage buckets.
  • File System Metrics: Monitor the health, throughput, requests, and latency of your file systems and mount targets.

Create a Set of Alarms for Each Metric

DevOps Architect, Infrastructure Architect

Create a set of alarms for your relevant service metrics. For each metric emitted by your resources, create alarms that define the resource behaviors.
  • At risk. The resource is at risk of becoming inoperable, as indicated by metric values.
  • Non-optimal. The resource is performing at non-optimal levels, as indicated by metric values.
  • Resource is up or down. The resource is either not reachable or not operating.

Tune Your Alarms

DevOps Architect, Infrastructure Architect

Review your alarms on a regular basis, such as weekly, to ensure optimal configuration. Calibrate each alarm's threshold, severity, and notification details, including method, frequency, and targeted audience.
Metric thresholds that are too wide will alert unnecessarily, whereas thresholds that are too tight will reduce the time to take corrective actions prior to an outage.

An optimal alarm configuration addresses the following factors:

  • Criticality of the resource.
  • Appropriate resource behavior. Assess the behavior individually and within the context of the service ecosystem. Review metric value fluctuations for a given period of time and then adjust thresholds as needed.
  • Acceptable notification noise. Assess the notification method (for example, email or PagerDuty), the appropriate recipients, and the frequency of repeated notifications.

Enable Service Logging

DevOps Architect, Infrastructure Architect, Security Architect

Service logs are logs that are emitted by Oracle Cloud Infrastructure (OCI) services, such as API Gateway, Events, Functions, Load Balancing, Object Storage, and VCN Flow Logs. Each of these supported services has a Logs resource that allows you to enable or disable logging for that service.
Enable it only if critical diagnostic information that describes how resources are performing and being accessed is required.

Consider ingesting Service Logs with Oracle Cloud Logging Analytics for better insight and detailed analysis of patterns and trends, as needed.

Enable OCI Operations Insights

Infrastructure Architect

Oracle Cloud Infrastructure (OCI) Operations Insights is an OCI native service that provides 360-degree insight into the resource utilization and capacity of autonomous databases.

Operations Insights consists of the following integrated applications:

  • Capacity Planning
  • Oracle SQL Warehouse

Enable Oracle Cloud Guard

DevOps Architect, Infrastructure Architect, Security Architect

Oracle Cloud Guard is a service that helps customers monitor, identify, achieve, and maintain a strong security posture on Oracle Cloud. Use the service to examine your Oracle Cloud Infrastructure resources for security weakness related to configuration, and your operators and users for risky activities.

Ensure that Oracle Cloud Guard is enabled at the root level of your tenancy to monitor all of your compartments.

Configure Auditing

DevOps Architect, Infrastructure Architect, Security Architect

The Oracle Cloud Infrastructure Audit service automatically records calls to all supported Oracle Cloud Infrastructure (OCI) public application programming interface (API) endpoints as log events. Currently, all services support logging by Audit.

Review the following when configuring auditing:

  • Ensure audit retention is set to 365 Days.
  • If you have third party tools that must access OCI Audit data, configure a Service Connector to copy the OCI Audit data to Oracle Cloud Infrastructure Object Storage.
  • Ensure the retention period on the storage bucket is appropriately configured.
  • Consider ingesting a subset of audit logs with OCI Log Analytics for better insight and detailed analysis of patterns and trends, as needed.