Observability and Management in the Cloud

Use Oracle Cloud Infrastructure (OCI) Observability and Management services to gain visibility and actionable insights that help you manage your cloud environment.

OCI services related to observability and management let you monitor, audit, and alert to changes in your cloud environment. Insights driven by machine learning help you manage resources that are deployed on a variety of technology across all layers of the stack.

A top priority is to increase automation that enables scalable, predictable results. Use integrated functionality and automation for DevOps monitoring and IT operations management to prevent and solve IT problems.

The Observability and Management services in OCI include the following services:

Application Performance Monitoring

Application Performance Monitoring provides deep visibility into applications performance and enables rapid diagnostics for delivering a consistent level of service. This includes monitoring of multiple components and application logic spread across clients, third-party services, and back-end computing tiers, on premises or on the cloud. For an overview, see the Application Performance Monitoring product page.

Management Agent

Management Agent is a service that provides low latency interactive communication and data collection between OCI and other targets.

Database Management

Database Management provides comprehensive database performance diagnostics and management capabilities to monitor and manage Oracle databases. For an overview, see the Database Management product page.

Logging

Logging lets you enable, view, and manage all the logs in your tenancy, and provides access to logs from OCI resources. These logs include critical diagnostic information that describes how resources are performing and being accessed. For an overview, see the Logging product page.

Logging Analytics

Logging Analytics is a unified, integrated cloud solution that lets you monitor, aggregate, index, analyze, search, explore, and correlate all log data from your applications and system infrastructure. For an overview, see the Logging Analytics product page.

Java Management

Java Management is a reporting and management infrastructure within OCI. It lets you observe and manage the use of Java in your enterprise.

Monitoring

Use Monitoring to query metrics and manage alarms. Metrics and alarms help monitor the health, capacity, and performance of your cloud resources.

Ops Insights

Ops Insights provides comprehensive information about the resource use and capacity of databases and hosts. Use this service to analyze CPU and storage resources, forecast and plan capacity, and proactively identify SQL performance issues across a database fleet. For an overview, see the Ops Insights product page.

Service Connector Hub

Service Connector Hub is a cloud message bus platform that offers a single pane of glass for describing, running, and monitoring interactions when moving data between OCI services. For an overview, see the Service Connector Hub product page.

Stack Monitoring

Stack Monitoring enables proactive monitoring of applications and their underlying stack, including application servers and databases. By discovering all components of an application, including the application topology, Stack Monitoring automatically collects status, load, response, error, and utilization metrics for all application components. Each component of the application stack is referred to as a resource.

For an overview, see the Stack Monitoring product page.

To gain comprehensive visibility into your newly deployed cloud environment, use the Observability and Management services that meet your organization's needs.

Monitoring

Use metrics and alarms to monitor the health, capacity, and performance of your cloud resources.

The following table provides some key areas to consider when defining your organization's monitoring strategy.

Area Data to Monitor
Accounts

Account management

Subscription extension to other regions

Creation and deletion of administrative accounts

Quota breaches

Usage of cloud services

Number of instances

Storage, including latest, maximum, and average use

Object count, including procedures and views

Number of compartments

Resources overutilized or underutilized

Monthly or yearly utilization of services

Metrics

Business metrics

Security metrics

Performance metrics

Financial metrics

Invoicing

Budget

Billing

Compartment quotas

Operations

Define operational activities, or common tasks to be performed periodically.

Your operations strategy should include the following recommended activities:

  • Define operational procedures
  • Establish a maintenance schedule
  • Use configuration management utilities
  • Back up data in storage and databases
  • Verify backup integrity and process
  • Validate backup security and encryption
  • Replicate your data for disaster recovery
  • Automate OS management (OS Management service)
  • Automate patching and maintenance
  • Stay up to date with security patches, bug fixes, and enhancement updates
  • Manage service limits and be aware of fixed service limits
  • Factor failover usage in your service limits
  • Set compartment quotas

Auditing

Use the Audit service to gain visibility into activities related to your OCI resources and tenancy.

Audit log events can be used for security audits, to track usage of and changes to OCI resources, and to help ensure compliance with standards or regulations.

Your audit strategy should include the following recommended activities:

  • Configure auditing

  • Conduct audits

  • Audit your policies. For example:

    • Where are your policies defined, and do they comply with your organization's standards for compartment usage?
    • Audit the usage of dynamic groups. Do these groups grant excess privileges?
    • What services are configured and where are they located? Should any services be limited to certain compartments or groups?
    • Are there any duplicate statements that should be removed?
    • Are there policies that grant privileges to the whole tenancy?
    • Are there groups that have more privileges than they need?
  • Check long running workflows

  • Maintain system logs, application logs, and audit logs

  • Continuously scan for vulnerabilities

Events and Notifications

Use the Events service to create automation in your tenancy. Use the Notifications service to get messages whenever alarms, service connectors, and event rules are triggered.

Events are structured messages that indicate changes in resources. Events trigger actions such as notifications. Because rules for events apply to events in the compartment in which you create them and any child compartments, we recommend that you create rules at the root compartment level.

The Notifications service is a multi-channel messaging service that broadcasts messages to users and applications when events of interest occur within OCI. Messages can be sent to various subscription protocols, including email, HTTPS, PagerDuty, Slack, and the OCI Functions service. Some channels require confirmation of the subscription before it becomes active.

We recommend that you create at least one notification topic and subscription to receive messages related to Monitoring metrics.

Notifications should also be triggered when there are changes to the following resources:

  • Identity provider (IdP)
  • IdP group mapping
  • OCI Identity and Access Management (IAM) group
  • IAM policy
  • Users
  • Virtual cloud networks (VCNs)
  • Route tables
  • Security lists
  • Network security groups
  • Network gateways