Monitoring and Observability
The cloud has revolutionized the way businesses consume technology. In the past, businesses assumed ownership of and responsibility for all levels of technology, from infrastructure to software. Now, the cloud offers the potential for businesses to provision and consume resources as needed. Although the benefit is increased efficiency and productivity, the cloud introduces additional changes to operational models. Changes include:
- The shared responsibility model between the business and cloud providers
- The need for the business to maintain applications on premises and in multiple clouds
- IT team requirements to integrate existing toolsets with new cloud platform tools
Oracle Cloud Infrastructure (OCI) uses best-in-class operational processes to secure and monitor the underlying cloud infrastructure, such as data center facilities, hardware, and software systems. OCI provides tools that let you securely run your workloads and monitor your cloud resources, such as compute, network, storage, database, and their end-to-end applications.
What is Monitoring and Observability?
Monitoring is a tool or a service that watches a system's state and triggers a notification when a predefined condition is met.
Observability is a tool or a solution that uses a system's telemetry data, such as metrics, logs, and traces, to debug a problem and improve the performance.
How to Monitor OCI Services
OCI offers predefined sets of metrics, logs, and events to provide visibility into internal infrastructure and services. OCI also provides integrations with Grafana, PagerDuty, and Slack, in addition to supporting standards from the Cloud Native Computing Foundation (CNCF), such as CloudEvents and OpenTracing.
Metrics: You can see a comprehensive view of the metrics that are emitted by OCI services by using Metrics Explorer in the Console. For more information about OCI Monitoring and a list of services that emit metrics, see Overview of Monitoring.
Monitoring lets you define thresholds on resource metrics to generate alarms. Alarms can feed into the OCI Notifications service. You can also access metrics for integration with third party tools that are cloud vendor agnostic, such as Grafana, which is an open source platform for monitoring and analytics.
Logs: OCI Logging provides access to logs from OCI resources. Logs include critical diagnostic information that describe how resources are performing and being accessed. For more information, see Logging Overview.
Events: OCI services emit events. Events are structured messages that indicate a state change in OCI resources. Examples of events include:
- Creating an instance
- Deleting an instance
- Creating, updating, or deleting a resource
Events can be routed by the Notifications service to appropriate channels. Events can also feed into OCI Functions for actionable items, such as notifying a specific team about the launch of an instance. For more information about services that emit events, see Services that Produce Events.
Observability and Management Platform
OCI observability and management services are designed to meet the challenges of modern applications and solutions consisting of many components that use different technologies. This collection of services provides visibility and insight across cloud native and traditional technology, cloud providers, and on-premises environments, in addition to broad standards-based ecosystem support. The platform is designed to help you manage increasingly diverse and distributed IT portfolios, while reducing troubleshooting time, preventing outages, and enabling IT to manage applications from a business perspective. The services include metrics, events, logs, and beyond, providing flexibility depending on your need for customization. Services and features include:
Monitoring: Enables OCI services and customers to emit metrics about OCI customer resources. Monitoring capabilities include service metrics, Metrics Explorer, and alarm status and definition. You can configure alarms with thresholds to detect and respond to infrastructure and application anomalies.
Health Checks: Provides high frequency external monitoring to determine the availability and performance of any publicly facing service, including hosted websites, API endpoints, or externally facing load balancers.
Application Performance Monitoring (APM): Provides deep visibility into the performance of applications and enables DevOps professionals to diagnose issues quickly. APM is compatible with OpenTracing and OpenMetrics for distributed tracing, and combines end user monitoring with synthetic monitoring. It can also ingest telemetry from microservices deployed in Kubernetes or Docker containers.
Database Management: Provides comprehensive database performance and management capability for each type of Oracle Database, including OCI and on-premises. This capability significantly reduces the burden on database administrators by providing a full-lifecycle solution encompassing monitoring, performance management, tuning, and database administration.
Java Management Service: Can discover, monitor and manage your Java environment. Once deployed, the service discovers which versions of Java you have running and where, which ones require updates, and which applications are using them. This service is included with your Java SE Subscription.
Logging: Provides easy ingestion of log data and analysis to diagnose issues. You can integrate Logging with OCI services such as Streaming, Monitoring, OCI Functions, and Notifications. Logging uses the CloudEvents standard by the CNCF and uses CNCF Fluentd to ingest logs from hundreds of sources.
Logging Analytics: Machine learning-based cloud solution that monitors, aggregates, indexes, and analyzes all log data from your on-premises and multicloud environments.
Notifications: Highly available, low-latency publish and subscribe (pub/sub) service that sends alerts and messages to OCI Functions, email, and message delivery partners, including Slack and PagerDuty.
Operation Insights: Capacity planning tool that enables administrators to uncover performance issues, forecast consumption, and plan capacity by using machine learning-based analytics on historical and SQL data. Use these capabilities to make data-driven decisions to optimize resource use, proactively avoid outages, and improve performance.
Resource Manager: Terraform-based cloud infrastructure automation tool that provides infrastructure-as-code service capability.
Service Connector Hub: Helps cloud engineers manage and move data between OCI services and from OCI to third-party services.
Stack Monitoring: Enables proactive monitoring of applications and their underlying stack, including application servers and databases.
Enterprise Manager: Provides comprehensive monitoring and management for Oracle Applications, Middleware, Database, and Engineered Systems deployed in hybrid clouds.
Governance: Provides a comprehensive array of services to help you optimize costs, maximize utilization, and ensure adherence with corporate standards and legislative compliance for assets deployed in OCI.