Overview of Monitoring

The Oracle Cloud Infrastructure Monitoring service enables you to actively and passively monitor your cloud resources using the Metrics and Alarms features. Learn how Monitoring works.

This image shows metrics and alarms as used in the Monitoring service.

Tip

Watch a video introduction to the service.

How Monitoring Works

The Monitoring service uses metrics  to monitor resources and alarms  to notify you when these metrics meet alarm-specified triggers.

Metrics are emitted to the Monitoring service as raw data points , or timestamp-value pairs, along with dimensions  and metadata. Metrics come from a variety of sources:

You can transfer metrics from the Monitoring service using Service Connector Hub. For more information, see details for setting up a Monitoring source for a service connector.

Metric data posted to the Monitoring service is only presented to you or consumed by the Oracle Cloud Infrastructure features that you enable to use metric data.

When you query a metric, the Monitoring service returns aggregated data according to the specified parameters. You can specify a range (such as the last 24 hours), statistic , and interval . The Console displays one monitoring chart per metric for selected resources. The aggregated data in each chart reflects your selected statistic and interval. API requests can optionally filter by dimension  and specify a resolution . API responses include the metric name along with its source compartment and metric namespace . You can feed the aggregated data into a visualization or graphing library.

Metric and alarm data is accessible via the Console, CLI, and API. For retention periods, see Storage Limits.

The Alarms feature of the Monitoring service publishes alarm messages  to configured destinations, such as topics in Notifications and streams in Streaming.

Metrics Feature Overview

The Metrics feature relays metric  data about the health, capacity, and performance of your cloud resources.

A metric is a measurement related to health, capacity, or performance of a given resource . Resources, services, and applications emit metrics to the Monitoring service. Common metrics reflect data related to:

  • Availability and latency
  • Application uptime and downtime
  • Completed transactions
  • Failed and successful operations
  • Key performance indicators (KPIs), such as sales and engagement quantifiers

By querying Monitoring for this data, you can understand how well the systems and processes are working to achieve the service levels you commit to your customers. For example, you can monitor the CPU utilization and disk reads of your compute instances . You can then use this data to determine when to launch more instances to handle increased load, troubleshoot issues with your instance, or better understand system behavior.

Example Metric: Failure Rate

For application health, one of the common KPIs is failure rate, for which a common definition is the number of failed transactions divided by total transactions. This KPI is usually delivered through application monitoring and management software.

As a developer, you can capture this KPI from your applications using custom metrics. Simply record observations every time an application transaction takes place and then post that data to the Monitoring service. In this case, set up metrics to capture failed transactions, successful transactions, and transaction latency (time spent per completed transaction).

Alarms Feature Overview

Use alarms to monitor the health, capacity, and performance of your cloud resources.

Resources emit metric data points to Monitoring. When triggered, alarms send messages to the configured destination. For Notifications, messages are sent to subscriptions in the configured topic. For Streaming, messages are sent to the configured stream).

The Alarms feature of the Monitoring service works with the configured destination service to notify you when metrics meet alarm-specified triggers. The previous illustration depicts the flow, starting with resources emitting metric data points to Monitoring. When triggered, an alarm  sends an alarm message to the configured destination. For Notifications, messages are sent to subscriptions  in the configured topic. For Streaming, messages are sent to the configured stream. (This illustration does not cover raw and aggregated metric data. For these details, see the "Monitoring Overview" illustration at the top of this page.)

When configured, repeat notifications remind you of a continued firing state at the configured repeat interval. You are also notified when an alarm transitions back to the OK state, or when an alarm is reset.

Alarm Evaluations

Monitoring evaluates alarms once per minute to determine alarm status.

When the alarm splits notifications, Monitoring evaluates each tracked metric stream. If the evaluation of that metric stream indicates a new FIRING status or other qualifying event, then Monitoring sends an alarm message.

Monitoring tracks metric streams per alarm for qualifying events, but messages are subject to the destination service limits.

Time Needed to Reflect Alarm Updates

Updates to alarms take up to five minutes to be reflected everywhere.

For example, if you update an alarm to split notifications, then it might take up to five minutes for metric stream status to be populated in the Console.

Searching for Alarms

Search for alarms using supported attributes.

For more information about Search, see Overview of Search. For attribute descriptions, see Alarm Reference.

Search-Supported Attributes for Alarms
  • id

  • displayName

  • compartmentId

  • metricCompartmentId

  • namespace

  • query

  • severity

  • destinations

  • suppression

  • isEnabled

  • lifecycleState

  • timeCreated

  • timeUpdated

  • tags

Message Types

The message type indicates the reason that the message was sent.

  • OK_TO_FIRING: The alarm changed from OK status to FIRING status.
  • FIRING_TO_OK: The alarm changed from FIRING status to OK status.
  • REPEAT: The alarm is maintaining a FIRING status and repeat notifications are configured.
  • RESET: The alarm is not detecting the metric firing; the metric is no longer being emitted. The resource that was emitting the metric might have been moved or terminated. Monitoring stops tracking metric streams associated with RESET messages.

    Important

    When a RESET status change occurs, determine the health of the resource.

Message Format and Examples

Review parameter descriptions for alarm messages and example messages.

Alarm Message Format

Review parameter descriptions for alarm messages.

Parameter Description

dedupekey

Required

string

Unique identifier for all the alarm messages of the alarm. Use for de-duplication.

Note: To de-duplicate multiple occurrences of the same message, use dedupekey and timestamp.

title

Required

string

The alarm's configured display name.

body

string

The alarm's configured message body.

type

Required

string

The reason for sending the notification message. Valid values: See Message Types.

severity

Required

string

The highest severity level of the listed alarms. Valid values: CRITICAL, ERROR, WARNING, and INFO

timestampEpochMillis

Required

long

The time when the alarm was triggered, in milliseconds since epoch time.

timestamp

Required

string

The time when the alarm was triggered, in ISO-8601 format. Same information as in timestampEpochMillis.

alarmMetaData

Required

array of objects

List of alarms related to this notification message. See alarmMetaData format following this table.

version

Required

int

The version of the alarm message format.

alarmMetaData format
Parameter Description

id

Required

string

The alarm OCID .

status

Required

string

The alarm state. Valid values: OK, FIRING

severity

Required

string

The alarm severity level. Valid values: CRITICAL, ERROR, WARNING, INFO

query

Required

string

The alarm's configured query.

CpuUtilization[1m]{availabilityDomain="cumS:PHX-AD-1"}.absent()

totalMetricsFiring

Required

int

The number of metric streams represented in this notification message.

dimensions

array of objects

List of dimension key-value pairs that identify each metric stream. The list is limited to a hundred entries. Empty for an alarm with a status of OK.

Example Alarm Messages

Review examples of alarm messages by destination.

Notifications Destination

Review example messages, by subscription protocol and message format, for an alarm titled "High CPU Utilization" that is continuing to be in the FIRING state. In this example, the alarm destination is a topic (Notifications service). The message includes two (grouped) metric streams: one for "myinstance1" and another for "myinstance2."

Note

You can optionally split messages by metric stream.
Email (Formatted)

The following example shows an alarm message sent to an Email subscription when the alarm is configured for Send formatted messages. Email is available when the alarm destination is a topic (Notifications service).

For supported subscription protocols and message types, see Friendly Formatting)

Example of a formatted email message sent by an alarm.
Email (Pretty JSON)

The following example shows an alarm message sent to an Email subscription when the alarm is configured for Send Pretty JSON messages (raw text with line breaks). Email is available when the alarm destination is a topic (Notifications service).

For supported subscription protocols and message types, see Friendly Formatting)

{
  "dedupeKey": "exampleuniqueID",
  "title": "High CPU Utilization",
  "body": "Follow runbook at http://example.com/runbooks",
  "type": "REPEAT",
  "severity": "CRITICAL",
  "timestampEpochMillis": 1542406320000,
  "timestamp": "2018-11-16T22:12:00Z",
  "alarmMetaData": [
    {
      "id": "ocid1.alarm.oc1.iad.exampleuniqueID",
      "status": "FIRING",
      "severity": "CRITICAL",
      "query": "CpuUtilization[1m].mean() > 0",
      "totalMetricsFiring": 2,
      "dimensions": [
        {
          "instancePoolId": "Default",
          "resourceDisplayName": "myinstance1",
          "faultDomain": "FAULT-DOMAIN-1",
          "resourceId": "ocid1.instance.oc1.iad.exampleuniqueID",
          "imageId": "ocid1.image.oc1.iad.exampleuniqueID",
          "availabilityDomain": "szYB:US-ASHBURN-AD-1",
          "shape": "VM.Standard2.1",
          "region": "us-ashburn-1"
        },
        {
          "instancePoolId": "Default",
          "resourceDisplayName": "myinstance2",
          "faultDomain": "FAULT-DOMAIN-3",
          "resourceId": "ocid1.instance.oc1.iad.exampleuniqueID",
          "imageId": "ocid1.image.oc1.iad.exampleuniqueID",
          "availabilityDomain": "szYB:US-ASHBURN-AD-1",
          "shape": "VM.Standard2.1",
          "region": "us-ashburn-1"
        }
      ]
    }
  ],
  "version": 1.1
}
Email (Raw)

The following example shows an alarm message sent to an Email subscription when the alarm is configured for Send raw messages. Email is available when the alarm destination is a topic (Notifications service).

{"dedupeKey":"exampleuniqueID","title":"High CPU Utilization","body":"Follow runbook at http://example.com/runbooks","type":"REPEAT","severity":"CRITICAL","timestampEpochMillis":1542406320000,"timestamp":"2018-11-16T22:12:00Z","alarmMetaData":[{"id":"ocid1.alarm.oc1.iad.exampleuniqueID","status":"FIRING","severity":"CRITICAL","query":"CpuUtilization[1m].mean()>0","totalMetricsFiring":2,"dimensions":[{"instancePoolId":"Default","resourceDisplayName":"myinstance1","faultDomain":"FAULT-DOMAIN-1","resourceId":"ocid1.instance.oc1.iad.exampleuniqueID","imageId":"ocid1.image.oc1.iad.exampleuniqueID","availabilityDomain":"szYB:US-ASHBURN-AD-1","shape":"VM.Standard2.1","region":"us-ashburn-1"},{"instancePoolId":"Default","resourceDisplayName":"myinstance2","faultDomain":"FAULT-DOMAIN-3","resourceId":"ocid1.instance.oc1.iad.exampleuniqueID","imageId":"ocid1.image.oc1.iad.exampleuniqueID","availabilityDomain":"szYB:US-ASHBURN-AD-1","shape":"VM.Standard2.1","region":"us-ashburn-1"}]}],"version":1.1}
Slack

The following example shows an alarm message sent to a Slack subscription. Slack is available when the alarm destination is a topic (Notifications service).

{"dedupeKey":"exampleuniqueID","title":"High CPU Utilization","body":"Follow runbook at http://example.com/runbooks","type":"REPEAT","severity":"CRITICAL","timestampEpochMillis":1542406320000,"timestamp":"2018-11-16T22:12:00Z","alarmMetaData":[{"id":"ocid1.alarm.oc1.iad.exampleuniqueID","status":"FIRING","severity":"CRITICAL","query":"CpuUtilization[1m].mean()>0","totalMetricsFiring":2,"dimensions":[{"instancePoolId":"Default","resourceDisplayName":"myinstance1","faultDomain":"FAULT-DOMAIN-1","resourceId":"ocid1.instance.oc1.iad.exampleuniqueID","imageId":"ocid1.image.oc1.iad.exampleuniqueID","availabilityDomain":"szYB:US-ASHBURN-AD-1","shape":"VM.Standard2.1","region":"us-ashburn-1"},{"instancePoolId":"Default","resourceDisplayName":"myinstance2","faultDomain":"FAULT-DOMAIN-3","resourceId":"ocid1.instance.oc1.iad.exampleuniqueID","imageId":"ocid1.image.oc1.iad.exampleuniqueID","availabilityDomain":"szYB:US-ASHBURN-AD-1","shape":"VM.Standard2.1","region":"us-ashburn-1"}]}],"version":1.1}
SMS

The following example shows an alarm message sent to a SMS subscription. SMS is available when the alarm destination is a topic (Notifications service).

Example of an SMS alarm message.

Text in example SMS alarm message:

119L3T: [CRITICAL] "High CPU Utilization" has transitioned to OK_TO_FIRING at 2021-02-10T05:52:00Z
https://cloud.oracle.com/monitoring/alarms/status
Streaming Destination

The following example shows an alarm message sent when the alarm destination is a stream (Streaming service). In this example, the alarm is titled "High CPU Utilization" and continues to be in the FIRING state.

While the example shows line breaks, messages sent to streams are in raw JSON format (no line breaks).

{
  "destinationOcid": "ocid1.stream.oc1.phx.exampleuniqueID",
  "onsMessageFormat": "OnsOptimized",
  "message": {
    "dedupeKey": "some-dedupe-key",
    "title": "HighCPUUtilization",
    "type": "OK_TO_FIRING",
    "severity": "CRITICAL",
    "timestampEpochMillis": 1612936320000,
    "timestamp": "2021-02-10T05:52:00Z",
    "alarmMetaData": [
      {
        "id": "ocid1.alarm.oc1.phx.exampleuniqueID",
        "status": "FIRING",
        "severity": "CRITICAL",
        "query": "CPUUtil[1m].groupBy(CLUSTER).mean()>80",
        "totalMetricsFiring": 1,
        "dimensions": [
          {
            "CLUSTER": "ecejaIDCS1221x1-es"
          }
        ]
      }
    ],
    "version": 1.1
  },
  "tenantOcid": "ocid1.tenancy.oc1..exampleuniqueID",
  "ingestionTimestamp": 1611189126005,
  "evaluationTimestamp": 1612936320000
}

Monitoring Concepts

The following concepts are essential to working with Monitoring.

aggregated data
The result of applying a statistic and interval to a selection of raw data points for a given metric. For example, you can apply the statistic max and interval 1h (one hour) to the last 24 hours of raw data points for the metric CpuUtilization. Aggregated data is displayed in default metric charts in the Console. You can also build metric queries for specific sets of aggregated data. For instructions, see Viewing Default Metric Charts and Building Metric Queries.
alarm
The alarm query to evaluate and the notification destination to use when the alarm is in the firing state, along with other alarm properties. For instructions on managing alarms, see Managing Alarms.
alarm query
The Monitoring Query Language (MQL) expression to evaluate for the alarm. An alarm query must specify a metric, statistic, interval, and a trigger rule (threshold or absence). The Alarms feature of the Monitoring service interprets results for each returned time series as a Boolean value, where zero represents false and a non-zero value represents true. A true value means that the trigger rule condition has been met. For more information, see Building Metric Queries and the query attribute description in the Alarm API reference.
data point
A timestamp-value pair for the specified metric. Example: 2018-05-10T22:19:00Z, 10.4
A data point is either raw or aggregated. Raw data points are posted by the metric namespace to the Monitoring service using the PostMetricData operation. The frequency of the data points posted varies by metric namespace. For example, your custom namespace might send data points for a given metric at a 20-second frequency.
Aggregated data points are the result of applying a statistic and interval to raw data points. The interval of the aggregated data points is determined by the SummarizeMetricsData request. For example, a request specifying the statistic sum and interval 1h (one hour) returns a sum value for each hour of available raw data points for the given metric.
dimension
A qualifier provided in a metric definition. Example: Resource identifier (resourceId), provided in the definitions of oci_computeagent metrics. Use dimensions to filter or group metric data. Example dimension name-value pair for filtering by availability domain: availabilityDomain = "VeBZ:PHX-AD-1"
frequency
The time period between each posted raw data point for a given metric. (Raw data points are posted by the metric namespace to the Monitoring service.) While frequency varies by metric, default service metrics typically have a frequency of 60 seconds (that is, one data point posted per minute). See also resolution.
interval
The time window used to convert the given set of raw data points.
The timestamp of the aggregated data point corresponds to the end of the time window during which raw data points are assessed. For example, for a five-minute interval, the timestamp "2:05" corresponds to the five-minute time window from 2:00:n to 2:05:00.
This image shows how the timestamp of an aggregated data point corresponds to the interval.
The following example query specifies a 5-minute interval. CpuUtilization[5m].max() For supported values, see Monitoring Query Language (MQL) Reference.
Note

Supported values for interval depend on the specified time range in the metric query (not applicable to alarm queries). More interval values are supported for smaller time ranges. For example, if you select one hour for the time range, then all interval values are supported. If you select 90 days for the time range, then only the 1h or 1d interval values are supported.
To specify an interval value that is not available in Basic Mode in the Console, such as 12 hours, switch to Advanced mode.
See also resolution.
message
The content that the Alarms feature of the Monitoring service publishes to topics in the alarm’s configured notification destinations. A message is sent when the alarm transitions to another state, such as from "OK" to "FIRING".
For more information about alarm messages, see Message Format and Examples.
metadata
A reference provided in a metric definition. Example: unit (bytes), provided in the definition of the oci_computeagent metric DiskBytesRead. Use metadata to determine additional information about a given metric. For metric definitions, see Supported Services.
metric
A measurement related to health, capacity, or performance of a given resource. Example: The oci_computeagent metric CpuUtilization, which measures usage of a compute instance. For metric definitions, see Supported Services.
Note

Metric resources do not have OCIDs .
metric definition
A set of references, qualifiers, and other information provided by a metric namespace for a given metric. For example, the oci_computeagent metric DiskBytesRead is defined by dimensions (such as resource identifier) and metadata (specifying bytes for unit) as well as identification of its metric namespace (oci_computeagent). Each posted set of data points carries this information. Use the ListMetricData API operation to get metric definitions. For metric definitions, see Supported Services.
metric namespace
Indicator of the resource , service, or application that emits the metric. Provided in the metric definition. For example, the CpuUtilization metric definition emitted by the Oracle Cloud Agent software on compute instances  lists the metric namespace oci_computeagent as the source of the CpuUtilization metric. For metric definitions, see Supported Services.
metric stream
An individual set of aggregated data for a metric and zero or more dimension values.
In the Metric streams status page, each metric stream corresponds to a set of dimension key-value pairs.
In metric charts (in the Console), each metric stream is depicted as a line (unless you aggregate all metric streams).
The following image depicts metric streams in a chart. Each line in the chart corresponds to a metric stream.This image depicts metric streams in a chart. Each line in the chart corresponds to a metric stream.
For example, consider a compartment containing three compute instances in the AD-1 availability domain (including two in the ipexample instance pool) and a fourth instance in the AD-2 availability domain. In this example, the CPU Utilization metric chart shows four lines (one per instance). When filtered by the AD-1 availability domain, the chart shows three lines. When further filtered by the ipexample instance pool, the chart shows two lines.
For steps to set up an alarm for notifications per metric stream, see Scenario: Split Messages by Metric Stream.
notification destination
Details for sending messages when the alarm transitions to another state, such as from "OK" to "FIRING". The details and setup might vary by destination service. Available destination services include Notifications and Streaming.
For the Notifications service, specify a topic. (If you're creating the topic for the alarm, also specify one or more subscription protocols (such as PagerDuty).
For the Streaming service, specify a stream.
For examples of alarm messages sent to topics and streams, see Example Alarm Messages.
Oracle Cloud Agent software
Software that allows a compute instance to post raw data points to the Monitoring service. Automatically installed with the latest versions of supported images. See Enabling Monitoring for Compute Instances.
query
The Monitoring Query Language (MQL) expression to evaluate for returning aggregated data. The query must specify a metric, statistic, and interval. For more information, see Building Metric Queries.
resolution
The period between time windows, or the regularity at which time windows shift. For example, use a resolution of 1m to retrieve aggregations every minute.
To specify a non-default resolution that differs from the interval, use the SummarizeMetricsData operation.
Note

For metric queries, the interval  you select drives the default resolution  of the request, which determines the maximum time range of data returned.

For more information about the resolution parameter as used in metric queries, see SummarizeMetricsData.

Maximum time range returned for a query

The maximum time range returned for a metric query depends on the resolution. By default, for metric queries, the resolution is the same as the query interval.

The maximum time range is calculated using the current time, regardless of any specified end time. Following are the maximum time ranges returned for each interval selection available in the Console (Basic mode). To specify an interval value that is not available in Basic Mode in the Console, such as 12 hours, switch to Advanced mode.

Interval Default resolution (metric queries) Maximum time range returned

1 minute (Service Metrics page)

1m (Create Alarm and Metrics Explorer pages)

Auto (Service Metrics page)*, when the selected period of time is 6 hours or less

1 minute 7 days

5 minutes (Service Metrics page)

5m (Create Alarm and Metrics Explorer pages)

Auto (Service Metrics page)*, when the selected period of time is more than 6 hours and less than 36 hours

5 minutes 30 days

1 hour (Service Metrics page)

1h (Create Alarm and Metrics Explorer pages)

Auto (Service Metrics page)*, when the selected period of time is more than 36 hours

1 hour 90 days

1 day (Service Metrics page)

1d (Create Alarm and Metrics Explorer pages)

1 day 90 days

* The maximum time range returned when Auto is selected for Interval (Service Metrics page only) is determined by the automatic interval selection. The automatic interval selection is based on the selected period of time.

To specify a non-default resolution that differs from the interval, use the SummarizeMetricsData operation.

See examples of returned data

Example 1: One-minute interval and resolution up to the current time, sent at 10:00 on January 8th. No resolution or end time is specified, so the resolution defaults to the interval value of 1m, and the end time defaults to the current time (2019-01-08T10:00:00.789Z). This request returns a maximum of 7 days of metric data points. The earliest data point possible within this seven-day period would be 10:00 on January 1st (2019-01-01T10:00:00.789Z).

Example 2: Five-minute interval with one-minute resolution up to two days ago, sent at 10:00 on January 8th. Because the resolution drives the maximum time range, a maximum of 7 days of metric data points is returned. While the end time specified was 10:00 on January 6th (2019-01-06T10:00:00.789Z), the earliest data point possible within this seven-day period would be 10:00 on January 1st (2019-01-01T10:00:00.789Z). Therefore, only 5 days of metric data points can be returned in this example.

For alarm queries, the specified interval  has no effect on the resolution  of the request. The only valid value of the resolution for an alarm query request is 1m. For more information about the resolution parameter as used in alarm queries, see Alarm.

As shown in the following illustration, resolution controls the start time of each aggregation window relative to the previous window while interval controls the length of the windows. Both requests apply the statistic max to the data within each five-minute window (from the interval), resulting in a single aggregated data point representing the highest CPUutilization counter for that window. Only the resolution value differs. This resolution changes the regularity at which the aggregation windows shift, or the start times of successive aggregation windows. Request A does not specify a resolution and thus uses the default value equal to the interval (5 minutes). This request's five-minute aggregation windows are thus taken from the sets of data points emitted from 0:n to 5:00, 5:n to 10:00, and so forth. Request B specifies a 1-minute resolution, so its five-minute aggregation windows are taken from the set of data points emitted every minute from 0:n to 5:00, 1:n to 6:00, and so forth.
This image shows how aggregation windows start according to the resolution.
resource group
A custom string provided with a custom metric that can be used as a filter or to aggregate results. The resource group must exist in the definition of the posted metric. Only one resource group can be applied per metric.
statistic
The aggregation function applied to the given set of raw data points. For supported statistics, see Monitoring Query Language (MQL) Reference.
suppression
A configuration to avoid publishing messages during the specified time range. Useful for suspending alarm notifications during system maintenance. Each suppression applies to a single alarm. In the Console, you can apply one definition of a suppression to multiple alarms. The result is an individual suppression for each alarm. For instructions on suppressing alarms, see To suppress alarms.
trigger rule
The condition that must be met for the alarm to be in the firing state. A trigger rule can be based on a threshold or absence of a metric.

Availability

The Monitoring service is available in all Oracle Cloud Infrastructure commercial regions. See About Regions and Availability Domains for the list of available regions, along with associated locations, region identifiers, region keys, and availability domains.

Supported Services

The following services have resources or components that can emit metrics to Monitoring:

Resource Identifiers

Most types of Oracle Cloud Infrastructure resources have a unique, Oracle-assigned identifier called an Oracle Cloud ID (OCID). For information about the OCID format and other ways to identify your resources, see Resource Identifiers.

Note

Metric resources do not have OCIDs .

Ways to Access Monitoring

You can access the Monitoring service using the Console (a browser-based interface) or the Monitoring REST API. Instructions for the Console and API are included in topics throughout this guide. For a list of available SDKs, see Software Development Kits and Command Line Interface.

Console: To access Monitoring using the Console, you must use a supported browser. To go to the Console sign-in page, open the navigation menu at the top of this page and click Infrastructure Console. You are prompted to enter your cloud tenant, your user name, and your password. Open the navigation menu and click Observability & Management. Under Monitoring, click Service Metrics.

API: To access Monitoring through APIs, use Monitoring API for metrics and alarms and Notifications API for notifications (used with alarms).

CLI: See Command Line Reference for Monitoring and Command Line Reference for Notifications.

Moving Alarms to a Different Compartment

You can move alarms from one compartment to another. When you move an alarm to a new compartment, its associated metrics remain where they are. After you move the alarm to the new compartment, inherent policies apply immediately and affect access to the alarm through the Console. For more information on moving resources to other compartments, see To move a resource to a different compartment.

Important

To move resources between compartments, resource users must have sufficient access permissions on the compartment that the resource is being moved to, as well as the current compartment. For more information about permissions for Monitoring resources, see Details for Monitoring.

Authentication and Authorization

Each service in Oracle Cloud Infrastructure integrates with IAM for authentication and authorization, for all interfaces (the Console, SDK or CLI, and REST API).

An administrator in your organization needs to set up groups , compartments , and policies  that control which users can access which services, which resources, and the type of access. For example, the policies control who can create new users, create and manage the cloud network, launch instances, create buckets, download objects, etc. For more information, see Getting Started with Policies. For specific details about writing policies for each of the different services, see Policy Reference.

If you’re a regular user (not an administrator) who needs to use the Oracle Cloud Infrastructure resources that your company owns, contact your administrator to set up a user ID for you. The administrator can confirm which compartment or compartments you should be using.

For more information about user authorizations for monitoring, see IAM Policies (Monitoring).

Administrators: For common policies that give groups access to metrics, see Let users view metric definitions in a compartment and Restrict user access to a specific metric namespace. For a common alarms policy, see Let users view alarms. To authorize resources, such as instances, to make API calls, add the resources to a dynamic group. Use the dynamic group's matching rules to add the resources, and then create a policy that allows that dynamic group access to metrics. See Let instances make API calls to access monitoring metrics in the tenancy.

Limits on Monitoring

See Monitoring Limits for a list of applicable limits and instructions for requesting a limit increase.

Other limits include the following.

Storage Limits

Item Time range stored
Metric definitions 90 days
Alarm history entries 90 days

Alarm Message Limits

The maximum number of messages allowed per alarm evaluation depends on the alarm destination. Limits are associated with the Oracle Cloud Infrastructure service used for the destination.

Monitoring tracks 200,000 metric streams per alarm for qualifying events. For more information about alarm evaluations, see Alarm Evaluations on this page.

Alarm destination Delivery Maximum alarm messages per evaluation
topic (Notifications) At least once 60
stream (Streaming) At least once 200,000

For example, consider the following evaluations of an alarm that splits notifications among 200 metric streams, using a topic as its destination.

Alarm evaluation (time) Metric stream transition Generated messages Sent messages Dropped messages
00:01:00 110 metric streams transition from OK to FIRING. 110 60 50
00:02:00 90 metric streams transition from OK to FIRING. 90 60 30

When a topic or stream is overused, it can result in delayed alarm notifications. Overuse can occur when multiple resources are using that topic or stream.

Best Practices to Work Within Limits

When you expect a high volume of alarm notifications, follow these best practices to help prevent exceeding alarm message limits and associated delays.

  • Reserve a single topic or stream for use with a high-volume alarm. Don't use one topic or stream for multiple high-volume alarms.
  • If you expect more than 60 messages per minute, specify Streaming as the alarm destination.
  • Streams:
    • Create partitions based on expected load. See Limits on Streaming Resources.
    • If alarm messages exceed the stream space, then update the alarm to use a different stream that has more partitions. For example, if the original stream contains five partitions, create a stream with ten partitions and then update the alarm to use the new stream.
      Note

      To avoid missing messages, continue consuming the original stream until no more messages are received.
  • Increase limits for your tenancy:

Troubleshooting Limits

If you see an error that the query has exceeded the maximum number of metric streams , then update the query to evaluate a number of metric streams that is within the limit. For example, you can reduce the metric streams by specifying dimensions. You can continue to evaluate all metric streams that were in the original query by spreading the metric streams across multiple queries (or alarms).

Security

This topic describes security for Monitoring.

For information about how to secure Monitoring, including security information and recommendations, see Securing Monitoring.