Resource Monitoring
You can monitor the health, capacity, and performance of your Oracle Cloud Infrastructure resources when needed using queries or on a passive basis using alarms . Queries and alarms rely on metrics emitted by your resource to the Monitoring service.
Prerequisites
- IAM policies: To monitor resources, you must be given the required type of access in a policy written by an administrator, whether you're using the Console or the REST API with an SDK, CLI, or other tool. The policy must give you access to the monitoring services as well as the resources being monitored. If you try to perform an action and get a message that you don’t have permission or are unauthorized, confirm with your administrator the type of access you've been granted and which compartment you should work in. For more information on user authorizations for monitoring, see the Authentication and Authorization section for the related service: Monitoring or Notifications.
- Metrics exist in Monitoring: The resources that you want to monitor must emit metrics to the Monitoring service.
- Compute instances: To emit metrics, the Compute Instance Monitoring plugin must be enabled on the instance, and plugins must be running. The instance must also have either a service gateway or a public IP address to send metrics to the Monitoring service. For more information, see Enabling Monitoring for Compute Instances.
Working with Resource Monitoring
Not all resources support monitoring. See Supported Services for the list of resources that support the Monitoring service, which is required for queries and alarms used in monitoring.
The Monitoring service works with the Notifications service to notify you when metrics breach. For more information about these services, see Monitoring and Notifications.
On the page for the resource of interest, under Resources, click Metrics.
For example, to view metric data for a compute instance:
- Open the navigation menu and click Compute. Under Compute, click Instances.
-
Click the instance you're interested in.
-
On the instance details page, under Resources, click Metrics.
A chart is shown for each metric. For a list of metrics related to Compute instances, see Compute Instance Metrics.
The Console displays the last hour of metric data for the selected resource. A chart is shown for each metric emitted by the selected resource.
For a list of metrics emitted by your resource, see Supported Services.
- Open the navigation menu and click Observability & Management. Under Monitoring, click Service Metrics.
-
Choose a Compartment you have permission to work in.
The list of metric namespaces is updated for the selected compartment.
-
Choose the Metric namespace for the resource types of interest in the selected compartment.
For example, choose oci_lbaas to see metrics for load balancers.
Default charts are displayed for all resources in the selected Metric namespace and Compartment. Very small or large values are indicated by International System of Units (SI units), such as M for mega (10 to the sixth power).
- Try a different time range.
-
Make sure the correct Compartment is selected.
Metric namespaces are shown only when associated resources exist in the selected compartment. For example, the
oci_autonomous_database
namespace is shown only when Autonomous Databases exist in the selected compartment. - Confirm that the missing resources are emitting metrics. See Enabling Monitoring for Compute Instances.
-
Review limits information. Limits information for returned data includes the 100,000 data point maximum and time range maximums (determined by resolution, which relates to interval). See MetricData Reference.
-
Open the navigation menu and click Observability & Management. Under Monitoring, click Metrics Explorer.
The Metrics Explorer page displays an empty chart with fields to build a query.
- Fill in the fields for a new query.
- Compartment: The compartment containing the resources that you want to monitor. By default, the first accessible compartment is selected.
- Metric namespace: The service or application emitting metrics for the resources that you want to monitor.
- Resource group (optional): The group that the metric belongs to. A resource group is a custom string provided with a custom metric. Not applicable to service metrics.
- Metric name: The name of the metric. Only one metric can be specified. Metric selections depend on the selected compartment and metric namespace. Example: CpuUtilization
Interval: The aggregation window.
Interval valuesSupported values for interval depend on the specified time range in the metric query (not applicable to alarm queries). More interval values are supported for smaller time ranges. For example, if you select one hour for the time range, then all interval values are supported. If you select 90 days for the time range, then only the 1h or 1d interval values are supported.
- 1m - 1 minute
- 5m - 5 minutes
- 1h - 1 hour
- 1d - 1 day
Note
For metric queries, the interval you select drives the default resolution of the request, which determines the maximum time range of data returned.
For more information about the resolution parameter as used in metric queries, see SummarizeMetricsData.
Maximum time range returned for a queryThe maximum time range returned for a metric query depends on the resolution. By default, for metric queries, the resolution is the same as the query interval.
The maximum time range is calculated using the current time, regardless of any specified end time. Following are the maximum time ranges returned for each interval selection available in the Console (Basic mode). To specify an interval value that is not available in Basic Mode in the Console, such as 12 hours, switch to Advanced mode.
Interval Default resolution (metric queries) Maximum time range returned 1 minute (Service Metrics page)
1m (Create Alarm and Metrics Explorer pages)
Auto (Service Metrics page)*, when the selected period of time is 6 hours or less
1 minute 7 days 5 minutes (Service Metrics page)
5m (Create Alarm and Metrics Explorer pages)
Auto (Service Metrics page)*, when the selected period of time is more than 6 hours and less than 36 hours
5 minutes 30 days 1 hour (Service Metrics page)
1h (Create Alarm and Metrics Explorer pages)
Auto (Service Metrics page)*, when the selected period of time is more than 36 hours
1 hour 90 days 1 day (Service Metrics page)
1d (Create Alarm and Metrics Explorer pages)
1 day 90 days * The maximum time range returned when Auto is selected for Interval (Service Metrics page only) is determined by the automatic interval selection. The automatic interval selection is based on the selected period of time.
To specify a non-default resolution that differs from the interval, use the SummarizeMetricsData operation.
See examples of returned dataExample 1: One-minute interval and resolution up to the current time, sent at 10:00 on January 8th. No resolution or end time is specified, so the resolution defaults to the interval value of
1m
, and the end time defaults to the current time (2019-01-08T10:00:00.789Z
). This request returns a maximum of 7 days of metric data points. The earliest data point possible within this seven-day period would be 10:00 on January 1st (2019-01-01T10:00:00.789Z
).Example 2: Five-minute interval with one-minute resolution up to two days ago, sent at 10:00 on January 8th. Because the resolution drives the maximum time range, a maximum of 7 days of metric data points is returned. While the end time specified was 10:00 on January 6th (
2019-01-06T10:00:00.789Z
), the earliest data point possible within this seven-day period would be 10:00 on January 1st (2019-01-01T10:00:00.789Z
). Therefore, only 5 days of metric data points can be returned in this example.Statistic: The aggregation function.
Statistic values- Count - The number of observations received in the specified time period.
- Max - The highest value observed during the specified time period.
- Mean - The value of Sum divided by Count during the specified time period.
- Min - The lowest value observed during the specified time period.
- P50 - The value of the 50th percentile.
- P90 - The value of the 90th percentile.
- P95 - The value of the 95th percentile.
- P99 - The value of the 99th percentile.
- Rate - The per-interval average rate of change.
- Sum - All values added together.
-
Metric dimensions: Optional filters to narrow the metric data evaluated.
Dimension fieldsNote
Additional dimension fields appear for some metric namespaces. See the service-specific documentation for details. For example, a deployment type field appears for the metric namespaceoci_autonomous_database
; for more information about this field, see To view default metric charts for multiple Autonomous Databases.-
Dimension name: A qualifier specified in the metric definition. For example, the dimension
resourceId
is specified in the metric definition forCpuUtilization
.Note
Long lists of dimensions are trimmed.
- To view dimensions by name, type one or more characters in the box. A refreshed (trimmed) list shows matching dimension names.
- To retrieve all dimensions for a given metric, use the following API operation: ListMetrics
- Dimension value: The value you want to use for the specified dimension. For example, the resource identifier for your instance of interest.
- + Additional dimension: Adds another name-value pair for a dimension.
-
- Aggregate metric streams: Plots a single line on the metric chart to represent the combined value of all metric streams for the selected statistic.
-
Click Update Chart.
The chart shows the results of your new query. Very small or large values are indicated by International System of Units (SI units), such as M for mega (10 to the sixth power). Units correspond to the selected metric and do not change by statistic.
Troubleshooting Errors and Query LimitsIf you see an error that the query has exceeded the maximum number of metric streams , then update the query to evaluate a number of metric streams that is within the limit. For example, you can reduce the metric streams by specifying dimensions. You can continue to evaluate all metric streams that were in the original query by spreading the metric streams across multiple queries (or alarms).
Limits information for returned data includes the 100,000 data point maximum and time range maximums (determined by resolution, which relates to interval). See MetricData Reference. -
To change the view of the query results, click the appropriate option above the results, on the right:
- Show Data Table: Lists data points, indicating time stamp and bytes for each.
- Show Graph (default): Plots data points on a graph.
-
To customize the y-axis label or range, type the label you want into Y-Axis Label or type the minimum and maximum values you want into Y-Axis Min value and Y-Axis Max value.
Only numeric characters are allowed for custom ranges. Custom labels and ranges are not persisted in shared queries (MQL).
-
To view the query as a Monitoring Query Language (MQL) expression, select Advanced mode.
Advanced mode is located on the right, under the chart.
Use Advanced mode to edit your query using MQL syntax to aggregate results by group. The MQL syntax also supports additional parameter values. For more information about query parameters in Basic and Advanced modes, see Monitoring Query Language (MQL) Reference.
-
To create another query, click Add Query below the chart.
- Open the navigation menu and click Observability & Management. Under Monitoring, click Alarm Definitions.
-
Click Create Alarm.
Note
You can also create an alarm from a predefined query on the Service Metrics page. Expand Options and click Create an Alarm on this Query. For more information about service metrics, see Viewing Default Metric Charts. -
On the Create Alarm page, under Define alarm, fill in or update the alarm settings:
Note
To toggle between Basic Mode and Advanced Mode, click Switch to Advanced Mode or Switch to Basic Mode (to the right of Define Alarm).
Basic Mode (default)By default, this page uses Basic Mode, which separates the metric from its dimensions and its trigger rule.
-
Alarm name:
User-friendly name for the new alarm. This name is sent as the title for notifications related to this alarm. Avoid entering confidential information.Rendering of the title by protocolProtocol Rendering of the title Email Subject line of the email message. HTTPS (Custom URL) Not rendered. PagerDuty Title field of the published message. Slack Not rendered. SMS Not rendered. - Alarm severity: The perceived type of response required when the alarm is in the firing state.
- Alarm body: The human-readable content of the notification delivered. Oracle recommends providing guidance to operators for resolving the alarm condition. Consider adding links to standard runbook practices. Example: "High CPU usage alert. Follow runbook instructions for resolution."
- Tags (optional): If you have permissions to create a resource, then you also have permissions to apply free-form tags to that resource. To apply a defined tag, you must have permissions to use the tag namespace. For more information about tagging, see Resource Tags. If you are not sure whether to apply tags, skip this option (you can apply tags later) or ask your administrator.
- Metric description: The metric to evaluate
for the alarm condition.
- Compartment: The compartment containing the resources that emit the metrics evaluated by the alarm. The selected compartment is also the storage location of the alarm. By default, the first accessible compartment is selected.
- Metric namespace: The service or application emitting metrics for the resources that you want to monitor.
- Resource group (optional): The group that the metric belongs to. A resource group is a custom string provided with a custom metric. Not applicable to service metrics.
- Metric name: The name of the metric. Only one metric can be specified. Example: CpuUtilization
-
Interval: The aggregation window, or the frequency at which data points are aggregated.
Interval valuesNote
Valid alarm intervals depend on the frequency at which the metric is emitted. For example, a metric emitted every five minutes requires a 5-minute alarm interval or higher. Most metrics are emitted every minute, which means most metrics support any alarm interval. To determine valid alarm intervals for a given metric, check the relevant service's metric reference.- 1m - 1 minute
- 5m - 5 minutes
- 1h - 1 hour
- 1d - 1 day
Note
For alarm queries, the specified interval has no effect on the resolution of the request. The only valid value of the resolution for an alarm query request is
1m
. For more information about the resolution parameter as used in alarm queries, see Alarm. -
Statistic: The aggregation function.
Statistic values- Count - The number of observations received in the specified time period.
- Max - The highest value observed during the specified time period.
- Mean - The value of Sum divided by Count during the specified time period.
- Min - The lowest value observed during the specified time period.
- P50 - The value of the 50th percentile.
- P90 - The value of the 90th percentile.
- P95 - The value of the 95th percentile.
- P99 - The value of the 99th percentile.
- Rate - The per-interval average rate of change.
- Sum - All values added together.
-
Metric dimensions: Optional filters to narrow the metric data evaluated.
Dimension fieldsNote
Additional dimension fields appear for some metric namespaces. See the service-specific documentation for details. For example, a deployment type field appears for the metric namespaceoci_autonomous_database
; for more information about this field, see To view default metric charts for multiple Autonomous Databases.-
Dimension name: A qualifier specified in the metric definition. For example, the dimension
resourceId
is specified in the metric definition forCpuUtilization
.Note
Long lists of dimensions are trimmed.
- To view dimensions by name, type one or more characters in the box. A refreshed (trimmed) list shows matching dimension names.
- To retrieve all dimensions for a given metric, use the following API operation: ListMetrics
- Dimension value: The value you want to use for the specified dimension. For example, the resource identifier for your instance of interest.
- + Additional dimension: Adds another name-value pair for a dimension.
-
-
Aggregate metric streams: Returns the combined value of all metric streams for the selected statistic.
The Aggregate metric streams option is equivalent to the
grouping()
query component. -
Trigger rule: The condition that must be satisfied for the alarm to be in the firing state. The condition can specify a threshold, such as 90% for CPU Utilization, or an absence.
-
Operator: The operator used in the condition threshold.
Operator values- greater than
- greater than or equal to
- equal to
- less than
- less than or equal to
- between (inclusive of specified values)
- outside (inclusive of specified values)
- absent
- Value: The value to use for the condition threshold.
- Trigger delay minutes: The number of minutes that the condition must be maintained before the alarm is in firing state.
-
Advanced ModeClick Switch to Advanced Mode to view the alarm query as a Monitoring Query Language (MQL) expression. Edit your query using MQL syntax to aggregate results by group or for additional parameter values. See Monitoring Query Language (MQL) Reference.
-
Alarm name:
User-friendly name for the new alarm. This name is sent as the title for notifications related to this alarm. Avoid entering confidential information.Rendering of the title by protocolProtocol Rendering of the title Email Subject line of the email message. HTTPS (Custom URL) Not rendered. PagerDuty Title field of the published message. Slack Not rendered. SMS Not rendered. - Alarm severity: The perceived type of response required when the alarm is in the firing state.
- Alarm body: The human-readable content of the notification delivered. Oracle recommends providing guidance to operators for resolving the alarm condition. Consider adding links to standard runbook practices. Example: "High CPU usage alert. Follow runbook instructions for resolution."
- Tags (optional): If you have permissions to create a resource, then you also have permissions to apply free-form tags to that resource. To apply a defined tag, you must have permissions to use the tag namespace. For more information about tagging, see Resource Tags. If you are not sure whether to apply tags, skip this option (you can apply tags later) or ask your administrator.
-
Metric description, dimensions, and trigger rule: The metric to evaluate for the alarm condition, including dimensions and the trigger rule.
- Compartment: The compartment containing the resources that emit the metrics evaluated by the alarm. The selected compartment is also the storage location of the alarm. By default, the first accessible compartment is selected.
- Metric namespace: The service or application emitting metrics for the resources that you want to monitor.
- Resource group (optional): The group that the metric belongs to. A resource group is a custom string provided with a custom metric. Not applicable to service metrics.
-
Query code editor box: The alarm query as a Monitoring Query Language (MQL) expression.
Note
Valid alarm intervals depend on the frequency at which the metric is emitted. For example, a metric emitted every five minutes requires a 5-minute alarm interval or higher. Most metrics are emitted every minute, which means most metrics support any alarm interval. To determine valid alarm intervals for a given metric, check the relevant service's metric reference.Example alarm query:
CpuUtilization[1m]{availabilityDomain=AD1}.groupBy(poolId).percentile(0.9) > 85
For query syntax and examples, see Working with Metric Queries.
- Trigger delay minutes: The number of minutes that the condition must be maintained before the alarm is in firing state.
The chart below the Define alarm section dynamically displays the last six hours of emitted metrics according to currently selected fields for the query. Very small or large values are indicated by International System of Units (SI units), such as M for mega (10 to the sixth power).
-
-
To change the view of the query results, click the appropriate option above the results, on the right:
- Show Data Table: Lists data points, indicating time stamp and bytes for each.
- Show Graph (default): Plots data points on a graph.
-
Set up notifications: Under Notifications, fill in the fields.
-
Destinations
-
Destination service: The provider of the destination to use for notifications.
Available options:
- Compartment: The compartment storing the topic to be used for notifications. Can be a different compartment from the alarm and metric. By default, the first accessible compartment is selected.
- Topic: The topic to use for notifications. Each topic supports a subscription protocol, such as PagerDuty.
-
Create a topic: Sets up a topic and subscription protocol in the selected compartment, using the specified destination service.
- Topic name: User-friendly name for the new topic. Example: "Operations Team " for a topic used to notify operations staff of firing alarms. Avoid entering confidential information.
- Topic description: Description of the new topic.
-
Subscription protocol: Medium of communication to use for the new topic. Configure your subscription for the protocol you want:
Email subscriptionSends an email message when you publish a message to the subscription's parent topic .
Note
Follow best practices for integrating with Email Delivery. See Maintain a Positive Email Sender Reputation and Set Up Custom Domains for Email.Message contents and appearance vary by message type. See alarm messages, event messages, and service connector messages.Some message types allow friendly formatting.
- Subscription protocol: Select Email.
- Subscription Email: Type an email address.
Function subscriptionRuns the specified function when you publish a message to the subscription's parent topic . For example, runs a function to resize VMs when an associated alarm is triggered.Note
You must have
FN_INVOCATION
permission against the function to be able to add the function as a subscription to a topic.The Notifications service has no information about a function after it's invoked. For more details, see the troubleshooting information at Function not invoked or run.
Confirmation is not required for function subscriptions.
- Subscription protocol: Select Function.
- Function Compartment: Select the compartment containing your function.
- Function Application: Select the application containing your function.
- Function: Select your function.
HTTPS (Custom URL) subscriptionNote
The client service must be able to support theHTTP/1.1 401 Unauthorized header
response. For more information, see HTTPS (Custom URL) subscription.Sends specified information when you publish a message to the subscription's parent topic .
Endpoint format (URL using HTTPS protocol):
https://<anyvalidURL>
Authentication: Only Basic Access Authentication is supported. For more information, see RFC-2617: HTTP Authentication: Basic and Digest Access Authentication. You can specify a username and password in the URL, as in
https://user:password@domain.com
orhttps://user@domain.com
. In the URL, encode (escape) the characters noted at RFC-3986: Uniform Resource Identifier (URI): Generic Syntax.Certificates: Only valid certificate authority (CA) certificates are trusted. No self-signed certificates are allowed.
Encryption: As with any subscription protocol, data in the endpoint (including username and password if supplied in the URL) is encrypted in transit over the SSL connection established when using HTTPS, and at rest in the service database.
POST calls: The endpoint that you provide must accept POST calls. The Notifications service uses POST calls to send messages to HTTPS (custom URL) endpoints.
Not supported: Query parameters are not allowed in URLs. Custom HTTP header parameters are not supported. When sending a message to the URL endpoint, the Notifications service adds standard metadata to the HTTP request in the header.
- Subscription protocol: Select HTTPS (Custom URL).
- Subscription URL: Type (or copy and paste) the URL you want to use as the endpoint.
PagerDuty subscriptionCreates a PagerDuty incident by default when you publish a message to the subscription's parent topic .Endpoint format (URL):
https://events.pagerduty.com/integration/<integrationkey>/enqueue
Query parameters are not allowed in URLs.To create an endpoint for a PagerDuty subscription (set up and retrieve an integration key), see To create a PagerDuty endpoint
- Subscription protocol: Select PagerDuty.
- Subscription URL: Type (or copy and paste) the integration key portion of the URL for your PagerDuty subscription. (The other portions of the URL are hard-coded.)
Slack subscriptionSends a message to the specified Slack channel by default when you publish a message to the subscription's parent topic .Sends a message to the specified Slack channel by default when you publish a message to the subscription's parent topic .Message contents and appearance vary by message type. See alarm messages, event messages, and service connector messages.Endpoint format (URL):
https://hooks.slack.com/services/<webhook-token>
The <webhook-token> portion of the URL contains two slashes (/).
Query parameters are not allowed in URLs.To create an endpoint for a Slack subscription (using a webhook for your Slack channel), see the Slack documentation.
- Subscription protocol: Select Slack.
- Subscription URL: Type (or copy and paste) the Slack endpoint, including your webhook token.
SMS subscriptionSends a text message using Short Message Service (SMS) to the specified phone number when you publish a message to the subscription's parent topic . Supported endpoint formats: E.164 format.Note
International SMS capabilities are required if SMS messages come from a phone number in another country. We continuously add support for more countries so that more users can receive SMS messages from local phone numbers.
SMS subscriptions are enabled only for messages sent by the following Oracle Cloud Infrastructure services: Monitoring, Service Connector Hub. SMS messages sent by unsupported services are dropped. Troubleshoot dropped messages.
The Notifications service delivers SMS messages from a preconfigured pool of numbers. You might receive SMS messages from multiple numbers.
Message contents and appearance vary by message type. See alarm messages, event messages, and service connector messages.Available Countries and RegionsYou can use Notifications to send SMS messages to the following countries and regions:
Country or region ISO code Australia AU Brazil BR Canada CA Chile CL China CN Costa Rica CR Croatia HR Czechia CZ France FR Germany DE Hungary HU India IN Ireland IE Israel IL Japan JP Lithuania LT Mexico MX Netherlands NL New Zealand NZ Norway NO Philippines PH Poland PL Portugal PT Romania RO Saudi Arabia SA Singapore SG South Africa ZA South Korea KR Spain ES Sweden SE Switzerland CH Ukraine UA United Arab Emirates AE United Kingdom GB United States US - Subscription protocol: Select SMS.
- Country: Select the country for the phone number.
- Phone Number: Enter the phone number, using E.164 format. Example: +14255550100
- Message
Format: Determines the appearance of the
messages you receive from this alarm:
- Send formatted messages: Simplified, user-friendly
layout. Note
To view supported subscription protocols and message types for formatted messages (options other than Raw), see Friendly Formatting. - Send Pretty JSON messages (raw text with line breaks): JSON with new lines and indents.
- Send raw messages: Raw JSON blob.
- Send formatted messages: Simplified, user-friendly
layout.
-
+ Additional destination service: Adds another destination service and topic to use for notifications.
Note
Each alarm is limited to one destination per supported destination service.
-
- Repeat notification?: While the alarm is in the firing state, resends notifications at the specified interval.
- Notification frequency: The period of time to wait before resending the notification.
-
Suppress notifications: Sets up a suppression time window during which to suspend evaluations and notifications. Useful for avoiding alarm notifications during system maintenance periods.
- Suppression description
- Start time
- End time
-
- If you want to disable the new alarm, clear Enable this alarm?.
-
Click Save alarm.
The new alarm is listed on the Alarm Definitions page.
Using the API
For information about using the API and signing requests, see REST APIs and Security Credentials. For information about SDKs, see Software Development Kits and Command Line Interface.
To create a query, use the SummarizeMetricsData operation.
To create an alarm, use the CreateAlarm operation.