Monitoring Oracle NoSQL Database Cloud Service

The Oracle Cloud Infrastructure Monitoring service enables you to actively and passively monitor your cloud resources using the Metrics and Alarms features. The Monitoring service uses metrics to monitor resources and alarms to notify you when these metrics meet alarm-specified triggers.

A metric is a measurement related to the health, capacity, or performance of a given resource. An alarm is a trigger rule and query. Alarms passively monitor your cloud resources by using metrics. You can configure notification settings when creating an alarm.

Metrics are emitted to the Monitoring service as raw data points (a timestamp-value pair for a specified metric)along with dimensions (a resource identifier provided in the metric definition)and metadata. The Monitoring service publishes alarm messages to configured destinations managed by the Notifications service.

When you query a metric, the Monitoring service returns aggregated data according to the specified parameters. You can specify a range (such as the last 24 hours), statistic, and interval. A statistic is the aggregation function applied to the raw data points. SUM aggregation function is an example of a statistic. An interval is the time window used to convert a given set of raw data points. For example, 5 minutes.

The Console displays one monitoring chart per metric for selected resources. The aggregated data in each chart reflects your selected statistic and interval. API requests can optionally filter by dimension and specify a resolution. API responses include the metric name along with its source compartment and metric namespace(indicates the resource, service, or application that emits a metric). The namespace is provided in the metric definition. For example, the CpuUtilization metric definition emitted by Oracle Cloud lists the oci_computeagent metric namespace as the source of the metric.

Metric and alarm data is accessible via the Console, CLI, and API. For more information about OCI monitoring service concepts, see Monitoring Concepts.

This article has the following topics:

Oracle NoSQL Database Cloud Service Metrics

Oracle NoSQL Database Cloud Service emits metrics using the metric namespace oci_nosql.

Metrics for Oracle NoSQL Database Cloud Service include the following dimensions:
  • RESOURCEID
    The OCID of the NoSQL Table in the Oracle NoSQL Database Cloud Service.

    Note:

    OCID is an Oracle-assigned unique ID that is included as part of the resource's information in both the console and API.
  • TABLENAME

    The name of the NoSQL table in the Oracle NoSQL Database Cloud Service.

  • REPLICA

    The name of the region that receives the table update from another region.

Oracle NoSQL Database Cloud Service sends metrics to the Oracle Cloud Infrastructure Monitoring Service. You can view or create alarms on these metrics using the Oracle Cloud Infrastructure Console SDKs or CLI.

Table - Oracle NoSQL Database Cloud Service Metrics

Metric Metric Display Name Unit Description Dimensions
ReadUnits Read Units Units The number of read units consumed during this period. resourceId tableName
WriteUnits Write Units Units The number of write units consumed during this period. resourceId tableName
StorageGB Storage Size GB The maximum amount of storage consumed by the table. As this information is generated hourly, you may see values that are out of date in between the refresh points. resourceId tableName
ReadThrottleCount Read Throttle Count The number of read throttling exceptions on this table in the time period. resourceId tableName
WriteThrottleCount Write Throttle Count The number of write throttling exceptions on this table in the time period. resourceId tableName
StorageThrottleCount Storage Throttle Count The number of storage throttling exceptions on this table in the time period. resourceId tableName
MaxShardSizeUsagePercent Maximum Shard Size Usage Percentage The ratio of the space used in the shard over the total space allocated to the shard. This is specific to a table and will be the highest value across all shards. resourceId tableName
Replica Lag Replica Lag Millisecond A time lag in replicating the data changes of a Global Active table from a sender region to a receiver region.
resourceId
tableName
replica

Additionally, you can publish custom metrics as per your requirement. For example, you can set up metrics to capture application transaction latency (time spent per completed transaction) and then post that data to the Monitoring service.

NDCS Metrics Explained

Oracle NoSQL Database Cloud Service sends metrics to the Oracle Cloud Infrastructure Monitoring Service.

Read Units:

The number of read units consumed during this period. It is the throughput for up to 1 KB of data per second for an eventually consistent read operation. If your data is greater than 1 KB it will require multiple read units to read it. The Read Unit metric chart for a table is shown below. The metric is taken every minute and the metric charts are plotted for an interval of 5 minutes by default.


Write Units:

The number of write units consumed during this period. It is the throughput for up to 1 KB of data per second for a write operation. Write operations are triggered during insert, update, and delete operations. If your data is greater than 1 KB it will require multiple read units to write it. The Write Unit metric chart for a table is shown below. The metric is taken every minute and the metric charts are plotted for an interval of 5 minutes by default.


StorageGB:

The maximum amount of storage consumed by the table. The Storage metric chart for a table is shown below. The metric is taken every minute and the metric charts are plotted for an interval of 5 minutes by default.

Note:

It takes one hour after table creation to seed the beginning of storage size tracking. After the initial hour, storage statistics are updated every 5 minutes.


Note:

The storage GB metric is truncated. Therefore storage usage of less than 1 GB will be displayed as 0. The chart will begin to display storage when usage is greater than 1 GB.

ReadThrottleCount:

This gives a count of the number of read throttling exceptions on the given table in the time period. A throttling exception usually indicates that the provisioned read throughput has been exceeded. If you get these frequently, then you should consider increasing the Read Units on your table. The Read throttle count metric chart for a table is shown below. The metric is taken every minute and the metric charts are plotted for an interval of 5 minutes by default.


WriteThrottleCount:

This gives a count of the number of write throttling exceptions on the given table in the time period. A throttling exception usually indicates that the provisioned write throughput has been exceeded. If you get these frequently, then you should consider increasing the Write Units on your table. The Write throttle count metric chart for a table is shown below. The metric is taken every minute and the metric charts are plotted for an interval of 5 minutes by default.


StorageThrottleCount:

This gives a count of the number of storage throttling exceptions on the given table in the time period. A throttling exception usually indicates that the provisioned storage capacity has been exceeded. If you get these frequently, then you should consider increasing the storage capacity of your table. The Storage throttle count metric chart for a table is shown below. The metric is taken every minute and the metric charts are plotted for an interval of 5 minutes by default.


MaxShardSizeUsagePercent

The highest usage of space in a shard for a specific table, as a percentage of space used in that shard.

Note:

Oracle NoSQL Database Cloud Service hashes keys to shards to provide distribution over a collection of storage nodes that provide storage for the tables. Although not directly visible to you, Oracle NoSQL Database Cloud Service tables are sharded and replicated for availability and performance. A shard key either 100% matches the primary key or is a subset of the primary key. All records sharing a shard key are co-located to achieve data locality.

When maxShardSizeUsagepercent reaches 100, you can no longer do a write operation in the table. You have to increase the storage capacity to perform a write into the table. This metric helps to determine if a storage hotspot exists for your NoSQL table.

This scenario happens because of an imbalance in how the table data is stored across shards. An imbalance can occur when a majority of the table data is stored in a subset of the shards. The storage in a NoSQL database is sharded, and the shard key is part of the table definition. In hierarchical tables, the parent and child tables share the same shard key. If you have a parent table with child tables, all the records share the same shard key. So all of these data will be stored together. If a parent table has fewer children, it occupies less storage space in a single shard. Due to this imbalance, certain shards can contain much more data than other shards.

At a certain point, one shard will have the highest usage of space for a specific table and the percentage used in that shard is the MaxShardSizeUsagePercent. The maxShardSizeUsagepercent metric chart for a table is shown below. The metric is taken every minute and the metric charts are plotted for an interval of 5 minutes by default.


In addition to viewing the chart for a metric, you have the following options.


You can get the table view to check the value of a metric at a given point in time.


Monitoring the MaxShardSizeUsagePercent metric

You have to periodically monitor this chart to know if the maxShardSizeUsagepercent is reached or not. Proactively you can create an alarm for this metric.


That is you should trigger an alarm when the metric reaches a particular value, say for example 90 percent.


OCI alarm uses OCI notification service to send notifications. Usually, the alarm will be configured to send notifications through configured email. When maxShardSizeUsagepercent reaches 90 percent, an email notification is sent.


See Managing Alarms and Notifications for more details.

When there is an imbalance in the way your table data is distributed across shards, you will not be able to utilize the storage capacity allocated to your table to its maximum. In this scenario, maxShardSizeUsagepercent reaches the value of 100 even without utilizing the entire storage allocated to the table. You are now required to add more storage to continue writing on your table. This scenario can be avoided by following some guidelines while designing your table.
  • Decide on the correct shard key for your table. The attributes with high cardinality are a good choice for shard keys.
  • Limit the number of child tables to avoid a potential shard storage imbalance situation.

Replica Lag

A time lag in replicating the data changes (INSERT/UPDATE or DELETE) of a Global Active table from a sender region to a receiver region. The write operation that happened at the sender region of a Global Active table gets reflected in the receiver region after a time lag. The information on the time lag is expressed as a metric called Replica Lag. Replica lag is a measure of how current the table data in the receiver replication region is, relative to the data in the sender region's table. The replica lag indicates that the table in the receiver region has not yet received updates from the sender region that happened during the lag period. If there has been no application writes for the table at the sender region, the service uses the ping mechanisms to calculate an approximation of the lag, and the lag statistic will still be available in the receiver region.

Obtain info on Replica lag:

In the receiver region, click on your Global Active table and view the table information. Under Resources, click Metrics. You see a metric, Replica lag which displays the replication lag in milliseconds. In the example chart below, you see that the Replica Lag metric is taken in the Canada Southeast (Toronto) region which is the receiver region. This Global Active table has two regional table replicas one each at Canada Southeast(Montreal) and US East (Ashburn) regions. You see that the chart has two lines one each for these regional table replicas in Montreal and Ashburn.

In the chart below, the Interval indicates the time window used to plot the chart. Various interval options available are 1 minute, 5 minutes, 1 hour, and 1 day. By default, the replica lag is monitored every 1 minute, and the chart is plotted every 5 minutes. You can select different statistics for the Replica Lag metric.

Example 1: Replica lag with Canada Southeast (Toronto) as the receiver region and Canada Southeast(Montreal) and US East (Ashburn) as sender regions.

The below chart is plotted for the Mean statistic for a 5-minute interval.


In this example, Montreal and Ashburn are two sender regions and Toronto is the receiver region where the metric is captured. Consider the value of Replica Lag at 12:25 UTC for Montreal. It is 2020 milliseconds. This means the receiver region Canada Southeast (Toronto) has not received updates that happened in the sender region Canada Southeast (Montreal) in the last 2020 milliseconds. Similarly consider the value of replica lag at 12:25 UTC for Ashburn. It is 2954 milliseconds. This means the receiver region Canada Southeast (Toronto) has not received updates that happened in the sender region US East (Ashburn) in the last 2954 milliseconds.

Example 2: Replica lag with US East (Ashburn) as the receiver region and Canada Southeast(Montreal) and Canada Southeast Toronto as sender regions.

In this example, Montreal and Toronto are two sender regions and Ashburn is the receiver region where the metric is captured.


Example 3: Replica lag with Canada Southeast(Montreal) as the receiver region and US East (Ashburn)and Canada Southeast Toronto as sender regions.

In this example, Ashburn and Toronto are two sender regions and Montreal is the receiver region where the metric is captured.


In addition to viewing the chart for the Replica lag, you have the following options.


You can get the table view to check the value of Replica lag at a given point in time.


Viewing or Listing Oracle NoSQL Database Cloud Service Metrics

You can view the metrics available for the Oracle NoSQL Database Cloud Service from Console. Additionally, you can get the list of metrics available for the Oracle NoSQL Database Cloud Service using OCI CLI commands.

  1. Open the navigation menu and click Observability & Management. Under Monitoring, click Service Metrics.
  2. Select the Compartment and Metric namespace (oci_nosql).

From the Cloud Shell, run the following command. It returns metric definitions that match the criteria specified in the request. Compartment OCID required. For more information about the OPTIONS available with the list command, see List Metrics.

oci monitoring metric list --compartment-id <Compartment_OCID> --namespace oci_nosql

For example:
oci monitoring metric list --compartment-id ocid1.compartment.oc1..aaaaaaaawrmvqjzoegxbsixp5k3b5554vlv2kxukobw3drjho3f7nf5ca3ya --namespace oci_nosql
Example response:
{
  "data": [
    {
      "compartment-id": "ocid1.compartment.oc1..aaaaaaaawrmvqjzoegxbsixp5k3b5554vlv2kxukobw3drjho3f7nf5ca3ya",
      "dimensions": {
        "resourceId": "ocid1_nosqltable_oc1_phx_amaaaaaau7x7rfyasvdkoclhgryulgzox3nvlxb2bqtlxxsrvrc4zxr6lo4a",
        "tableName": "demo"
      },
      "name": "ReadThrottleCount",
      "namespace": "oci_nosql",
      "resource-group": null
    },
    {
      "compartment-id": "ocid1.compartment.oc1..aaaaaaaawrmvqjzoegxbsixp5k3b5554vlv2kxukobw3drjho3f7nf5ca3ya",
      "dimensions": {
        "resourceId": "ocid1_nosqltable_oc1_phx_amaaaaaau7x7rfyasvdkoclhgryulgzox3nvlxb2bqtlxxsrvrc4zxr6lo4a",
        "tableName": "demo"
      },
      "name": "ReadUnits",
      "namespace": "oci_nosql",
      "resource-group": null
    },
    {
      "compartment-id": "ocid1.compartment.oc1..aaaaaaaawrmvqjzoegxbsixp5k3b5554vlv2kxukobw3drjho3f7nf5ca3ya",
      "dimensions": {
        "resourceId": "ocid1_nosqltable_oc1_phx_amaaaaaau7x7rfyasvdkoclhgryulgzox3nvlxb2bqtlxxsrvrc4zxr6lo4a",
        "tableName": "demo"
      },
      "name": "StorageGB",
      "namespace": "oci_nosql",
      "resource-group": null
    },
    {
      "compartment-id": "ocid1.compartment.oc1..aaaaaaaawrmvqjzoegxbsixp5k3b5554vlv2kxukobw3drjho3f7nf5ca3ya",
      "dimensions": {
        "resourceId": "ocid1_nosqltable_oc1_phx_amaaaaaau7x7rfyasvdkoclhgryulgzox3nvlxb2bqtlxxsrvrc4zxr6lo4a",
        "tableName": "demo"
      },
      "name": "StorageThrottleCount",
      "namespace": "oci_nosql",
      "resource-group": null
    },
    {
      "compartment-id": "ocid1.compartment.oc1..aaaaaaaawrmvqjzoegxbsixp5k3b5554vlv2kxukobw3drjho3f7nf5ca3ya",
      "dimensions": {
        "resourceId": "ocid1_nosqltable_oc1_phx_amaaaaaau7x7rfyasvdkoclhgryulgzox3nvlxb2bqtlxxsrvrc4zxr6lo4a",
        "tableName": "demo"
      },
      "name": "WriteThrottleCount",
      "namespace": "oci_nosql",
      "resource-group": null
    },
    {
      "compartment-id": "ocid1.compartment.oc1..aaaaaaaawrmvqjzoegxbsixp5k3b5554vlv2kxukobw3drjho3f7nf5ca3ya",
      "dimensions": {
        "resourceId": "ocid1_nosqltable_oc1_phx_amaaaaaau7x7rfyasvdkoclhgryulgzox3nvlxb2bqtlxxsrvrc4zxr6lo4a",
        "tableName": "demo"
      },
      "name": "WriteUnits",
      "namespace": "oci_nosql",
      "resource-group": null
    }
  ]
}

How to Collect Oracle NoSQL Database Cloud Service Metrics?

You can build metric queries for collecting specific sets of metrics (aggregated data). A metric query contains the Monitoring Query Language (MQL) expression to evaluate for returning aggregated data. The query must specify a metric, statistic, and interval.

You can use metric queries to actively and passively monitor your cloud resources. Actively monitor with metric queries that you generate spontaneously, on-demand. In the Console, update a chart to show data from multiple queries. Store queries you want to reuse. Passively monitor with alarms that add a condition, or trigger rule, to a metric query.

Metric query syntax (boldface elements are required):
metric[interval] {dimensionname=dimensionvalue}.groupingfunction.statistic
Threshold Alarm query syntax (boldface elements are required):
metric[interval]{dimensionname=dimensionvalue}.groupingfunction.statistic alarmoperator alarmvalue

For supported parameter values, see Monitoring Query Language (MQL) Reference.

Example Queries

Simple metric query

Sum of Storage Throttle counts for all the tables in a compartment at a one-minute interval.

The number of lines displayed in the metric chart (Console): 1 per table.

StorageThrottleCount[1m].sum()
Filtered metric query

Sum of Storage Throttle counts in a compartment at a one-minute interval, filtered to a single table.

The number of lines displayed in the metric chart (Console): 1.

StorageThrottleCount[1m]{tableName = "demoKeyVal"}.sum()
Aggregated metric query

Aggregated average of read operation at a sixty-minute interval, filtered to a compartment, aggregated for the average.

The number of lines displayed in the metric chart (Console): 1 per table.

ReadUnits[60m]{compartmentId="ocid1.compartment.oc1.phx..exampleuniqueID"}.grouping().mean()
Group-aggregated metric query

Aggregated average of Read Throttle Count by read unit at a sixty-minute interval, filtered to a single table in a compartment.

The number of lines displayed in the metric chart (Console): 1 per read unit.

ReadThrottleCount[60m]{tableName = "demoKeyVal"}.groupBy(ReadUnits).mean()

Creating a Metric Query

There are two ways for creating a metric query. You can either create a query using Console or OCI CLI command.

  1. Open the navigation menu and click Observability & Management. Under Monitoring, click Metrics Explorer.

    The Metrics Explorer page displays an empty chart with fields to build a query.

  2. Fill in the fields for a new query.
    • Compartment: The compartment containing the Oracle NoSQL Database Cloud Service tables that you want to monitor. By default, the first accessible compartment is selected.
    • Metric namespace: The Oracle NoSQL Database Cloud Service emitting metrics for the tables that you want to monitor. Example: oci_nosql.
    • Resource group (optional): The group that the metric belongs to. A resource group is a custom string provided with a custom metric. Not applicable to service metrics.
    • Metric name: The name of the metric. Only one metric can be specified. Metric selections depend on the selected compartment and metric namespace. Example: ReadUnits
    • Interval: The aggregation window.
    • Statistic: The aggregation function.
    • Metric dimensions: Optional filters to narrow the metric data evaluated.
      • Dimension fields: For Oracle NoSQL Database Cloud Service metrics, you can select either resourceId or tableName as Dimension name and Dimension value pair.
    • Aggregate metric streams: Plots a single line on the metric chart to represent the combined value of all metric streams for the selected statistic.
  3. Click Update Chart.

    The chart shows the results of your new query. Very small or large values are indicated by the International System of Units (SI units), such as M for mega (10 to the sixth power). Units correspond to the selected metric and do not change by the statistic.

  4. To view the query as a Monitoring Query Language (MQL) expression, select Advanced mode.

From the Cloud Shell, run the following command. It returns aggregated data that match the criteria specified in the request. Compartment OCID required.

oci monitoring metric-data summarize-metrics-data --compartment-id<Compartment_OCID> --namespace oci_nosql --query-text [text]

--query-text is the Monitoring Query Language (MQL) expression to use when searching for metric data points to aggregate. The query must specify a metric, statistic, and interval. Supported values for interval: 1m-60m (also 1h). You can optionally specify dimensions and grouping functions. Supported grouping functions: grouping(), groupBy(). For more information about the OPTIONS available with the summarize-metrics-data command, see Summarize Metrics Data. In the example below, we are creating a filtered metric query to get the Sum of Read Units in a compartment at a one-minute interval, filtered to a single table.

For example:
oci monitoring metric-data summarize-metrics-data --compartment-id ocid1.compartment.oc1..aaaaaaaawrmvqjzoegxbsixp5k3b5554vlv2kxukobw3drjho3f7nf5ca3ya 
--namespace oci_nosql --query-text 'ReadUnits[1m]{tableName="articles"}.sum()'
Example response:
{
  "data": [
    {
      "aggregated-datapoints": [
        {
          "timestamp": "2022-02-17T11:03:00+00:00",
          "value": 0.0
        },
        {
          "timestamp": "2022-02-17T11:04:00+00:00",
          "value": 0.0
        },
        {
          "timestamp": "2022-02-17T11:05:00+00:00",
          "value": 0.0
        },

        ...
        ...
        ...

        {
          "timestamp": "2022-02-17T13:59:00+00:00",
         "value": 0.0
        },
        {
          "timestamp": "2022-02-17T14:00:00+00:00",
          "value": 0.0
        },
        {
          "timestamp": "2022-02-17T14:01:00+00:00",
          "value": 0.0
        }
      ],
      "compartment-id": "ocid1.compartment.oc1..aaaaaaaawrmvqjzoegxbsixp5k3b5554vlv2kxukobw3drjho3f7nf5ca3ya",
      "dimensions": {
        "resourceId": "ocid1_nosqltable_oc1_phx_amaaaaaau7x7rfyav7f67yuj3t2q6rk7lp2a2obfdxa6hg2ho2ea7qabin4q",
        "tableName": "demo"
      },
      "metadata": {},
      "name": "ReadUnits",
      "namespace": "oci_nosql",
      "resolution": null,
      "resource-group": null
    }
  ]
}

Creating Alarms

You can create an alarm that evaluates the alarm query and sends a notification when the alarm is in the firing state, along with other alarm properties. When triggered, an alarm sends an alarm message to the configured topic (in Notifications), which then sends the message on to all of the topic's subscriptions. Slack, Email, SMS, and PagerDuty are some of the examples of Configured Topic in Notifications.

When configured, repeat notifications remind you of a continued firing state at the configured repeat interval. You are also notified when an alarm transitions back to the OK state, or when an alarm is reset.

An alarm query contains the Monitoring Query Language (MQL) expression to evaluate for returning aggregated data. The query must specify a metric, statistic, and interval.

There are two ways for creating an alarm. You can either create a query using the Console or OCI CLI.

  1. Open the navigation menu and click Observability & Management. Under Monitoring, click Alarm Definitions.
  2. Click Create Alarm.

    Note:

    You can also create an alarm from a predefined query on the Service Metrics page. Expand Options and click Create an Alarm on this Query. For more information about service metrics, see Viewing or Listing Oracle NoSQL Database Cloud Service Metrics.
  3. On the Create Alarm page, under Define alarm, fill in or update the alarm settings:

    Note:

    To toggle between Basic Mode and Advanced Mode, click Switch to Advanced Mode or Switch to Basic Mode (to the right of Define Alarm).
    • Alarm name: The user-friendly name for the new alarm. This name is sent as the title for notifications related to this alarm. Avoid entering confidential information.
    • Alarm severity: The perceived type of response required when the alarm is in the firing state.
    • Alarm body: The human-readable content of the notification delivered. Oracle recommends providing guidance to operators for resolving the alarm condition. Example: "High Read Throttle Count".
    • Tags (optional): If you have permission to create a resource, then you also have permission to apply free-form tags to that resource. To apply a defined tag, you must have permission to use the tag namespace. For more information about tagging, see Resource Tags. If you are not sure whether to apply tags, skip this option (you can apply tags later) or ask your administrator.
    • Metric description: The metric to evaluate for the alarm condition.
      • Compartment: The compartment containing the Oracle NoSQL Database Cloud Service tables that you want to monitor. By default, the first accessible compartment is selected.
      • Metric namespace: The Oracle NoSQL Database Cloud Service emitting metrics for the tables that you want to monitor. Example: oci_nosql.
      • Resource group (optional): The group that the metric belongs to. A resource group is a custom string provided with a custom metric. Not applicable to service metrics.
      • Metric name: The name of the metric. Only one metric can be specified. Metric selections depend on the selected compartment and metric namespace. Example: ReadUnits
      • Interval: The aggregation window.
      • Statistic: The aggregation function.
    • Metric dimensions: Optional filters to narrow the metric data evaluated.
      • Dimension fields: For Oracle NoSQL Database Cloud Service metrics, you can select either resourceId or tableName as Dimension name and Dimension value pair.
    • Aggregate metric streams: Plots a single line on the metric chart to represent the combined value of all metric streams for the selected statistic.
    • Trigger rule: The condition that must be satisfied for the alarm to be in the firing state. The condition can specify a threshold, such as 90% for StorageGB.
      • Operator: The operator used in the condition threshold.
      • Value: The value to use for the condition threshold.
      • Trigger delay minutes: The number of minutes that the condition must be maintained before the alarm is in a firing state.
  4. To change the view of the query results, click the appropriate option above the results, on the right:
    • Show Data Table: Lists data points, indicating time stamp and bytes for each.
    • Show Graph (default): Plots data points on a graph.
  5. Set up notifications: Under Notifications, fill in the fields.
    • Destinations: The topic to be used for notifications.
    • Repeat notification?: While the alarm is in the firing state, resends notifications at the specified interval.
    • Notification frequency: The period of time to wait before resending the notification.
    • Suppress notifications: Set up a suppression time window during which to suspend evaluations and notifications. Useful for avoiding alarm notifications during system maintenance periods.
  6. If you want to disable the new alarm, clear Enable this alarm?
  7. Click Save alarm.

From the Cloud Shell, run the following command to create a new alarm in the specified compartment. Compartment OCID required.

oci monitoring alarm create --compartment-id <Compartment_OCID> --namespace oci_nosql --query-text [text] --destinations [complex type] --display-name [text] --is-enabled [boolean] --metric-compartment-id [text] --severity [text]

--query-text is the Monitoring Query Language (MQL) expression to use when searching for metric data points to aggregate. The query must specify a metric, statistic, and interval. Supported values for interval: 1m-60m (also 1h). You can optionally specify dimensions and grouping functions. Supported grouping functions: grouping(), groupBy(). For more information about the OPTIONS available with the create alarm command, see create - alarm. In the example below, we are creating an alarm with alarm query when 90the percentile of StorageGB is greater than 85 in a compartment at a one-minute interval, filtered to a single table.

Example of threshold alarm:
oci monitoring alarm create --compartment-id ocid1.compartment.oc1..aaaaaaaawrmvqjzoegxbsixp5k3b5554vlv2kxukobw3drjho3f7nf5ca3ya 
--namespace oci_nosql --query-text 'StorageGB[1m]{tableName="demo"}.groupBy(WriteUnits).percentile(0.9) > 85' 
--display-name HighStorageConsumption --metric-compartment-id demonosql --severity Critical --is-enabled true

Managing Alarms

You can follow these guidelines on how to manage your alarms.

  • Create a Set of Alarms for Each Metric.
    For each metric emitted by Oracle NoSQL Database Cloud Service table, create alarms that define the following resource behaviors:
    • At risk - The Oracle NoSQL Database Cloud Service is at risk of becoming inoperable, as indicated by metric values. For example, Storage size for a table is at risk of high utilization.
    • Non-optimal - The Oracle NoSQL Database Cloud Service is performing at non-optimal levels, as indicated by metric values. For example, ReadUnits or Write Units have high latency.
    • Resource is up or down - The Oracle NoSQL Database Cloud Service is either not reachable or not operating. For example, High number for ReadThrottleCount or WriteThrottleCount.
  • Set up a process for responding to alarms.
    Based on the severity of the alarm, you can choose to respond to the alarms in the following different ways:
    • For Critical to At-Risk alarms, you can decide to notify the operations team immediately because repair is required to bring the instances back to optimal operational levels. You configure alarm notifications to the responsible team by both PagerDuty and email, requesting an investigation and appropriate fixes before the instances go into an inoperable state. You set repeat notifications every minute. When someone responds to the alarm notifications, you temporarily stop notifications by suppressing the alarm. Once metrics return to optimal values, you remove the suppression.
    • For Warning or Non-Optimal alarms, you can decide to notify the appropriate individual or team that Oracle NoSQL Database Cloud Service table is consuming more Storage Size than usual. You configure a threshold alarm to notify the appropriate contacts as no immediate actions are required to investigate and reduce the Storage Size. You set notification to email only, directed to the appropriate developer or team, with repeat notifications every 24 hours to reduce email notification noise.