17.1.1 CloudWatch Monitoring Metrics

CloudWatch metrics can be used to monitor resources, diagnose issues, and troubleshoot your DB systems and their HeatWave Clusters. After enabling CloudWatch monitoring on a DB System, you can select the set of metrics that will be emitted to CloudWatch for the DB System and its HeatWave Cluster.

The metrics are emitted using the CloudWatch embedded metric format (EMF) logs. The EMF logs are JSON objects that embed metric data along with metadata and contextual information. You can find more information about the EMF format in EMF specification by AWS. Here is a sample EMF log snippet:

{
    "_aws": {
        "CloudWatchMetrics": [
            {
                "Namespace": "OracleHeatWave",
                "Dimensions": [
                    [
                        "dbSystemId"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "mysql.stats.threads_connected",
                        "Unit": "Count",
                        "StorageResolution": 60
                    }
                ]
            }
        ],
        "Timestamp": 1750074660311
    },
    "dbSystemId": "example-dbsystem-id",
    "mysql.stats.threads_connected": 1
}

Here are explanations for some of the objects in an EMF log:

  • Namespace: Metrics are emitted to the OracleHeatWave namespace in CloudWatch. The EMF logs are streamed to the /OracleHeatWave/metrics log group.
  • Dimensions:
    • dbSystemId: Specifies the unique ID of the DB System.
    • resourceType: Specifies the resource type the metric originated from. It can be one of:
      • mysql: DB System
      • heatWave: HeatWave Cluster
    • heatWaveNode: Specifies the HeatWave node index the metric originated from.
  • Metrics: Table 17-1 list the metrics for monitoring your DB Systems and HeatWave Clusters.

    Metrics included in the Basic metric template in the Monitoring Configurations are marked by a * beside their names in the table. All the listed metrics are included in the Detailed metric template.

    Table 17-1 CloudWatch Metrics for HeatWave on AWS

    Metric Unit Description Dimensions
    heatwave.auto_ml_operation_count Count The number of AutoML operations that have been executed. This number resets on HeatWave restart.
    • dbSystemId
    heatwave.change_propagation_lag Count The number of transactions that are not propagated to HeatWave yet.
    • dbSystemId
    heatwave.dataset_bytes Bytes The amount of data loaded in the HeatWave cluster.
    • dbSystemId
    heatwave.heap_usage Bytes The size of the HeatWave data dictionary, which consumes heap space on the DB system nodes.
    • dbSystemId
    heatwave.lakehouse_total_loaded_bytes Bytes The total size of all Lakehouse tables loaded into HeatWave.
    • dbSystemId
    heatwave.loaded_tables_available Count The total size of all Lakehouse tables loaded into HeatWave.
    • dbSystemId
    heatwave.loaded_tables_unusable Count The number of tables loaded into HeatWave that are currently not available for use.
    • dbSystemId
    heatwave.health* Count HeatWave cluster health status.
    • dbSystemId
    heatwave.load_progress* Percent Progress of data load into HeatWave cluster memory.
    • dbSystemId
    heatwave.query_offload_count* Count The number of statements executed against the DB System and were executed on HeatWave cluster.
    • dbSystemId
    mysql.inno.buffer_pool.pages_dirty Count Buffer pages currently dirty.
    • dbSystemId
    mysql.inno.buffer_pool.pages_free Count Buffer pages currently free.
    • dbSystemId
    mysql.inno.buffer_pool.pages_total Count Total buffer pool size in pages.
    • dbSystemId
    mysql.stats.aborted_clients Count The number of connections that were aborted because the client died without closing the connection properly.
    • dbSystemId
    mysql.stats.aborted_connects Count The number of failed attempts to connect to the MySQL server.
    • dbSystemId
    mysql.stats.connection.total Count Cumulative count of total connections created.
    • dbSystemId
    mysql.stats.created.tmp_disk_tables Count The number of internal on-disk temporary tables created by the server while executing statements.
    • dbSystemId
    mysql.stats.created.tmp_tables Count The number of internal temporary tables created by the server while executing statements.
    • dbSystemId
    mysql.stats.handler.delete Count The number of times that rows have been deleted from tables.
    • dbSystemId
    mysql.stats.handler.read_first Count The number of times the first entry in an index was read.
    • dbSystemId
    mysql.stats.handler.read_key Count The number of requests to read a row based on a key.
    • dbSystemId
    mysql.stats.handler.read_last Count The number of requests to read the last key in an index.
    • dbSystemId
    mysql.stats.handler.read_next Count The number of requests to read the next row in key order.
    • dbSystemId
    mysql.stats.handler.read_prev Count The number of requests to read the previous row in key order.
    • dbSystemId
    mysql.stats.handler.read_rnd Count The number of requests to read a row based on a fixed position.
    • dbSystemId
    mysql.stats.handler.read_rnd_next Count The number of requests to read the next row in the data file.
    • dbSystemId
    mysql.stats.handler.update Count The number of requests to update a row in a table.
    • dbSystemId
    mysql.stats.handler.write Count The number of requests to insert a row in a table.
    • dbSystemId
    mysql.stats.select_full_join Count The number of joins that perform table scans because they do not use indexes.
    • dbSystemId
    mysql.stats.select_full_range_join Count The number of joins that used a range search on a reference table.
    • dbSystemId
    mysql.stats.select_range Count The number of joins that used ranges on the first table.
    • dbSystemId
    mysql.stats.select_range_check Count The number of joins without keys that check for key usage after each row.
    • dbSystemId
    mysql.stats.select_scan Count The number of joins that did a full scan of the first table.
    • dbSystemId
    mysql.stats.sort_merge_passes Count The number of merge passes that the sort algorithm has had to do.
    • dbSystemId
    mysql.stats.sort_range Count The number of sorts that were done using ranges.
    • dbSystemId
    mysql.stats.sort_rows Count The number of sorted rows.
    • dbSystemId
    mysql.stats.sort_scan Count The number of sorts that were done by scanning the table.
    • dbSystemId
    mysql.stats.threads_connected* Count The number of current connections to the DB system.
    • dbSystemId
    mysql.stats.threads_running* Count The number of connections actively executing statements against the DB system.
    • dbSystemId
    system.cpu.utilization* Percent The CPU utilization for the DB system host.
    • dbSystemId
    • resourceType
    system.dbvolume.utilization* Percent The total space utilization of the DB System volume.
    • dbSystemId
    • resourceType
    system.dbvolume.size* Bytes The maximum amount of space allocated to the DB System during the interval.
    • dbSystemId
    • resourceType
    system.dbvolume.usage* Bytes The maximum amount of space used during the interval.
    • dbSystemId
    • resourceType
    system.memory.utilization* Percent Memory utilization for the DB System host.
    • dbSystemId
    • resourceType
    • heatWaveNode
    system.memory.size* Bytes The total amount of memory allocated during the selected interval.
    • dbSystemId
    • resourceType
    • heatWaveNode
    system.memory.usage* Bytes The maximum amount of memory used during the selected interval.
    • dbSystemId
    • resourceType
    • heatWaveNode

How metrics are sent:

  • Metrics are emitted every 60 seconds.
  • During maintenance or actions against the DB System (for example, restart, update), metric emissions might be interrupted.
  • Metrics for highly available DB Systems are only emitted for the primary instance; secondary instances do not emit metrics. In case of a loss of quorum when both secondaries are unreachable (for example, due to network partitioning), no metrics will be sent until majority is restored for the DB System.