Health Monitor

Health Monitor collects statistics for system CPU usage, disk usage, memory usage, and the MySQL server process (mysqld). The statistics are logged in history tables, for which Health Monitor takes samples once a minute, and retains a day's worth of data. You can also access these statistics in tables that are populated with current readings on demand when you query them.

Overview of Health Monitor

Health Monitor samples disk space information, system memory information, and process memory information every sixty seconds, and stores a day's worth of data in tables in the Performance Schema of the MySQL Server. If the available disk space or available memory falls below certain preset thresholds, Health Monitor issues warnings and logs it to the performance_schema.error_log table.

Health Monitor uses MiB/GiB/TiB storage units where 100 GB = 93 GiB.

The total storage available to a DB system contains the following:

  • Available storage: The disk space after the storage reserve is excluded from the total storage.
  • Storage reserve: The disk space that is lesser of the disk_low_space_level and disk_low_space_percent configuration variables. The default value of disk_low_space_level is 5,000 MiB, and the default value of disk_low_space_percent is 4% of the total storage.
  • Critical storage: The disk space in the storage reserve below the critical threshold, which is defined by the disk_low_space_critical_level configuration variable. This is the minimum viable disk space for the system to avoid disk exhaustion. The default critical storage space is 2,000 MiB.

Disk Space Warnings

When the disk space falls below certain percentage of the available storage, Health Monitor issues the following warnings and logs it to the performance_schema.error_log table:

  • WARNING_DISK_USAGE_LEVEL_1: When the disk space falls below 20% of the available storage.
  • WARNING_DISK_USAGE_LEVEL_2: When the disk space falls below 10% of the available storage.
  • WARNING_DISK_USAGE_LEVEL_3: When the disk space falls below 5% of the available storage.

For example, if the total storage is 100,000 MiB, and the storage reserve is lesser of the following two configuration variables:

  • disk_low_space_level : The default value is 5,000 MiB.
  • disk_low_space_percent: It is 4% of the total storage, that is, 4% of 100,000 MiB = 4,000 MiB.

The storage reserve is 4,000 MiB, which is lesser of the two configuration variables.

You can find the available storage by excluding the storage reserve from the total storage:

100,000 - 4,000 = 96,000 MiB

Health Monitor issues the following warnings:

  • WARNING_DISK_USAGE_LEVEL_1: When the disk space falls below 20% of the available storage, which is 20/100*96,000 = 19,200 MiB.
  • WARNING_DISK_USAGE_LEVEL_2: When the disk space falls below 10% of the available storage, which is 10/100*96,000 = 9,600 MiB.
  • WARNING_DISK_USAGE_LEVEL_3: When the disk space falls below 5% of the available storage, which is 5/100*96,000 = 4,800 MiB.

Significant or Sustained Shortage of Disk Space

Health Monitor defines significant or sustained shortage of disk space when the following conditions are met:

  • Significant shortage: The disk space drops below the critical threshold defined by disk_low_space_critical_level.
  • Sustained shortage: The disk space drops below the storage reserve, and then remains in the storage reserve for the duration defined by the disk_low_space_duration configuration variable.

When there is a significant or sustained shortage of disk space, Health Monitor does the following:

  • SUPER_READ_ONLY=ON: Sets the system variables, SUPER_READ_ONLY, to ON in the MySQL Server. The MySQL Server rejects all new incoming SQL write statements (UPDATE, INSERT, DELETE, and DDL), regardless of users and privileges. Running transactions are allowed to complete, but new writes are prohibited until the disk space is available again. You cannot load data into HeatWave when the server is in SUPER_READ_ONLY mode.

    In some cases, global read locks, ongoing commits, or metadata locks blocks the Health Monitor from setting the SUPER_READ_ONLY system variable. In such cases, if you set the health_monitor.disk_fallback_force variable, the Health Monitor identifies and terminates active queries holding global locks to set the SUPER_READ_ONLY system variable.

  • OFFLINE_MODE=ON: Sets the system variable, OFFLINE_MODE, to ON in the MySQL Server. The MySQL Server disconnects client users who do not have the CONNECTION_ADMIN privilege, terminates running statements and releases locks, and blocks new connections with an appropriate error.
  • super_read_only_disk_full=ON: Sets the status variable, super_read_only_disk_full, to ON indicating that read-only mode was triggered due to low disk space. You can see the value of super_read_only_disk_full status variable with either of the following command:
    • SHOW GLOBAL STATUS WHERE variable_name = 'super_read_only_disk_full';
    • SELECT * FROM performance_schema.global_status WHERE variable_name = 'super_read_only_disk_full';

System variables change behavior of the MySQL Server while status variables are static indicators of the MySQL Server state.

Recovery to Normal Operation

Once the SUPER_READ_ONLY and OFFLINE_MODE are set, Health Monitor does the following:

  • Check the disk space every minute .
  • Set OFFLINE_MODE=OFF if the available disk space recovers and remains above disk_recovery_level for disk_recovery_time_1 seconds or more.
  • Set SUPER_READ_ONLY=OFF if the available disk space recovers and remains above disk_recovery_level for an additional disk_recovery_time_2 seconds or more.

Normal operations are resumed once the disk space remains above disk_recovery_level for disk_recovery_time_1 + disk_recovery_time_2 seconds.

Once the disk space recovers, Health Monitor clears the system variables, SUPER_READ_ONLY, and OFFLINE_MODE, and the status variable, super_read_only_disk_full. You do not have to restart the MySQL instance to clear the variables.

If there is a significant or sustained shortage of disk space again, then the Health Monitor again sets the system variables, SUPER_READ_ONLY and OFFLINE_MODE, and the status variable, super_read_only_disk_full, to ON in the MySQL Server.

Before returning a primary instance of a high availability DB system to normal operating mode, the Health Monitor verifies that the server is online with the group majority. The Health Monitor never sets SUPER_READ_ONLY or OFFLINE_MODE on the secondary instances.

Recovery Increment, Recovery Max Cycle, and Recovery Window

Each time the MySQL Server recovers from a significant or sustained shortage of disk space, the Health Monitor increases the disk recovery time by disk_recovery_increment seconds.

For example, if disk_recovery_time_1 = 300 seconds, and disk_recovery_time_2 = 600 seconds, and disk_recovery_increment = 400 seconds, then the disk recovery time for each cycle is as following:

  • Cycle 1: disk_recovery_time_1 = 300 seconds, disk_recovery_time_2 = 600 seconds
  • Cycle 2: disk_recovery_time_1 = (300 + 400) seconds, disk_recovery_time_2 = (600+400) seconds
  • Cycle 3: disk_recovery_time_1 = (300+400+400) seconds, disk_recovery_time_2 = (600+400+400) seconds

The Health Monitor restores to the normal operating mode a maximum of disk_recovery_max_cycles times within a fixed window of disk_recovery_window seconds, from the last recovery cycle. For example, if disk_recovery_max_cycles is set to 5, and disk_recovery_window is set to 86,400 seconds (1 day), then the Health Monitor can restore to the normal operating mode five times in a window of 86,400 seconds. If a significant or sustained shortage happens the sixth time in the same window, the system variables, SUPER_READ_ONLY and OFFLINE_MODE are set to ON.

The recovery count is reset after disk_recovery_window seconds from the last recovery.

Memory Warnings

When the available memory falls below certain thresholds, Health Monitor issues the following warnings:

  • WARNING_MEMORY_USAGE_LEVEL_1: When the available memory falls below 1024 MiB.
  • WARNING_MEMORY_USAGE_LEVEL_2: When the available memory falls below 500 MiB.
  • WARNING_MEMORY_USAGE_LEVEL_3: When the available memory falls below 100 MiB.

Health Monitor logs the information regarding mysql version, global memory usage, and various configuration settings in the error log. The information is logged as comma separated values for easy transfer into a spreadsheet for further analysis.

Viewing Health Monitor Tables

Health Monitor tables store monitoring data and statistics in MySQL Server's Performance Schema.

The health tables contain disk storage information, system memory information, and process memory information.

The statistics tables contain statistics for system CPU usage, disk usage, memory usage, and the MySQL server process (mysqld). Each statistics table has two versions, an on-demand version, which is populated with current readings when you query it, and a history version for which Health Monitor takes samples once a minute, and retains a day's worth of data. Both versions of the table have the same columns.

Using a Command-Line Client

Use a command-line client such as MySQL Client or MySQL Shell to view the Health Monitor tables.

This task requires the following:
  1. For MySQL Shell, switch to SQL mode by entering the \sql command.
  2. Enter the following SQL statement, specifying the table you want to view:
    SELECT * FROM performance_schema.<health_monitor_table>;

    For example, to view the disk storage status, use the following SQL command:

    SELECT * FROM performance_schema.health_block_device;
    Note

    To view the complete list of tables, you can specify in <health_monitor_table>, see Health Monitor Tables.
You get a response similar to the following, which shows the disk storage status information:

+-------------------------+---------------------+-------------+-----------------+-------------+-------------+
| DEVICE                  | TIMESTAMP           | TOTAL_BYTES | AVAILABLE_BYTES | USE_PERCENT | MOUNT_POINT |
+-------------------------+---------------------+-------------+-----------------+-------------+-------------+
| /dev/mapper/vg_db-lv_db | 2021-08-08 16:12:08 | 53656686592 |     51364134912 |        4.27 | /db         |
| /dev/mapper/vg_db-lv_db | 2021-08-08 16:13:08 | 53656686592 |     51364134912 |        4.27 | /db         |
| /dev/mapper/vg_db-lv_db | 2021-08-08 16:14:08 | 53656686592 |     51364134912 |        4.27 | /db         |
| /dev/mapper/vg_db-lv_db | 2021-08-08 16:15:08 | 53656686592 |     51364134912 |        4.27 | /db         |
| /dev/mapper/vg_db-lv_db | 2021-08-08 16:16:08 | 53656686592 |     51364134912 |        4.27 | /db         |
| /dev/mapper/vg_db-lv_db | 2021-08-08 16:17:08 | 53656686592 |     51364134912 |        4.27 | /db         |
| /dev/mapper/vg_db-lv_db | 2021-08-08 16:18:08 | 53656686592 |     51364134912 |        4.27 | /db         |
| /dev/mapper/vg_db-lv_db | 2021-08-08 16:19:08 | 53656686592 |     51364134912 |        4.27 | /db         |
| /dev/mapper/vg_db-lv_db | 2021-08-08 16:20:08 | 53656686592 |     51364134912 |        4.27 | /db         |
| /dev/mapper/vg_db-lv_db | 2021-08-08 16:21:08 | 53656686592 |     51364134912 |        4.27 | /db         |

Health Monitor Tables

Refer to the Health Monitor Tables to see the disk storage information, system memory information, and process memory information.

The complete list of Health Monitor tables is as following:

performance_schema.health_block_device Table

Health Monitor uses the performance_schema.health_block_device table in MySQL Server's Performance Schema to store data on used and available disk space in a DB system.

Table 17-1 health_block_device Performance Schema Table

Column Description
DEVICE The device name for the disk.
TIMESTAMP The time the data sample in this row was collected.
TOTAL_BYTES The total amount of storage on the device, in bytes.
AVAILABLE_BYTES The amount of available storage on the device, in bytes.
USE_PERCENT The percentage of disk space in use.
MOUNT_POINT The device's mount point.
performance_schema.health_system_memory Table

Health Monitor uses the performance_schema.health_system_memory table in MySQL Server's Performance Schema to store data on memory usage by the DB system

Table 17-2 health_system_memory Performance Schema Table

Column Description
TIMESTAMP The time the data sample in this row was collected.
TOTAL_MEMORY The total amount of memory for the system, in bytes.
AVAILABLE The memory available for starting new applications without swapping, in bytes.
USE_PERCENT The percentage of memory in use.
MEMORY_FREE The amount of unused memory.
MEMORY_FS_CACHE The filesystem page cache.
SWAP_TOTAL The total amount of swap memory.
SWAP_FREE The total amount of free swap memory.
performance_schema.health_process_memory Table

Health Monitor uses the performance_schema.health_process_memory table in MySQL Server's Performance Schema to store data on memory usage by the MySQL Server process (mysqld) in the DB system.

Table 17-3 health_process_memory Performance Schema Table

Column Description
TIMESTAMP The time the data sample in this row was collected.
PROCESS_NAME The name of the MySQL Server process.
PID The system identifier for the MySQL Server process.
VM_RSS The resident set size of the MySQL server process, in bytes.
VM_DATA The size of the MySQL Server process data segment, in bytes.
VM_SWAP The size of the MySQL server process swap segment, in bytes.
PAGE_FAULTS The number of page faults requiring disk I/O.
performance_schema.system_cpu_stats Table

Health Monitor collects statistics for CPU usage by the DB system. The statistics are available in two tables in MySQL Server's Performance Schema, an on-demand table and a corresponding history table. Both tables have the same columns.

The performance_schema.system_cpu_stats table in MySQL Server's Performance Schema is populated with a data sample when you query it. The performance_schema.system_cpu_stats_history table stores data samples regularly.

The statistics in these tables are sourced from the Linux system file /proc/stat.

Table 17-4 system_cpu_stats and system_cpu_stats_history Performance Schema Tables

Column Description
TIMESTAMP The time the data sample in this row was collected.
CPU The cpu row shows the aggregated CPU usage data, and each numbered row (for example, cpu0, cpu1) shows data for an individual process. All times are in milliseconds.
USER_MS The time spent in user mode.
NICE_MS The time spent in user mode with low priority.
SYSTEM_MS The time spent in system mode.
IDLE_MS The time spent in the idle task.
IOWAIT_MS The time spent waiting for I/O to complete.
IRQ_MS The time spent servicing hardware interrupts.
SOFTIRQ_MS The time spent servicing software interrupts.
STEAL_MS The time spent running other operating systems in a virtualized environment.
GUEST_MS The time spent running a virtual CPU with normal priority.
GUEST_NICE_MS The time spent running a virtual CPU with low priority.
performance_schema.system_disk_stats Table

Health Monitor collects statistics for disk usage by the DB system. The statistics are available in two tables in MySQL Server's Performance Schema, an on-demand table and a corresponding history table. Both tables have the same columns.

The performance_schema.system_disk_stats table in MySQL Server's Performance Schema is populated with a data sample when you query it. The performance_schema.system_disk_stats_history table stores data samples regularly.

The statistics in these tables are sourced from the Linux system file /proc/diskstats.

Table 17-5 system_disk_stats and system_disk_stats_history Performance Schema Tables

Column Description
TIMESTAMP The time the data sample in this row was collected.
DEVICE The device name for this disk.
READS The number of reads completed successfully.
READ_BYTES The number of bytes read.
READ_TIME_MS The time spent reading, in milliseconds.
WRITES The number of writes completed successfully.
WRITE_BYTES The number of bytes written.
WRITE_TIME_MS The time spent writing, in milliseconds.
FLUSHES The number of flush requests completed successfully.
FLUSH_TIME_MS The time spent flushing, in milliseconds.
performance_schema.system_memory_stats Table

Health Monitor collects statistics for memory usage by the DB system. The statistics are available in two tables in MySQL Server's Performance Schema, an on-demand table and a corresponding history table. Both tables have the same columns.

The performance_schema.system_memory_stats table in MySQL Server's Performance Schema is populated with a data sample when you query it. The performance_schema.system_memory_stats_history table stores data samples regularly.

The statistics in these tables are sourced from the Linux system file /proc/meminfo.

Table 17-6 system_memory_stats and system_memory_stats_history Performance Schema Tables

Column Description
TIMESTAMP The time the data sample in this row was collected.
TOTAL_BYTES The total usable system memory, in bytes.
FREE_BYTES The unused memory.
USED_BYTES The total used memory.
AVAILABLE_BYTES The estimated memory available for starting new applications, in bytes.
BUFFER_BYTES The amount of memory used for temporary storage for raw disk blocks.
CACHED_BYTES The amount of memory used to cache files read from disk (the page cache).
SLAB_BYTES The amount of memory used to cache in-kernel data structures.
SWAP_TOTAL_BYTES The total amount of swap space available on disk.
SWAP_FREE_BYTES The total amount of swap space that is currently unused.
SWAP_USED_BYTES The total amount of memory that has been swapped out from RAM and is temporarily on disk.
SWAP_IN The amount of memory swapped back into RAM from disk, in KB per second.
SWAP_OUT The amount of memory swapped out to disk from RAM, in KB per second.
performance_schema.system_process_stats Table

Health Monitor collects statistics for each thread used by the MySQL Server process in the DB system. The statistics are available in two tables in MySQL Server's Performance Schema, an on-demand table and a corresponding history table. Both tables have the same columns.

The performance_schema.system_process_stats table in MySQL Server's Performance Schema is populated with a data sample when you query it. The performance_schema.system_process_stats_history table stores data samples regularly.

For process statistics, the on-demand table returns one row for each thread used by the mysqld process. However, the history table stores a single row for each sample, with the aggregated totals for all the threads used by the mysqld process.

The statistics in these tables are sourced from the Linux system file /proc/self/task/[pid]/stat.

Table 17-7 system_process_stats and system_process_stats_history Performance Schema Tables

Column Description
TIMESTAMP The time the data sample in this row was collected.
PID The process ID.
PROCESS_NAME The process name, which is mysqld.
TID The thread ID for the individual thread shown in this row.
THREAD_NAME The instrumented thread name for the individual thread shown in this row.
STATE A character showing the thread state at the time of the sample: running (R), sleeping in an interruptible wait (S), waiting in an uninterruptible disk sleep (D), zombie thread (Z), stopped on a signal or trace stopped (T), paging (W), dead (X), wakekill (K), waking (W), or parked (P).
UTIME_MS The time this thread spent in user mode, in milliseconds.
STIME_MS The time this thread spent in kernel mode, in milliseconds.
CUTIME_MS The time that child processes of this process spent in user mode, in milliseconds.
CSTIME_MS The time that child processes of this process spent in kernel mode, in milliseconds.
NUM_THREADS The nunmber of threads in this process.
VSIZE_BYTES The virtual memory size in bytes.
RSS_BYTES The number of bytes the process has in real memory (the resident set size).
RSSLIM_BYTES The limit in bytes on the resident set size of the process.
PROCESSOR The CPU number on which the thread last executed.
DELAYACCT_BLKIO_MS The aggregated block I/O delays, in milliseconds.
GUEST_TIME_MS The guest time for the process, in milliseconds.
CGUEST_TIME_MS The guest time for the child processes of the process, in milliseconds.
READ_BYTES The number of bytes that this process fetched from the storage layer.
WRITE_BYTES The number of bytes that this process sent to the storage layer.

Viewing Health Monitor Variables

The Health Monitor variables are configuration settings for the monitoring activity in the DB system. You cannot change the values of these variables.

Using a Command-Line Client

Use a command-line client such as MySQL Client or MySQL Shell to view the Health Monitor variables and their values.

This task requires the following:
  • A running DB system.
  • A command-line client such as MySQL Client or MySQL Shell that is connected to the DB system. See Connecting to a DB System.
Do the following to view the Health Monitor variables:
  1. For MySQL Shell, switch to SQL mode by entering the \sql command.
  2. Enter the following SQL statement:
    show variables like 'health_monitor%';
You get a response similar to the following:
+----------------------------------------------+-------+
| Variable_name | Value |
+----------------------------------------------+-------+
| health_monitor.disk_fallback_enable | ON |
| health_monitor.disk_fallback_force | OFF |
| health_monitor.disk_low_space_critical_level | 2000 |
| health_monitor.disk_low_space_duration | 300 |
| health_monitor.disk_low_space_level | 5000 |
| health_monitor.disk_low_space_percent | 4 |
| health_monitor.disk_monitored | /db |
| health_monitor.disk_recovery_enable | ON |
| health_monitor.disk_recovery_increment | 300 |
| health_monitor.disk_recovery_level | 5 |
| health_monitor.disk_recovery_max_cycles | 3 |
| health_monitor.disk_recovery_time_1 | 300 |
| health_monitor.disk_recovery_time_2 | 300 |
| health_monitor.disk_recovery_window | 86400 |
| health_monitor.disk_retention | 86400 |
| health_monitor.disk_running | ON |
| health_monitor.disk_sample_rate | 60 |
| health_monitor.disk_usage_warning_level_1 | 20 |
| health_monitor.disk_usage_warning_level_2 | 10 |
| health_monitor.disk_usage_warning_level_3 | 5 |
...
+----------------------------------------------+-------+
35 rows in set (0.01 sec)

Health Monitor Variables

All Health Monitor system variables are prefixed with health_monitor. You cannot alter the values of the variables from their defaults in a DB system.

Note

The health_monitor. prefix is omitted from the table below. For example, disk_fallback_enable is actually health_monitor.disk_fallback_enable.

Table 17-8 Health Monitor System Variables

System Variable Default Value Description
disk_fallback_enable ON If there is a significant or sustained shortage of disk space, Health Monitor sets the system variable, SUPER_READ_ONLY and OFFLINE_MODE to ON in the MySQL Server.
disk_fallback_force OFF Terminate foreground queries that are holding global metadata locks and retry setting SUPER_READ_ONLY.
disk_low_space_critical_level 2000 The critical low disk space threshold in MiB. If the available disk space falls below this value for any duration, Health Monitor sets the SUPER_READ_ONLY and OFFLINE_MODE modes on the MySQL Server.
disk_low_space_duration 300 The duration (in seconds) for which the available disk space can remain below either disk_low_space_level or disk_low_space_percent (whichever is lower) before Health Monitor sets the SUPER_READ_ONLY and OFFLINE_MODE modes on the MySQL Server.
disk_low_space_level 5000 The low space threshold for the storage reserve, in MiB. If the available disk space drops below this percentage of total available storage for longer than the value of disk_low_space_duration, Health Monitor sets the SUPER_READ_ONLY and OFFLINE_MODE modes on the MySQL Server.
disk_low_space_percent 4 The low space threshold for the storage reserve, expressed as a percentage. If the available disk space drops below this percentage of total available storage for longer than the value of disk_low_space_duration, Health Monitor sets the SUPER_READ_ONLY and OFFLINE_MODE modes on the MySQL Server.
disk_monitored /db The mount point for disk statistics.
disk_retention 86400

How many seconds each data sample is retained for in the history tables for disk usage data.

The minimum value is 1 second, the default is 86400 seconds (1 day), and the maximum value is 864000 seconds (10 days).

If memory_sample_rate=60 and memory_retention=86400, 1440 samples are retained, with the oldest sample deleted every 60 seconds.

disk_recovery_enable ON Set the variable to ON to enable recovery from a sustained or significant shortage and to OFF to disable recovery from a sustained or significant shortage.
disk_recovery_level 5% The percentage of total disk space above the storage reserve.
  • If the available storage remains above the disk_recovery_level level for disk_recovery_time_1 seconds, then the Health Monitor sets OFFLINE_MODE = OFF.
  • If storage remains above the disk_recovery_level level for an additional disk_recovery_time_2 seconds, then the Health Monitor sets SUPER_READ_ONLY = OFF.
disk_recovery_time_1 300 seconds The amount of time (in seconds) that the available storage must remain above disk_recovery_level before the Health Monitor sets OFFLINE_MODE = OFF.
disk_recovery_time_2 300 seconds The additional time (in seconds) that the available storage must remain above disk_recovery_level after disk_recovery_time_1 before the Health Monitor sets SUPER_READ_ONLY = OFF.
disk_recovery_max_cycles 3 cycles The maximum number of times that the MySQL server can be placed back into normal operating mode within the disk_recovery_cycle_window timeframe.
disk_recovery_window 86400 seconds The amount of time (in seconds) following the last successful recovery cycle after which the recovery count is reset. A value of 0 means that the recovery count is never reset.
disk_recovery_increment 300 seconds The time (in seconds) to increase the disk_recovery_time_1 and disk_recovery_time_2 recovery timers for each recovery cycle.
disk_running ON The active state of the Health Monitor's disk monitor.
disk_sample_rate 60 The frequency of disk data collection, in seconds. By default, data is collected every 60 seconds. The minimum sample rate is 1 (every second), and the maximum is 86400 (once a day).
disk_usage_warning_level_1 20 A percentage of available disk space above the defined disk_low_space_level. If the available disk space falls below this level, the warning WARNING_DISK_USAGE_LEVEL_1 is raised.
disk_usage_warning_level_2 10 A percentage of available disk space, above the defined disk_low_space_level. If the available disk space falls below this level, the warning WARNING_DISK_USAGE_LEVEL_2 is raised.
disk_usage_warning_level_3 5 A percentage of available disk space, above the defined disk_low_space_level. If the available disk space falls below this level, the warning WARNING_DISK_USAGE_LEVEL_3 is raised.
memory_reporting ON Enables reporting of memory data.
memory_retention 86400

How many seconds each data sample is retained for in the history tables for memory usage data.

The minimum value is 1 second, the default is 86400 seconds (1 day), and the maximum value is 864000 seconds (10 days).

memory_running ON The active state of the Health Monitor's memory monitor.
memory_sample_rate 60 The frequency of memory data collection, in seconds. By default, data is collected every 60 seconds. The minimum sample rate is 1 (every second), and the maximum is 86400 (once a day).
memory_usage_warning_level_1 1024 An amount of available memory (MiB). If the available memory falls below this level, the warning WARNING_MEMORY_USAGE_LEVEL_1 is raised.
memory_usage_warning_level_2 500 An amount of available memory (MiB). If the available memory falls below this level, the warning WARNING_MEMORY_USAGE_LEVEL_2 is raised.
memory_usage_warning_level_3 100 An amount of available memory (MiB). If the available memory falls below this level, the warning WARNING_MEMORY_USAGE_LEVEL_3 is raised.
status_interval 10 The frequency (in multiples of disk_sample_rate) of sending the status messages to the error log table. For example, if status_interval = 10 and disk_sample_rate = 60 seconds, the Health Monitor sends a status message to the error log table every 600 seconds.
system_cpu_stats_history ON Whether system CPU usage statistics are collected and stored in the Performance Schema table system_cpu_stats_history. If these statistics are not stored, you can still access a snapshot on demand by querying the system_cpu_stats table.
system_disk_stats_history ON Whether system disk usage statistics are collected and stored in the Performance Schema table system_memory_stats_history. If these statistics are not stored, you can still access a snapshot on demand by querying the system_memory_stats table.
system_memory_stats_history ON Whether system memory usage statistics are collected and stored in the Performance Schema table system_disk_stats_history. If these statistics are not stored, you can still access a snapshot on demand by querying the system_disk_stats table.
system_process_stats_history ON Whether MySQL server process (mysqld) statistics are collected and stored in the Performance Schema table system_process_stats_history. If these statistics are not stored, you can still access a snapshot on demand by querying the system_process_stats table.
system_retention 86400

How many seconds each data sample is retained for in the history tables for system and process statistics.

The minimum value is 1 second, the default is 86400 seconds (1 day), and the maximum value is 864000 seconds (10 days).

system_running ON Whether system statistics are collected by Health Monitor. When this is set to ON, the statistics tables for system CPU usage, disk usage, memory usage, and the MySQL server process (mysqld) are available, and the corresponding history tables are populated if their system variables are set to ON.
system_sample_rate 60 The frequency of system statistics collection, in seconds. By default, data is collected every 60 seconds. The minimum sample rate is 1 (every second), and the maximum is 86400 (once a day).

Health Monitor Messages

The Health Monitor errors, warnings, and status updates are logged in performance_schema.error_log. Health Monitor abbreviates TiB as T, GiB as G and MiB as M in the error log. For example, 306.6G = 306.6 GiB = 329.2 GB.

Table 17-9 Health Monitor Messages

Message Type Description
Status Update Health Monitor issues a routine status update according to the number of seconds defined by sample_rate * status_interval. For example:
[Note] [MY-013694] [Health] Disk Collector: Started: running=ON, monitored='/db', 
sample rate=30s, retention=86400s, max samples=2880, status interval=1, low level=100 MiB, 
low percent=4%, critical level=10 MiB, warning levels=20/10/5%, fallback (enable=ON, force=ON), 
low space duration=300s, recovery (enable=ON, level=5%, time 1=300s, time 2=300s, max cycles=3,
increment=10, window=86400)
Threshold Warning If the amount of available space falls below one of the thresholds defined by disk_usage_warning_level_*, the Health Monitor issues a warning. The following example shows the warning for disk_usage_warning_level_2, defined as 4% of the total space above the low space limit:
[Warning] [MY-013695] [Health] Disk: Warning Level 2 (62M): monitored='/db', 
available=59M, total=466M, used=87.2%, low limit=18M, critical=10M, recovery=40M, 
warnings=107M/62M/40M
disk_low_space_duration Warning If the available disk space drops below the low space limit of 18 GiB, the Health Monitor starts the fallback timer and issues a warning. For example:
[Warning] [MY-013695] [Health] Disk: Low Level (18M): Fallback Timer Running: 300 seconds 
until SUPER_READ_ONLY: monitored='/db', available=15M, total=466M, used=96.6%, low limit=18M, 
critical=10M, recovery=40M, warnings=107M/62M/40M
OFFLINE/SUPER_READ_ONLY Mode Warning If the available disk space remains below the low space limit for more than disk_low_space_duration seconds, the Health Monitor puts the server into SUPER_READ_ONLY and OFFLINE_MODE. For example:
[Warning] [MY-013695] [Health] Disk: Low Level (18M): Fallback Timer Running: 30 seconds until 
SUPER_READ_ONLY: monitored='/db', available=15M, total=466M, used=96.6%, low limit=18M, critical=10M, 
recovery=40M, warnings=107M/62M/40M
[Warning] [MY-013695] [Health] Disk: Low Level (18M): Fallback Timer Expired
[Warning] [MY-013695] [Health] Disk: Low Level (18M): OFFLINE_MODE = ON
[Warning] [MY-013695] [Health] Disk: Low Level (18M): SUPER_READ_ONLY = ON
See Resolving SUPER_READ_ONLY and OFFLINE_MODE Issue.
Critical Warning If the available disk space drops below disk_low_space_critical_level for any length of time, the Health Monitor puts the server into SUPER_READ_ONLY mode. For example:
[Warning] [MY-013695] [Health] Disk Collector: CRITICAL LEVEL (2G): 
OFFLINE_MODE=ON: mount point='/db', available=1.9G, total=1024G, 
used=99.1%, low limit=5.0G, critical=2.0G, warnings=204G/102G/51G 
[Warning] [MY-013695] [Health] Disk Collector: CRITICAL LEVEL (2G): 
SUPER_READ_ONLY=ON: mount point='/db', available=1.9G, total=1024G, 
used=99.1%, low limit=5.0G, critical=2.0G, warnings=204G/102G/51G
Timer Canceled Warning If the available space rises above the low space limit before disk_low_space_duration seconds completes, the fallback mode timer is canceled and the Health Monitor issues either a status message or a warning message if the available space is still within a warning range. For example:
[Warning] [MY-013695] [Health] Disk: Low Level (18M): Fallback Timer Canceled
[Warning] [MY-013695] [Health] Disk: Warning Level 1 (107M): monitored='/db', available=79M, 
total=466M, used=82.9%, low limit=18M, critical=10M, recovery=40M, warnings=107M/62M/40M
Memory Warning If the available memory falls below one of the thresholds defined by memory_usage_warning_level_*, the Health Monitor issues a warning which includes the memory diagnostics details in comma separated values (csv). The csv data in the following example has been removed due to space constraint.
[Warning] [MY-013695] [Health] MEMORY: WARNING LEVEL 1 (1.0G): available=950M, total=7.5G, used=87.6%, 
mysqld=5.9G, warnings=1.0G/500M/100M
[Warning] [MY-013695] [Health]  
[Warning] [MY-013695] [Health] ===== MEMORY DIAGNOSTICS BEGIN (csv) =====

---------------------------------------
TOTAL ALLOCATED MEMORY AND BUFFER SIZES
---------------------------------------
keyword,value
mysql_version,8.0.33
total_allocated,15121493496
innodb_buffer_pool_size,13958643712
join_buffer_size,262144
----------------------------------------
MEMORY AGGREGATION PER LOGICAL COMPONENT
----------------------------------------
component,current_count,current_alloc,max_alloc,min_alloc
...<csv data>...

-----------------------
MEMORY USAGE PER THREAD
-----------------------
thread_id,user,current_count_used,current_allocated,current_avg_alloc,current_max_alloc,total_allocated
...<csv data>...

-----------------------
MEMORY USAGE PER USER
-----------------------
user,current_count_used,current_allocated,current_avg_alloc,current_max_alloc,total_allocated
...<csv data>...

-------------------
PROCESSLIST DETAILS
-------------------
thd_id,conn_id,user,db,command,state,time,current_statement,execution_engine,statement_latency,progress,
lock_latency,cpu_latency,rows_examined,rows_sent,rows_affected,tmp_tables,tmp_disk_tables,full_scan,
last_statement,last_statement_latency,current_memory,last_wait,last_wait_latency,source,trx_latency,
trx_state,trx_autocommit,pid,program_name
...<csv data>...

---------------
SESSION DETAILS
---------------
thd_id,conn_id,user,db,command,state,time,current_statement,execution_engine,statement_latency,progress,
lock_latency,cpu_latency,rows_examined,rows_sent,rows_affected,tmp_tables,tmp_disk_tables,full_scan,
last_statement,last_statement_latency,current_memory,last_wait,last_wait_latency,source,trx_latency,
trx_state,trx_autocommit,pid,program_name
...<csv data>...

--------------------------------------------------------
CURRENT STATEMENT PER THREAD ORDERED BY STATEMENT MEMORY
--------------------------------------------------------
THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,TIMER_START,TIMER_END,TIMER_WAIT,LOCK_TIME,SQL_TEXT,
DIGEST,DIGEST_TEXT,CURRENT_SCHEMA,OBJECT_TYPE,OBJECT_SCHEMA,OBJECT_NAME,OBJECT_INSTANCE_BEGIN,MYSQL_ERRNO,
RETURNED_SQLSTATE,MESSAGE_TEXT,ERRORS,WARNINGS,ROWS_AFFECTED,ROWS_SENT,ROWS_EXAMINED,CREATED_TMP_DISK_TABLES,
CREATED_TMP_TABLES,SELECT_FULL_JOIN,SELECT_FULL_RANGE_JOIN,SELECT_RANGE,SELECT_RANGE_CHECK,SELECT_SCAN,
SORT_MERGE_PASSES,SORT_RANGE,SORT_ROWS,SORT_SCAN,NO_INDEX_USED,NO_GOOD_INDEX_USED,NESTING_EVENT_ID,
NESTING_EVENT_TYPE,NESTING_EVENT_LEVEL,STATEMENT_ID,CPU_TIME,MAX_CONTROLLED_MEMORY,MAX_TOTAL_MEMORY,
EXECUTION_ENGINE
...<csv data>...

----------------------------------------------------------
CURRENT STATEMENT PER THREAD FOR THREADS USING MOST MEMORY
----------------------------------------------------------
THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,TIMER_START,TIMER_END,TIMER_WAIT,LOCK_TIME,SQL_TEXT,
DIGEST,DIGEST_TEXT,CURRENT_SCHEMA,OBJECT_TYPE,OBJECT_SCHEMA,OBJECT_NAME,OBJECT_INSTANCE_BEGIN,MYSQL_ERRNO,
RETURNED_SQLSTATE,MESSAGE_TEXT,ERRORS,WARNINGS,ROWS_AFFECTED,ROWS_SENT,ROWS_EXAMINED,CREATED_TMP_DISK_TABLES,
CREATED_TMP_TABLES,SELECT_FULL_JOIN,SELECT_FULL_RANGE_JOIN,SELECT_RANGE,SELECT_RANGE_CHECK,SELECT_SCAN,
SORT_MERGE_PASSES,SORT_RANGE,SORT_ROWS,SORT_SCAN,NO_INDEX_USED,NO_GOOD_INDEX_USED,NESTING_EVENT_ID,
NESTING_EVENT_TYPE,NESTING_EVENT_LEVEL,STATEMENT_ID,CPU_TIME,MAX_CONTROLLED_MEMORY,MAX_TOTAL_MEMORY,
EXECUTION_ENGINE,thread_id
...<csv data>...

----------------------------------------------------------
STATEMENT HISTORY PER THREAD FOR THREADS USING MOST MEMORY
----------------------------------------------------------
THREAD_ID,EVENT_ID,END_EVENT_ID,EVENT_NAME,SOURCE,TIMER_START,TIMER_END,TIMER_WAIT,LOCK_TIME,SQL_TEXT,
DIGEST,DIGEST_TEXT,CURRENT_SCHEMA,OBJECT_TYPE,OBJECT_SCHEMA,OBJECT_NAME,OBJECT_INSTANCE_BEGIN,MYSQL_ERRNO,
RETURNED_SQLSTATE,MESSAGE_TEXT,ERRORS,WARNINGS,ROWS_AFFECTED,ROWS_SENT,ROWS_EXAMINED,CREATED_TMP_DISK_TABLES,
CREATED_TMP_TABLES,SELECT_FULL_JOIN,SELECT_FULL_RANGE_JOIN,SELECT_RANGE,SELECT_RANGE_CHECK,SELECT_SCAN,
SORT_MERGE_PASSES,SORT_RANGE,SORT_ROWS,SORT_SCAN,NO_INDEX_USED,NO_GOOD_INDEX_USED,NESTING_EVENT_ID,
NESTING_EVENT_TYPE,NESTING_EVENT_LEVEL,STATEMENT_ID,CPU_TIME,MAX_CONTROLLED_MEMORY,MAX_TOTAL_MEMORY,
EXECUTION_ENGINE,thread_id
...<csv data>...

-----------------------------
EVENTS ALLOCATING MOST MEMORY
-----------------------------
processlist_id,processlist_user,thread_id,event_name,count_alloc,count_free,current_count_used,
current_number_of_bytes_used
...<csv data>...

----------------------
BUFFER POOL STATISTICS
----------------------
POOL_ID,POOL_SIZE,FREE_BUFFERS,DATABASE_PAGES,OLD_DATABASE_PAGES,MODIFIED_DATABASE_PAGES,PENDING_DECOMPRESS,
PENDING_READS,PENDING_FLUSH_LRU,PENDING_FLUSH_LIST,PAGES_MADE_YOUNG,PAGES_NOT_MADE_YOUNG,PAGES_MADE_YOUNG_RATE,
PAGES_MADE_NOT_YOUNG_RATE,NUMBER_PAGES_READ,NUMBER_PAGES_CREATED,NUMBER_PAGES_WRITTEN,PAGES_READ_RATE,
PAGES_CREATE_RATE,PAGES_WRITTEN_RATE,NUMBER_PAGES_GET,HIT_RATE,YOUNG_MAKE_PER_THOUSAND_GETS,
NOT_YOUNG_MAKE_PER_THOUSAND_GETS,NUMBER_PAGES_READ_AHEAD,NUMBER_READ_AHEAD_EVICTED,READ_AHEAD_RATE,
READ_AHEAD_EVICTED_RATE,LRU_IO_TOTAL,LRU_IO_CURRENT,UNCOMPRESS_TOTAL,UNCOMPRESS_CURRENT
...<csv data>...

Related Topics