MySQL HeatWave User Guide
MySQL 9.2.2 introduces the ability to detect anomalies in log data. To perform anomaly detection on logs, log data is cleaned, segemented, and encoded before running anomaly detection. This feature leverages the log template miner Drain3.
Consider the following when running anomaly detection on logs.
The input table can only have the following columns:
The column containing the logs.
If including logs from different sources, a column containing the source of each log. The values in this column contain the names of the sources that each log belongs to. These values are used to group each host's logs together. If this column is not present, it is assumed that all logs originate from the same source.
If including labeled data, a column identifying the labeled log lines. See Semi-supervised Anomaly Detection to learn more.
At least one column must act as the primary key to
establish the temporal order of logs. If the primary
key column (or columns) is not one of the previous
required columns (log data, source of log, or label),
then you must use the
exclude_column_list
option when
running ML_TRAIN
to
exclude all primary key columns that don't include
required data. See
Syntax Examples for Anomaly Detection Training
to review relevant examples.
If the input table has additional columns to the ones
permitted, you must use the
exclude_column_list
option when running
ML_TRAIN
to exclude
irrelevant columns.
The data collected for anomaly detection can be unsupervised or semi-supervised. To run semi-supervised anomaly detection, you can provide a separate column in the input table with labels for the labeled log lines. This column labels identified anomalous logs with a value of 1, non-anomalous logs with 0, and unlabeled logs with NULL. See Semi-supervised Anomaly Detection to learn more.
In addition to the anomaly scores included in the output table, you have the option to leverage MySQL HeatWave GenAI to provide textual log summaries.
By default the following parameters are masked in the
input data (training or test data): IP, DATETIME, TIME,
HEX, IPPORT, and OCID. You have the option to mask
additional regex patterns with the
additional_masking_regex
option.
Learn more about the following:
Learn how to Prepare Data for an Anomaly Detection Model.