MySQL HeatWave User Guide
This topic describes how to prepare the data to use for two anomaly detection machine learning models: a semi-supervised anomaly detection model, and an unsupervised anomaly detection model for logs. It uses data samples generated by OCI GenAI. To prepare the data for this use case, you set up a training dataset and a testing dataset. In a real-life use case, you should prepare a larger amount of records than these data samples for training and testing, and ensure the predictions are valid and reliable before testing on unlabeled data. To ensure reliable predictions, you should create an additional validation dataset. You can reserve 20% of the records in the training dataset to create the validation dataset.
This topic has the following sections.
Learn how to Prepare Data.
The semi-supervised anomaly detection model (available as of
MySQL 9.0.1-u1) looks for unusual patterns in credit card
transactions. The data has a column,
target
, that has three possible values: 0
for normal, 1 for anomalous, and NULL for unlabeled.
To prepare the data for the semi-supervised anomaly detection model:
Create and use the database to store the data.
mysql>CREATE DATABASE anomaly_data;
mysql>USE anomaly_data;
Create the table to insert the sample data into. This is the training dataset.
mysql> CREATE TABLE credit_card_train (
transaction_id INT AUTO_INCREMENT PRIMARY KEY,
home_address VARCHAR(100),
purchase_location VARCHAR(100),
purchase_amount DECIMAL(10, 2),
purchase_time DATETIME,
target INT
);
Insert the sample data to train into the table. Copy and paste the following commands.
INSERT INTO credit_card_train (home_address, purchase_location, purchase_amount, purchase_time, target)
VALUES
('123 Main St, City A', 'Store X, City A', 50.75, '2023-08-01 14:30:00', 0),
('456 Elm St, City B', 'Cafe B, City B', 15.20, '2023-08-02 09:45:00', 1),
('789 Oak Ave, City C', 'Online Shop', 250.00, '2023-08-03 18:10:00', 0),
('222 Maple Lane, City A', 'Grocery Store A', 35.50, '2023-08-04 11:00:00', NULL),
('555 River Rd, City D', 'Electronics Store, City D', 800.50, '2023-08-05 16:20:00', 1),
('1010 Mountain View, City E', 'Boutique, City E', 120.30, '2023-08-06 10:35:00', 0),
('333 Ocean Blvd, City F', 'Convenience Store, City F', 20.15, '2023-08-07 19:50:00', NULL),
('666 Sky St, City G', 'Luxury Store, City G', 1500.00, '2023-08-08 12:00:00', 1),
('999 Green Valley, City H', 'Hardware Store, City H', 75.90, '2023-08-09 08:40:00', 0),
('111 Sunset Ave, City A', 'Store X, City A', 60.40, '2023-08-10 15:10:00', NULL),
('2222 Country Road, City B', 'Cafe B, City B', 28.75, '2023-08-11 07:30:00', 0),
('3333 Lakeside, City C', 'Online Shop', 180.25, '2023-08-12 13:20:00', 1),
('4444 Forest Glade, City D', 'Grocery Store, City D', 45.60, '2023-08-13 09:50:00', 0),
('5555 Meadow Lane, City E', 'Electronics Store, City E', 300.75, '2023-08-14 17:40:00', NULL),
('6666 Creekside, City F', 'Boutique, City F', 95.50, '2023-08-15 11:30:00', 1),
('7777 Hillcrest, City G', 'Convenience Store, City G', 12.80, '2023-08-16 18:50:00', 0),
('8888 Riverbank, City H', 'Luxury Store, City H', 2200.00, '2023-08-17 14:10:00', NULL),
('9999 Sunrise Blvd, City A', 'Hardware Store, City A', 55.25, '2023-08-18 09:30:00', 0),
('101010 Ocean View, City B', 'Store X, City B', 70.50, '2023-08-19 16:40:00', 1),
('111111 Mountain Rd, City C', 'Cafe C, City C', 32.90, '2023-08-20 11:20:00', NULL),
('121212 Downtown, City D', 'Online Shop', 450.00, '2023-08-21 17:50:00', 0),
('131313 Lakeside Ave, City E', 'Grocery Store, City E', 28.50, '2023-08-22 10:10:00', 1),
('141414 Green Park, City F', 'Electronics Store, City F', 650.75, '2023-08-23 15:30:00', 0),
('151515 Skyway, City G', 'Boutique, City G', 180.40, '2023-08-24 08:50:00', NULL),
('161616 Meadow View, City H', 'Convenience Store, City H', 35.10, '2023-08-25 13:40:00', 0),
('171717 River Rd, City A', 'Luxury Store, City A', 1300.50, '2023-08-26 19:20:00', 1),
('181818 Sunset Blvd, City B', 'Hardware Store, City B', 85.60, '2023-08-27 12:30:00', NULL),
('191919 Country Lane, City C', 'Store Y, City C', 150.20, '2023-08-28 07:40:00', 0),
('202020 Forest Edge, City D', 'Cafe D, City D', 42.75, '2023-08-29 14:50:00', 1),
('212121 Lakeside View, City E', 'Online Shop', 220.50, '2023-08-30 09:20:00', 0),
('222222 Creekside Ave, City F', 'Grocery Store, City F', 55.90, '2023-08-31 16:10:00', NULL);
Create the table to use for generating predictions. This
is the test dataset. It has the same columns as the
training dataset. The target column,
target
, is used for the
sem-supervised component of the training.
mysql> CREATE TABLE credit_card_test (
transaction_id INT AUTO_INCREMENT PRIMARY KEY,
home_address VARCHAR(100),
purchase_location VARCHAR(100),
purchase_amount DECIMAL(10, 2),
purchase_time DATETIME,
target INT
);
Insert the sample data to test into the table. Copy and paste the following commands.
INSERT INTO credit_card_test (home_address, purchase_location, purchase_amount, purchase_time, target)
VALUES
('3030 Riverbank Dr, City I', 'Grocery Store, City I', 52.30, '2023-09-01 10:30:00', 0),
('3131 Mountain Rd, City J', 'Electronics Store, City J', 120.50, '2023-09-02 16:45:00', 0),
('3232 Ocean Ave, City K', 'Boutique, City K', 85.20, '2023-09-03 11:20:00', 1),
('3333 Green Valley, City L', 'Convenience Store, City L', 25.60, '2023-09-04 18:50:00', 0),
('3434 Sunset Blvd, City I', 'Luxury Store, City I', 1600.00, '2023-09-05 14:10:00', 1),
('3535 Country Lane, City J', 'Hardware Store, City J', 68.40, '2023-09-06 09:30:00', 0),
('3636 Lakeside View, City K', 'Store Z, City K', 135.75, '2023-09-07 17:20:00', 0),
('3737 Forest Glade, City L', 'Cafe E, City L', 38.50, '2023-09-08 12:40:00', 1),
('3838 Meadow Lane, City I', 'Online Shop', 280.50, '2023-09-09 08:50:00', 0),
('3939 Creekside Ave, City J', 'Grocery Store, City J', 48.75, '2023-09-10 15:30:00', 0),
('4040 River Rd, City K', 'Electronics Store, City K', 720.25, '2023-09-11 11:10:00', 1),
('4141 Skyway Blvd, City L', 'Boutique, City L', 165.90, '2023-09-12 17:40:00', 0),
('4242 Hillcrest Rd, City I', 'Convenience Store, City I', 22.50, '2023-09-13 10:20:00', 0),
('4343 Riverbank View, City J', 'Luxury Store, City J', 2100.75, '2023-09-14 16:50:00', 1),
('4444 Country Club, City K', 'Hardware Store, City K', 92.30, '2023-09-15 12:30:00', 0),
('4545 Lakeside Ave, City L', 'Store Alpha, City L', 145.60, '2023-09-16 08:40:00', 0),
('4646 Forest Edge, City I', 'Cafe F, City I', 55.80, '2023-09-17 15:20:00', 1),
('4747 Creekside View, City J', 'Online Shop', 320.40, '2023-09-18 11:50:00', 0),
('4848 Meadow Park, City K', 'Grocery Store, City K', 62.50, '2023-09-19 18:30:00', 0),
('4949 River Walk, City L', 'Electronics Store, City L', 550.30, '2023-09-20 14:10:00', 1);
The anomaly detection model for logs (available as of MySQL
9.2.2) looks for unusual patterns in log data. The model
uses unsupervised learning, so the target
column is excluded for training and predicting anomalies.
To prepare the data for the anomaly detection model for logs:
If not already done, create and use the database to store the data.
mysql>CREATE DATABASE anomaly_log_data;
mysql>USE anomaly_log_data;
Create the table to insert the sample data into. This is the training dataset.
mysql> CREATE TABLE training_data (
log_id INT AUTO_INCREMENT PRIMARY KEY,
log_message TEXT,
timestamp DATETIME,
target TINYINT
);
Insert the sample data to be trained into the table. Copy and paste the following commands.
INSERT INTO training_data (log_message, timestamp, target) VALUES
("User login successful: admin", "2023-08-07 09:00:00", 0),
("Database connection established", "2023-08-07 09:05:23", 0),
("Failed login attempt from IP: 192.168.1.20", "2023-08-07 09:12:15", 1),
("Server load is high: 85%", "2023-08-07 09:20:30", 1),
("Normal system behavior", "2023-08-07 09:35:00", 0),
("Anomalous CPU usage spike", "2023-08-07 10:10:45", 1),
("New user registered", "2023-08-07 10:25:00", 0),
("Error: File not found", "2023-08-07 11:02:10", 1),
("System startup completed", "2023-08-07 11:30:00", 0),
("Network packet loss detected", "2023-08-07 12:15:35", 1),
("User activity: John accessed dashboard", "2023-08-07 13:00:20", 0),
("Security alert: Brute force attack detected", "2023-08-07 13:45:55", 1),
("Log rotation completed", "2023-08-07 14:20:00", 0),
("Anomalous memory usage pattern", "2023-08-07 15:05:30", 1),
("User feedback submitted", "2023-08-07 15:40:10", 0),
("System error: Out of memory", "2023-08-07 16:15:25", 1),
("Network connectivity restored", "2023-08-07 16:50:00", 0),
("Unlabeled log entry", NULL, NULL),
("Potential intrusion detected", "2023-08-07 17:35:40", 1),
("User logout: Jane", "2023-08-07 18:10:00", 0);
Create the table to use for generating predictions. This
is the test dataset. It has the same columns as the
training dataset, but the target column,
target
, must be excluded when
generating predictions.
mysql> CREATE TABLE testing_data (
log_id INT AUTO_INCREMENT PRIMARY KEY,
log_message TEXT,
timestamp DATETIME,
target TINYINT
);
Insert the sample data to test into the table. Copy and paste the following commands.
INSERT INTO testing_data (log_message, timestamp, target) VALUES
("User login failed: Invalid credentials", "2023-08-08 10:30:00", 1),
("Server response time increased", "2023-08-08 11:15:45", 1),
("Normal database query", "2023-08-08 12:00:20", 0),
("Unusual network traffic from IP: 10.0.0.5", "2023-08-08 12:45:30", 1),
("System update completed successfully", "2023-08-08 13:30:00", 0),
("Error log: Stack trace included", "2023-08-08 14:10:50", 1),
("User activity: Admin accessed settings", "2023-08-08 15:00:10", 0),
("Unlabeled log: Further investigation needed", NULL, NULL),
("Security alert: Potential malware detected", "2023-08-08 16:25:35", 1),
("System shutdown initiated", "2023-08-08 17:10:00", 0);
Learn how to Train an Anomaly Detection Model.