MySQL HeatWave User Guide
This topic describes how to prepare the data to use for a classification machine learning model. It uses a data sample generated by OCI GenAI. The classification use-case is to approve or reject loan applications for clients based on their personal and socioeconomic status, assets, liabilities, credit rating, and past loan details. To prepare the data for this use case, you set up a training dataset and a testing dataset. The training dataset has 20 records, and the testing dataset has 10 records. In a real-life use case, you should prepare a larger amount of records for training and testing, and ensure the predictions are valid and reliable before testing on unlabeled data. To ensure reliable predictions, you should create an additional validation dataset. You can reserve 20% of the records in the training dataset to create the validation dataset.
Learn how to Prepare Data.
To prepare the data for the classification model:
Create and use the database to store the data.
mysql>CREATE DATABASE classification_data;
mysql>USE classification_data;
Create the table to insert the sample data into. This is the training dataset.
mysql> CREATE TABLE Loan_Training (
ClientID INT PRIMARY KEY,
ClientAge INT NOT NULL,
Gender VARCHAR(10) NOT NULL,
Education VARCHAR(50) NOT NULL,
Occupation VARCHAR(50) NOT NULL,
Income REAL NOT NULL,
Debt REAL NOT NULL,
CreditScore INT NOT NULL,
Assets REAL NOT NULL,
Liabilities REAL NOT NULL,
LoanType VARCHAR(20) NOT NULL,
LoanAmount REAL NOT NULL,
Approved VARCHAR(10) NOT NULL
);
Insert the sample data into the table. Copy and paste the following commands.
INSERT INTO Loan_Training (ClientID, ClientAge, Gender, Education, Occupation, Income, Debt, CreditScore, Assets, Liabilities, LoanType, LoanAmount, Approved) VALUES
(101, 30, 'Male', 'Bachelor''s', 'Engineer', 75000, 15000, 700, 300000, 80000, 'Home', 250000, 'Approved'),
(102, 25, 'Female', 'Master''s', 'Analyst', 60000, 10000, 680, 200000, 50000, 'Personal', 120000, 'Rejected'),
(103, 40, 'Male', 'High School', 'Manager', 80000, 20000, 650, 450000, 120000, 'Business', 150000, 'Approved'),
(104, 35, 'Female', 'PhD', 'Doctor', 120000, 30000, 750, 600000, 250000, 'Car', 30000, 'Approved'),
(105, 28, 'Male', 'College', 'IT Specialist', 55000, 8000, 620, 280000, 90000, 'Education', 80000, 'Rejected'),
(106, 45, 'Female', 'Bachelor''s', 'Teacher', 70000, 15000, 720, 500000, 180000, 'Home', 200000, 'Approved'),
(107, 32, 'Male', 'Associate', 'Sales', 65000, 12000, 670, 350000, 100000, 'Vacation', 18000, 'Rejected'),
(108, 22, 'Female', 'College', 'Student', 30000, 5000, 660, 150000, 40000, 'Education', 10000, 'Approved'),
(109, 50, 'Male', 'Master''s', 'Lawyer', 110000, 40000, 780, 700000, 350000, 'Investment', 500000, 'Rejected'),
(110, 38, 'Female', 'High School', 'Nurse', 52000, 18000, 640, 220000, 120000, 'Medical', 35000, 'Approved'),
(111, 48, 'Male', 'Diploma', 'Plumber', 48000, 10000, 600, 180000, 70000, 'Home Improvement', 25000, 'Rejected'),
(112, 55, 'Female', 'Bachelor''s', 'Writer', 90000, 25000, 760, 400000, 200000, 'Retirement', 150000, 'Approved'),
(113, 36, 'Male', 'Master''s', 'Accountant', 78000, 22000, 740, 380000, 150000, 'Refinance', 200000, 'Approved'),
(114, 24, 'Female', 'College', 'Designer', 45000, 7000, 610, 250000, 100000, 'Startup', 50000, 'Rejected'),
(115, 42, 'Male', 'PhD', 'Scientist', 130000, 50000, 800, 550000, 300000, 'Research', 400000, 'Approved'),
(116, 52, 'Female', 'Master''s', 'Marketer', 85000, 35000, 770, 480000, 280000, 'Marketing', 120000, 'Rejected'),
(117, 34, 'Male', 'Bachelor''s', 'Programmer', 68000, 16000, 690, 320000, 110000, 'Equipment', 85000, 'Approved'),
(118, 26, 'Female', 'Associate', 'Retail', 42000, 6000, 630, 200000, 70000, 'Wedding', 28000, 'Rejected'),
(119, 46, 'Male', 'College', 'Pilot', 100000, 45000, 710, 520000, 250000, 'Boat', 350000, 'Approved'),
(120, 58, 'Female', 'PhD', 'Professor', 140000, 60000, 820, 650000, 450000, 'Real Estate', 550000, 'Rejected');
Create the table to use for generating predictions and
explanations. This is the test dataset. It has the same
columns as the training dataset, but the target column,
Approved
, is not considered when
generating predictions or explanations.
mysql> CREATE TABLE Loan_Testing (
ClientID INT PRIMARY KEY,
ClientAge INT NOT NULL,
Gender VARCHAR(10) NOT NULL,
Education VARCHAR(50) NOT NULL,
Occupation VARCHAR(50) NOT NULL,
Income REAL NOT NULL,
Debt REAL NOT NULL,
CreditScore INT NOT NULL,
Assets REAL NOT NULL,
Liabilities REAL NOT NULL,
LoanType VARCHAR(20) NOT NULL,
LoanAmount REAL NOT NULL,
Approved VARCHAR(10) NOT NULL
);
Insert the sample data into the table. Copy and paste the following commands.
INSERT INTO Loan_Testing (ClientID, ClientAge, Gender, Education, Occupation, Income, Debt, CreditScore, Assets, Liabilities, LoanType, LoanAmount, Approved) VALUES
(201, 38, 'Male', 'College', 'Architect', 62000, 18000, 660, 380000, 160000, 'Home', 280000, 'Approved'),
(202, 29, 'Female', 'Master''s', 'HR Manager', 58000, 12000, 690, 260000, 110000, 'Personal', 150000, 'Rejected'),
(203, 44, 'Male', 'Bachelor''s', 'Chef', 72000, 25000, 730, 420000, 200000, 'Business', 180000, 'Approved'),
(204, 56, 'Female', 'PhD', 'Psychologist', 105000, 35000, 790, 580000, 320000, 'Car', 40000, 'Rejected'),
(205, 31, 'Male', 'High School', 'Carpenter', 50000, 8000, 610, 240000, 85000, 'Education', 90000, 'Approved'),
(206, 27, 'Female', 'College', 'Artist', 48000, 7000, 640, 220000, 95000, 'Art', 150000, 'Rejected'),
(207, 49, 'Male', 'Associate', 'Electrician', 55000, 15000, 670, 300000, 120000, 'Home Improvement', 20000, 'Approved'),
(208, 53, 'Female', 'Bachelor''s', 'Journalist', 88000, 30000, 750, 460000, 280000, 'Travel', 180000, 'Rejected'),
(209, 37, 'Male', 'Master''s', 'Financial Advisor', 76000, 22000, 700, 360000, 150000, 'Investment', 250000, 'Approved'),
(210, 23, 'Female', 'College', 'Intern', 35000, 5000, 600, 160000, 60000, 'Education', 20000, 'Rejected');
Learn how to Train a Classification Model.