Machine Learning
Configuring Machine Learning: Product Classification Prediction
In this chapter contains information to help you better understand a machine learning model and is broken down as follows.
- Creating a Model
- Interpreting the Training Results
- Interpreting the Prediction Results
- Understanding the Event Model
Creating a Product Classification Model
What’s the necessary amount of shipment history for training a model?
It depends, but the minimum is several months of data. It is generally recommended to use at least 1 years’ worth of shipment history, especially if there are seasonal patterns that you would like the model to capture. However, some customers have seen decent results with just several months’ of data. You should run a few tests to find out the optimal setting.
How to tune a model?
Only one Product Class Type per scenario
A Machine Learning Scenario must include items with only one unique GTM_PROD_CLASS_TYPE_GID.
Minimum frequency for 6-digit HS code prefixes. Each 6-digit HS code prefix must appear at least 3 times in the dataset. Prefixes with fewer than 3 occurrences will be removed.
Sufficient items per product classification code. Product codes with only 1 associated item will be excluded.
Limit columns with high null values. Columns with more than 95% null values will be dropped. Ensure important columns have sufficient non-null entries if they are meant to influence model training.
Train/test split with stratification (80/20). Stratified sampling is used to maintain class distribution in both training and testing datasets.
Example: For 5 classes with 100 total items, Train: 80 items (16 per class), Test: 20 items (4 per class)
Sufficient data relative to number of classes. Ensure the test/train size ≥ number of classes. Solution: Add more data to meet the minimum row-to-class ratio.