{ "cells": [ { "cell_type": "markdown", "id": "81880ec8", "metadata": {}, "source": [ "\n", "
by the Oracle AutoMLx Team
\n", "\n", "***" ] }, { "cell_type": "markdown", "id": "ed4786de", "metadata": {}, "source": [ "Recommendation Demo Notebook.\n", "\n", "Copyright © 2025, Oracle and/or its affiliates.\n", "\n", "Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/" ] }, { "cell_type": "markdown", "id": "96fd4d93", "metadata": {}, "source": [ "# Overview of this Notebook\n", "\n", "In this notebook we will build a recommender using the Oracle AutoMLx tool for the Movielens 100k dataset to predict the next item that users will most likely watch, based on their ratings history.\n", "We explore the various options provided by the Oracle AutoMLx tool, allowing the user to control the AutoMLx training process. We finally evaluate the different models trained by AutoMLx. Depending on the machine running this notebook, it can take up to minutes.\n", "\n", "---\n", "## Prerequisites:\n", "\n", " - Experience level: Novice (Python and Machine Learning)\n", " - Professional experience: Some industry experience\n", "---\n", "\n", "## Business Use:\n", "\n", "Data analytics and modeling problems using Machine Learning (ML) are becoming popular and often rely on data science expertise to build accurate ML models. Such modeling tasks primarily involve the following steps:\n", "- Preprocess dataset (clean, impute, engineer features, normalize).\n", "- Pick an appropriate model for the given dataset and prediction task at hand.\n", "- Tune the chosen model’s hyperparameters for the given dataset.\n", "\n", "All of these steps are significantly time consuming and heavily rely on data scientist expertise. Unfortunately, to make this problem harder, the best feature subset, model, and hyperparameter choice widely varies with the dataset and the prediction task. Hence, there is no one-size-fits-all solution to achieve reasonably good model performance. Using a simple Python API, AutoML can quickly jump-start the datascience process with an accurately-tuned model and appropriate features for a given prediction task.\n", "\n", "## Table of Contents\n", "\n", "- Setup\n", "- Load the Movielens 100k dataset\n", " - Define the column types\n", " - Splitting the dataset\n", "- AutoML\n", " - Create an Instance of AutoMLx\n", " - Train a Model using AutoMLx\n", " - Generate recommendations \n", " - Analyze the AutoMLx optimization process \n", " - Algorithm Selection\n", " - Hyperparameter Tuning\n", " - Advanced AutoMLx Configuration\n", " - Use a custom validation set\n", " - Final evaluation of the best model\n", "\n", "\n", "## Setup\n", "\n", "Basic setup for the Notebook." ] }, { "cell_type": "code", "execution_count": 1, "id": "f21246b4", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:34:49.524994Z", "iopub.status.busy": "2025-04-25T10:34:49.524508Z", "iopub.status.idle": "2025-04-25T10:34:56.014433Z", "shell.execute_reply": "2025-04-25T10:34:56.013264Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2025-04-25 03:34:51,937] [automlx.backend] Overwriting ray session directory to /tmp/1frrc9a7/ray, which will be deleted at engine shutdown. If you wish to retain ray logs, provide _temp_dir in ray_setup dict of engine_opts when initializing the AutoMLx engine.\n" ] } ], "source": [ "\n", "\n", "import datetime\n", "import logging\n", "import os\n", "import time\n", "import urllib\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from automlx import AutoRecommender, init\n", "\n", "# Settings for plots\n", "plt.rcParams[\"figure.figsize\"] = [10, 7]\n", "plt.rcParams[\"font.size\"] = 15\n", "\n", "# Silence unnecessary warnings\n", "logging.getLogger(\"sanerec.autotuning.parameter\").setLevel(logging.ERROR)\n", "\n", "# Initialize the parallelization engine of AutoMLx\n", "init(engine='ray', engine_opts={\"ray_setup\": {\"log_to_driver\": False}})" ] }, { "cell_type": "markdown", "id": "60de0bc8", "metadata": {}, "source": [ "\n", "## Load Movielens 100k data\n", "Movielens 100k dataset is one of the most common public datasets for movie recommendation. It contains 100k ratings from about 1k users on 1.6k movies, some information about user demographic, and additional movie characteristics. For more information about this dataset, you can visit the [Movielens website](https://grouplens.org/datasets/movielens/100k/).\n", "\n", "In this demo, we use the ratings to train a movie recommendation model, exploiting AutoMLx to find the best recommendation model and hyperparameters to use in terms of recommendation accuracy.\n", "Therefore, we start retrieving and loading the ratings data of the Movielens 100k dataset.\n", "To make this notebook lighter and quicker, we also subsample the ratings in the dataset, keeping only 50%." ] }, { "cell_type": "code", "execution_count": 2, "id": "10d71715", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:34:56.019370Z", "iopub.status.busy": "2025-04-25T10:34:56.018522Z", "iopub.status.idle": "2025-04-25T10:35:01.532354Z", "shell.execute_reply": "2025-04-25T10:35:01.531578Z" }, "lines_to_next_cell": 2 }, "outputs": [], "source": [ "\n", "\n", "get_ipython().system(' wget https://files.grouplens.org/datasets/movielens/ml-100k/u.data --no-check-certificate -q -O ./ml100k_interactions.tsv')" ] }, { "cell_type": "code", "execution_count": 3, "id": "d5f67d34", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:35:01.534915Z", "iopub.status.busy": "2025-04-25T10:35:01.534278Z", "iopub.status.idle": "2025-04-25T10:35:01.580798Z", "shell.execute_reply": "2025-04-25T10:35:01.580270Z" } }, "outputs": [ { "data": { "text/html": [ "\n", " | user_id | \n", "movie_id | \n", "rating | \n", "timestamp | \n", "
---|---|---|---|---|
43660 | \n", "508 | \n", "185 | \n", "5 | \n", "883777430 | \n", "
87278 | \n", "518 | \n", "742 | \n", "5 | \n", "876823804 | \n", "
14317 | \n", "178 | \n", "28 | \n", "5 | \n", "882826806 | \n", "
81932 | \n", "899 | \n", "291 | \n", "4 | \n", "884122279 | \n", "
95321 | \n", "115 | \n", "117 | \n", "4 | \n", "881171009 | \n", "
\n", " | user_id | \n", "movie_id | \n", "rating | \n", "
---|---|---|---|
timestamp | \n", "\n", " | \n", " | \n", " |
883777430 | \n", "508 | \n", "185 | \n", "5 | \n", "
876823804 | \n", "518 | \n", "742 | \n", "5 | \n", "
882826806 | \n", "178 | \n", "28 | \n", "5 | \n", "
884122279 | \n", "899 | \n", "291 | \n", "4 | \n", "
881171009 | \n", "115 | \n", "117 | \n", "4 | \n", "
\n", " | user_id | \n", "movie_id | \n", "score | \n", "
---|---|---|---|
0 | \n", "628 | \n", "330 | \n", "15.370814 | \n", "
1 | \n", "628 | \n", "286 | \n", "15.029380 | \n", "
2 | \n", "628 | \n", "258 | \n", "14.119169 | \n", "
3 | \n", "628 | \n", "272 | \n", "13.703196 | \n", "
4 | \n", "628 | \n", "313 | \n", "13.564656 | \n", "
\n", " |
---|
None | \n", "
None | \n", "
ManualSplit(Shuffle=False, Seed=7) | \n", "
SanerecMetric | \n", "
ItemKNNRecommender | \n", "
{'n_recommendations': 10, 'num_of_neighbors': 506, 'bias': 0.0001, 'hist_len': 10, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "
25.2.1 | \n", "
3.9.21 (main, Dec 11 2024, 16:24:11) \\n[GCC 11.2.0] | \n", "
Step | \n", "# Samples | \n", "# Features | \n", "Algorithm | \n", "Hyperparameters | \n", "Score (SanerecMetric) | \n", "All Metrics | \n", "Runtime (Seconds) | \n", "Memory Usage (GB) | \n", "Finished | \n", "
---|---|---|---|---|---|---|---|---|---|
Model Selection | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 100, 'bias': 25, 'hist_len': 20, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0882 | \n", "{'hr': 0.08820403825717323} | \n", "1.0319 | \n", "0.7063 | \n", "Fri Apr 25 03:35:20 2025 | \n", "
Model Selection | \n", "48114 | \n", "2 | \n", "AlsRecommender | \n", "{'n_recommendations': 10, 'iterations': 10, 'factors': 16, 'regularization': 0.01, 'cache_users_states': True} | \n", "0.0691 | \n", "{'hr': 0.06907545164718384} | \n", "4.5845 | \n", "0.7055 | \n", "Fri Apr 25 03:35:19 2025 | \n", "
Model Selection | \n", "48114 | \n", "2 | \n", "TRexxRecommender | \n", "{'n_recommendations': 10, 'embedding_dim': 32, 'sequence_length': 5, 'num_sampled': 100, 'dropout_rate': 0.2, 'num_blocks': 2, 'num_head': 4, 'l2_reg_embedding': 1e-06, 'dnn_activation': 'tanh', 'optimizer_name': 'lazyadam', 'optimizer_learning_rate': 0.01, 'future_blinding': False, 'embeddings_on_cpu': False, 'cache_users_states': False, 'negative_sampling_method': CandidateSamplingMethod.UNIFORM_CANDIDATE_SAMPLING, 'epochs': 10, 'batch_size': 512, 'verbose': 1, 'augment_data': True, 'early_stopping_patience': -1} | \n", "0.0531 | \n", "{'hr': 0.053134962805526036} | \n", "35.1017 | \n", "1.1860 | \n", "Fri Apr 25 03:35:56 2025 | \n", "
Model Selection | \n", "48114 | \n", "2 | \n", "BprRecommender | \n", "{'n_recommendations': 10, 'iterations': 10, 'factors': 16, 'regularization': 0.01, 'cache_users_states': True} | \n", "0.0372 | \n", "{'hr': 0.03719447396386823} | \n", "0.4148 | \n", "0.7046 | \n", "Fri Apr 25 03:35:21 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 505, 'bias': 0.0001, 'hist_len': 10, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0999 | \n", "{'hr': 0.09989373007438895} | \n", "1.3955 | \n", "0.6848 | \n", "Fri Apr 25 03:36:10 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 506, 'bias': 0.0001, 'hist_len': 10, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0999 | \n", "{'hr': 0.09989373007438895} | \n", "1.0910 | \n", "0.6842 | \n", "Fri Apr 25 03:36:12 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 506, 'bias': 0.0001, 'hist_len': 10, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0999 | \n", "{'hr': 0.09989373007438895} | \n", "1.3582 | \n", "0.6869 | \n", "Fri Apr 25 03:36:09 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 10, 'bias': 28.25660795027468, 'hist_len': 10, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0956 | \n", "{'hr': 0.09564293304994687} | \n", "1.4462 | \n", "0.6762 | \n", "Fri Apr 25 03:36:08 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 10, 'bias': 28.26160794927468, 'hist_len': 10, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0956 | \n", "{'hr': 0.09564293304994687} | \n", "1.1324 | \n", "0.6785 | \n", "Fri Apr 25 03:36:10 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 10, 'bias': 28.26160794927468, 'hist_len': 10, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0956 | \n", "{'hr': 0.09564293304994687} | \n", "1.3547 | \n", "0.6758 | \n", "Fri Apr 25 03:36:08 2025 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 10, 'bias': 0.0001, 'hist_len': 20, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.084 | \n", "{'hr': 0.08395324123273114} | \n", "1.3191 | \n", "0.6742 | \n", "Fri Apr 25 03:36:08 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 10, 'bias': 0.0001, 'hist_len': 21, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.084 | \n", "{'hr': 0.08395324123273114} | \n", "1.3325 | \n", "0.6755 | \n", "Fri Apr 25 03:36:08 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 505, 'bias': 25, 'hist_len': 20, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0829 | \n", "{'hr': 0.08289054197662062} | \n", "1.1355 | \n", "1.1758 | \n", "Fri Apr 25 03:36:01 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 506, 'bias': 25, 'hist_len': 20, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0829 | \n", "{'hr': 0.08289054197662062} | \n", "1.3934 | \n", "1.1758 | \n", "Fri Apr 25 03:36:02 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 752, 'bias': 25, 'hist_len': 20, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0818 | \n", "{'hr': 0.0818278427205101} | \n", "1.4294 | \n", "1.1812 | \n", "Fri Apr 25 03:36:05 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 753, 'bias': 25, 'hist_len': 20, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0818 | \n", "{'hr': 0.0818278427205101} | \n", "1.2525 | \n", "1.1812 | \n", "Fri Apr 25 03:36:04 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 10, 'bias': 0.0001, 'hist_len': 132, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0797 | \n", "{'hr': 0.07970244420828905} | \n", "1.4449 | \n", "0.6758 | \n", "Fri Apr 25 03:36:09 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 10, 'bias': 0.0001, 'hist_len': 133, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0797 | \n", "{'hr': 0.07970244420828905} | \n", "1.3044 | \n", "0.6773 | \n", "Fri Apr 25 03:36:08 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 10, 'bias': 0.0001, 'hist_len': 255, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0797 | \n", "{'hr': 0.07970244420828905} | \n", "1.4471 | \n", "0.6760 | \n", "Fri Apr 25 03:36:09 2025 | \n", "
Model Tuning | \n", "48114 | \n", "2 | \n", "ItemKNNRecommender | \n", "{'n_recommendations': 10, 'num_of_neighbors': 10, 'bias': 0.0001, 'hist_len': 256, 'reciprocal_ranking': False, 'normalize_scores': False, 'cache_users_states': True} | \n", "0.0797 | \n", "{'hr': 0.07970244420828905} | \n", "1.4831 | \n", "0.6730 | \n", "Fri Apr 25 03:36:09 2025 | \n", "
\n", " | user_id | \n", "movie_id | \n", "score | \n", "
---|---|---|---|
0 | \n", "628 | \n", "286 | \n", "13.964525 | \n", "
1 | \n", "628 | \n", "330 | \n", "13.761287 | \n", "
2 | \n", "628 | \n", "272 | \n", "12.332191 | \n", "
3 | \n", "628 | \n", "331 | \n", "12.210435 | \n", "
4 | \n", "628 | \n", "313 | \n", "12.163628 | \n", "