{ "cells": [ { "cell_type": "markdown", "id": "9e256324", "metadata": {}, "source": [ "***\n", "# Building and Explaining a Regressor using AutoMLx\n", "
by the Oracle AutoMLx Team
\n", "\n", "***" ] }, { "cell_type": "markdown", "id": "8219b040", "metadata": {}, "source": [ "Regression Demo Notebook.\n", "\n", "Copyright © 2025, Oracle and/or its affiliates.\n", "\n", "Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/" ] }, { "cell_type": "markdown", "id": "7429c73d", "metadata": {}, "source": [ "## Overview of this Notebook\n", "\n", "In this notebook we will build a regressor using the Oracle AutoMLx tool for the public California Housing dataset to predict the value of house prices.\n", "We explore the various options provided by the Oracle AutoMLx tool, allowing the user to control the AutoMLx training process. We finally evaluate the different models trained by AutoMLx. Depending on the dataset size and the machine running it, it can take about tens of minutes. The dataset is sampled down for a snappier demo, with the option to run it with the full dataset. We finally provide an overview of the possibilities that Oracle AutoMLx provides for explaining the predictions of the tuned model.\n", "\n", "---\n", "## Prerequisites:\n", "\n", " - Experience level: Novice (Python and Machine Learning)\n", " - Professional experience: Some industry experience\n", "---\n", "\n", "## Business Use:\n", "\n", "Data analytics and modeling problems using Machine Learning (ML) are becoming popular and often rely on data science expertise to build accurate ML models. Such modeling tasks primarily involve the following steps:\n", "- Preprocess dataset (clean, impute, engineer features, normalize).\n", "- Pick an appropriate model for the given dataset and prediction task at hand.\n", "- Tune the chosen model’s hyperparameters for the given dataset.\n", "\n", "All of these steps are significantly time consuming and heavily rely on data scientist expertise. Unfortunately, to make this problem harder, the best feature subset, model, and hyperparameter choice widely varies with the dataset and the prediction task. Hence, there is no one-size-fits-all solution to achieve reasonably good model performance. Using a simple Python API, AutoML can quickly jump-start the datascience process with an accurately-tuned model and appropriate features for a given prediction task.\n", "\n", "## Table of Contents\n", "\n", "- Setup\n", "- Load California housing dataset\n", "- AutoML\n", " - Setting the execution engine\n", " - Create an Instance of AutoMLx\n", " - Train a Model using AutoMLx\n", " - Analyze the AutoMLx optimization process \n", " - Algorithm Selection\n", " - Adaptive Sampling\n", " - Feature Selection\n", " - Hyperparameter Tuning\n", " - Advanced AutoMLx Configuration \n", " - Use a custom validation set\n", "- Machine Learning Explainability (MLX)\n", " - Initialize an MLExplainer\n", " - Model Explanations (Global Feature Importance)\n", " - Feature Dependence Explanations (PDP + ICE)\n", " - Prediction Explanations (Local Feature Importance)\n", " - Aggregate Local Feature Importance\n", " - Interactive What-IF Explanations\n", " - Counterfactuals Explanations\n", " - Advanced Feature Importance Options\n", " - Configure prediction explanation\n", " - Explain the model or explain the world\n", " - Advanced Feature Dependence Options (ALE)\n", "- References" ] }, { "cell_type": "markdown", "id": "8e6581e4", "metadata": {}, "source": [ "\n", "## Setup\n", "\n", "Basic setup for the Notebook." ] }, { "cell_type": "code", "execution_count": 1, "id": "657a4697", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:36:33.861880Z", "iopub.status.busy": "2025-04-25T10:36:33.861443Z", "iopub.status.idle": "2025-04-25T10:36:34.477398Z", "shell.execute_reply": "2025-04-25T10:36:34.476766Z" } }, "outputs": [], "source": [ "\n", "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "id": "ec534f07", "metadata": {}, "source": [ "Load the required modules." ] }, { "cell_type": "code", "execution_count": 2, "id": "2012d12d", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:36:34.479458Z", "iopub.status.busy": "2025-04-25T10:36:34.479204Z", "iopub.status.idle": "2025-04-25T10:36:37.568973Z", "shell.execute_reply": "2025-04-25T10:36:37.568338Z" } }, "outputs": [], "source": [ "import time\n", "import datetime\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import plotly.figure_factory as ff\n", "from sklearn.datasets import fetch_california_housing\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.metrics import mean_squared_error\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression\n", "\n", "# Settings for plots\n", "plt.rcParams['figure.figsize'] = [10, 7]\n", "plt.rcParams['font.size'] = 15\n", "\n", "import automlx\n", "from automlx import init" ] }, { "cell_type": "markdown", "id": "c6587d8e", "metadata": {}, "source": [ "\n", "## Load the California housing dataset using sklearn.datasets\n", "\n", "Dataset details are available here: https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset. The goal is to predict the median price of a house given some features." ] }, { "cell_type": "code", "execution_count": 3, "id": "61f405a3", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:36:37.571130Z", "iopub.status.busy": "2025-04-25T10:36:37.570795Z", "iopub.status.idle": "2025-04-25T10:36:37.643865Z", "shell.execute_reply": "2025-04-25T10:36:37.643328Z" } }, "outputs": [ { "data": { "text/plain": [ "(20640, 9)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X, y = fetch_california_housing(return_X_y=True)\n", "ds = fetch_california_housing(return_X_y=False)\n", "df = pd.concat([pd.DataFrame(X, columns=ds.feature_names),\n", " pd.DataFrame(y.ravel(), columns=['Median Price'])], axis=1)\n", "\n", "target_col='Median Price'\n", "df.shape" ] }, { "cell_type": "code", "execution_count": 4, "id": "536b326c", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:36:37.645702Z", "iopub.status.busy": "2025-04-25T10:36:37.645227Z", "iopub.status.idle": "2025-04-25T10:36:37.679582Z", "shell.execute_reply": "2025-04-25T10:36:37.679046Z" } }, "outputs": [ { "data": { "text/html": [ "\n", " | MedInc | \n", "HouseAge | \n", "AveRooms | \n", "AveBedrms | \n", "Population | \n", "AveOccup | \n", "Latitude | \n", "Longitude | \n", "Median Price | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "8.3252 | \n", "41.0 | \n", "6.984127 | \n", "1.023810 | \n", "322.0 | \n", "2.555556 | \n", "37.88 | \n", "-122.23 | \n", "4.526 | \n", "
1 | \n", "8.3014 | \n", "21.0 | \n", "6.238137 | \n", "0.971880 | \n", "2401.0 | \n", "2.109842 | \n", "37.86 | \n", "-122.22 | \n", "3.585 | \n", "
2 | \n", "7.2574 | \n", "52.0 | \n", "8.288136 | \n", "1.073446 | \n", "496.0 | \n", "2.802260 | \n", "37.85 | \n", "-122.24 | \n", "3.521 | \n", "
3 | \n", "5.6431 | \n", "52.0 | \n", "5.817352 | \n", "1.073059 | \n", "558.0 | \n", "2.547945 | \n", "37.85 | \n", "-122.25 | \n", "3.413 | \n", "
4 | \n", "3.8462 | \n", "52.0 | \n", "6.281853 | \n", "1.081081 | \n", "565.0 | \n", "2.181467 | \n", "37.85 | \n", "-122.25 | \n", "3.422 | \n", "
\n", " |
---|
(14448, 8) | \n", "
None | \n", "
RepeatedKFoldSplit(Shuffle=True, Seed=7, number of splits=5, number of repeats=2) | \n", "
neg_mean_squared_error | \n", "
LGBMRegressor | \n", "
{'num_leaves': 31, 'boosting_type': 'gbdt', 'subsample': 1, 'colsample_bytree': 0.7952797110155084, 'max_depth': 63, 'reg_alpha': 0, 'reg_lambda': 0, 'n_estimators': 376, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "
25.2.1 | \n", "
3.9.21 (main, Dec 11 2024, 16:24:11) \\n[GCC 11.2.0] | \n", "
Step | \n", "# Samples | \n", "# Features | \n", "Algorithm | \n", "Hyperparameters | \n", "Score (neg_mean_squared_error) | \n", "All Metrics | \n", "Runtime (Seconds) | \n", "Memory Usage (GB) | \n", "Finished | \n", "
---|---|---|---|---|---|---|---|---|---|
Model Selection | \n", "{1: 4000, 2: 4000, 3: 4000, 4: 4000, 6: 4000, 5: 4000, 7: 4000, 8: 4000, 10: 4000, 9: 4000} | \n", "8 | \n", "LGBMRegressor | \n", "{'num_leaves': 31, 'boosting_type': 'gbdt', 'subsample': 1, 'colsample_bytree': 1, 'max_depth': 63, 'reg_alpha': 0, 'reg_lambda': 0, 'n_estimators': 100, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-0.249 | \n", "{'neg_mean_squared_error': -0.2490324005700733} | \n", "4.8204 | \n", "0.3142 | \n", "Fri Apr 25 03:36:57 2025 | \n", "
Model Selection | \n", "{1: 4000, 2: 4000, 3: 4000, 4: 4000, 6: 4000, 7: 4000, 5: 4000, 8: 4000, 10: 4000, 9: 4000} | \n", "8 | \n", "XGBRegressor | \n", "{'n_estimators': 100, 'min_child_weight': 1, 'reg_alpha': 0, 'booster': 'gbtree', 'max_depth': 6, 'learning_rate': 0.1, 'reg_lambda': 1} | \n", "-0.2605 | \n", "{'neg_mean_squared_error': -0.2604646831437227} | \n", "6.7072 | \n", "0.3330 | \n", "Fri Apr 25 03:37:02 2025 | \n", "
Model Selection | \n", "{1: 4000, 2: 4000, 3: 4000, 4: 4000, 5: 4000, 6: 4000, 7: 4000, 8: 4000, 9: 4000, 10: 4000} | \n", "8 | \n", "RandomForestRegressor | \n", "{'n_estimators': 100, 'min_samples_split': 0.0003, 'min_samples_leaf': 0.00015, 'max_features': 0.777777778} | \n", "-0.3021 | \n", "{'neg_mean_squared_error': -0.3021340026929764} | \n", "12.1182 | \n", "0.3436 | \n", "Fri Apr 25 03:36:59 2025 | \n", "
Model Selection | \n", "{1: 4000, 2: 4000, 4: 4000, 3: 4000, 6: 4000, 5: 4000, 7: 4000, 8: 4000, 9: 4000, 10: 4000} | \n", "8 | \n", "ExtraTreesRegressor | \n", "{'n_estimators': 100, 'min_samples_split': 0.00125, 'min_samples_leaf': 0.000625, 'max_features': 0.777777778} | \n", "-0.3025 | \n", "{'neg_mean_squared_error': -0.3025185088921889} | \n", "4.0168 | \n", "0.2891 | \n", "Fri Apr 25 03:36:57 2025 | \n", "
Model Selection | \n", "{1: 4000, 2: 4000, 3: 4000, 5: 4000, 4: 4000, 6: 4000, 7: 4000, 8: 4000, 9: 4000, 10: 4000} | \n", "8 | \n", "DecisionTreeRegressor | \n", "{'min_samples_split': 0.004, 'min_samples_leaf': 0.002, 'max_features': 1.0} | \n", "-0.4589 | \n", "{'neg_mean_squared_error': -0.4589419999457891} | \n", "2.5777 | \n", "0.2825 | \n", "Fri Apr 25 03:36:56 2025 | \n", "
Model Selection | \n", "{3: 4000, 2: 4000, 6: 4000, 1: 4000, 4: 4000, 5: 4000, 8: 4000, 7: 4000, 9: 4000, 10: 4000} | \n", "8 | \n", "AdaBoostRegressor | \n", "{'learning_rate': 0.667, 'n_estimators': 50} | \n", "-0.6038 | \n", "{'neg_mean_squared_error': -0.6038106242123971} | \n", "6.0882 | \n", "0.2775 | \n", "Fri Apr 25 03:36:56 2025 | \n", "
Model Selection | \n", "{2: 4000, 1: 4000, 6: 4000, 4: 4000, 3: 4000, 5: 4000, 7: 4000, 9: 4000, 10: 4000, 8: 4000} | \n", "8 | \n", "TorchMLPRegressor | \n", "{'optimizer_class': 'Adam', 'shuffle_dataset_each_epoch': True, 'optimizer_params': {}, 'criterion_class': None, 'criterion_params': {}, 'scheduler_class': None, 'scheduler_params': {}, 'batch_size': 128, 'lr': 0.001, 'epochs': 18, 'input_transform': 'auto', 'tensorboard_dir': None, 'use_tqdm': None, 'prediction_batch_size': 128, 'prediction_input_transform': 'auto', 'shuffling_buffer_size': None, 'depth': 4, 'num_logits': 1000, 'div_factor': 2, 'activation': 'ReLU', 'dropout': 0.1} | \n", "-2.4536 | \n", "{'neg_mean_squared_error': -2.4536325523119644} | \n", "114.0048 | \n", "0.6130 | \n", "Fri Apr 25 03:37:10 2025 | \n", "
Model Selection | \n", "{1: 4000, 2: 4000, 3: 4000, 4: 4000, 5: 4000, 6: 4000, 7: 4000, 8: 4000, 9: 4000, 10: 4000} | \n", "8 | \n", "LinearSVR | \n", "{'C': 1.0} | \n", "-3.3478 | \n", "{'neg_mean_squared_error': -3.347771732858513} | \n", "1.4416 | \n", "0.3151 | \n", "Fri Apr 25 03:36:58 2025 | \n", "
Model Selection | \n", "{1: 4000, 2: 4000, 3: 4000, 4: 4000, 5: 4000, 6: 4000, 7: 4000, 8: 4000, 9: 4000, 10: 4000} | \n", "8 | \n", "LinearRegression | \n", "{} | \n", "-3.5573 | \n", "{'neg_mean_squared_error': -3.5573203151697577} | \n", "0.6169 | \n", "0.3101 | \n", "Fri Apr 25 03:36:57 2025 | \n", "
Adaptive Sampling | \n", "{1: 11558, 2: 11558, 3: 11558, 4: 11559, 5: 11559, 6: 11558, 7: 11558, 8: 11558, 9: 11559, 10: 11559} | \n", "8 | \n", "AdaptiveSamplingStage_LGBMRegressor | \n", "{'num_leaves': 31, 'boosting_type': 'gbdt', 'subsample': 1, 'colsample_bytree': 1, 'max_depth': 63, 'reg_alpha': 0, 'reg_lambda': 0, 'n_estimators': 100, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-0.2183 | \n", "{'neg_mean_squared_error': -0.21828611974711234} | \n", "2.7049 | \n", "0.6072 | \n", "Fri Apr 25 03:37:13 2025 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
Model Tuning | \n", "{1: 11558, 2: 11558, 3: 11558, 4: 11559, 5: 11559, 6: 11558, 7: 11558, 8: 11558, 9: 11559, 10: 11559} | \n", "6 | \n", "LGBMRegressor | \n", "{'num_leaves': 7, 'boosting_type': 'gbdt', 'subsample': 0.4, 'colsample_bytree': 0.4, 'max_depth': 2, 'reg_alpha': 9.999999990000003e-07, 'reg_lambda': 1e-10, 'n_estimators': 5, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-1.1075 | \n", "{'neg_mean_squared_error': -1.1074975253749682} | \n", "0.9045 | \n", "0.6196 | \n", "Fri Apr 25 03:37:39 2025 | \n", "
Model Tuning | \n", "{1: 11558, 2: 11558, 3: 11558, 4: 11559, 5: 11559, 6: 11558, 7: 11558, 9: 11559, 8: 11558, 10: 11559} | \n", "6 | \n", "LGBMRegressor | \n", "{'num_leaves': 7, 'boosting_type': 'gbdt', 'subsample': 0.4, 'colsample_bytree': 0.4, 'max_depth': 2, 'reg_alpha': 1e-10, 'reg_lambda': 1.7784127779939314e-05, 'n_estimators': 5, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-1.1075 | \n", "{'neg_mean_squared_error': -1.1074975274208296} | \n", "0.8745 | \n", "0.6196 | \n", "Fri Apr 25 03:37:41 2025 | \n", "
Model Tuning | \n", "{1: 11558, 2: 11558, 3: 11558, 4: 11559, 5: 11559, 6: 11558, 7: 11558, 8: 11558, 9: 11559, 10: 11559} | \n", "6 | \n", "LGBMRegressor | \n", "{'num_leaves': 7, 'boosting_type': 'gbdt', 'subsample': 0.4, 'colsample_bytree': 0.4, 'max_depth': 2, 'reg_alpha': 1e-10, 'reg_lambda': 1.8784127778939314e-05, 'n_estimators': 5, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-1.1075 | \n", "{'neg_mean_squared_error': -1.1074975275461456} | \n", "0.7852 | \n", "0.6196 | \n", "Fri Apr 25 03:37:41 2025 | \n", "
Model Tuning | \n", "{1: 11558, 2: 11558, 3: 11558, 5: 11559, 4: 11559, 6: 11558, 7: 11558, 9: 11559, 8: 11558, 10: 11559} | \n", "6 | \n", "LGBMRegressor | \n", "{'num_leaves': 7, 'boosting_type': 'gbdt', 'subsample': 0.4, 'colsample_bytree': 0.4, 'max_depth': 2, 'reg_alpha': 1.7784127779939314e-05, 'reg_lambda': 1e-10, 'n_estimators': 5, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-1.1075 | \n", "{'neg_mean_squared_error': -1.1074975277105636} | \n", "0.8380 | \n", "0.6196 | \n", "Fri Apr 25 03:37:40 2025 | \n", "
Model Tuning | \n", "{1: 11558, 2: 11558, 3: 11558, 4: 11559, 5: 11559, 6: 11558, 7: 11558, 9: 11559, 8: 11558, 10: 11559} | \n", "6 | \n", "LGBMRegressor | \n", "{'num_leaves': 7, 'boosting_type': 'gbdt', 'subsample': 0.4, 'colsample_bytree': 0.4, 'max_depth': 2, 'reg_alpha': 1.8784127778939314e-05, 'reg_lambda': 1e-10, 'n_estimators': 5, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-1.1075 | \n", "{'neg_mean_squared_error': -1.1074975278443453} | \n", "0.9636 | \n", "0.6196 | \n", "Fri Apr 25 03:37:40 2025 | \n", "
Model Tuning | \n", "{1: 11558, 2: 11558, 3: 11558, 4: 11559, 5: 11559, 6: 11558, 7: 11558, 8: 11558, 9: 11559, 10: 11559} | \n", "6 | \n", "LGBMRegressor | \n", "{'num_leaves': 7, 'boosting_type': 'gbdt', 'subsample': 0.4, 'colsample_bytree': 0.4, 'max_depth': 2, 'reg_alpha': 1e-10, 'reg_lambda': 0.005623553830557401, 'n_estimators': 5, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-1.1075 | \n", "{'neg_mean_squared_error': -1.107498217910651} | \n", "0.7935 | \n", "0.6196 | \n", "Fri Apr 25 03:37:41 2025 | \n", "
Model Tuning | \n", "{1: 11558, 2: 11558, 3: 11558, 4: 11559, 5: 11559, 6: 11558, 7: 11558, 8: 11558, 9: 11559, 10: 11559} | \n", "6 | \n", "LGBMRegressor | \n", "{'num_leaves': 7, 'boosting_type': 'gbdt', 'subsample': 0.4, 'colsample_bytree': 0.4, 'max_depth': 2, 'reg_alpha': 1e-10, 'reg_lambda': 0.0056245538305564015, 'n_estimators': 5, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-1.1075 | \n", "{'neg_mean_squared_error': -1.1074982180406503} | \n", "0.8447 | \n", "0.6196 | \n", "Fri Apr 25 03:37:42 2025 | \n", "
Model Tuning | \n", "{1: 11558, 2: 11558, 3: 11558, 4: 11559, 5: 11559, 7: 11558, 6: 11558, 9: 11559, 8: 11558, 10: 11559} | \n", "6 | \n", "LGBMRegressor | \n", "{'num_leaves': 7, 'boosting_type': 'gbdt', 'subsample': 0.4, 'colsample_bytree': 0.4, 'max_depth': 2, 'reg_alpha': 0.005623553830557401, 'reg_lambda': 1e-10, 'n_estimators': 5, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-1.1075 | \n", "{'neg_mean_squared_error': -1.1074982822779325} | \n", "0.8752 | \n", "0.6196 | \n", "Fri Apr 25 03:37:40 2025 | \n", "
Model Tuning | \n", "{2: 11558, 1: 11558, 3: 11558, 4: 11559, 5: 11559, 6: 11558, 7: 11558, 8: 11558, 9: 11559, 10: 11559} | \n", "6 | \n", "LGBMRegressor | \n", "{'num_leaves': 7, 'boosting_type': 'gbdt', 'subsample': 0.4, 'colsample_bytree': 0.4, 'max_depth': 2, 'reg_alpha': 0.0056245538305564015, 'reg_lambda': 1e-10, 'n_estimators': 5, 'learning_rate': 0.1, 'min_child_weight': 0.001} | \n", "-1.1075 | \n", "{'neg_mean_squared_error': -1.1074982824185997} | \n", "0.9160 | \n", "0.6196 | \n", "Fri Apr 25 03:37:40 2025 | \n", "
Model Tuning | \n", "None | \n", "0 | \n", "LGBMRegressor | \n", "None | \n", "-inf | \n", "None | \n", "2.0835 | \n", "0.6214 | \n", "-1 | \n", "
\n", " | Feature | \n", "Attribution | \n", "Lower Bound | \n", "Upper Bound | \n", "
---|---|---|---|---|
0 | \n", "Latitude | \n", "1.254504 | \n", "1.231705 | \n", "1.277303 | \n", "
1 | \n", "Longitude | \n", "1.074243 | \n", "1.027264 | \n", "1.121221 | \n", "
2 | \n", "MedInc | \n", "0.370338 | \n", "0.353111 | \n", "0.387564 | \n", "
3 | \n", "AveOccup | \n", "0.151174 | \n", "0.142189 | \n", "0.160159 | \n", "
4 | \n", "AveRooms | \n", "0.116548 | \n", "0.114087 | \n", "0.119010 | \n", "
5 | \n", "HouseAge | \n", "0.064133 | \n", "0.060462 | \n", "0.067804 | \n", "
\n | MedInc | \nHouseAge | \nAveRooms | \nAveBedrms | \nPopulation | \nAveOccup | \nLatitude | \nLongitude | \n
---|---|---|---|---|---|---|---|---|
Original Sample | \n2.0278 | \n31.0 | \n2.8469284994964754 | \n1.0916414904330312 | \n3107.0 | \n3.1289023162134946 | \n34.06 | \n-118.31 | \n
Modified Sample | \n2.0278 | \n31.0 | \n2.8469284994964754 | \n1.0916414904330312 | \n3107.0 | \n3.1289023162134946 | \n34.06 | \n-118.31 | \n
\n | MedInc | \nHouseAge | \nAveRooms | \nAveBedrms | \nPopulation | \nAveOccup | \nLatitude | \nLongitude | \n
---|---|---|---|---|---|---|---|---|
0 | \n3.8214 | \n36.0 | \n5.600823 | \n1.059671 | \n1390.0 | \n2.860082 | \n33.95 | \n-118.04 | \n
1 | \n2.0938 | \n16.0 | \n6.444444 | \n1.833333 | \n123.0 | \n2.277778 | \n32.75 | \n-115.72 | \n
2 | \n2.9798 | \n17.0 | \n5.530488 | \n0.969512 | \n1672.0 | \n3.398374 | \n35.02 | \n-120.48 | \n
3 | \n5.5385 | \n43.0 | \n6.972093 | \n1.093023 | \n970.0 | \n2.255814 | \n37.93 | \n-122.54 | \n
4 | \n4.9444 | \n16.0 | \n5.533981 | \n0.878641 | \n627.0 | \n3.043689 | \n34.11 | \n-117.39 | \n
5 | \n10.5793 | \n16.0 | \n7.750000 | \n1.062500 | \n567.0 | \n2.725962 | \n37.27 | \n-122.06 | \n
6 | \n8.0095 | \n52.0 | \n6.252577 | \n1.000000 | \n503.0 | \n2.592784 | \n34.06 | \n-118.39 | \n
7 | \n3.4167 | \n36.0 | \n5.196347 | \n1.041096 | \n725.0 | \n3.310502 | \n33.86 | \n-117.99 | \n
8 | \n2.0221 | \n25.0 | \n4.728597 | \n1.014572 | \n1536.0 | \n2.797814 | \n40.45 | \n-122.31 | \n
9 | \n3.8307 | \n33.0 | \n5.728440 | \n1.062385 | \n1733.0 | \n3.179817 | \n35.38 | \n-118.92 | \n
\n | Prediction (True value: 3.6) | \n
---|---|
Original Sample | \n2.8988 | \n
Modified Sample | \n2.8988 | \n