<font color=gray>Oracle Cloud Infrastructure Data Science Sample Notebook

Copyright (c) 2021 Oracle, Inc.  All rights reserved. <br>
Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl.
</font>

# Deploying a Simple Sklearn Linear Regression Model

In this tutorial we are going to prepare and save an sklearn model artifact using the ADS generic method and deploy the model as an HTTP endpoint. 

## Pre-requisites to Running this Notebook 

* We recommend that you run this notebook in a notebook session using the **Data Science Conda Environment "General Machine Learning for CPU (v1.0)"** 
* You need access to the public internet
* **Upgrade the current version of the OCI Python SDK** (`oci`)

In [None]:
!pip install --upgrade oci

In [None]:
import oci
import ads
import json
import joblib
import logging
import os
import pandas as pd
import tempfile
import warnings
from os import path
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from ads.common.model_export_util import prepare_generic_model
import time
import cloudpickle

logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)
warnings.filterwarnings('ignore')
ads.set_documentation_mode(False)

We're going to load a simple dataset about the housing market in the US and predict the house price. 

In [None]:
ds = pd.read_csv("https://objectstorage.us-ashburn-1.oraclecloud.com/n/bigdatadatasciencelarge/b/hosted-ds-datasets/o/others%2Fusa_housing_lite.csv")
X = ds[['avg_area_income', 'avg_area_house_age', 'avg_area_number_of_rooms',
        'avg_area_number_of_bedrooms', 'area_population']]
y = ds['price']
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.4, random_state=101)

Using sklearn `LinearRegression()` algorithm: 

In [None]:
lrm = LinearRegression().fit(X_train, y_train)
lrm.predict(X_test)

Here we are using the "General Machine Learning for CPU" Data Science conda environment. Since we don't modify the conda environment we don't need to publish it. We can use "General Machine learning for CPU (v1.0)" for model deployment as well. Thus we'll set `data_science_env=True` when preparing the artifact with ADS. 

Here we are using the `prepare_generic_model()` method to prepare the artifact. 

In [None]:
# Prepare the model artifact template
path_to_model_artifacts = "linear_regression_generic_artifacts"
generic_model_artifact = prepare_generic_model(
    path_to_model_artifacts, 
    force_overwrite=True,
    function_artifacts=False,
    data_science_env=True)

# Serialize the model
with open(path.join(path_to_model_artifacts, "model.pkl"), "wb") as outfile:
    cloudpickle.dump(lrm, outfile)

# List the template files
print(f"Model Artifact Path: {path_to_model_artifacts}\n\nModel Artifact Files:")
for file in os.listdir(path_to_model_artifacts):
    if path.isdir(path.join(path_to_model_artifacts, file)):
        for file2 in os.listdir(path.join(path_to_model_artifacts, file)):
            print(path.join(file, file2))
    else:
        print(file)

We are going to make a few changes to the `score.py` template that ADS generates. We are simply uncommenting the following three lines in `predict()` : 

```
    # from pandas import read_json, DataFrame
    # from io import StringIO
    # X = read_json(StringIO(data)) if isinstance(data, str) else DataFrame.from_dict(data)
```
and changing `data` for `X` in `model.predict()`. 

In [None]:
%%writefile {path_to_model_artifacts}/score.py

import json
import os
from cloudpickle import cloudpickle


model_name = 'model.pkl'


"""
   Inference script. This script is used for prediction by scoring server when schema is known.
"""


def load_model(model_file_name=model_name):
    """
    Loads model from the serialized format

    Returns
    -------
    model:  a model instance on which predict API can be invoked
    """
    model_dir = os.path.dirname(os.path.realpath(__file__))
    contents = os.listdir(model_dir)
    if model_file_name in contents:
        with open(os.path.join(os.path.dirname(os.path.realpath(__file__)), model_file_name), "rb") as file:
            return cloudpickle.load(file)
    else:
        raise Exception('{0} is not found in model directory {1}'.format(model_file_name, model_dir))


def predict(data, model=load_model()):
    """
    Returns prediction given the model and data to predict

    Parameters
    ----------
    model: Model instance returned by load_model API
    data: Data format as expected by the predict API of the core estimator. For eg. in case of sckit models it could be numpy array/List of list/Panda DataFrame

    Returns
    -------
    predictions: Output from scoring server
        Format: {'prediction':output from model.predict method}

    """
    
    from pandas import read_json, DataFrame
    from io import StringIO
    X = read_json(StringIO(data)) if isinstance(data, str) else DataFrame.from_dict(data)
    return {'prediction':model.predict(X).tolist()}

In [None]:
project_id = os.environ['PROJECT_OCID'] 
compartment_id = os.environ['NB_SESSION_COMPARTMENT_OCID']

mc_model = generic_model_artifact.save(project_id=project_id,
                                       compartment_id=compartment_id,
                                       display_name="USA Housing Lin Reg (Model Deployment Test) - Generic",
                                       description="Testing USA Housing Lin Reg model (Generic) deployment",
                                       ignore_pending_changes=True)

In [None]:
# Print published model information
mc_model

## Deploying the model with Model Deployment

We are ready to deploy `mc_model`. We are using the user principal (config+key) method of authentication. Alternatively you can use resource principal. 

In [None]:
# Getting OCI config information
# oci_config = pd.read_csv("/home/datascience/.oci/config", delimiter="=", header = 0).to_dict()['[DEFAULT]']
oci_config = oci.config.from_file("~/.oci/config", "DEFAULT")
# Setting up DataScience instance
data_science = oci.data_science.DataScienceClient(oci_config)
# Setting up data science composite client to unlock wait_for_state operations
data_science_composite = oci.data_science.DataScienceClientCompositeOperations(data_science)

In [None]:
# Prepareing model deployment data
model_deployment_details = {
    "displayName": "Model Deployment NB Test USA Housing Lin Reg - PKL",
    "projectId": mc_model.project_id,
    "compartmentId": mc_model.compartment_id,
    "modelDeploymentConfigurationDetails": {
        "deploymentType": "SINGLE_MODEL",
        "modelConfigurationDetails": {
            "modelId": mc_model.id,
            "instanceConfiguration": {
                "instanceShapeName": "VM.Standard2.4"
            },
            "scalingPolicy": {
                "policyType": "FIXED_SIZE",
                "instanceCount": 1
            },
            "bandwidthMbps": 10
        }
    },
    "categoryLogDetails": None
}

Let's deploy the model! 

In [None]:
%%time 

model_deployment = data_science_composite.create_model_deployment_and_wait_for_state(model_deployment_details,
                                                                                     wait_for_states=["SUCCEEDED",
                                                                                                      "FAILED"])

This cell extract from the `model_deployment` object a series of useful diagnostics about the creation of the model deployment resource: 

In [None]:
print("Grabbing the model deployment ocid...")
model_deployment_data = json.loads(str(model_deployment.data))
model_deployment_id = model_deployment_data['resources'][0]['identifier']
print(f"Model deployment ocid: {model_deployment_id}")

print("Checking for the correct response status code...")
if model_deployment.status == 200:
    print(f"Work request status code returned: {model_deployment.status}")
    print("Checking for non-empty response data...")
    if model_deployment.data:
        print(f"Data returned: {model_deployment.data}")
        print("Grabbing the model deployment work request status...")
        work_request_status = model_deployment_data['status']
        print("Checking for the correct work request status...")
        if work_request_status == "SUCCEEDED":
            print(f"Work request status returned: {work_request_status}")
        else:
            print(
                f"Work request returned an incorrect status of: {work_request_status}")
            print(
                f"Work requests error: {data_science.list_work_request_errors(model_deployment.data.id).data}")
            print(
                f"opc-request-id: {model_deployment.headers['opc-request-id']}")
    else:
        print("Failed to grab model deployment data.")
        print(f"opc-request-id: {model_deployment.headers['opc-request-id']}")
else:
    print(
        f"Model deployment returned an incorrect status of: { model_deployment.status}")
    print(f"opc-request-id: {model_deployment.headers['opc-request-id']}")


We are now ready to invoke the model `/predict` endpoint. 

## Invoking the Model Deployment `/predict` Endpoint 

Lastly we want to invoke the `/predict` endpoint of the deployed model and make inferences on a batch of new data samples

In [None]:
import requests
import oci
from oci.signer import Signer

Before you can execute the cell below, copy and paste the URI of your model deployment. You can find that value in the OCI console under the detail page of your model deployment. In the **Resources** menu of the detail page, click on **"Invoking Your Model"**. You will find the HTTP endpoint of the model. 

In [None]:
uri = f""
print(uri)

From a notebook session, you have two options to authenticate to the model deployment `/predict` endpoint: with user principal (config+key) or with resource principal. We are using user principal. If you prefer to use resource principal, set `using_rps=True` : 

In [None]:
#using resource princital to authenticate to the /predict endpoint: 
using_rps = False

# payload: 
input_data = X_train[:5].to_json()

if using_rps: # using resource principal:     
    auth = oci.auth.signers.get_resource_principals_signer()
else: # using user principal (config+key): 
    config = oci.config.from_file("~/.oci/config") # replace with the location of your oci config file
    auth = Signer(
        tenancy=config['tenancy'],
        user=config['user'],
        fingerprint=config['fingerprint'],
        private_key_file_location=config['key_file'],
        pass_phrase=config['pass_phrase'])

In [None]:
%%time
    
# submit request to model endpoint: 
response = requests.post(uri, json=input_data, auth=auth)

Let's take a look at the status code: 

In [None]:
response.status_code

and the model predictions: 

In [None]:
print(json.loads(response.content))