Creating and Saving a Model with the OCI Python SDK

Create a model with Python and save it directly to the model catalog.

To create and save a model, you must first create the model artifact.

We recommend that you create and save models to the model catalog programmatically instead, either using ADS or the OCI Python SDK.

Important

  • We recommend that you create and save models to the model catalog programmatically instead, either using ADS or the OCI Python SDK.

  • You can use ADS to create large models. Large models have artifacts limitations of up to 400 GB.

  1. (Optional) Upgrade OCI Python SDK with pip install oci –upgrade.
  2. Save a model object to disk. You can use various tools to save a model (Joblib, cloudpickle, pickle, ONNX, and so on). We recommend that you save a model object in the top-level directory of your model artifact and at the same level as the score.py and runtime.yaml files.
  3. Change your score.py file to define the load_model() and predict() functions. Change the body of both functions to support your model as follows:
    load_model()

    Reads the model file on disk, and returns the estimator object. Ensure that you use the same library for serializing and deserializing the model object.

    predict()

    Contains two parameters, data and model. The required parameter is data, which represents a dataset payload while model is an optional parameter. By default, model is the object returned by load_model(). Ensure that the data type of the data parameter matches the payload format you expect with model deployment.

    By default, model deployment assumes that data is a JSON payload (MIME type application/json). The predict() function converts the JSON payload into a model object data format. For example, a Pandas dataframe or a Numpy array when that's the data format supported by the model object. The body of predict() can include data transformations, and other data manipulation tasks before a model prediction is made.

    A few more things to consider:

    • You can't edit the function signatures of load_model() and predict(). You can only edit the body of these functions to customize them.
    • If they're available in the artifact file, any custom Python modules can be imported using score.py, or as part of the conda environment used for inference purposes.
    • You can save more than one model object in your artifact. You can load more than one estimator object to memory to perform an ensemble evaluation. In this case, load_model() can return an array of model objects that predict() processes.

  4. (Optional) Test the score.predict() function.

    We recommend that you test the predict() function in your local environment before saving the model to the model catalog. The following code snippet shows you how to pass a JSON payload to predict that mimics the behavior of your model deployed using model deployment. This is a good way to ensure that the model object is read by load_model(). Also, that the predictions returned by your models are correct and in the format you expect. If you run this code snippet in a notebook session, you also get the output of any loggers you define in score.py in the output cell.

    import sys
    from json import dumps
     
    # The local path to your model artifact directory is added to the Python path.
    # replace <your-model-artifact-path>
    sys.path.insert(0, f"<your-model-artifact-path>")
     
    # importing load_model() and predict() that are defined in score.py
    from score import load_model, predict
     
    # Loading the model to memory
    _ = load_model()
     
    # Take a sample of your training or validation dataset and store it as data.
    # Making predictions on a JSON string object (dumps(data)). Here we assume
    # that predict() is taking data in JSON format
    predictions_test = predict(dumps(data), _)
    # Compare the predictions captured in predictions_test with what you expect for data:
    predictions_test
  5. Change the runtime.yaml file.

    This file provides a reference to the conda environment you want to use for the runtime environment for model deployment. Minimally, the file must contain the following fields for a model deployment:

    MODEL_ARTIFACT_VERSION: '3.0'
    MODEL_DEPLOYMENT:
      INFERENCE_CONDA_ENV:
        INFERENCE_ENV_SLUG: <the-slugname> # for example mlcpuv1 see: https://docs.oracle.com/en-us/iaas/data-science/using/conda-gml-fam.htm
        INFERENCE_ENV_TYPE: <env-type> # can either be "published" or "data_science"
        INFERENCE_ENV_PATH: <conda-path-on-object-storage>
        INFERENCE_PYTHON_VERSION: <python-version-of-conda-environment>

    Following is an example of a runtime.yaml file. The data scientist is selecting the Data Science TensorFlow 2.3 for CPU conda environment.

    MODEL_ARTIFACT_VERSION: '3.0'
    MODEL_DEPLOYMENT:
      INFERENCE_CONDA_ENV:
        INFERENCE_ENV_SLUG: tensorflow23_p37_cpu_v1
        INFERENCE_ENV_TYPE: data_science
        INFERENCE_ENV_PATH: oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/Tensorflow for CPU Python 3.7/1.0/tensorflow23_p37_cpu_v1
        INFERENCE_PYTHON_VERSION: '3.7'
  6. (Optional) (Recommended) Before saving a model to the catalog, we recommend that you run a series of introspection tests on your model artifact.

    The purpose of these tests is to identify any errors by validating score.py and runtime.yaml files with a set of checks to ensure that they have right syntax, parameters, and versions. Introspection tests are defined as part of the model artifact code template.

    1. Python version 3.5 or greater is required to run the tests. Before running the tests locally on your machine, you must install the pyyaml and requests Python libraries. This installation is a one-time operation.

      Go to your artifact directory. Run the following command to install the required third-party dependencies:

      python3 -m pip install --user -r artifact-introspection-test/requirements.txt
    2. Run the tests locally by replacing <artifact-directory> with the path to the model artifact directory:
      python3 artifact-introspection-test/model_artifact_validate.py --artifact <artifact-path>
    1. Inspect the test results.

      The model_artifact_validate.py script generates two output files in the top-level directory of your model artifacts:

      • test_json_output.json

      • test_html_output.html

      You can open either file to inspect the errors. If you're opening the HTML file, error messages are displayed in the red background.

    2. Repeat steps 2-6 until all tests run successfully. After the tests are running successfully, the model artifact is ready to be saved to the model catalog.
  7. Create and save the model to the model catalog using the OCI SDK with an OCI configuration file, which is part of standard SDK access management.
    1. Initialize the client with:
      # Create a default config using DEFAULT profile in default location
      # Refer to
      # https://docs.cloud.oracle.com/en-us/iaas/Content/API/Concepts/sdkconfig.htm#SDK_and_CLI_Configuration_File
      # for more info
       
      import oci
      from oci.data_science.models import CreateModelDetails, Metadata, CreateModelProvenanceDetails, UpdateModelDetails, UpdateModelProvenanceDetails
      config = oci.config.from_file()
      data_science_client = oci.data_science.DataScienceClient(config=config)
       
      # Initialize service client with user principal (config file)
      config = oci.config.from_file()
      data_science_client = oci.data_science.DataScienceClient(config=config)
       
      # Alternatively initialize service client with resource principal (for example in a notebook session)
      # auth = oci.auth.signers.get_resource_principals_signer()
      # data_science_client = oci.data_science.DataScienceClient({}, signer=auth)
    2. (Optional) Document the model provenance.

      For example:

      provenance_details = CreateModelProvenanceDetails(repository_url="EXAMPLE-repositoryUrl-Value",
                                                        git_branch="EXAMPLE-gitBranch-Value",
                                                        git_commit="EXAMPLE-gitCommit-Value",
                                                        script_dir="EXAMPLE-scriptDir-Value",
                                                        # OCID of the ML job Run or Notebook session on which this model was
                                                        # trained
                                                        training_id="<Notebooksession or ML Job Run OCID>"
                                                        )
    3. (Optional) Document the model taxonomy.

      For example:

      # create the list of defined metadata around model taxonomy:
      defined_metadata_list = [
          Metadata(key="UseCaseType", value="image_classification"),
          Metadata(key="Framework", value="keras"),
          Metadata(key="FrameworkVersion", value="0.2.0"),
          Metadata(key="Algorithm",value="ResNet"),
          Metadata(key="hyperparameters",value="{\"max_depth\":\"5\",\"learning_rate\":\"0.08\",\"objective\":\"gradient descent\"}")
      ]
    4. (Optional) Add your custom metadata (attributes).

      For example:

      # Adding your own custom metadata:
      custom_metadata_list = [
          Metadata(key="Image Accuracy Limit", value="70-90%", category="Performance",
                   description="Performance accuracy accepted"),
          Metadata(key="Pre-trained environment",
                   value="https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/",
                   category="Training environment", description="Environment link for pre-trained model"),
          Metadata(key="Image Sourcing", value="https://lionbridge.ai/services/image-data/", category="other",
                   description="Source for image training data")
      ]
    5. (Optional) Document the model input and output data schema definitions.
      Important

      The schema definition for both the input feature vector and model predictions are used for documentation purposes. This guideline applies to tabular datasets only.

      For example:

      import json
      from json import load
      # Declare input/output schema for our model - this is optional
      # It must be a valid json or yaml string
      # Schema like model artifact is immutable hence it is allowed only at the model creation time and cannot be updated
      # Schema json sample in appendix
      input_schema = load(open('SR_input_schema.json','rb'))
      input_schema_str= json.dumps(input_schema)
      output_schema = load(open('SR_output_schema.json','rb'))
      output_schema_str= json.dumps(output_schema)
    6. (Optional) Document the introspection test results.
      For example:
      # Provide the introspection test results
       
      test_results = load(open('test_json_output.json','rb'))
      test_results_str = json.dumps(test_results)
      defined_metadata_list.extend([Metadata(key="ArtifactTestResults", value=test_results_str)])
    7. (Optional) Set the client timeout value to avoid a Data Science service timeout error when saving large model artifacts:
      import oci
       
      config = oci.config.from_file()
      data_science_client = oci.data_science.DataScienceClient(config=config)
      # Change the timeout value to 1800 sec (30 mins)
      data_science_client.base_client.timeout =  30 * 60
    8. Create a zip archive of the model artifact:
      import zipfile
      import os
           
      def zipdir(target_zip_path, ziph, source_artifact_directory):
          ''' Creates a zip archive of a model artifact directory.
           
          Parameters:
           
          - target_zip_path: the path where you want to store the zip archive of your artifact
          - ziph: a zipfile.ZipFile object
          - source_artifact_directory: the path to the artifact directory.
       
          Returns a zip archive in the target_zip_path you specify.    
       
          '''
          for root, dirs, files in os.walk(source_artifact_directory):
              for file in files:
                  ziph.write(os.path.join(root, file),
                             os.path.relpath(os.path.join(root,file),
                                             os.path.join(target_zip_path,'.')))
             
      zipf = zipfile.ZipFile('<relpath-to-artifact-directory>.zip', 'w', zipfile.zip_DEFLATED)
      zipdir('.', zipf, "<relpath-to-artifact-directory>")
      zipf.close()
    9. Create (save) the model in the model catalog:
      # creating a model details object:
      model_details = CreateModelDetails(
          compartment_id='<compartment-ocid-of-model>',
          project_id='<project-ocid>',
          display_name='<display-name-of-model>',
          description='<description-of-model>',
          custom_metadata_list=custom_metadata_list,
          defined_metadata_list=defined_metadata_list,
          input_schema=input_schema_str,
          output_schema=output_schema_str)
       
      # creating the model object:
      model = data_science_client.create_model(model_details)
      # adding the provenance:
      data_science_client.create_model_provenance(model.data.id, provenance_details)
      # adding the artifact:
      with open('<relpath-to-artifact-directory>.zip','rb') as artifact_file:
          artifact_bytes = artifact_file.read()
          data_science_client.create_model_artifact(model.data.id, artifact_bytes, content_disposition='attachment; filename="<relpath-to-artifact-directory>.zip"')
  8. Now you can view the model details and view the model information including any optional metadata that you defined.

Use these sample code files, and notebook examples to further help you design a model store.