Using Notebook Sessions to Build and Train Models

Once you have a notebook session created, you can write and execute Python code using the machine learning libraries in the JupyterLab interface to build and train models.

Authenticating to the OCI APIs from a Notebook Session

When you are working within a notebook session, you are operating as the Linux user datascience. This user does not have an OCI Identity and Access Management (IAM) identity, so it has no access to the OCI API. OCI resources include Data Science projects and models and the resources of other OCI services, such as Object Storage, Functions, Vault, Data Flow, and so on. To access these resources from the notebook environment, use one of the two authentication approaches:

(Recommended) Authenticating Using a Notebook Session's Resource Principal

A resource principal is a feature of IAM that enables resources to be authorized principal actors that can perform actions on service resources. Each resource has its own identity, and it authenticates using the certificates that are added to it. These certificates are automatically created, assigned to resources, and rotated, avoiding the need for you to store credentials in your notebook session.

The Data Science service enables you to authenticate using your notebook session's resource principal to access other OCI resources. Resource principals provides a more secure way to authenticate to resources compared to the OCI configuration and API key approach

Your tenancy administrator must write policies to grant permissions for your resource principal to access other OCI resources, see Configuring Your Tenancy for Data Science.

You can authenticate with resource principals in a notebook session using the following interfaces:

Oracle Accelerated Data Science SDK:

Run the following in a notebook cell:

import ads
ads.set_auth(auth='resource_principal')

For details, see the Accelerated Data Science documentation.

OCI Python SDK:

Run the following in a notebook cell.

import oci
from oci.data_science import DataScienceClient
rps = oci.auth.signers.get_resource_principals_signer()
dsc = DataScienceClient(config={}, signer=rps)
OCI CLI:

Use the `--auth=resource_principal` flag with commands.

Note

The resource principal token is cached for 15 minutes. If you change the policy or the dynamic group, you have to wait for 15 minutes to see the effect of your changes.
Important

If you don't explicitly use the resource principals when invoking an SDK or CLI, then the configuration file and API key approach is used

(Default) Authenticating Using OCI Configuration File and API Keys

You can operate as your own personal IAM user by setting up an OCI configuration file and API keys to access OCI resources. This is the default authentication approach

To authenticate using the configuration file and API key approach, you must upload an OCI configuration file into the notebook session's /home/datascience/.oci/ directory. For the relevant profile defined in the OCI configuration file, you also need to upload or create the required .pem files.

Alternatively, you can use the included individual getting-started.ipynb notebooks to interactively create configuration and key files, see Overview of the Notebook Examples.

You can use the api_keys.ipynb notebook to interactively create OCI configuration and API key files. To launch the api_keys.ipynb notebook, click Notebook Examples in the JupyterLab Launcher tab

Working with Existing Code Files

You can create new files or work with your own existing files.

Uploading Files

Files can be uploaded from your local machine by clicking Upload in the JupyterLab interface or by dragging and dropping files.

Creating a Key Pair in a Notebook Session to Use with a Third-Party Version Control Provider
Cloning a Git Repository Without an Existing Private Key
Using Additional Terminal Commands

Installing Additional Python Libraries

You can install a library that's not preinstalled in the provided image.

Access to the public internet is required to install additional libraries. Install a library by opening a notebook session and running this command:

%%bash
pip install <library-name>==<library-version>
Important

Data Science doesn't allow root privileges in notebook sessions. You can only install libraries using yum and pip as a normal user. Attempting to use sudu results in errors.

You can install any open source package available on a publicly-accessible Python Package Index (PyPI) repository. You can also install private or custom libraries from your own internal repositories.

Note

The VCN or subnet that you used to create the notebook session must have network access to the source locations for the packages you want to download and install, see Manually Configuring Your Tenancy for Data Science.

Using the Provided Environment Variables in Notebook Sessions

When you start up a notebook session, the service creates useful environment variables that you can use in your code:

NB_SESSION_COMPARTMENT_OCID

The compartment OCID of the current notebook session.

NB_SESSION_OCID

The OCID of the current notebook session.

PROJECT_OCID

The OCID of the project associated with the current notebook session.

USER_OCID

Your user OCID.

PROJECT_COMPARTMENT_OCID

The compartment OCID of the project associated with the current notebook session.

To access these environment variables in your notebook session, use the Python os library. For example:

import os 
project_ocid = os.environ[‘PROJECT_OCID’]
print(project_ocid)
Note

The NB_SESSION_COMPARTMENT_OCID and PROJECT_COMPARTMENT_OCID values do not update in a running notebook session if the resources has moved compartments after the notebook session was created.

Using the Oracle Accelerated Data Science SDK

The Oracle Accelerated Data Science (ADS) SDK is a Python library that is included as part of the OCI Data Science service notebook session resource. ADS offers a friendly user interface that covers many of the steps involved in the lifecycle of machine learning models, from connecting to different data sources to using AutoML for model training to model evaluation and explanation. ADS also provides a simple interface to access the OCI Data Science service model catalog and other OCI services including object storage.

Note

For complete documentation on how to use the Accelerated Data Science SDK, see Accelerated Data Science Library and Accessing the Conda Environment Notebook Examples.

Connecting to Your Data

You can connect to your data in these ways:

Connecting to Data in Oracle Cloud Infrastructure Object Storage
Connecting to Data on the Autonomous Data Warehouse
Connecting to Data on OCI Streaming
Connecting to Data Using Oracle Vault