Overview of Data Science

OCI Data Science is a fully managed and serverless platform for data science teams to build, train, and manage machine learning models using Oracle Cloud Infrastructure.

The Data Science Service:

  • Provides data scientists with a collaborative, project-driven workspace.

  • Enables self-service, serverless access to infrastructure for data science workloads.

  • Includes Python-centric tools, libraries, and packages developed by the open source community and the Oracle Accelerated Data Science Library, which supports the end-to-end lifecycle of predictive models:

    • Data acquisition, profiling, preparation, and visualization.

    • Feature engineering.

    • Model training (including Oracle AutoML).

    • Model evaluation, explanation, and interpretation (including Oracle MLX).

  • Integrates with the rest of the Oracle Cloud Infrastructure stack, including Functions, Data Flow, Autonomous Data Warehouse, and Object Storage.

  • Model deployment as resources to deploy models as web applications (HTTP API endpoints).

  • Data Science jobs enable you to define and run repeatable machine learning tasks on a fully-managed infrastructure.

  • Includes policies, and vaults to control access to compartments and resources.

  • Includes metrics that provide insight into the health, availability, performance, and utilization of your Data Science resources.
  • Helps data scientists concentrate on methodology and domain expertise to deliver models to production.

Data Science Concepts

Review the following concepts and terms to help you get started with Data Science.

Accelerated Data Science SDK

The Oracle Accelerated Data Science (ADS) SDK is a Python library that is included as part of the OCI Data Science service. ADS has many functions and objects that automates or simplifies the steps in the Data Science workflow, including connecting to data, exploring and visualizing data, training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides a simple interface to access the Data Science service model catalog and other OCI services including Object Storage. To familiarize yourself with ADS, see the Accelerated Data Science Library.

Projects

Projects are collaborative workspaces for organizing and documenting Data Science assets, such as notebook sessions and models.

Notebook Sessions

Data Science notebook sessions are interactive coding environments for building and training models. Notebook sessions come with many preinstalled open source and Oracle developed machine learning and data science packages.

Conda Environments

Conda is an open-source environment and package management system and was created for Python programs. It quickly installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments on your local computer.

Models

Models define a mathematical representation of your data and business process. The model catalog is a place to store, track, share, and manage models.

Model Deployments

Model deployments are a managed resource in the Data Science service that allows you to deploy models stored in the model catalog as HTTP endpoints. Deploying machine learning models as web applications (HTTP API endpoints) serving predictions in real time is the most common way to productionized models. HTTP endpoints are flexible and can serve requests for model predictions.

Jobs

Data Science jobs enable you to define and run repeatable machine learning tasks on a fully-managed infrastructure.

You should also be familiar with the OCI Key Concepts.

Ways to Access Data Science

You access Data Science using the Console, REST API, SDKs, or CLI.

Use any of the following options, based on your preference and its suitability for the task you want to complete:

  • The OCI Console is an easy-to-use, browser-based interface. To access the Console, you must use a supported browser.
  • The REST APIs provide the most functionality, but require programming expertise. API reference and endpoints provide endpoint details and links to the available API reference documents including the Data Science REST API.
  • OCI provides SDKs that interact with Data Science without the need to create a framework.
  • The CLI provides both quick access and full functionality without the need for programming.

Creating Automation Using Events

You can create automation based on state changes for your OCI resources using the Event service types, rules, and actions.

These Data Science resources emit events and explain how to set up event notifications: 

Project Event Types

These are the event types that projects emit:

Friendly Name Event Type
Create Project
com.oraclecloud.datascience.createproject
Delete Project Begin
com.oraclecloud.datascience.deleteproject.begin
Delete Project End
com.oraclecloud.datascience.deleteproject.end
Update Project
com.oraclecloud.datascience.deleteproject.end

Project Example

This is a reference event for projects:

{
    "eventType": "com.oraclecloud.datascience.createproject",
    "cloudEventsVersion": "0.1",
    "eventTypeVersion": "2.0",
    "source": "datascience",
    "eventTime": "2019-11-22T01:43:35.246Z",
    "eventID": "<unique_ID>",
    "contentType": "application/json",
    "data": {
      "compartmentId": "ocid1.compartment.oc1..<unique_ID>",
      "compartmentName": "example_compartment",
      "resourceName": "example_project",
      "resourceId": "ocid1.datascienceproject.oc1.iad.<unique_ID>",
      "availabilityDomain": "<availability_domain>",
      "freeFormTags": {
        "Department": "Finance"
      },
      "definedTags": {
        "Operations": {
          "CostCenter": "42"
        }
      }
    },
    "extensions": {
      "compartmentId": "ocid1.compartment.oc1..<unique_ID>"
    }
}

Notebook Session Event Types

These are the event types that notebook sessions emit:

Friendly Name Event Type
Activate Notebook Session Begin
com.oraclecloud.datascience.activatenotebooksession.begin
Activate Notebook Session End
com.oraclecloud.datascience.activatenotebooksession.end
Create Notebook Session Begin
com.oraclecloud.datascience.createnotebooksession.begin
Create Notebook Session End
com.oraclecloud.datascience.createnotebooksession.end
Deactivate Notebook Session Begin
com.oraclecloud.datascience.deactivatenotebooksession.begin
Deactivate Notebook Session End
com.oraclecloud.datascience.deactivatenotebooksession.end
Delete Notebook Session Begin
com.oraclecloud.datascience.deletenotebooksession.begin
Delete Notebook Session End
com.oraclecloud.datascience.deletenotebooksession.end
Update Notebook Session
com.oraclecloud.datascience.updatenotebooksession

Notebook Session Example

This is a reference event for notebook sessions:

{
    "eventType": "com.oraclecloud.datascience.updatenotebooksession",
    "cloudEventsVersion": "0.1",
    "eventTypeVersion": "2.0",
    "source": "datascience",
    "eventTime": "2019-11-22T01:43:35.246Z",
    "eventID": "<unique_ID>",
    "contentType": "application/json",
    "data": {
      "compartmentId": "ocid1.compartment.oc1..<unique_ID>",
      "compartmentName": "example_compartment",
      "resourceName": "example_notebook_session",
      "resourceId": "ocid1.datasciencenotebooksession.oc1.iad.<unique_ID>",
      "availabilityDomain": "<availability_domain>",
      "freeFormTags": {
        "Department": "Finance"
      },
      "definedTags": {
        "Operations": {
          "CostCenter": "42"
        }
      }
    },
    "extensions": {
      "compartmentId": "ocid1.compartment.oc1..<unique_ID>"
    }
}

Model Event Types

These are the event types that models emit:

Friendly Name Event Type
Activate Model
com.oraclecloud.datascience.activatemodel
Create Model
com.oraclecloud.datascience.createmodel
Deactivate Model
com.oraclecloud.datascience.deactivatemodel
Delete Model
com.oraclecloud.datascience.deletemodel
Update Model
com.oraclecloud.datascience.updatemodel

Model Example

This is a reference event for models:

{
    "eventType": "com.oraclecloud.datascience.deletemodel",
    "cloudEventsVersion": "0.1",
    "eventTypeVersion": "2.0",
    "source": "datascience",
    "eventTime": "2019-11-22T01:43:35.246Z",
    "eventID": "<unique_ID>",
    "contentType": "application/json",
    "data": {
      "compartmentId": "ocid1.compartment.oc1..<unique_ID>",
      "compartmentName": "example_compartment",
      "resourceName": "example_model",
      "resourceId": "ocid1.datasciencemodel.oc1.iad.<unique_ID>",
      "availabilityDomain": "<availability_domain>",
      "freeFormTags": {
        "Department": "Finance"
      },
      "definedTags": {
        "Operations": {
          "CostCenter": "42"
        }
      }
    },
    "extensions": {
      "compartmentId": "ocid1.compartment.oc1..<unique_ID>"
    }
}

Model Deployment Event Types

These are the event types that model deployments emit:

Friendly Name Event Type
Activate Model Deployment
com.oraclecloud.datascience.activatemodeldeployment.begin
com.oraclecloud.datascience.activatemodeldeployment.end
Create Model Deployment
com.oraclecloud.datascience.createmodeldeployment.begin
com.oraclecloud.datascience.createmodeldeployment.end
Deactivate Model Deployment
com.oraclecloud.datascience.deactivatemodeldeployment.begin
com.oraclecloud.datascience.deactivatemodeldeployment.end
Delete Model Deployment
com.oraclecloud.datascience.deletemodeldeployment.begin
com.oraclecloud.datascience.deletemodeldeployment.end
Update Model Deployment
com.oraclecloud.datascience.updatemodeldeployment.begin
com.oraclecloud.datascience.updatemodeldeployment.end

Model Deployment Example

This is a reference event for model deployments:

"exampleEvent": {
    "eventType": "com.oraclecloud.datascience.createmodeldeployment.begin",
    "cloudEventsVersion": "0.1",
    "eventTypeVersion": "2.0",
    "source": "datascience",
    "eventTime": "2021-03-03T01:43:35.246Z",
    "eventID": "unique_ID",
    "contentType": "application/json",
    "data": {
      "compartmentId": "ocid1.compartment.oc1..unique_ID",
      "compartmentName": "example_compartment",
      "resourceName": "example_model_deployment",
      "resourceId": "ocid1.datasciencemodeldeployment.oc1.iad.unique_ID",
      "availabilityDomain": "availability_domain",
      "freeFormTags": {
        "Department": "Finance"
      },
      "definedTags": {
        "Operations": {
          "CostCenter": "42"
        }
      }
    },

Job and Job Run Event Types

These are the event types that job and job runs emit:

Friendly Name Event Type
Job - Create
com.oraclecloud.datascience.createjob
Job - Delete begin
com.oraclecloud.datascience.deletejob.begin
Job - Delete end
com.oraclecloud.datascience.deletejob.end
Job - Update
com.oraclecloud.datascience.updatejob
Job Run - Cancel begin
com.oraclecloud.datascience.canceljobrun.begin
Job Run - Cancel end
com.oraclecloud.datascience.canceljobrun.end
Job Run - Create begin
com.oraclecloud.datascience.createjobrun.begin
Job Run - Create end
com.oraclecloud.datascience.createjobrun.end
Job Run - Delete
com.oraclecloud.datascience.updatejobrun
Job Run - Failed
com.oraclecloud.datascience.failedjobrun
Job Run - Succeeded
com.oraclecloud.datascience.succededjobrun
Job Run - Timeout
com.oraclecloud.datascience.timeoutjobrun
Job Run - Update
com.oraclecloud.datascience.updatejobrun

Regions and Availability Domains

OCI services are hosted in regions and availability domains. A region  is a localized geographic area, and an availability domain  is one or more data centers located in that region.

Data Science is hosted in these regions:

  • Australia East (Sydney)

  • Australia Southeast (Melbourne)

  • Brazil East (Sao Paulo)

  • Brazil Southeast (Vindeho)

  • Canada Southeast (Montreal)

  • Canada Southeast (Toronto)

  • Chile (Santiago)

  • Dedicated Region Cloud@Customer (Chiyoda)

  • Germany Central (Frankfurt)

  • India South (Hyderabad)

  • India West (Mumbai)

  • Japan Central (Osaka)

  • Japan East (Tokyo)

  • Netherlands Northwest (Amsterdam)

  • Saudi Arabia West (Jeddah)

  • South Korea Central (Seoul)

  • South Korea North (Chuncheon)

  • Switzerland North (Zurich)

  • UAE East (Dubai)

  • UK South (London)

  • UK West (Newport)

  • US East (Ashburn)

  • US West (Phoenix)

  • US West (San Jose)

  • US Gov West (Phoenix)

  • US Gov East (Ashburn)

  • US DoD North (Chicago)

  • US DoD West (Phoenix)

  • US DoD East (Ashburn)

Note

GPU regional availability for Data Science is as follows:

  • VM.GPU2—The hardware for VM.GPU2 is only found in Ashburn and Frankfurt.

  • VM.GPU3—The hardware for VM.GPU3 is only found in Ashburn, London, Tokyo, Osaka, and San Jose.

Limits on Data Science Resources

When you sign up for OCI, a set of service limits is configured for your tenancy. The service limit is the quota or allowance set on the resources.

Limits by Service includes Data Science limits and other OCI services. You can request a service limit increase to change the defaults.

Tip

Watch the increasing Data Science service limits video for specifics.

In addition to these service limits, note that:

  • Failed and inactive notebook sessions and models count against your service limits. Only when you fully stop an instance or delete a model is it not counted toward your quota.

  • GPU limits are set to zero by default so ask your system administrator to increase the limits so that you can use GPUs.

  • The maximum number of jobs is 1000. By default, every tenancy can create up to 1000 jobs. You can increase this limit a CAM service request ticket.

  • The number of simultaneous job runs is limited by your Data Science core count limits.

Resource Identifiers

Most types of OCI resources have an Oracle assigned unique ID called an OCID (Oracle Cloud Identifier) .

The OCID is included as part of the resource's information in both the Console and API. For information about the OCID format and other ways to identify your resources, see Resource Identifiers.

Authentication and Authorization

Each service in OCI integrates with Identity and Access Management for access to cloud resources through all interfaces (the OCI Console, SDKs, REST APIs, or the CLI).

An administrator in your organization must set up tenancies, groups, compartments, and policies that control who can access which services and resources and the type of access. Your administrator confirms which compartments you should be using.

Use About Data Science Policies to create and manage Data Science projects, or launch notebook sessions.

Supported Compute Shapes

Data Science supports specific Compute shapes for various resources in the service.

VM Compute shapes and GPU Compute shapes describe these shapes in detail.

Tip

Sizing notebook sessions gives you tips about how to use shapes.

OCPUs represent physical CPU cores. Most CPU architectures, including x86, execute two threads per physical core, so 1 OCPU is the equivalent of 2 vCPUs for x86-based compute. The cost per vCPU is half the cost of an OCPU, see Data Science pricing.

Notebook Sessions

AMD CPU flex shape (VM.Standard.E3.Flex)

1-64 OCPUs

1-1024GB memory (dependent on OCPU count)

Ex

OCPU GB Memory
1 1-64
2 1-128
3 1-192
4 1-256

Intel CPU (X7)

Shape OCPUs GB Memory
VM.Standard2.1 1 15
VM.Standard2.2 2 30
VM.Standard2.4 4 60
VM.Standard2.8 8 120
VM.Standard2.16 16 240
VM.Standard2.24 24 320

NVIDIA GPU

Shape GPU Type GPU Count

GPU Memory

16GB/GPU

OCPU Count GB Memory

VM.GPU2.1

(older generation)

P100 1 16 12 72
VM.GPU3.1 V100 1 16 6 90
VM.GPU3.2 V100 2 32 12 180
VM.GPU3.4 V100 4 64 24 360

Jobs

Intel CPU

Shape OCPUs GB memory
VM.Standard2.1 1 15
VM.Standard2.2 2 30
VM.Standard2.4 4 60
VM.Standard2.8 8 120
VM.Standard2.16 16 240
VM.Standard2.24 24 320

NVIDIA GPU

Shape GPU type GPU count

GPU memory

16GB/GPU

OCPU count GB Memory

VM.GPU2.1

(older generation)

P100 1 16 12 72
VM.GPU3.1 V100 1 16 6 90
VM.GPU3.2 V100 2 32 12 180
VM.GPU3.4 V100 4 64 24 360

Model Deployments

Intel CPU

Shape OCPUs GB memory
VM.Standard2.1 1 15
VM.Standard2.2 2 30
VM.Standard2.4 4 60
VM.Standard2.8 8 120
VM.Standard2.16 16 240
VM.Standard2.24 24 320

Provisioning and Pricing

The Data Science service offers a serverless experience for model development and deployment. When you create Data Science resources, such as notebook sessions, models, model deployments, jobs, and the underlying Compute and storage infrastructure is provisioned and maintained for you.

You pay for the use of the underlying infrastructure (Block Storage, Compute, and Object Storage). Review the detailed pricing list for Data Science resources.

You only pay for the infrastructure while you are using it with Data Science resources:

Notebook Sessions
  • Notebook sessions are serverless, and all underlying infrastructure is service-managed.

  • When creating a notebook session, you select the VM shape (the type of machine CPU or GPU, and the number of OCPU or GPUs) and amount of block storage (minimum of 50 GB).

  • While a notebook session is active, you pay for Compute and Block Storage at the standard Oracle Cloud Infrastructure rates, see Deactivating Notebook Sessions.
  • You can deactivate your notebook session, which shuts down the Compute though retains the Block Storage. In this case, you are no longer charged for Compute, but you continue to pay for the Block Storage. This applies to notebook sessions with a GPU instance. Notebook sessions with a GPU instance aren't metered for Compute when they are deactivated.

    You can activate your notebook session to reattach this Block Storage to new Compute, see Deactivating and Activating Notebook Sessions.

  • When you delete a notebook session, you are no longer charged for Compute or Block Storage, see Deleting Notebook Sessions.

Models
  • When you save a model to the model catalog, you are charged for the storage of the model artifact at the standard Object Storage rates in terms of GB per month.

  • When you delete a model, you are no longer charged, see Deleting Models.

Model Deployments
  • When you deploy a model, you select the shape type and the number of replicas hosting the model servers. You can also select the load balancer bandwidth associated with your deployment.

  • When a model deployment is active, you pay for the VMs that are hosting the model servers and the load balancer at the standard OCI rates.

  • When you deactivate a model deployment, you are no longer charged for the VMs or the load balancer. You can activate a model deployment and billing resumes for both VMs and the load balancer.

  • When you stop a model deployment, you are no longer charged for the infrastructure associated with the model deployment.

Jobs
  • Jobs don't render a premium cost for using the service, you only pay for the underlining used infrastructure and only during the duration of execution of the job artifact.

  • Metering starts from the moment the job artifact is run, and stops with the code exit. You don't pay for the infrastructure provisioning time nor for the deprovisioning of the infrastructure.

    Metering includes the CPU or GPU consumption per OCPU during the duration of running the job artifact and the Block Storage size used for the job.

  • Using the Logging service with Jobs doesn't incur an extray cost.

Tip

You can use Checking Your Balance and Usage to review the costs associated with your account. Also, you can use the Oracle Cloud Infrastructure Billing and Payment Tools to analyze your Data Science usage and manage your costs.

Compliance

Review the standards that the Data Science service is compliant with.

The service is compliant with these standards:

HIPAA, used by healthcare companies to protect patient privacy.

PCI-DSS, used by the credit card industry to protect consumers against fraud.