Getting Started with Data Science

Learn how to get started with Oracle Cloud Infrastructure Data Science to easily create your first notebook session.

This tutorial is directed at administrator users because they are granted the required access permissions.

In this tutorial, you are:

1. Creating a Data Scientists User Group.

2. Creating a Compartment for Your Work.

3. Creating a VCN and Subnet to Access the Public Internet

4. Creating Policies.

5. Creating a Dynamic Group and Writing Policies for It.

6. Creating a Notebook Session.

Before You Begin

To perform this tutorial successfully, you must have the following:

  • An Oracle Cloud Infrastructure account with administrator privileges, see signing up for Oracle Cloud Infrastructure.

  • At least one user in your tenancy who wants to access the Data Science service. This user must be created in the Identity service.

1. Creating a Data Scientists User Group

You need to create a user group for the data scientists to work in.

  1. Open a supported browser and enter the Console URL, https://console.<tenancy-region>.oraclecloud.com.

    The <tenancy-region> can be us-ashburn-1, us-phoenix-1, and so on. Use one of the Data Science supported Regions and Availability Domains.

  2. Enter your cloud tenant and click Continue.
  3. Sign in with your credentials.
  4. Open the navigation menu.
  5. Under Governance and Administration, click Identity, and then click Groups.
    Shows how to select the identity service and groups.
  6. Create a data-scientists group and enter a description:
    Create groups page.
    You are returned to the list of groups.
  7. Click Create.
  8. Click the data-scientists group, and then add one or more data scientist users to the data-scientists group you just created:
    Add a user to a group dialog.

2. Creating a Compartment for Your Work

Now, you create a compartment for your data science resources.

  1. Click Compartments.
    Shows how to select compartments from the left navigation.
  2. Click Create Compartment.
    Create compartment dialog
  3. Name the new compartment data-science-work, and enter a description.
  4. Click Create Compartment.

3. Creating a VCN and Subnet to Access the Public Internet

You need to create a virtual cloud network (VCN) for use by the Data Science service.

Note

For a private subnet to have egress access to the internet, it must have a route to a NAT Gateway. For egress access to the public internet, we recommend that you use a private subnet with a route to a NAT Gateway. A NAT gateway gives instances in a private subnet access to the internet.
  1. Open the navigation menu.
  2. Under Core Infrastructure, click Networking, and then click Virtual Cloud Networks.
    Shows how to select Virtual Cloud Networks
  3. Click Networking Quickstart.
  4. Make sure that VCN with Internet Connectivity is selected, and then click Start Workflow.
    Shows how to select start workflow
  5. Enter datascience-vcn as the VCN Name.
  6. Select the data-science-work compartment. This compartment contains the VCN you are creating. It takes time for this new compartment to be populated in the drop-down list, so refresh the page until it appears.
  7. Click Next.
  8. Enter the VCN, Public Subnet, and Private Subnet CIDR information as follows:
    The configuration dialog to create a VCN
  9. Make sure that Use DNS Hostnames in this VCN is selected.
  10. Click Next.
  11. Click Create to create the VCN and the related resources (three public subnets and an internet gateway).
    shows the creating resources progress page

    Use this VCN and its private subnet when you create your notebook session.

  12. Click View Virtual Cloud Network to review your VCN and subnets.

4. Creating Policies

Before you can launch a notebook session, you have to configure the policies.

  1. Open the navigation menu.
  2. Under Governance and Administration, click Identity, and then click Policies.
    Shows how to select identity service policies.
  3. Click Create Policy.
  4. Enter data-science-policy for the Name.
    Shows the create policy dialog to enter policy configuration.
  5. Enter Policy for data science users and service as the Description.
  6. Enter these three simple policy statements using the Statement fields and click + Another Statement:

    To allow users in the data scientists group to perform all operations on projects, notebook sessions, models, and work requests that are found in the data-science-work compartment:

    allow group data-scientists to manage data-science-family in compartment data-science-work
    

    To allow those data scientists to use the VCN you just created and attach it to their notebook session:

    allow group data-scientists to use virtual-network-family in compartment data-science-work 
    

    To allow the Data Science service to attach that VCN to your notebook session and route egress traffic from the notebook environment:

    allow service datascience to use virtual-network-family in compartment data-science-work
  7. Click Create to create your policy.

5. Creating a Dynamic Group and Writing Policies for It

To enable notebook sessions to access other Oracle Cloud Infrastructure resources (Object Storage or model catalog), you have to create a dynamic group and write policies for the notebook sessions' resource principals.

  1. Open the navigation menu.
  2. Under Governance and Administration, click Identity, and then click Compartments.
    Menu selection.
  3. Click the name of the compartment that you created in step 2 to of the tutorial to get the OCID.

    The compartment details page is displayed.

    Compartment details page showing the OCID, creation date, parent, and child compartments.

  4. Click Copy to save the entire OCID to your clipboard.
  5. Click Compartments to return to the list of compartments.
  6. Click Dynamic Groups.
  7. Click Create Dynamic Group.
  8. Enter the following:
    • Name: data-science-dynamic-group
    • Description: Data Science dynamic group
  9. Enter this matching rule. Replace <compartment-ocid> with the compartment OCID you copied.
    ALL {resource.type = 'datasciencenotebooksession', resource.compartment.id = '<compartment-ocid>'}
    Create dynamic group page.

    This matching rule means that all notebook sessions created in your compartment are added to data-science-dynamic-group.

  10. Click Create.

    Next, write a policy to enable access for this dynamic group.

  11. Click Policies.
    Policies page listing all policies in the selected compartment.
  12. Click Create Policy.
  13. Enter the following:
    • Name: data-science-dynamic-group-policy
    • Description: Policy for the Data Science dynamic group
  14. Select the root compartment.
  15. Enter these policy statements by clicking + Additional Statement.
    allow dynamic-group data-science-dynamic-group to manage data-science-family in compartment data-science-work

    To allow the notebook sessions to perform CRUD operations on entries in the model catalog, projects, and notebook session resources:

    allow dynamic-group data-science-dynamic-group to manage dataflow-family in compartment data-science-work

    To allow notebook sessions to perform CRUD operations on Data Flow applications and runs:

    allow dynamic-group data-science-dynamic-group to read compartments in tenancy

    To allow notebook sessions to list and read compartments and user names that are in the tenancy:

    allow dynamic-group data-science-dynamic-group to read users in tenancy

    To allow notebook sessions to read and write files to object storage buckets that are located in the compartment data-science-work:

    allow dynamic-group data-science-dynamic-group to manage object-family in compartment data-science-work
    Create policy page.
  16. Click Create to create the policy.
    You can use this dynamic group with resource principals in notebook sessions.

6. Creating a Notebook Session

Lastly, you need to create a notebook session, and then test its access to the public internet.

  1. Open the navigation menu.
  2. Under Data and AI, click Data Sciences, and then click Projects.
    Shows how to select the data science service projects.
  3. Click Create Project.
  4. Select the data-science-work compartment.
    The create project page to conigure a new project.
  5. (Optional) Enter Initial Project for the Name.
  6. (Optional) Enter my first project for the Description.
  7. Click Create.

    The project details page appears.

  8. Click Create Notebook Session.
  9. Make sure that the data-science-work compartment is selected.
    The create notebook session configuration page.
  10. (Optional) Enter my-first-notebook-session for the Name.
  11. Enter VM.Standard2.8 for the Instance Shape.
  12. Enter 100 for the Block Storage Size to attach to your virtual machine.
  13. Select the datascience-vcn VCN and Private Subnet-data-science-vcn subnet to route egress traffic from your notebook session.
  14. Click Create to launch your first notebook session.

    You are advanced to the notebook sessions page. Creating the notebook session takes a few minutes. When it is complete, the status turns to Active, and you can open the notebook session.

  15. Click Open.
  16. Enter your Oracle Cloud Infrastructure credentials to access the JupyterLab UI.
  17. Click Terminal to perform a simple test to check that you can access the public internet from your notebook session.
  18. Run this command:
    wget --spider https://www.oracle.com

    You should see a response similar to:


    Shows the wget command messages in a JuypeterLab terminal.

    The HTTP request sent, awaiting response... 200 OK indicates a successful test and you have public internet access in your notebook session.

What's Next

You are done with this simple tenancy setup.

Now, you can follow the remaining instructions in the getting-started.ipynb notebook session to setup the following from your notebook environment:

  • Oracle Cloud Infrastructure configuration file on the notebook environment.

  • Access the model catalog.

  • Access Oracle Cloud Infrastructure Object Storage.

  • Access Oracle Cloud Infrastructure Data Flow.

For additional information, see Using Notebook Sessions to Build and Train Models.