Manually Configuring a Data Science Tenancy

In this tutorial, you set up your tenancy for Data Science and test it by creating a notebook session.

This tutorial is directed at administrator users because they are granted the required access permissions.

In this tutorial, you are:

1. Creating a Data Scientists User Group.

2. Creating a Compartment for Your Work.

3. Creating a VCN and Subnet.

4. Creating Policies.

5. Creating a Dynamic Group and Writing Policies for it.

6. Creating a Notebook Session.

Before You Begin

To perform this tutorial, you must have the following:

  • A paid Oracle Cloud Infrastructure (OCI) account, or a new account with Oracle Cloud promotions. See Request and Manage Free Oracle Cloud Promotions.

  • Administrator privilege for the OCI account.
  • At least one user in your tenancy who wants to access the Data Science service. This user must be created in IAM.

1. Creating a Data Scientists User Group

Create a user group for the data scientists to work in.

  1. Open a supported browser and enter the Console URL:
    https://cloud.oracle.com
  2. Enter your Cloud Account Name, also referred to as your tenancy name, and click Next.
  3. Sign in with your user name and password.
  4. Open the navigation menu and click Identity & Security. Under Identity, click Groups. Shows how to navigate to identity and securiting to select groups in the console.

    A list of the groups in your tenancy displays.

  5. Click Create Group.
  6. Name the new group data-scientists, and enter a description. Create groups page.
  7. Click Create.
    You are advanced to the data-scientists group detail page that you created.
  8. Click Add User to Group.Add a user to a group dialog.
  9. Select a user to add, and then click Add.

    The selected user is added and appears in the group member list.

  10. Repeat adding data scientist users until all of your users are added to the data-scientists group you created.

    A list of the users in the data-scientists group displays.

2. Creating a Compartment for Your Work

Create a compartment for your data science resources.

  1. Open the navigation menu and click Identity & Security. Under Identity, click Compartments. Shows how to select compartments from the left navigation.
  2. Click Create Compartment.
  3. Name the new compartment data-science-work, and enter a description.Create compartment dialog
  4. Click Create Compartment.

    The data-science-work compartment is created, and added to the compartments list when it successfully creates.

3. Creating a VCN and Subnet

Create a virtual cloud network (VCN) for use by the Data Science service.

Note

For a private subnet to have egress access to the internet, it must have a route to a NAT Gateway. For egress access to the public internet, we recommend that you use a private subnet with a route to a NAT Gateway. A NAT gateway gives instances in a private subnet access to the internet.
  1. Open the navigation menu and click Networking and then click Virtual Cloud Networks. Shows how to select Virtual Cloud Networks
  2. Select the compartment that you want to create the VCN in.
  3. Click Start VCN Wizard.
  4. Ensure that Create VCN with Internet Connectivity is selected, and then click Start VCN Wizard. Shows how to select start workflow
  5. Enter datascience-vcn for the VCN Name.
  6. Select the data-science-work compartment. This compartment hosts the VCN you are creating. It takes time for this new compartment to be populated in the drop-down list, so refresh the page until it appears.
  7. Click Next.
  8. Use the defaults for the Configure VCN and Subnets section. The configuration dialog to create a VCN
  9. Ensure that Use DNS Hostnames in this VCN is selected.
  10. Click Next.

    A review of the VCN configuration is displayed.

  11. Click Create to create the VCN and the related resources (a public and a private subnet, an internet gateway, a NAT gateway, and a service gateway). shows the creating resources progress page

    Use this VCN and its private subnet when you create your notebook session.

  12. Click View Virtual Cloud Network to review your VCN and subnets.

4. Creating Policies

Before users start their notebook sessions, you have to configure the Data Science policies.

  1. Open the navigation menu and click Identity & Security. Under Identity, click Policies.
  2. Click Create Policy.
  3. Enter data-science-policy for the Name.
  4. Enter Policy for data science users and service as the Description.
  5. Select the data-science-work compartment.
  6. Click Show manual editor.Shows the create policy dialog to enter policy configuration.
  7. Enter these five policy statements into the Policy Builder field:

    To allow the Data Science service to attach your VCN to your notebook session and route egress traffic from the notebook environment:

    allow service datascience to use virtual-network-family in compartment data-science-work

    To allow the data-scientists group to perform operations on all Data Science resources (projects, notebook sessions, models, model deployments, work requests, jobs, and job runs) that are found in the data-science-work compartment:

    allow group data-scientists to manage data-science-family in compartment data-science-work
    

    To allow those data scientists to use the VCN, you created and attach it to their notebook session:

    allow group data-scientists to use virtual-network-family in compartment data-science-work 
    

    To allow those data scientists to create and manage buckets, such as adding artifacts and conda environments to buckets:

    allow group data-scientists to manage buckets in compartment data-science-work 
    
    allow group data-scientists to manage objects in compartment data-science-work 
    
    Note

    Instead of specifying access to individual services, to allow data-scientists to manage all OCI resources in the data-science-work compartment, replace the preceding five policies with the following two policies:
    allow group data-scientists to manage all-resources in compartment data-science-work 
    
    allow service datascience to use virtual-network-family in compartment data-science-work
  8. Click Create to create your policy.

5. Creating a Dynamic Group and Writing Policies for it

Create a dynamic group for Data Science resources and allow this dynamic group to access other OCI resources, such as Object Storage and Logging.

To give permission to OCI resources to access other OCI resources, first, you add the resources to a dynamic group, instead of a user group. Then you write policies to allow the dynamic group to access specified resources. Here, your dynamic group has three Data Science resources: notebook sessions, model deployments, and job runs.

  1. Open the navigation menu and click Identity & Security. Under Identity, click Compartments.
  2. Click the data-science-work compartment.

    The compartment details page is displayed.

    Compartment details page showing the OCID, creation date, parent, and child compartments.
  3. Click Copy to save the entire OCID to your notepad.
  4. Click Compartments to return to the list of compartments.
  5. Click Dynamic Groups.
  6. Click Create Dynamic Group.
  7. Enter the following:
    • Name: data-science-dynamic-group
    • Description: Data Science dynamic group
  8. Enter the following three matching rules. Replace <compartment-ocid> with the compartment OCID you copied.
    ALL {resource.type='datasciencenotebooksession', resource.compartment.id='<compartment-ocid>'}

    The preceding matching rule means that all notebook sessions created in your compartment are added to data-science-dynamic-group.

    ALL {resource.type='datasciencemodeldeployment', resource.compartment.id='<compartment-ocid>'}

    The preceding matching rule means that all model deployments created in your compartment are added to data-science-dynamic-group.

    ALL {resource.type='datasciencejobrun', resource.compartment.id='<compartment-ocid>'}

    The preceding matching rule means that all job runs created in your compartment are added to data-science-dynamic-group.

    Shows the Create dynamic group page displaying the three matching rules from this step.
  9. Click Create.

    Next, write policies to allow resources of this dynamic group to access other OCI services.

  10. Click Policies.
    Policies page listing all policies in the selected compartment.
  11. Click Create Policy.
  12. Enter the following:
    • Name: data-science-dynamic-group-policy
    • Description: Policy for the Data Science dynamic group
  13. Instead of the data-science-work compartment, select the top-most compartment, which is your tenancy.
  14. Click Show manual editor.
  15. Enter the following policy statements into the Policy Builder field:

    To allow the notebook sessions to perform CRUD operations on entries in the model catalog, projects, and notebook session resources:

    allow dynamic-group data-science-dynamic-group to manage data-science-family in compartment data-science-work
    

    To allow notebook sessions to perform CRUD operations on Data Flow applications and runs:

    allow dynamic-group data-science-dynamic-group to manage dataflow-family in compartment data-science-work

    To allow notebook sessions to list and read compartments and user names that are in the tenancy:

    allow dynamic-group data-science-dynamic-group to read compartments in tenancy
    allow dynamic-group data-science-dynamic-group to read users in tenancy

    To allow model deployments to emit logs to the Logging service:

    allow dynamic-group data-science-dynamic-group to use log-content in compartment data-science-work

    To allow job runs to create logs and record job run details in the Logging service:

    allow dynamic-group data-science-dynamic-group to use log-groups in compartment data-science-work

    To allow notebook sessions and model deployments to read and write files to object storage buckets, in the data-science-work compartment:

    allow dynamic-group data-science-dynamic-group to manage object-family in compartment data-science-work
    Note

    • The preceding policy allows model deployments to access any bucket in the data-science-work compartment.
    • To give model deployments read access to specific buckets outside the data-science-work compartment, specify the bucket names and their compartments in your policy.
    • Example: To allow model deployments to access published conda environments from bucket published-conda-env, and model artifacts from bucket model-artifacts:
      allow dynamic-group data-science-dynamic-group to read objects in compartment <another-compartment> where ANY {target.bucket.name='published-conda-envs', target.bucket.name='model-artifacts'}
    • In the Create Policy page, if your policies mention tenancy or include compartments outside the data-science-work compartment, then for Compartment, select the <your-tenancy> (root).
    Create policy page.
  16. Click Create to create the policy.
    You can use this dynamic group to give notebook sessions and model deployments that are in the data-science-work compartment, access to other OCI resources in the tenancy.

6. Creating a Notebook Session

Lastly, create a notebook session and test its access to the public internet.

  1. Open the navigation menu and click Analytics & AI. Under Machine Learning, click Data Science.Shows how to select the data science service projects.
  2. Click Create Project.
  3. Select the data-science-work compartment.
  4. (Optional) Enter Initial Project for the Name.
  5. (Optional) Enter my first project for the Description.The create project page to conigure a new project.
  6. Click Create.

    The project details page appears.

  7. Click Create notebook session.
  8. Ensure that the data-science-work compartment is selected.
  9. (Optional) Enter my-first-notebook-session for the Name.
  10. Click Select , select Intel, and then VM.Standard2.8 for the Compute shape.
  11. Enter 100 for the Block storage size to attach to your virtual machine.
  12. Click Custom networking, and select the datascience-vcn VCN and Private Subnet-datascience-vcn subnet to route egress traffic from your notebook session.The create notebook session configuration page.
  13. Click View detail page on clicking create.
  14. Click Create to create your first notebook session.

    You are advanced to the notebook sessions page. Creating the notebook session takes a few minutes. When it completes, the status turns to Active, and you can open the notebook session.

  15. Click Open.
  16. Enter your Oracle Cloud Infrastructure credentials to access the JupyterLab UI.shows the notebook session in the jupyterlab interface.
  17. Click Terminal to perform a simple test to check that you can access the public internet from your notebook session.
  18. Run this command:

    You should see a response similar to:

    Shows the wget command messages in a JuypeterLab terminal.

    The HTTP request sent, awaiting response... 200 OK indicates a successful test and you have public internet access in your notebook session.