2 Get Started with Data Science Agent

This section discusses how to get started with Data Science Agent in Oracle Machine Learning UI.

Topics:

2.1 Access Data Science Agent

You can access Data Science Agent directly from Oracle Machine Learning UI home page.

To access Data Science Agent, you must first sign into Oracle Machine Learning from Autonomous AI Database:
  1. To sign into Oracle Machine Learning from an Autonomous AI Database instance:
    1. On the Autonomous AI Database information page click Database actions and then click View all database actions.
    2. On the Database Actions page, go to the Development tab and click Machine Learning.

      Figure 2-2 Oracle Machine Learning option in Database Actions Launchpad



      The Oracle Machine Learning UI sign in page opens.
    3. Enter your username and password, and click Sign in.
    This opens the Oracle Machine Learning UI home page.
  2. On your Oracle Machine Learning UI home page, click DS Agent..

    Figure 2-3 Oracle Machine Learning UI homepage



    Alternatively, you can click the Cloud menu icon cloud icon to open on the left navigation menu. Here, click DS Agent.

    This opens the Data Science Conversations listing page.

    Figure 2-4 Data Science Agent Conversations listing page



2.2 Create AI Credential and AI Profile

An AI Credential is information that contains authentication details required for the database to access Oracle Cloud Infrastructure (OCI) resources on behalf of a user or application. The AI credential comprises information such as user_ocid, tenancy_ocid, private_key and fingerprint. An AI Profile is information about a user and their attributes, such as provider, credential_name, and object_list.

Prerequisites
  • DBMS_CLOUD_AI package
  • user_ocid
  • tenancy_ocid
  • private_key
  • fingerprint
You can create and manage your AI profiles using the DBMS_CLOUD_AI package.
To create an AI credential and AI Profile:
  1. Create a notebook and in a %script paragraph, run the following command to create an AI credential:
    %script
    DECLARE
    	credential_name VARCHAR2(128) := 'OCI_CRED';
    BEGIN
    	BEGIN
    		dbms_cloud.drop_credential(credential_name => credential_name);
    	EXCEPTION
    		WHEN OTHERS THEN
    			NULL;
    	END;
    	dbms_cloud.create_credential(
    		credential_name => credential_name,
    		user_ocid       => '<ocid1.user.oc1..>',
    		tenancy_ocid    => '<ocid1.tenancy.oc1..>',
    		private_key     => '<private_key',
    		fingerprint     => '<fingerprint'
    	);
    END;
    /
    
    This PL/SQL script calls the create_credential procedure of DBMS_CLOUD_AI package to create a new credential with the given parameters:
  2. In another %script paragraph in the same notebook, run the following command to create an AI profile by the name NL2SQL.
    %script
    DECLARE
        profile_name VARCHAR2(128) := 'NL2SQL';
    BEGIN
        dbms_cloud_ai.drop_profile(
            profile_name,
            TRUE
        );
        dbms_cloud_ai.create_profile(
            profile_name => profile_name,
            attributes   => '{
                "comments": false,
                "conversation": true,
                "credential_name": "OCI_CRED",
                "model": "openai.gpt-4.1",
                "provider": "oci",
                "temperature": 1,
                "max_tokens": 4096,
                "oci_compartment_id": "<ocid1.compartment.oc1..>",
                "oci_apiformat": "GENERIC"
            }'
        );
    END;
    /
    Define the following attributes for this profile:
    • profile_name: A name for the AI profile. The profile name must follow the naming rules of Oracle SQL identifier. Maximum length of profile name is 125 characters.
    • comments: Set this to false. Setting it to false <placeholder>.
    • conversation: A VARCHAR2 attribute that indicates if conversation history is enabled for a profile. Allowed values are true or false. The default value is false. The values are not case sensitive.
    • credential_name: This is the AI credential name. name of the Oracle Cloud Infrastructure (OCI) credential object for authenticating requests.
    • model: The name of the AI model being used to generate responses in the conversation. In this example, it is OpenAI’s GPT-4.1. For more information, see Recommended Models.
    • provider: This is the provider of the model. It is a mandatory field. Supported providers are:
      • openai
      • cohere
      • azure
      • database
      • oci
      • google
      • anthropic
      • huggingface
      • aws
    • max_tokens: Specify the maximum number of tokens (words and pieces of words) in the response. Prevents overly long outputs and manages cost.
    • oci_compartment_id: This is the OCID of the compartment you are permitted to access when calling the OCI Generative AI service. The compartment ID can contain alphanumeric characters, hyphens and dots.
    • oci_apiformat: This is the format in which the API expects data to be sent and received. Use this attribute to generate text responses. This attribute applies to OCI Generative AI Chat Models in a dedicated AI cluster. Specify this attribute when you specify a model OCID in the model attribute or provide an endpoint in the oci_endpoint_id attribute. Supported values are:
      • COHERE
      • GENERIC
  3. Check the status of the profile creation by running the following:
    %sql select * from
        user_cloud_ai_profiles;
This completes the task of creating an AI credential and AI Profile.

2.3 Manage Data Science Conversations

The Data Science Agent Conversations page lists all the conversations you created. Here, you create and manage conversations.

Create your Data Science Agent conversation to interact with the agent on these areas of data science and machine learning:
  • Data profiling
  • Data wrangling and transformation
  • Statistical analysis of variable relationships
  • Feature Importance, Classification, Regression, XGBoost, Clustering, and Anomaly Detection.
  • Model training and evaluation
  • Inference on new data
Here is the Data Science Conversations page. It lists all the conversations that you created. Click on a conversation title to open it and resume it.

Figure 2-6 Data Science Conversation listing page



This page lists the following details about the conversations:
  • Title: This is the name of the conversation you provided while creating the conversation.
  • Profile Name: This is the AI Profile you selected while creating the conversation. An AI Profile contains information about the user and their attributes, such as the provider, credential_name, and object_list. You can create and manage your AI profiles through DBMS_CLOUD_AI package.
  • Messages: This indicates the number of interactions in the conversation with the agent. As shown in the screenshot, the conversation titled SQL has 2 messages or interactions as on the last updated date.
  • Created on: This is the date on which the conversation was first created.
  • Last Updated: This is the date on which the conversation was last used or updated.
  • Status: There are two statuses ACTIVE and IDLE.

You can perform the following tasks here:

2.3.1 Create a Data Science Agent Conversation

A conversation is a set of interactions with Data Science Agent in the chat interface. Before you start a conversation with the Data Science Agent, you must create a conversation.

Note:

You can use the same Database user credentials to access the same conversation in multiple browsers. However, Oracle does not recommend this as it may lead to unexpected behavior. If you attempt this, Data Science Agent will display a warning, but you will have the option to override it.
To create a Data Science Agent conversation:
  1. On the Data Science Agent Conversation page, click Create.

    Figure 2-7 Data Science Agent Conversations listing page



    This opens the Create DS Agent Conversation dialog.
  2. In the Create DS Agent Conversation dialog, enter the following details:

    Figure 2-8 Create Conversation dialog



  3. In the Title field, provide a name for your conversation. In this example, create a conversation by the name SQL.
  4. In the AI Profile drop-down menu, click on the down arrow and select a profile. Select the profile NL2SQL.
  5. Click OK.
    The conversation is created and it opens in the Data Science Agent chat interface. Here, you can start chatting with Data Science Agent.
  6. In the chat interface, Data Science presents you with tips to begin your conversation. In the Send a message field, type in your prompt in natural language and press enter. In this example, type Display the SQL statement to list all customers who purchased vehicles from 2024 - 2025 and press enter.

    Figure 2-9 Chat with Data Science Agent



  7. The agent may take some time to respond. Here is a response from Data Science Agent to the prompt you entered in the previous step. Continue your conversation with the agent. You may also choose to accept the suggestions provided by the agent.

    Figure 2-10 Response of Data Science Agent



2.3.2 Delete a Data Science Agent Conversation

You can delete a conversation from the Data Science Agent Conversations listing page.

To delete a conversation:
  1. On the Data Science Agent Conversations listing page, select the conversation you want to delete.
  2. Click Delete.

2.4 Use Data Science Agent Chat Interface

On the Data Science Agent chat interface, you interact with the agent for all machine learning and data science related questions.

To begin a chat with Data Science Agent:
  1. On the Data Science Agent chat interface, type your question in the Ask a question field, and press enter.
  2. On the Settings pane, you can define database service levels and manually associate database objects to the conversation. Click settingson the top right corner to open the Settings pane.

    Figure 2-11 Data Science Agent settings



  3. In the Service Level field, click the drop-down menu to select a service level. By default, the service level is set to Low. If you are working with large dataset, consider changing the Service Level to Medium or High to achieve parallelism in the database backend.
  4. To improve the precision and efficiency of the responses of the agent, associate database objects to the conversation. In the Associated Objects field, click + to open the Object Management dialog. In the Object Management dialog:
    1. Turn on Show other schemas to view objects in schemas besides your own. The object are tables, views, and machine learning models.

      Figure 2-12 Object Management dialog



    2. Select the objects from the drop-down list. You can select multiple objects to the conversation.
    3. Click OK to associate objects to the conversation.
  5. The associated objects are listed below the Associated Objects field on the Settings pane. To remove any associated objects, click the delete icon against the object.

    Figure 2-13 Data Science Agent settings



  6. Data Science Agent provides the responses in the text area. You can continue your conversation by asking further questions.
  7. To exit the conversation page, click Back on the top right corner of the page. This takes you back to the Data Science Agent Conversations listing page.

2.5 Example of a Conversation with Data Science Agent

This example demonstrates how Data Science Agent supports a novice user in exploring the dataset present in the SH schema, and in building and evaluating a machine learning model. It also shows the effectiveness of Natural Language to SQL (NL2SQL) in Data Science Agent.

The user runs a complete machine learning workflow using natural language. The conversation between the user and Data Science Agent progresses from generic questions on data exploration in the SH schema to specific ones on model training, model building and scoring. Broadly, the conversation flow can be categorized into these sections:
  • User sets a goal for the conversation
  • Data Science Agent explains the dataset in the SH schema
  • Data Science Agent prepares data for model building
  • Data Science Agent trains and builds a model to predict customer income level
  • Data Science Agent computes model accuracy
  • Data Science Agent interprets the confusion matrix
  • Data Science Agent uses the trained model to predict customer income levels on new data

Highlights:

This example highlights the following capabilities of Data Science Agent:
  • Efficiency in performing machine learning tasks—data preparation, model building, model training, model evaluation, data analysis and so on
  • Natural language to SQL capabilities
  • Visualization capabilities
  • Intelligence to suggest the next steps and options to the user

1. User sets the context

For every new conversation, Data Science Agent opens the chat interface with some important tips for the user. The user begins to interact with the agent with a prompt that sets the goal and context for the conversation.

Prompt 1: I'm new to machine learning and SQL. I want to begin by exploring the dataset present in the SH schema in the database.

Agent's response:

Data Science Agent responds with the following information:
  • A summary of each table present in the SH schema.
  • A detailed description of all available objects in the Catalog Discovery Results section.
  • Provides options to explore specific business area (like sales, customers, or products), or an overview with basic statistics on a few key tables. It asks in natural language about the user's preference and mentions that it'll provide interactive guidance.

Figure 2-15 User sets the goal of the conversation



Catalog Discovery Results

The agent also provides a detailed description of all available objects in the SH schema in the Catalog Discovery Results, a separate section in the same response. The user clicks to expand it and view the details, as shown in the screenshot below:

Figure 2-16 Catalog Discovery Results



2. Data Science Agent explores dataset in the SH schema

Conversation progression: The user opts to explore a particular table in the SH schema.

Prompt 2: Explain the SH.CUSTOMERS table and also provide an overview of this table.

Agent's response: Data Science Agent responds to this prompt in a structured format. It provides the following information:
  • A brief summary of the table
  • Lists all the columns in the SH.CUSTOMERS table alongwith their meaning
  • Sample data
  • Table size
  • Conclusion and options to continue the conversation.

Figure 2-17 Data Science Agent explains the CUSTOMERS table



This image shows the sample data, table size and a conclusion. This is in continuation to the same response to prompt 2. The agent also asks how the user would like to proceed from here—explore variables in more details, run statistics, or join the CUSTOMERS table with other tables.

Figure 2-18 Data Science Agent explains the CUSTOMERS table (Contd.)



3. Data Science Agent provides Extended Analysis and Statistics

Conversation progression: The user now chooses to dive deeper into some important statistics. The user provides the third prompt.

Prompt 3: Give me some important statistics on customer income level.

Agent's response:

Data Science Agent analyzes data related to customer income level and provides detailed statistical information on customer income level based on the SH.CUSTOMERS table.
  • Related to customer income level, it identifies the following categories:
    • Twelve distinct income ranges
    • The most common income level
    • Very high income categories
    • Lowest income groups
    • Missing values for income level
  • Presents an extended analysis customer income level
  • Presents a graphical representation of customer income level

Figure 2-19 Data Science Agent provides statistical information on customer income level in the SH.CUSTOMERS table



Here is a screenshot of the extended analysis of customer income level in the SH.CUSTOMERS table.

Figure 2-20 Extended Analysis of customer income level



Here is a screenshot of the graphical representation of customer income level in the SH.CUSTOMERS table.

Figure 2-21 Graphical representation of customer income level



4. Data Science Agent prepares data

Conversation progression: After reviewing the dataset and exploring the customer income level, the user now proceeds to train a machine learning model with the help of Data Science Agent. The user wants to predict the customer income level.

Prompt 4: Now, train a model to predict the customer income level. Use all features that improve the prediction, and consider those to avoid data leakage.

Agent's response:

  • Data Science Agent creates a view DSAGENT$INCOME_MODEL_DATASET_A365 combining the following tables in the SH schema— SH.CUSTOMERS, SH.SUPPLEMENTARY_DEMOGRAPHICS, and SH.COUNTRIES.
  • Provides the next steps
  • Provides the SQL script that it used to create the view DSAGENT$INCOME_MODEL_DATASET_A365
  • Provides a visual diagram to depict the workflow to create the view.

Figure 2-22 Data Science Agent prepares data for model building



Here's the SQL code developed by Data Science Agent to define the view DSAGENT$INCOME_MODEL_DATASET_A365.

Figure 2-23 Data Science Agent provides the SQL code for defining the view



Data Science Agent also presents a visual representation of the sources (tables) from where it pulled the data to create the view DSAGENT$INCOME_MODEL_DATASET_A365.

Figure 2-24 Visual representation of how Data Science Agent created the view



5. Data Science Agent splits data

Conversation progression: The user now proceeds to prepare the data to predict customer income level. The target column is CUST_INCOME_LEVEL in the view DSAGENT$INCOME_MODEL_DATASET_A365.

Prompt 5: Proceed to set up the data for model training and to train a model to predict CUST_INCOME_LEVEL for the customers.

Agent's response: Data Science Agent now splits the data for training, testing and validation:
  • Presents information about the data split along with a brief explanation about how this data would be used.
    • Training dataset 70%
    • Validation dataset 10%
    • Testing dataset 20%
  • How the data split is balanced by income level
  • Asks if the user would like to proceed with the 70-10-20 data split?
Here is a screenshot of the data split done by Data Science Agent.

Prompt 6: Yes. Go ahead with this 70/10/20 split.

Agent's response: Data Science Agent performs data split and presents the following information to the user:
  • Overview of the data split.
  • Presented detailed information about how the split was split along with the SQL code in an expandable section.

Figure 2-26 Data Split - Overview



Agent's response:

Data Science Agent created splits from OMLUSER.DSAGENT$INCOME_MODEL_DATASET_A365_TOP_MDL, created the following views, and also presented the corresponding SQL code for each view.
  • train set: OMLUSER.DSAGENT$INCOME_MODEL_DATASET_A365_TOP_MDL_TRAIN
  • validation set: OMLUSER.DSAGENT$INCOME_MODEL_DATASET_A365_TOP_MDL_VAL
  • test set: OMLUSER.DSAGENT$INCOME_MODEL_DATASET_A365_TOP_MDL_TEST
  • unlabeled: OMLUSER.DSAGENT$INCOME_MODEL_DATASET_A365_TOP_MDL_UNLABELED
Here's a screenshot of the data split performed by Data Science Agent.

Figure 2-27 Data Split - Details



Here's a screenshot of the SQL code for each of the views created by Data Science Agent.

Figure 2-28 Data Split - SQL code for each view



6. Data Science Agent trains model

Conversation progression: After the data split is complete, the user proceeds to model training using the training and validation datasets.

Prompt 7: Great. Now train a model to predict the customer income level using the training and validation sets.

Agent's response:

Data Science Agent confirms completing of model training. It provides the following additional information about the same:
  • Algorithm used
  • Accuracy achieved
  • Model details. The model built is DSAGENT$CUST_INCOME_CLASSIFIER_A365.
  • Next steps for the user to choose from

See screenshot for more details.

Figure 2-29 Data Science Agent trains model



You can also check the model built by the agent on the Models page in Oracle Machine Learning UI, as shown in the screenshot here.

Figure 2-30 Model listed on the Models page in OML UI



7. Data Science Agent evaluates the model and computes model accuracy

Conversation progression: The user now focuses on model evaluation.

Prompt 8: Yes. Evaluate the model and show its prediction accuracy.

Agent's response:

Data Science Agent evaluates the model's accuracy. It computes the overall accuracy, balanced accuracy, macro precision, macro precision, and F1 score. Here's a screenshot that shows prediction accuracy done by Data Science Agent.

Figure 2-31 Model evaluation by Data Science Agent



Data Science Agent also provides the SQL code to run inference or scoring using model "OMLUSER"."DSAGENT$CUST_INCOME_CLASSIFIER_A365" on new raw or unlabelled data. Here's a screenshot depicting the same:

Figure 2-32 Data Science Agent provides SQL query for inference



8. Data Science Agent interprets confusion matrix

Conversation progression: The user now dives into analysis

Prompt 9: Provide an interpretation of the confusion matrix.

Agent's response:

Data Science Agent provides a comprehensive interpretation of the confusion matrix for customer income level prediction done by the model. Data Science Agent provides the following information:
  • Provides an overview of how to interpret the confusion matrix
  • Lists the key takeaways of the results
  • Provides an interpretation of the meaning
  • Provides suggestions on how the user can improve accuracy of the model's prediction.

Figure 2-33 Data Science Agent interprets Confusion Matrix



9. Data Science Agent runs live scoring demonstration

Conversation progression: The user now decides to use the model to predict customer income levels on new data. And so, the user prompts Data Science Agent to demonstrate how to score using the model.

Prompt 10: Show a live demonstration of scoring using this model

Agent's response: Data Science Agent responds by explaining what information it requires for scoring with the model. It provides some options to the user. Here is the agent's response:

To run a live scoring demonstration, I need to provide a “positive” income group label for probability reporting. You can choose any group, for example: '110,000 - 129,999', '90,000 - 109,999', 'Below 30,000', etc. Would you like to use '110,000 - 129,999', the most common income class, or specify another group for probability ranking in the results?

Figure 2-34 Response of Data Science Agent for running live scoring demonstration



The user responds to the agent, and provides an input in prompt 11.

Prompt 11: use '110,000 - 129,999'

Agent's response: Data Science Agent demonstrates live scoring with the model. It provides the following information:
  • Confirms the completion of the task.
  • Provides details of the task—scoring.
  • Provides the SQL code for running manual inference
  • Provides a table containing detailed results, including predictions and probabilities for each scored customer
Here's a screenshot depicting the SQL code for manual inference. This information is presented in an expandable section.

Figure 2-35 SQL code for manual inference



Here's a screenshot depicting a table containing detailed results, including predictions and probabilities for each scored customer.

Figure 2-36 Table containing predictions and probabilities of each scored customer