2 Get Started with Data Science Agent
This section discusses how to get started with Data Science Agent in Oracle Machine Learning UI.
Topics:
- Access Data Science Agent
You can access Data Science Agent directly from Oracle Machine Learning UI home page. - Create AI Credential and AI Profile
An AI Credential is information that contains authentication details required for the database to access Oracle Cloud Infrastructure (OCI) resources on behalf of a user or application. The AI credential comprises information such asuser_ocid,tenancy_ocid,private_keyandfingerprint. An AI Profile is information about a user and their attributes, such asprovider,credential_name, andobject_list. - Manage Data Science Conversations
The Data Science Agent Conversations page lists all the conversations you created. Here, you create and manage conversations. - Use Data Science Agent Chat Interface
On the Data Science Agent chat interface, you interact with the agent for all machine learning and data science related questions. - Example of a Conversation with Data Science Agent
This example demonstrates how Data Science Agent supports a novice user in exploring the dataset present in the SH schema, and in building and evaluating a machine learning model. It also shows the effectiveness of Natural Language to SQL (NL2SQL) in Data Science Agent.
2.1 Access Data Science Agent
You can access Data Science Agent directly from Oracle Machine Learning UI home page.
Parent topic: Get Started with Data Science Agent
2.2 Create AI Credential and AI Profile
An AI Credential is information that contains authentication details
required for the database to access Oracle Cloud Infrastructure (OCI) resources on behalf of
a user or application. The AI credential comprises information such as
user_ocid, tenancy_ocid, private_key
and fingerprint. An AI Profile is information about a user and their
attributes, such as provider, credential_name, and
object_list.
DBMS_CLOUD_AIpackageuser_ocidtenancy_ocidprivate_keyfingerprint
DBMS_CLOUD_AI package.
Parent topic: Get Started with Data Science Agent
2.3 Manage Data Science Conversations
The Data Science Agent Conversations page lists all the conversations you created. Here, you create and manage conversations.
- Data profiling
- Data wrangling and transformation
- Statistical analysis of variable relationships
- Feature Importance, Classification, Regression, XGBoost, Clustering, and Anomaly Detection.
- Model training and evaluation
- Inference on new data
Figure 2-6 Data Science Conversation listing page
- Title: This is the name of the conversation you provided while creating the conversation.
- Profile Name: This is the AI Profile you
selected while creating the conversation. An AI Profile contains information
about the user and their attributes, such as the
provider,credential_name, andobject_list. You can create and manage your AI profiles throughDBMS_CLOUD_AIpackage. - Messages: This indicates the number of
interactions in the conversation with the agent. As shown in the screenshot, the
conversation titled
SQLhas 2 messages or interactions as on the last updated date. - Created on: This is the date on which the conversation was first created.
- Last Updated: This is the date on which the conversation was last used or updated.
- Status: There are two statuses ACTIVE and IDLE.
You can perform the following tasks here:
- Create a Data Science Agent Conversation
A conversation is a set of interactions with Data Science Agent in the chat interface. Before you start a conversation with the Data Science Agent, you must create a conversation. - Delete a Data Science Agent Conversation
You can delete a conversation from the Data Science Agent Conversations listing page.
Parent topic: Get Started with Data Science Agent
2.3.1 Create a Data Science Agent Conversation
A conversation is a set of interactions with Data Science Agent in the chat interface. Before you start a conversation with the Data Science Agent, you must create a conversation.
Note:
You can use the same Database user credentials to access the same conversation in multiple browsers. However, Oracle does not recommend this as it may lead to unexpected behavior. If you attempt this, Data Science Agent will display a warning, but you will have the option to override it.Parent topic: Manage Data Science Conversations
2.3.2 Delete a Data Science Agent Conversation
You can delete a conversation from the Data Science Agent Conversations listing page.
Parent topic: Manage Data Science Conversations
2.4 Use Data Science Agent Chat Interface
On the Data Science Agent chat interface, you interact with the agent for all machine learning and data science related questions.
Parent topic: Get Started with Data Science Agent
2.5 Example of a Conversation with Data Science Agent
This example demonstrates how Data Science Agent supports a novice user in exploring the dataset present in the SH schema, and in building and evaluating a machine learning model. It also shows the effectiveness of Natural Language to SQL (NL2SQL) in Data Science Agent.
SH schema to specific ones on model
training, model building and scoring. Broadly, the conversation flow can be categorized
into these sections:
- User sets a goal for the conversation
- Data Science Agent explains the dataset in the
SHschema - Data Science Agent prepares data for model building
- Data Science Agent trains and builds a model to predict customer income level
- Data Science Agent computes model accuracy
- Data Science Agent interprets the confusion matrix
- Data Science Agent uses the trained model to predict customer income levels on new data
Highlights:
- Efficiency in performing machine learning tasks—data preparation, model building, model training, model evaluation, data analysis and so on
- Natural language to SQL capabilities
- Visualization capabilities
- Intelligence to suggest the next steps and options to the user
1. User sets the context
For every new conversation, Data Science Agent opens the chat interface with some important tips for the user. The user begins to interact with the agent with a prompt that sets the goal and context for the conversation.
I'm new to machine learning and SQL. I want to begin by exploring the
dataset present in the SH schema in the database.
Agent's response:
- A summary of each table present in the SH schema.
- A detailed description of all available objects in the Catalog Discovery Results section.
- Provides options to explore specific business area (like sales, customers, or products), or an overview with basic statistics on a few key tables. It asks in natural language about the user's preference and mentions that it'll provide interactive guidance.
Figure 2-15 User sets the goal of the conversation
Catalog Discovery Results
The agent also provides a detailed description of all available objects in the SH schema in the Catalog Discovery Results, a separate section in the same response. The user clicks to expand it and view the details, as shown in the screenshot below:
2. Data Science Agent explores dataset in the SH schema
Conversation progression: The user opts to explore a particular table in the SH schema.
Prompt 2:
Explain the SH.CUSTOMERS table and also provide an overview of this
table.
- A brief summary of the table
- Lists all the columns in the
SH.CUSTOMERStable alongwith their meaning - Sample data
- Table size
- Conclusion and options to continue the conversation.
Figure 2-17 Data Science Agent explains the CUSTOMERS table
Figure 2-18 Data Science Agent explains the CUSTOMERS table (Contd.)
3. Data Science Agent provides Extended Analysis and Statistics
Conversation progression: The user now chooses to dive deeper into some important statistics. The user provides the third prompt.
Prompt 3:
Give me some important statistics on customer income level.
Agent's response:
SH.CUSTOMERS table.
- Related to customer income level, it identifies the following
categories:
- Twelve distinct income ranges
- The most common income level
- Very high income categories
- Lowest income groups
- Missing values for income level
- Presents an extended analysis customer income level
- Presents a graphical representation of customer income level
Figure 2-19 Data Science Agent provides statistical information on customer income level in the SH.CUSTOMERS table
SH.CUSTOMERS table.
Figure 2-20 Extended Analysis of customer income level
SH.CUSTOMERS table.
Figure 2-21 Graphical representation of customer income level
4. Data Science Agent prepares data
Conversation progression: After reviewing the dataset and exploring the customer income level, the user now proceeds to train a machine learning model with the help of Data Science Agent. The user wants to predict the customer income level.
Prompt 4:
Now, train a model to predict the customer income level. Use all features
that improve the prediction, and consider those to avoid data
leakage.
Agent's response:
- Data Science Agent creates a view
DSAGENT$INCOME_MODEL_DATASET_A365combining the following tables in the SH schema—SH.CUSTOMERS,SH.SUPPLEMENTARY_DEMOGRAPHICS, andSH.COUNTRIES. - Provides the next steps
- Provides the SQL script that it used to create the view
DSAGENT$INCOME_MODEL_DATASET_A365 - Provides a visual diagram to depict the workflow to create the view.
Figure 2-22 Data Science Agent prepares data for model building
DSAGENT$INCOME_MODEL_DATASET_A365.
Figure 2-23 Data Science Agent provides the SQL code for defining the view
DSAGENT$INCOME_MODEL_DATASET_A365.
Figure 2-24 Visual representation of how Data Science Agent created the view
5. Data Science Agent splits data
Conversation progression: The user now proceeds
to prepare the data to predict customer income level. The target column is
CUST_INCOME_LEVEL in the view
DSAGENT$INCOME_MODEL_DATASET_A365.
Prompt 5:
Proceed to set up the data for model training and to train a model to
predict CUST_INCOME_LEVEL for the customers.
- Presents information about the data split along with a brief
explanation about how this data would be used.
- Training dataset 70%
- Validation dataset 10%
- Testing dataset 20%
- How the data split is balanced by income level
- Asks if the user would like to proceed with the 70-10-20 data split?
Prompt 6:
Yes. Go ahead with this 70/10/20 split.
- Overview of the data split.
- Presented detailed information about how the split was split along with the SQL code in an expandable section.
Agent's response:
OMLUSER.DSAGENT$INCOME_MODEL_DATASET_A365_TOP_MDL, created the
following views, and also presented the corresponding SQL code for each view.
- train set:
OMLUSER.DSAGENT$INCOME_MODEL_DATASET_A365_TOP_MDL_TRAIN - validation set:
OMLUSER.DSAGENT$INCOME_MODEL_DATASET_A365_TOP_MDL_VAL - test set:
OMLUSER.DSAGENT$INCOME_MODEL_DATASET_A365_TOP_MDL_TEST - unlabeled:
OMLUSER.DSAGENT$INCOME_MODEL_DATASET_A365_TOP_MDL_UNLABELED
Figure 2-28 Data Split - SQL code for each view
6. Data Science Agent trains model
Conversation progression: After the data split is complete, the user proceeds to model training using the training and validation datasets.
Prompt 7:
Great. Now train a model to predict the customer income level using the
training and validation sets.
Agent's response:
- Algorithm used
- Accuracy achieved
- Model details. The model built is
DSAGENT$CUST_INCOME_CLASSIFIER_A365. - Next steps for the user to choose from
See screenshot for more details.
Figure 2-29 Data Science Agent trains model
Figure 2-30 Model listed on the Models page in OML UI
7. Data Science Agent evaluates the model and computes model accuracy
Conversation progression: The user now focuses on model evaluation.
Prompt 8:
Yes. Evaluate the model and show its prediction accuracy.
Agent's response:
Figure 2-31 Model evaluation by Data Science Agent
Data Science Agent also provides the SQL code to run inference or scoring
using model "OMLUSER"."DSAGENT$CUST_INCOME_CLASSIFIER_A365" on new
raw or unlabelled data. Here's a screenshot depicting the same:
Figure 2-32 Data Science Agent provides SQL query for inference
8. Data Science Agent interprets confusion matrix
Conversation progression: The user now dives into analysis
Prompt 9:
Provide an interpretation of the confusion matrix.
Agent's response:
- Provides an overview of how to interpret the confusion matrix
- Lists the key takeaways of the results
- Provides an interpretation of the meaning
- Provides suggestions on how the user can improve accuracy of the model's prediction.
Figure 2-33 Data Science Agent interprets Confusion Matrix
9. Data Science Agent runs live scoring demonstration
Conversation progression: The user now decides to use the model to predict customer income levels on new data. And so, the user prompts Data Science Agent to demonstrate how to score using the model.
Prompt 10:
Show a live demonstration of scoring using this model
Agent's response: Data Science Agent responds by explaining what information it requires for scoring with the model. It provides some options to the user. Here is the agent's response:
To run a live scoring demonstration, I need to provide a
“positive” income group label for probability reporting. You can choose any
group, for example: '110,000 - 129,999', '90,000 - 109,999', 'Below 30,000',
etc. Would you like to use '110,000 - 129,999', the most common income class, or
specify another group for probability ranking in the results?Figure 2-34 Response of Data Science Agent for running live scoring demonstration
The user responds to the agent, and provides an input in prompt 11.
Prompt 11:
use '110,000 - 129,999'
- Confirms the completion of the task.
- Provides details of the task—scoring.
- Provides the SQL code for running manual inference
- Provides a table containing detailed results, including predictions and probabilities for each scored customer
Figure 2-36 Table containing predictions and probabilities of each scored customer
Parent topic: Get Started with Data Science Agent









on the top right corner to open the Settings pane. 
























