7.12 Example of a Conversation with Data Science Agent

This example demonstrates how Data Science Agent supports a data analyst in exploring a dataset, and in building and evaluating a machine learning model. It also shows the effectiveness of Natural Language to SQL (NL2SQL) in Data Science Agent.

The user runs a complete machine learning workflow using natural language. The conversation between the user and Data Science Agent progresses from generic questions on data exploration in the OMLUSER schema to specific ones on model training, model building and scoring. Broadly, the conversation flow can be categorized into these sections:
  • Goal setting for the conversation
  • Explanation of the dataset in the tables CLIENTS, CONTACTS, PAST_CAMPAIGNS, and PROSPECTS.
  • Data preparation for model building
  • Training and building a model
  • Computation of model accuracy
  • Interpretation of the model's prediction
  • Making prediction by using the trained model

Highlights:

This example highlights the following capabilities of Data Science Agent:
  • Efficiency in performing machine learning tasks—data preparation, model building, model training, model evaluation, data analysis and so on
  • Natural language to SQL capabilities
  • Visualization capabilities
  • Intelligence to suggest the next steps and options to the user

1. Goal and context setting

For every new conversation, Data Science Agent opens the chat interface with some important tips for you. Get started to interact with the agent with a prompt that sets your goal and context for the conversation. Start the conversation with an introduction with specific information to set the goal and context.

Prompt 1: I'm an analyst with no formal ML background. Using the tables added here as Associated Objects, explain the data and how to frame this as a machine learning problem.

Figure 7-16 Goal and context setting



Agent's response:

2. Data Science Agent explores the dataset

Here, let's explore the dataset.

Prompt 2: Show some basic statistics about the data.

Agent's response: Data Science Agent responds to this prompt in a structured format. It provides the following information:

3. Data Science Agent explains how to use the data to frame a machine learning problem

Prompt 3: Explain how to frame this as a machine learning problem to predict subscription likelihood. Explain the target variable and the possible input features.

Agent's response: Data Science Agent provides a point-wise explanation of how to frame the question as a machine learning (ML) problem.

4. Data Science Agent creates a single modeling table

Prompt 4: Yes. Show the exact joins to create a single modeling table.

Agent's response:

5. Data Science Agent performs feature validation and modeling

Prompt 5: Proceed with feature validation and modeling.

Agent's response:

  • Data Science Agent creates a new database view DSAGENT$MODELING_DATA_CLEAN_A6CB
  • Makes this view DSAGENT$MODELING_DATA_CLEAN_A6CB ready for modeling.
  • Provides the SQL code and a visual diagram for the view.
  • Provides the next step options

    Figure 7-26 Prompt 5 and response



6. Data Science Agent performs data split, model training and model evaluation

Data Science Agent uses the view DSAGENT$MODELING_DATA_CLEAN_A6CB for data split. It splits the data into 70% training, 10% validation and 20% test.

Prompt 6: Proceed with splitting the clean view into training, validation, and test sets. Then start training the models.

Agent's response:

  • Splits the data and creates the following views DSAGENT$MODELING_DATA_CLEAN_A6CB_TRAIN with 31,750 rows, DSAGENT$MODELING_DATA_CLEAN_A6CB_VAL with 4,461 rows, and DSAGENT$MODELING_DATA_CLEAN_A6CB_TEST with 9,000 rows.
  • Provides a summary of the data split. The agent splits the data into 70% training, 10% validation and 20% test.
  • Conducts model training and uses Neural Network as the best algorithm for this machine learning problem

    Figure 7-27 Prompt 6 and response



  • Builds and evaluates the final model DSAGENT$SUBSCRIPTION_MODEL_A6CB with

    Figure 7-28 Response 6 (contd.)



  • Provides details on data split done on OMLUSER.DSAGENT$MODELING_DATA_CLEAN_A6CB
  • Shows the scorecard for the model OMLUSER.DSAGENT$SUBSCRIPTION_MODEL_A6CB. It contains the metrics and binary confusion matrix.

    Figure 7-29 Response 6 (concluded)



The final model OMLUSER.DSAGENT$SUBSCRIPTION_MODEL_A6CB is also listed on the Models page, as shown in the screenshot here:

Figure 7-30 DSAGENT$SUBSCRIPTION_MODEL_A6CB model listed on the Models page



7. Data Science Agent performs scoring to predict likelihood of subscription

Data Science Agent uses the model DSAGENT$SUBSCRIPTION_MODEL_A6CB to predict the likelihood of subscription.

Prompt 7: Use the model to score the 100 prospects in the PROSPECTS table

Agent's response: Data Science Agent performs scoring on 100 prospective clients in the PROSPECTS table.