Data Science Agent: Sample Prompts and Outputs

7.5 Data Science Agent: Sample Prompts and Outputs

Here are some sample prompts and outputs related to various machine learning domains on which you may have conversations with Data Science Agent.

1. Data Discovery

Discovery is semantic and goal-driven. It works best and more efficiently when the goal and domain are stated explicitly.

You can ask the agent to find data objects relevant to a business topic or analysis goal. For example, marketing response, churn, fraud, product demand. It can also help you obtain a general overview of all available objects.

Example 7-1 Discover available tables, views, and models

Sample prompts

Find tables and views related to bank marketing subscriptions and campaign contacts.
What data exists related to bank marketing?
Find tables related to customer churn and retention?
What objects are available?

Outputs

Here are some expected outputs for the above prompts:

A curated set of relevant objects—tables, views, models.
Business-oriented summaries and hints about how objects relate. For example, likely join keys.
Additional extended report with detailed information about all relevant objects.

Note:

Best results depend on meaningful metadata. Semantically clear tables, views, column names and well-maintained annotations improve quality and relevance of results.
You can manually associate database objects—tables, views, or models to the conversation so the agent can use them immediately. This is useful when the relevant objects are already known and discovery is unnecessary. Once associated, the agent can inspect, analyze, transform, and model from those objects directly. Discovery can remain optional unless additional data needs to be found.

2. Inspect Specific Object

You can ask for details about a specific table, view, or mining model.

Example 7-2 Ask questions related to specific tables, views and mining models

Sample prompts

Describe the CUSTOMERS table?
Show the columns and types in SCHEMA.SALES_TRANSACTIONS?
What attributes are used in the model CHURN_MODEL?

Outputs

Here are some expected outputs for the above prompts:

For tables and views, the agent will typically retrieve information related to row and column counts, column list and data types, a small data sample.
For models, the agent will typically retrieve information related to features, target and algorithm data.

3. Exploratory Statistical Analysis

For exploratory statistical analysis, you can ask the agent for both single-variable analysis and relationship analysis (pair).

Example 7-3 Single-variable analysis

You can request distribution and qualitative summaries for one or more individual columns.

Sample prompts

Describe the SALES.CUSTOMERS table and provide an overview of all its attributes.
Provide an overview of all variables in SCHEMA.CUSTOMERS_VIEW
Analyze the distribution of AGE, INCOME, and JOB_CATEGORY.
Analyze which factors are most associated with subscription behavior.

Outputs

Here are some expected outputs for the above prompts:

Global interpretation of analysis results.
Distribution summaries for each variable, using statistics and plots appropriate to the variable type, that is, numeric versus categorical.
Percentage of missing values and number of categories, as applicable.

Example 7-4 Relationship analysis

You can request statistical analysis of the relationship between two variables, such as predictors and outcomes. Relationship analysis is performed pairwise. This means each predictor is examined individually against one outcome. Data Science Agent can scan many predictors against a single outcome; however, this process does not replace multivariate modeling.

Note:

Relationship analyses are most reliable when performed on row-level (fine-grained) datasets, rather than on heavily aggregated data.

Sample prompts

What factors are most associated with subscriptions?
How does CONTACT_CHANNEL relate to AGE?
Analyze relationships of all features versus CHURN_FLAG.

Outputs

Here are some expected outputs for the above prompts:

Global interpretation of pairwise analysis results.
Pairwise relationship summaries for each attribute (against target variable), using statistics and plots appropriate to the variable types (numeric vs. categorical).

4. View-Based Data Transformation and Preparation

The agent can transform and prepare data for modeling by creating new views. This is how it joins tables, filters populations, and derives new features from existing attributes. Here are some common view-building tasks:

Join customer, transaction, and interaction tables into a unified dataset.
Filter to a time window or segment. For example, last 12 months, specific product line and so on.
Create derived fields. For example, date components such as year/month/day or day of week.
Exclude unsupported or non relevant fields from training datasets when needed.
Create a new view joining clients, contacts, and past campaigns; extract day and month from timestamps

The agent does not run arbitrary adhoc SQL queries and return full result sets for interactive browsing. Views are the primary mechanism for shaping data.

The agent does not directly modify base tables.

Example 7-5 View-Based Data Transformation and Preparation

Sample prompts

Join CLIENTS, CONTACTS, and PAST_CAMPAIGNS into a modeling dataset.
Make the dataset ready for modeling by extracting features from timestamps.

Outputs

Here are some expected outputs for the above prompts:

A new view in the user schema starting with the prefix DSAGENT$
A plain-language summary of what the view contains and how it was created
SQL code used to create the view.
Visual diagram to track dependencies and operations at a glance.

5. Feature Importance and Feature Selection

Ask which variables matter most for predicting a specific target and optionally reduce the dataset to most important features.

Example 7-6 Feature Importance and Feature Selection

Sample Prompts

Rank feature importance for predicting SUBSCRIBED.
Create a reduced dataset with only important features

Outputs

Here are some expected outputs for the above prompts:

A ranked list of attributes with importance scores
Optionally, a new top-features view created from the original dataset

Note:

Feature importance can be computed using different supported algorithms. The agent can guide you on algorithm choice in business terms.

6. Dataset Splitting for Training and Evaluation

Use the agent to split dataset as database views.

Example 7-7 Dataset Splitting for Training and Evaluation

Sample Prompts

Split into train/validation/test using standard percentages.
Create an 80/20 train/test split.
Split data into train, validation and test sets, then find best model optimizing Accuracy.

Outputs

Here are some expected outputs for the above prompts:

New views with suffixes such as _TRAIN, _VAL (if requested), _TEST
Optional _UNLABELED view if a target column is provided and some rows have NULL targets. You can use this view later for inference.
SQL code used to perform the split.

7. Model Training

Use Data Science Agent for model training, automated model selection (supervised model search), and model building (unsupervised learning).

Example 7-8 Supervised learning (classification and regression)

Here are some prompts to use Data Science Agent to train models to predict a categorical outcome (classification) or a numeric value (regression).

Sample Prompts

Train a classifier to predict SUBSCRIBED.
Train a regression model to predict CALL_DURATION.

Outputs

Here are some expected outputs for the above prompts:

A trained OML mining model stored in the database
A summary of the training run and configuration choices
SQL code to replicate the training

Example 7-9 Automated model selection (supervised model search)

You can ask the agent to evaluate multiple supervised algorithms and pick the best model based on a metric.

Note:

Automated model selection requires a validation set for comparing models against the selected metric.

After automated model selection, the winning model is retrained on combined train and validation dataset. Therefore, the final model is not the same object as the one that scored best during comparison.

Sample Prompts

Find the best model for predicting SUBSCRIBED using F1 metric.
Run an automated model search for churn prediction.

Outputs

Here are some expected outputs for the above prompts:

A best-performing model selected using the chosen metric (on validation data).
A report of validation performance across tested algorithms.
Optionally, a results table containing the benchmark metrics.

Example 7-10 Unsupervised learning (Clustering and Anomaly Detection)

You can use Data Science Agent to build models without a labeled target.

Note:

For unsupervised models, only model build is supported currently. Additional scoring capabilities and interpretations for clustering will be available soon.

Sample Prompts

Segment customers into clusters.
Detect anomalies in transaction behavior.

Outputs

Here are some expected outputs for the above prompts:

A trained clustering or anomaly model stored in the database
A summary describing how to use the model for downstream scoring

8. Compare Models and Select a Winner

When multiple models exist, either created by the agent or otherwise, you can request the agent for a comparison based on a common validation dataset.

Example 7-11 Model comparison

Sample Prompts

Compare these three models using AUC and select the best.
Rank candidate models and store results in a table.

Outputs

Here are some expected outputs for the above prompts:

A ranked comparison (best-to-worst) on the specified metric.
Optional persistence of the full ranking into a results table for auditability.

9. Evaluate Models

Use Data Science Agent for an unbiased evaluation on held-out test data.

Note:

Evaluation on held-out test set is intended as the most reliable estimate of generalization performance of a trained model.

Example 7-12 Model Evaluation

Sample Prompts

Evaluate the selected model on the test set.
Provide Precision, Recall, F1 and a confusion matrix on test.
Evaluate regression error on test.
Evaluate the best model on the test set, then score the prospects dataset and return the highest-probability cases

Outputs

Here are some expected outputs for the above prompts:

For Classification: accuracy-family metrics and confusion-matrix reporting (binary and multiclass supported)
For Regression: fit and error metrics. For example, R², MAE, RMSE.
A test-results table stored in the database.
SQL code to use the models in inference on arbitrary data.

10. Inference and Scoring

Once a model exists, you can request scoring on new records. Inference requires:

A trained model
A dataset containing the full feature set expected by the model
A dataset containing the IDs to score

Note:

Inference is supported only in ID-based scoring mode, that is IDs to score along with full feature dataset. Broader scoring options will be available soon.

Example 7-13 Inference and Scoring

Sample Prompts

Score the prospects table and return the top 500 most likely to subscribe.
Run inference for these customer IDs.

Outputs

Here are some expected outputs for the above prompts:

Predictions returned in the UI, linked back to case IDs
For Classification: predicted class and probability (based on the designated positive class)
For Regression: predicted numeric value

You can also use the agent in interactive mode for suggestions and interpretations as well.

Examples:

Suggestion request: "I want to predict clients most likely to subscribe, assist me in designing a suitable workflow"
Interpretation request: "Can you help me interpret the model metrics so that I can better assess its performance?"

Parent topic: Data Science Agent