7.5 Data Science Agent: Sample Prompts and Outputs
Here are some sample prompts and outputs related to various machine learning domains on which you may have conversations with Data Science Agent.
1. Data Discovery
Discovery is semantic and goal-driven. It works best and more efficiently when the goal and domain are stated explicitly.
You can ask the agent to find data objects relevant to a business topic or analysis goal. For example, marketing response, churn, fraud, product demand. It can also help you obtain a general overview of all available objects.
Example 7-1 Discover available tables, views, and models
- Find tables and views related to bank marketing subscriptions and campaign contacts.
- What data exists related to bank marketing?
- Find tables related to customer churn and retention?
- What objects are available?
Outputs
- A curated set of relevant objects—tables, views, models.
- Business-oriented summaries and hints about how objects relate. For example, likely join keys.
- Additional extended report with detailed information about all relevant objects.
Note:
- Best results depend on meaningful metadata. Semantically clear tables, views, column names and well-maintained annotations improve quality and relevance of results.
- You can manually associate database objects—tables, views, or models to the conversation so the agent can use them immediately. This is useful when the relevant objects are already known and discovery is unnecessary. Once associated, the agent can inspect, analyze, transform, and model from those objects directly. Discovery can remain optional unless additional data needs to be found.
2. Inspect Specific Object
You can ask for details about a specific table, view, or mining model.
Example 7-2 Ask questions related to specific tables, views and mining models
- Describe the
CUSTOMERStable? - Show the columns and types in
SCHEMA.SALES_TRANSACTIONS? - What attributes are used in the model
CHURN_MODEL?
Outputs
- For tables and views, the agent will typically retrieve information related to row and column counts, column list and data types, a small data sample.
- For models, the agent will typically retrieve information related to features, target and algorithm data.
3. Exploratory Statistical Analysis
For exploratory statistical analysis, you can ask the agent for both single-variable analysis and relationship analysis (pair).
Example 7-3 Single-variable analysis
You can request distribution and qualitative summaries for one or more individual columns.
- Describe the
SALES.CUSTOMERStable and provide an overview of all its attributes. - Provide an overview of all variables in
SCHEMA.CUSTOMERS_VIEW - Analyze the distribution of
AGE,INCOME, andJOB_CATEGORY. - Analyze which factors are most associated with subscription behavior.
Outputs
- Global interpretation of analysis results.
- Distribution summaries for each variable, using statistics and plots appropriate to the variable type, that is, numeric versus categorical.
- Percentage of missing values and number of categories, as applicable.
Example 7-4 Relationship analysis
Note:
Relationship analyses are most reliable when performed on row-level (fine-grained) datasets, rather than on heavily aggregated data.- What factors are most associated with subscriptions?
- How does
CONTACT_CHANNELrelate toAGE? - Analyze relationships of all features versus
CHURN_FLAG.
Outputs
- Global interpretation of pairwise analysis results.
- Pairwise relationship summaries for each attribute (against target variable), using statistics and plots appropriate to the variable types (numeric vs. categorical).
4. View-Based Data Transformation and Preparation
- Join customer, transaction, and interaction tables into a unified dataset.
- Filter to a time window or segment. For example, last 12 months, specific product line and so on.
- Create derived fields. For example, date components such as year/month/day or day of week.
- Exclude unsupported or non relevant fields from training datasets when needed.
- Create a new view joining clients, contacts, and past campaigns; extract day and month from timestamps
The agent does not run arbitrary adhoc SQL queries and return full result sets for interactive browsing. Views are the primary mechanism for shaping data.
The agent does not directly modify base tables.
Example 7-5 View-Based Data Transformation and Preparation
- Join
CLIENTS,CONTACTS, andPAST_CAMPAIGNSinto a modeling dataset. - Make the dataset ready for modeling by extracting features from timestamps.
Outputs
- A new view in the user schema starting with the prefix
DSAGENT$ - A plain-language summary of what the view contains and how it was created
- SQL code used to create the view.
- Visual diagram to track dependencies and operations at a glance.
5. Feature Importance and Feature Selection
Ask which variables matter most for predicting a specific target and optionally reduce the dataset to most important features.
Example 7-6 Feature Importance and Feature Selection
- Rank feature importance for predicting
SUBSCRIBED. - Create a reduced dataset with only important features
Outputs
- A ranked list of attributes with importance scores
- Optionally, a new top-features view created from the original dataset
Note:
Feature importance can be computed using different supported algorithms. The agent can guide you on algorithm choice in business terms.6. Dataset Splitting for Training and Evaluation
Use the agent to split dataset as database views.
Example 7-7 Dataset Splitting for Training and Evaluation
- Split into train/validation/test using standard percentages.
- Create an 80/20 train/test split.
- Split data into train, validation and test sets, then find best model optimizing Accuracy.
Outputs
- New views with suffixes such as
_TRAIN,_VAL(if requested),_TEST - Optional
_UNLABELEDview if a target column is provided and some rows haveNULLtargets. You can use this view later for inference. - SQL code used to perform the split.
7. Model Training
Use Data Science Agent for model training, automated model selection (supervised model search), and model building (unsupervised learning).
Example 7-8 Supervised learning (classification and regression)
Here are some prompts to use Data Science Agent to train models to predict a categorical outcome (classification) or a numeric value (regression).
- Train a classifier to predict
SUBSCRIBED. - Train a regression model to predict
CALL_DURATION.
Outputs
- A trained OML mining model stored in the database
- A summary of the training run and configuration choices
- SQL code to replicate the training
Example 7-9 Automated model selection (supervised model search)
Note:
Automated model selection requires a validation set for comparing models against the selected metric.After automated model selection, the winning model is retrained on combined train and validation dataset. Therefore, the final model is not the same object as the one that scored best during comparison.
- Find the best model for predicting SUBSCRIBED using F1 metric.
- Run an automated model search for churn prediction.
Outputs
- A best-performing model selected using the chosen metric (on validation data).
- A report of validation performance across tested algorithms.
- Optionally, a results table containing the benchmark metrics.
Example 7-10 Unsupervised learning (Clustering and Anomaly Detection)
Note:
For unsupervised models, only model build is supported currently. Additional scoring capabilities and interpretations for clustering will be available soon.- Segment customers into clusters.
- Detect anomalies in transaction behavior.
Outputs
- A trained clustering or anomaly model stored in the database
- A summary describing how to use the model for downstream scoring
8. Compare Models and Select a Winner
When multiple models exist, either created by the agent or otherwise, you can request the agent for a comparison based on a common validation dataset.
Example 7-11 Model comparison
- Compare these three models using AUC and select the best.
- Rank candidate models and store results in a table.
Outputs
- A ranked comparison (best-to-worst) on the specified metric.
- Optional persistence of the full ranking into a results table for auditability.
9. Evaluate Models
Note:
Evaluation on held-out test set is intended as the most reliable estimate of generalization performance of a trained model.Example 7-12 Model Evaluation
- Evaluate the selected model on the test set.
- Provide Precision, Recall, F1 and a confusion matrix on test.
- Evaluate regression error on test.
- Evaluate the best model on the test set, then score the prospects dataset and return the highest-probability cases
Outputs
- For Classification: accuracy-family metrics and confusion-matrix reporting (binary and multiclass supported)
- For Regression: fit and error metrics. For example, R², MAE, RMSE.
- A test-results table stored in the database.
- SQL code to use the models in inference on arbitrary data.
10. Inference and Scoring
- A trained model
- A dataset containing the full feature set expected by the model
- A dataset containing the IDs to score
Note:
Inference is supported only in ID-based scoring mode, that is IDs to score along with full feature dataset. Broader scoring options will be available soon.Example 7-13 Inference and Scoring
- Score the prospects table and return the top 500 most likely to subscribe.
- Run inference for these customer IDs.
Outputs
- Predictions returned in the UI, linked back to case IDs
- For Classification: predicted class and probability (based on the designated positive class)
- For Regression: predicted numeric value
You can also use the agent in interactive mode for suggestions and interpretations as well.
- Suggestion request: "I want to predict clients most likely to subscribe, assist me in designing a suitable workflow"
- Interpretation request: "Can you help me interpret the model metrics so that I can better assess its performance?"
Parent topic: Data Science Agent