1 About Data Science Agent
Data Science Agent is an intelligent built-in conversational chatbot integrated with Oracle Machine Learning UI included in your Oracle Autonomous AI Database subscription. You must provide the LLM, whether from a third-party AI provider, OCI GenAI Service, or one you privately host. You can run complete data science workflows using natural language in the Data Science Agent chat.
Topics:
- Prerequisites to use Data Science Agent
To use Data Science Agent, you must have the following: - Data Science Agent Concepts
Here is a list of key concepts and terms commonly used in Data Science Agent. - Key Highlights of Data Science Agent
Data Science Agent offers a range of powerful features designed to streamline data science workflows. The key features include: - Limitations of Data Science Agent
While Data Science Agent offers numerous benefits, there are certain limitations that may impact its use in specific scenarios. - Data Science Agent: Sample Prompts and Outputs
Here are some sample prompts and outputs related to various machine learning domains on which you may have conversations with Data Science Agent.
1.1 Prerequisites to use Data Science Agent
To use Data Science Agent, you must have the following:
DBMS_CLOUD_AIprofile (AI profile) andDBMS_CLOUDcredentials (AI Credential) created. For more information, see Use DBMS_CLOUD_AI to Configure AI Profiles.- The
OML_DEVELOPERrole must be granted to theOMLUSER.Note:
If the user (OMLUSER) is created through Database Actions, theOML_DEVELOPERrole is automatically granted. - User must be added to the host ACL (Access Control List).
Note:
This is not required for OCI Generative AI. - Select AI must be configured to use supported AI providers. For more information, see Perform Prerequisites for Select AI
- Access to the relevant schemas and objects based on your role and privileges.
For additional information, see Manage AI Profiles
- Use DBMS_CLOUD_AI to Configure AI Profiles
Autonomous AI Database uses AI profiles to facilitate and configure access to an LLM and to setup for generating, running, and explaining SQL based on natural language prompts. It also facilitates retrieval augmented generation using embedding models and vector indexes and allows for chatting with the LLM. - Grant OML_DEVELOPER Role to OML User
To use Data Science Agent, the administrator must grant theOML_DEVELOPERrole to the OML user. - Add User to the Host ACL
For model providers like OpenAI, add users to the host ACL (Access Control List). - Perform Prerequisites for Select AI
Before you use Select AI, here are the steps to enableDBMS_CLOUD_AI.
Parent topic: About Data Science Agent
1.1.1 Use DBMS_CLOUD_AI to Configure AI Profiles
Autonomous AI Database uses AI profiles to facilitate and configure access to an LLM and to setup for generating, running, and explaining SQL based on natural language prompts. It also facilitates retrieval augmented generation using embedding models and vector indexes and allows for chatting with the LLM.
In addition to specifying tables and views in the AI profile, you can also specify tables mapped with external tables, including those described in Query External Data with Data CatalogQuery External Data with Data Catalog. This enables you to query data not just inside the database, but also data stored in a data lake's object store.
Parent topic: Prerequisites to use Data Science Agent
1.1.2 Grant
OML_DEVELOPER Role to OML User
To use Data Science Agent, the administrator must grant the
OML_DEVELOPER role to the OML user.
If the OML user (OMLUSER) is created through Database Actions,
the OML_DEVELOPER role is automatically granted.
OML_DEVELOPER role, run the
following:GRANT OML_DEVELOPER to OMLUSERParent topic: Prerequisites to use Data Science Agent
1.1.3 Add User to the Host ACL
For model providers like OpenAI, add users to the host ACL (Access Control List).
Note:
Host ACL entry is not required for OCI GenAI.Note:
This procedure is not applicable to OCI Generative AI.BEGIN
DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
host => 'api.openai.com',
ace => xs$ace_type(privilege_list => xs$name_list('http'),
principal_name => 'OMLUSER',
principal_type => xs_acl.ptype_db)
);
END;
-
host: The host, which can be the name or the IP address of the host. You can use a wildcard to specify a domain or an IP subnet. The host or domain name is not case sensitive.AI Provider Host OpenAI
api.openai.com
OpenAI-compatible providers
For example, for Fireworks AI, use api.fireworks.ai
Cohere
api.cohere.ai
Azure OpenAI Service
<azure_resource_name>.openai.azure.com
See Profile Attributes to know more about
azure_resource_name.Google
generativelanguage.googleapis.com
Anthropic
api.anthropic.com
Hugging Face
api-inference.huggingface.co
AWS
bedrock-runtime.us-east-1.amazonaws.com
-
ace: The access control entries (ACE). TheXS$ACE_TYPEtype is provided to construct each ACE entry for the ACL. For more details, see Creating ACLs and ACEs .
Parent topic: Prerequisites to use Data Science Agent
1.1.4 Perform Prerequisites for Select AI
Before you use Select AI, here are the steps to
enable DBMS_CLOUD_AI.
DBMS_CLOUD_AI:
- Access to an Oracle Cloud Infrastructure cloud account and to an Autonomous AI Database instance.
- A paid API account of a supported AI provider, one of:
AI Provider API Keys OpenAI
See Use OpenAI to get your API keys.
OpenAI-compatible providers
See Use OpenAI-Compatible Providers to get your API keys and
provider_endpoint.Cohere
See Use Cohere to get your secret API keys.
Azure OpenAI Service
See Use Azure OpenAI Service for more information on how to configure Azure OpenAI Service. OCI Generative AI
See Use OCI Generative AI .
Google
See Use Google to get your API keys.
Anthropic
See Use Anthropic to get your API keys.
Hugging Face
See Use Hugging Face to get your API keys.
AWS
See Use AWS to get your API keys and model ID.
- Network ACL privileges to access your external AI
provider.
Note:
Network ACL privileges are not required for OCI Generative AI. - A credential that provides access to the AI provider.
- Grant Privileges for Select AI
To use Select AI, the administrator must grant theEXECUTEprivilege on theDBMS_CLOUD_AIpackage. Learn about additional privileges required for Select AI and its features. - Examples of Privileges to Run Select AI
Review examples of privileges required to use Select AI and its features.
Parent topic: Prerequisites to use Data Science Agent
1.1.4.1 Grant Privileges for Select AI
To use Select AI, the administrator must grant the
EXECUTE privilege on the DBMS_CLOUD_AI package. Learn
about additional privileges required for Select AI and its features.
DBMS_CLOUD_AI:
-
Grant the
EXECUTEprivilege on theDBMS_CLOUD_AIpackage to the user who wants to use Select AI.By default, only the system administrator has
EXECUTEprivilege. The administrator can grantEXECUTEprivilege to other users. -
Grant
EXECUTEprivilege onDBMS_CLOUD_PIPELINEto the user who wants to use Select AI with RAG.Note:
If the user already has theDWROLErole, this privilege is included and additional grant is not required. -
Grant network ACL access to the user who wants to use Select AI and for the AI provider endpoint.
The system administrator can grant network ACL access. See APPEND_HOST_ACE Procedure for more information.
- Create a credential to enable access to your AI provider.
See CREATE_CREDENTIAL Procedure for more information.
-
Grant quotas in tablespace to manage the amount of space in a specific tablespace to the user who wants to use Select AI with RAG.
Parent topic: Perform Prerequisites for Select AI
1.1.4.2 Examples of Privileges to Run Select AI
Review examples of privileges required to use Select AI and its features.
EXECUTE privilege to
ADB_USER:GRANT execute on DBMS_CLOUD_AI to ADB_USER;The following example grants EXECUTE privilege for the
DBMS_CLOUD_PIPELINE package required for
RAG:
GRANT EXECUTE on DBMS_CLOUD_PIPELINE to ADB_USER;
To check the privileges granted to a user for the
DBMS_CLOUD_AI and DBMS_CLOUD_PIPELINE
packages, an administrator can run the following:
SELECT table_name AS package_name, privilege
FROM DBA_TAB_PRIVS
WHERE grantee = '<username>'
AND (table_name = 'DBMS_CLOUD_PIPELINE'
OR table_name = 'DBMS_CLOUD_AI');
ADB_USER the privilege to
use the api.openai.com endpoint.
Note:
This procedure is not applicable to OCI Generative AI.BEGIN
DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
host => 'api.openai.com',
ace => xs$ace_type(privilege_list => xs$name_list('http'),
principal_name => 'ADB_USER',
principal_type => xs_acl.ptype_db)
);
END;
/
The parameters are:
-
host: The host, which can be the name or the IP address of the host. You can use a wildcard to specify a domain or an IP subnet. The host or domain name is not case sensitive.AI Provider Host OpenAI
api.openai.com
OpenAI-compatible providers
For example, for Fireworks AI, use api.fireworks.ai
Cohere
api.cohere.ai
Azure OpenAI Service
<azure_resource_name>.openai.azure.com
See Profile Attributes to know more about
azure_resource_name.Google
generativelanguage.googleapis.com
Anthropic
api.anthropic.com
Hugging Face
api-inference.huggingface.co
AWS
bedrock-runtime.us-east-1.amazonaws.com
-
ace: The access control entries (ACE). TheXS$ACE_TYPEtype is provided to construct each ACE entry for the ACL. For more details, see Creating ACLs and ACEs.
The following example creates a credential to enable access to OpenAI.
EXEC
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => 'OPENAI_CRED',
username => 'OPENAI',
password => '<your_api_token>');
The parameters are:
-
credential_name: The name of the credential to be stored. Thecredential_nameparameter must conform to Oracle object naming conventions. -
username: Theusernameandpasswordarguments together specify your AI provider credentials.The
usernameis a user-specified user name. -
password: Theusernameandpasswordarguments together specify your AI provider credentials.The password is your AI provider secret API key, and depends on the provider, that is, OpenAI, Cohere, or Azure OpenAI Service.
AI Provider API Keys OpenAI
See unresolvable-reference.html#GUID-D9EFE56B-402D-4A8B-90E0-96C99FCF81AD to get your API keys.
OpenAI-compatible providers
See unresolvable-reference.html#GUID-F8EAAF29-9750-4DFD-AE2B-DBFC509AD436 to get your API keys and
provider_endpoint.Cohere
See unresolvable-reference.html#GUID-3AA3B2FB-1EEC-481E-940F-832B97187564 to get your API keys.
Azure OpenAI Service
See unresolvable-reference.html#GUID-45F5C27C-5431-4B7C-84C3-23B77C1F4C12 to get your API keys and to configure the service.
Note:
If you are using the Azure OpenAI Service principal to authenticate, you can skip theDBMS_CLOUD.CREATE_CREDENTIALprocedure. See Examples of Using Select AI for an example of authenticating using Azure OpenAI Service principal.OCI Generative AI
See unresolvable-reference.html#GUID-9CD3FF6F-E380-4E17-8AB0-8153D80B73FB to generate API signing keys.
Google
See unresolvable-reference.html#GUID-4FDC2F94-36CD-41C3-AC1A-84F67B161DB2 to generate your API keys.
Anthropic
See unresolvable-reference.html#GUID-CD371DA8-6C1B-447F-9420-64F1F01205A0 to generate your API keys.
Hugging Face
See unresolvable-reference.html#GUID-7D964B69-2A68-4FB8-BC10-C37C1F5945E0 to generate your API keys.
AWS
See unresolvable-reference.html#GUID-B71D2617-F079-4982-A979-6C1C8C58B577 to get your API keys and model ID.
The following example grants quotas on tablespace to the
ADB_USER to use Select AI with RAG:
ALTER USER ADB_USER QUOTA 1T ON <tablespace_name>;
To the check the tablespace quota granted to a user, run the following:
SELECT TABLESPACE_NAME, BYTES, MAX_BYTES
FROM DBA_TS_QUOTAS
WHERE USERNAME = '<username>' AND
TABLESPACE_NAME LIKE 'DATA%';
The parameters are:
TABLESPACE_NAME: The tablespace for which the quota is assigned. In Autonomous AI Database, tablespaces are managed automatically and haveDATAas a prefix.BYTES: The amount of space currently used by the user in the tablespace.MAX_BYTES: The maximum quota assigned (in bytes). IfMAX_BYTESis -1, it means the user has unlimited quota on the tablespace. The database user creating the vector index must haveMAX_BYTESsufficiently larger than bytes to accommodate the vector index, orMAX_BYTESshould be -1 for unlimited quota.
Parent topic: Perform Prerequisites for Select AI
1.2 Data Science Agent Concepts
Here is a list of key concepts and terms commonly used in Data Science Agent.
AI Credential
dbms_cloud.create_credential procedure. The credential
comprises the following information:
user_ocid: This is the unique identifier of the OCI user.tenancy_ocid: This is the unique identifier of the OCI tenancy in your cloud account.private_key: The private key associated with the OCI user. It is required for secure authentication.fingerprint: The fingerprint of the public key linked to the OCI user.
AI Profile
An AI Profile is a named configuration that specifies how the database connects to an
LLM — including the provider, (for example, openai, oci), credential, model, and
optional parameters such as temperature and max_tokens and so on. You create and
manage AI profiles through the DBMS_CLOUD_AI package.
For more information, see Manage AI Profiles.
Conversation
The interaction with Data Science Agent takes the form of a conversation,
each made up of alternating turns. Each turn
begins with a user prompt, followed by the agent’s response. The conversation
retains the context throughout, allowing you to refer back to previous answers in
later questions. For example, you might ask, "filter the dataset you just profiled,"
or "train a model using the training dataset we prepared".
Conversation history
Conversation history is a persistent record of past conversations with the agent. It allows you to browse through conversation history, review previous results, and continue past sessions without losing the context. This ensures continuity over time, allows multiple workloads in separate chats, supports reproducibility of analyses, and provides an auditable trail on how insights were derived.
Conversation Objects Catalog
Data Science Agent operates on and produces three types of database object while handling requests. If you associate these objects to your conversation, the agent can inspect, analyze, transform, and model from those objects directly. This will thereby enhance the quality of the agent's response. If you do not associate any object, the agent will automatically scan the database for relevant objects based on your prompt.
- Tables: Source data and persisted modeling results (created by the agent)
- Views: Views may be pre-existing data sources or derived
datasets created by the agent. They are used for analysis, modeling, or
general data transformation. Views created by the agent use the prefix
DSAGENT$and may include a unique suffix. - Mining models: The Oracle Machine Learning (OML) models trained by the agent.
For more information on how to associate these objects to your conversation, see .
Prompt
A prompt is your input or message that initiates an interaction. It can be a question, command, statement, or request that Data Science Agent processes in order to generate an appropriate response. Essentially, the prompt guides the agent on what information or action you are seeking.
Prompt library
A prompt library is a curated set of system, task, and tool-specific prompts that defines how the agent interprets your prompts, interprets results, and calls various tools. The prompts are designed to encode domain knowledge and ensure consistent, reliable behavior.
Service Levels
In Oracle Machine Learning (OML) on Autonomous AI Database, Service Levels refer to the predefined configurations for resource allocation and workload management. Essentially, it determines how much OCPUs (Oracle CPUs), ECPUs, and memory are allocated to a session. There are four types of service levels—Low, Medium, and High.
These service levels help manage and prioritize workloads running on the database, ensuring appropriate performance based on the use case.
For more information how to change the Service Levels of your conversation, see Use Data Science Agent Chat Interface.
Tools
Tools are modular components that enable the agent to perform specific tasks such as profiling a data object, computing feature correlations, or training a model. Each tool has clearly defined inputs, outputs, and constraints. In short, tools serve as the building blocks of Data Science Agent’s functionality. Although, the end users do not interact with these tools directly, it determines the user experience of the agent.
Parent topic: About Data Science Agent
1.3 Key Highlights of Data Science Agent
Data Science Agent offers a range of powerful features designed to streamline data science workflows. The key features include:
- Data Discovery and Inspection: Accesses and discovers data locally as well as from remote sources including non-Oracle databases in multi-cloud environments.
- Exploratory Statistical Analysis:
Conducts single-variable analysis as well as relationship analysis.
Relationship analysis is performed pairwise, that is, between two variables
such as predictors and outcomes. This means each predictor is examined
individually against one outcome. Data Science Agent can scan many
predictors against a single outcome; however, this process does not replace
multivariate modeling.
Note:
Relationship analyses are most reliable when performed on row-level (fine-grained) datasets, rather than on heavily aggregated data. - View-based Data Preparation: Transforms and prepares data for modeling by creating new views. This is how it joins tables, filters populations, and derives new features from existing attributes.
- Data Analysis and Visualization: Simplifies and automates data analysis with built-in visualization for actionable insights.
- Feature Selection and Feature engineering: Profiles datasets, and performs feature selection and feature engineering.
- Model Training (supervised and unsupervised)
including Automated Model Search:Handles training for both
supervised and unsupervised models, thereby providing clear explanations of
metrics and results to support learning and decision-making. It supports
Classification, Regression, Clustering, and Anomaly Detection. Supported
algorithms include XGBoost, Random Forest, Decision Tree, Neural Network,
Naive Bayes, SVM, GLM, K-Means, Expectation Maximization, and O-Cluster.
Converse with the agent to:
- Train models to predict a categorical outcome (Classification) or a numeric value (Regression)
- Evaluate multiple supervised algorithms and pick the best model based on a metric (automated model search), and
- Build models without a labeled target (Clustering and Anomaly Detection)
- Model Comparison and Evaluation: Handles model comparison and evaluation. If you have multiple models, either created by the agent or otherwise, you can request the agent for a comparison based on a common validation dataset.
- Inference (scoring) on new data:
Performs inference (scoring) on new data. Inference requires a trained
model, a dataset containing the full feature set expected by the model, and
dataset containing the IDs to score.
Note:
Inference is supported only in ID-based scoring mode, that is IDs to score along with full feature dataset. Broader scoring options will be available soon.
Parent topic: About Data Science Agent
1.4 Limitations of Data Science Agent
While Data Science Agent offers numerous benefits, there are certain limitations that may impact its use in specific scenarios.
The current limitations of Data Science Agent include:
- Ad hoc SQL queries cannot be run directly
- Algorithms supported by Oracle permitted for models
- Conversation length and scope
- Error handling
- Limitations in result visualization
- Performance and latency related limitations
- Reuse of existing objects
Parent topic: About Data Science Agent
1.4.1 Ad hoc SQL queries cannot be run directly
Data Science Agent is capable of generating SQL internally to create views. However, it does not support running of ad hoc SQL queries or direct visualization of raw result sets currently.
Note:
You can define arbitrary views to structure and transform data for downstream analysis and modeling.Parent topic: Limitations of Data Science Agent
1.4.2 Algorithms supported by Oracle permitted for models
Note:
Inference or scoring is not supported for Clustering and Anomaly Detection.Parent topic: Limitations of Data Science Agent
1.4.3 Conversation length and scope
While Data Science Agent can handle extended interactions, very long conversations may gather context that negatively affects clarity or performance. For extended work, consider starting a new conversation after substantial number of interactions (around 50 messages), particularly when your objectives change.
Parent topic: Limitations of Data Science Agent
1.4.4 Error handling
Note:
Oracle recommends refining prompts, adjusting goals, or re-running steps.Parent topic: Limitations of Data Science Agent
1.4.5 Limitations in result visualization
Data Science Agent only provides the summaries of its analysis or limited data samples. Interactive viewing of raw query results is not supported currently.
Parent topic: Limitations of Data Science Agent
1.4.6 Performance and latency related limitations
Certain operations such as data discovery, feature analysis, and model training may require a few minutes to process. Model training on very large datasets can take even longer. During these operations, the conversation may not progress until the operation is completed. If you encounter such performance or latency related issues, you can start other conversations.
Parent topic: Limitations of Data Science Agent
1.4.7 Reuse of existing objects
Note:
If several similar objects are available, make sure that you specify whether to reuse or recreate the objects.Parent topic: Limitations of Data Science Agent
1.5 Data Science Agent: Sample Prompts and Outputs
Here are some sample prompts and outputs related to various machine learning domains on which you may have conversations with Data Science Agent.
1. Data Discovery
Discovery is semantic and goal-driven. It works best and more efficiently when the goal and domain are stated explicitly.
You can ask the agent to find data objects relevant to a business topic or analysis goal. For example, marketing response, churn, fraud, product demand. It can also you obtain a general overview of all available objects.
Example 1-1 Discover available tables, views, and models
- Find tables and views related to bank marketing subscriptions and campaign contacts.
- What data exists related to bank marketing?
- Find tables related to customer churn and retention?
- What objects are available?
Outputs
- A curated set of relevant objects—tables, views, models.
- Business-oriented summaries and hints about how objects relate. For example, likely join keys.
- Additional extended report with detailed information about all relevant objects.
Note:
- Best results depend on meaningful metadata. Semantically clear tables, views, column names and well-maintained annotations improve quality and relevance of results.
- You can manually associate database objects—tables, views, or models to the conversation so the agent can use them immediately. This is useful when the relevant objects are already known and discovery is unnecessary. Once associated, the agent can inspect, analyze, transform, and model from those objects directly. Discovery can remain optional unless additional data needs to be found.
2. Inspect Specific Object
You can ask for details about a specific table, view, or mining model.
Example 1-2 Ask questions related to specific tables, views and mining models
- Describe the
CUSTOMERStable? - Show the columns and types in
SCHEMA.SALES_TRANSACTIONS? - What attributes are used in the model
CHURN_MODEL?
Outputs
- For tables and views, the agent will typically retrieve information related to row and column counts, column list and data types, a small data sample.
- For models, the agent will typically retrieve information related to features, target and algorithm data.
3. Exploratory Statistical Analysis
For exploratory statistical analysis, you can ask the agent for both single-variable analysis and relationship analysis (pair).
Example 1-3 Single-variable analysis
You can request distribution and qualitative summaries for one or more individual columns.
- Describe the
SALES.CUSTOMERStable and provide an overview of all its attributes. - Provide an overview of all variables in
SCHEMA.CUSTOMERS_VIEW - Analyze the distribution of
AGE,INCOME, andJOB_CATEGORY. - Analyze which factors are most associated with subscription behavior.
Outputs
- Global interpretation of analysis results.
- Distribution summaries for each variable, using statistics and plots appropriate to the variable type, that is, numeric versus categorical.
- Percentage of missing values and number of categories, as applicable.
Example 1-4 Relationship analysis
Note:
Relationship analyses are most reliable when performed on row-level (fine-grained) datasets, rather than on heavily aggregated data.- What factors are most associated with subscriptions?
- How does
CONTACT_CHANNELrelate toAGE? - Analyze relationships of all features versus
CHURN_FLAG.
Outputs
- Global interpretation of pairwise analysis results.
- Pairwise relationship summaries for each attribute (against target variable), using statistics and plots appropriate to the variable types (numeric vs. categorical).
4. View-Based Data Transformation and Preparation
- Join customer, transaction, and interaction tables into a unified dataset.
- Filter to a time window or segment. For example, last 12 months, specific product line and so on.
- Create derived fields. For example, date components such as year/month/day or day of week.
- Exclude unsupported or non relevant fields from training datasets when needed.
- Create a new view joining clients, contacts, and past campaigns; extract day and month from timestamps
The agent does not run arbitrary adhoc SQL queries and return full result sets for interactive browsing. Views are the primary mechanism for shaping data.
The agent does not directly modify base tables.
Example 1-5 View-Based Data Transformation and Preparation
- Join
CLIENTS,CONTACTS, andPAST_CAMPAIGNSinto a modeling dataset. - Make the dataset ready for modeling by extracting features from timestamps.
Outputs
- A new view in the user schema starting with the prefix
DSAGENT$ - A plain-language summary of what the view contains and how it was created
- SQL code used to create the view.
- Visual diagram to track dependencies and operations at a glance.
5. Feature Importance and Feature Selection
Ask which variables matter most for predicting a specific target and optionally reduce the dataset to most important features.
Example 1-6 Feature Importance and Feature Selection
- Rank feature importance for predicting
SUBSCRIBED. - Create a reduced dataset with only important features
Outputs
- A ranked list of attributes with importance scores
- Optionally, a new top-features view created from the original dataset
Note:
Feature importance can be computed using different supported algorithms. The agent can guide you on algorithm choice in business terms.6. Dataset Splitting for Training and Evaluation
Use the agent to split dataset as database views.
Example 1-7 Dataset Splitting for Training and Evaluation
- Split into train/validation/test using standard percentages.
- Create an 80/20 train/test split.
- Split data into train, validation and test sets, then find best model optimizing Accuracy.
Outputs
- New views with suffixes such as
_TRAIN,_VAL(if requested),_TEST - Optional
_UNLABELEDview if a target column is provided and some rows haveNULLtargets. You can use this view later for inference. - SQL code used to perform the split.
7. Model Training
Use Data Science Agent for model training, automated model selection (supervised model search), and model building (unsupervised learning).
Example 1-8 Supervised learning (classification and regression)
Here are some prompts to use Data Science Agent to train models to predict a categorical outcome (classification) or a numeric value (regression).
- Train a classifier to predict
SUBSCRIBED. - Train a regression model to predict
CALL_DURATION.
Outputs
- A trained OML mining model stored in the database
- A summary of the training run and configuration choices
- SQL code to replicate the training
Example 1-9 Automated model selection (supervised model search)
Note:
Automated model selection requires a validation set for comparing models against the selected metric.After automated model selection, the winning model is retrained on combined train and validation dataset. Therefore, the final model is not the same object as the one that scored best during comparison.
- Find the best model for predicting SUBSCRIBED using F1 metric.
- Run an automated model search for churn prediction.
Outputs
- A best-performing model selected using the chosen metric (on validation data).
- A report of validation performance across tested algorithms.
- Optionally, a results table containing the benchmark metrics.
Example 1-10 Unsupervised learning (Clustering and Anomaly Detection)
Note:
For unsupervised models, only model build is supported currently. Additional scoring capabilities and interpretations for clustering will be available soon.- Segment customers into clusters.
- Detect anomalies in transaction behavior.
Outputs
- A trained clustering or anomaly model stored in the database
- A summary describing how to use the model for downstream scoring
8. Compare Models and Select a Winner
When multiple models exist, either created by the agent or otherwise, you can request the agent for a comparison based on a common validation dataset.
Example 1-11 Model comparison
- Compare these three models using AUC and select the best.
- Rank candidate models and store results in a table.
Outputs
- A ranked comparison (best-to-worst) on the specified metric.
- Optional persistence of the full ranking into a results table for auditability.
9. Evaluate Models
Note:
Evaluation on held-out test set is intended as the most reliable estimate of generalization performance of a trained model.Example 1-12 Model Evaluation
- Evaluate the selected model on the test set.
- Provide Precision, Recall, F1 and a confusion matrix on test.
- Evaluate regression error on test.
- Evaluate the best model on the test set, then score the prospects dataset and return the highest-probability cases
Outputs
- For Classification: accuracy-family metrics and confusion-matrix reporting (binary and multiclass supported)
- For Regression: fit and error metrics. For example, R², MAE, RMSE.
- A test-results table stored in the database.
- SQL code to use the models in inference on arbitrary data.
10. Inference and Scoring
- A trained model
- A dataset containing the full feature set expected by the model
- A dataset containing the IDs to score
Note:
Inference is supported only in ID-based scoring mode, that is IDs to score along with full feature dataset. Broader scoring options will be available soon.Example 1-13 Inference and Scoring
- Score the prospects table and return the top 500 most likely to subscribe.
- Run inference for these customer IDs.
Outputs
- Predictions returned in the UI, linked back to case IDs
- For Classification: predicted class and probability (based on the designated positive class)
- For Regression: predicted numeric value
You can also use the agent in interactive mode for suggestions and interpretations as well.
- Suggestion request: "I want to predict clients most likely to subscribe, assist me in designing a suitable workflow"
- Interpretation request: "Can you help me interpret the model metrics so that I can better assess its performance?"
Parent topic: About Data Science Agent