7.3 Key Highlights of Data Science Agent
Data Science Agent offers a range of powerful features designed to streamline data science workflows. The key features include:
- Data Discovery and Inspection: Accesses and discovers data locally as well as from supported accessible database objects.
- Exploratory Statistical Analysis:
Conducts single-variable analysis as well as relationship analysis.
Relationship analysis is performed pairwise, that is, between two variables
such as predictors and outcomes. This means each predictor is examined
individually against one outcome. Data Science Agent can scan many
predictors against a single outcome; however, this process does not replace
multivariate modeling.
Note:
Relationship analyses are most reliable when performed on row-level (fine-grained) datasets, rather than on heavily aggregated data. - View-based Data Preparation: Transforms and prepares data for modeling by creating new views. This is how it joins tables, filters populations, and derives new features from existing attributes.
- Data Analysis and Visualization: Simplifies and automates data analysis with built-in visualization for actionable insights.
- Feature Selection and Feature engineering: Profiles datasets, and performs feature selection and feature engineering.
- Model Training (supervised and unsupervised)
including Automated Model Search: Handles training for both
supervised and unsupervised models, thereby providing clear explanations of
metrics and results to support learning and decision-making. It supports
Classification, Regression, Clustering, and Anomaly Detection. Supported
algorithms include XGBoost, Random Forest, Decision Tree, Neural Network,
Naive Bayes, SVM, GLM, K-Means, Expectation Maximization, and O-Cluster.
Converse with the agent to:
- Train models to predict a categorical outcome (Classification) or a numeric value (Regression)
- Evaluate multiple supervised algorithms and pick the best model based on a metric (automated model search), and
- Build models without a labeled target (Clustering and Anomaly Detection)
- Model Comparison and Evaluation: Handles model comparison and evaluation. If you have multiple models, either created by the agent or otherwise, you can request the agent for a comparison based on a common validation dataset.
- Inference (scoring) on new data:
Performs inference (scoring) on new data. Inference requires a trained
model, a dataset containing the full feature set expected by the model, and
a dataset containing the IDs to score.
Note:
Inference is supported only in ID-based scoring mode, that is IDs to score along with full feature dataset. Broader scoring options will be available soon. - Reuse of existing objects: Data Science Agent may reuse existing objects—views or models, although starting from scratch is also an option. If you prefer that the agent doesn't reuse previous objects, you can state so in your response. Otherwise, the agent may refer to or reuse relevant objects created earlier—including those from other conversations—when they are manually associated or automatically discovered. This is done to save time by avoiding repeated creation of the same objects.
Parent topic: Data Science Agent