7.3 Key Highlights of Data Science Agent

Data Science Agent offers a range of powerful features designed to streamline data science workflows. The key features include:

  • Data Discovery and Inspection: Accesses and discovers data locally as well as from supported accessible database objects.
  • Exploratory Statistical Analysis: Conducts single-variable analysis as well as relationship analysis. Relationship analysis is performed pairwise, that is, between two variables such as predictors and outcomes. This means each predictor is examined individually against one outcome. Data Science Agent can scan many predictors against a single outcome; however, this process does not replace multivariate modeling.

    Note:

    Relationship analyses are most reliable when performed on row-level (fine-grained) datasets, rather than on heavily aggregated data.
  • View-based Data Preparation: Transforms and prepares data for modeling by creating new views. This is how it joins tables, filters populations, and derives new features from existing attributes.
  • Data Analysis and Visualization: Simplifies and automates data analysis with built-in visualization for actionable insights.
  • Feature Selection and Feature engineering: Profiles datasets, and performs feature selection and feature engineering.
  • Model Training (supervised and unsupervised) including Automated Model Search: Handles training for both supervised and unsupervised models, thereby providing clear explanations of metrics and results to support learning and decision-making. It supports Classification, Regression, Clustering, and Anomaly Detection. Supported algorithms include XGBoost, Random Forest, Decision Tree, Neural Network, Naive Bayes, SVM, GLM, K-Means, Expectation Maximization, and O-Cluster. Converse with the agent to:
    • Train models to predict a categorical outcome (Classification) or a numeric value (Regression)
    • Evaluate multiple supervised algorithms and pick the best model based on a metric (automated model search), and
    • Build models without a labeled target (Clustering and Anomaly Detection)
  • Model Comparison and Evaluation: Handles model comparison and evaluation. If you have multiple models, either created by the agent or otherwise, you can request the agent for a comparison based on a common validation dataset.
  • Inference (scoring) on new data: Performs inference (scoring) on new data. Inference requires a trained model, a dataset containing the full feature set expected by the model, and a dataset containing the IDs to score.

    Note:

    Inference is supported only in ID-based scoring mode, that is IDs to score along with full feature dataset. Broader scoring options will be available soon.
  • Reuse of existing objects: Data Science Agent may reuse existing objects—views or models, although starting from scratch is also an option. If you prefer that the agent doesn't reuse previous objects, you can state so in your response. Otherwise, the agent may refer to or reuse relevant objects created earlier—including those from other conversations—when they are manually associated or automatically discovered. This is done to save time by avoiding repeated creation of the same objects.