3 Best Practices in using Data Science Agent

Follow these best practices to maximize the benefits of Data Science Agent.

Topics:

3.1 Recommended Models

Data Science Agent works with large language models accessed through Oracle DBMS_CLOUD_AI andDBMS_CLOUD_AI_AGENT packages. The DBMS_CLOUD_AI package, with Select AI, supports the translation of natural language prompts to generate, run, explain SQL statements, and also enables RAG and natural language-based interactions, including chats with LLMs. For more information, see DBMS_CLOUD_AI Package and DBMS_CLOUD_AI_AGENT Package.

This table lists the recommended large language models and the scenarios in which each should be used.

Note:

Oracle recommends using GPT-4.1 as the preferred model for all Data Science Agent users for reliability, responsiveness, and cost-effectiveness.

Table 3-1 Recommended models

Large Language Models Provider Scenario Strengths
GPT-4.1 OpenAI or OCI GenAI Use this model for all general-purpose data science tasks.
  • Reliable
  • Cost-effective
Grok-4 OCI GenAI Use this model for complex, multi-step tasks requiring advanced reasoning.

The response time of Data Science Agent is expected to be slow when using Grok-4.

Powerful reasoning capabilities

Profile Creation for GPT-4.1

To create an AI profile for GPT-4.1 through Open AI, run the following script in a notebook:

DECLARE
    profile_name VARCHAR2(128) := 'OPENAI_GPT_4_1';
BEGIN
    dbms_cloud_ai.drop_profile(profile_name, TRUE);
    dbms_cloud_ai.create_profile(
        profile_name => profile_name,
        attributes => '{
            "comments": false,
            "conversation": true,
            "credential_name": "OPENAI_CRED",
            "model": "gpt-4.1",
            "provider": "openai",
            "temperature": 1,
            "max_tokens": 8192
           }'
        );
    END;
/

To create an AI profile for GPT 4.1 through Oracle Cloud Infrastructure (OCI), run the following script in a notebook:

DECLARE
    profile_name VARCHAR2(128) := 'OCI_GPT_4_1';
BEGIN
    dbms_cloud_ai.drop_profile(profile_name, TRUE);
    dbms_cloud_ai.create_profile(
        profile_name => profile_name,
        attributes => '{
            "comments": false,
            "conversation": true,
            "credential_name": "OCI_CRED",
            "model": "openai.gpt-4.1",
            "provider": "oci",
            "temperature": 1,
            "max_tokens": 8192,
            "oci_compartment_id": "<your-dep-id>",
            "oci_apiformat": "GENERIC"
           }'
        );
    END;
/

Profile Creation for Grok 4

To create an AI profile for Grok 4 through Oracle Cloud Infrastructure (OCI), run the following script in a notebook:

DECLARE
    profile_name VARCHAR2(128) := 'OCI_GROK_4';
BEGIN
    dbms_cloud_ai.drop_profile(profile_name, TRUE);
    dbms_cloud_ai.create_profile(
        profile_name => profile_name,
        attributes => '{
            "comments": false,
            "conversation": true,
            "credential_name": "OCI_CRED",
            "model": "xai.grok-4",
            "provider": "xAI",
            "temperature": 1,
            "max_tokens": 8192,
            "oci_compartment_id": "<your-dep-id>",
            "oci_apiformat": "GENERIC"
          }'
        );
    END;
/
Parameters:
  • profile_name: A name for the AI profile. The profile name must follow the naming rules of Oracle SQL identifier. Maximum length of profile name is 125 characters.
  • comments: Includes table and column comments in the metadata used for translating natural language prompts using AI. BOOLEAN data type is supported. The valid values are TRUE or FALSE for a string with VARCHAR2 data type. The values are not case sensitive.
  • conversation: A VARCHAR2 attribute that indicates if conversation history is enabled for a profile. Valid values are true or false. The default value is false. The values are not case sensitive.
  • credential_name: The name of the credential to access the AI provider APIs.
  • model: The name of the AI model being used to generate responses.
  • provider: AI provider for the AI profile. This is a mandatory attribute.
  • temperature: Controls the randomness of the model's output. Lower values, for example 0, make the responses more deterministic and focused. Higher values, for example, 1, make them more creative and varied. You may want to tune it depending on your use case.
    • Lower values are generally preferred for structured or factual tasks
    • Higher values can be useful for more open-ended generation. temperature = 1 gives the best results.
  • max_tokens: Denotes the number of tokens to predict per generation. Default is 1024. Sets the maximum length of the agent's response. The default value of 1024 tokens may be too low for complex or verbose answers. Setting it to 4096 should be sufficient for most use cases, while 8192 provides extra headroom for longer responses.

    Note:

    This can be an arbitrary number, but not strictly a power of 2.
  • oci_compartment_id: Specifies the OCID of the compartment you are permitted to access when calling the OCI Generative AI service. The compartment ID can contain alphanumeric characters, hyphens and dots. The default is the compartment ID of the PDB.
  • oci_apiformat:

For more information, see Manage AI Profiles.

3.2 Associate Database objects to your conversation

Consider associating database objects such as tables, views and mining models to a Data Science Agent conversation. Once you associate these objects, the agent can inspect, analyze, transform, and model from those objects directly. This will thereby enhance the quality of the agent's response. If you do not associate any object, the agent will automatically scan the database for relevant objects based on your query.

Note:

Some operations such as feature ranking, model search, training can be compute-intensive and may take time.

3.3 Ask for clarification

During the course of your conversation, you can ask for clarification at anytime. Some examples:
  • What was done in the previous step
  • Why was a particular step necessary
  • What is the next recommended step
  • Explain the <concept>. For example, what is unstructured data in machine learning?

3.4 Ask in multiple iterations

If you are using Data Science Agent for extended workflows, consider asking the agent in multiple iterations. Longer workflows are generally more effective when handled iteratively. For instance, you can start by creating a dataset view, then move to validating assumptions, and finally focus on model training and evaluation.

3.5 Clarify terminology

If you use specific terms in your conversation, it is a good practice to clarify those terms to the agent.

3.6 Follow suggestions provided by Data Science Agent

Follow the suggestion of the agent when appropriate. The agent frequently proposes the next steps of a workflow. For example, data preparation, analysis, model training. Accept or refine these suggestions for a smooth progress.

3.7 Limit your conversation length and scope

Although Data Science Agent can handle extended interactions, very long conversations may gather context that negatively affects clarity or performance. For extended work, consider starting a new conversation, especially if you encounter these situations:
  • If your conversation has a lot of messages (around 50 messages), or
  • If your objective changes

3.8 Provide context to your conversation

The interaction with Data Science Agent is structured as a conversation, consisting of alternating turns. A turn begins with your prompt, followed by the agent’s response. A Data Science Agent conversation maintains the context across turns.

Therefore, providing context to your conversation is a good practice, especially if you resume a conversation at a later time.

3.9 Specify a clear objective

Clearly state your objective at the beginning of the conversation. For example, "I want to predict customer churn" or "I want to identify the main causes". Sharing a high-level intent early in the conversation helps guide the rest of the conversation. When the agent understands your objective, it can suggest the most appropriate workflow.