Use Select AI to Generate SQL from Natural Language Prompts

Autonomous AI Database on Dedicated Exadata Infrastructure Select AI enables you to query your data using natural language.

The Select AI feature allows Autonomous AI Database to use generative AI with Large Language Models (LLMs) to convert user's input text into Oracle SQL. Select AI processes the natural language prompt, supplements the prompt with metadata, and then generates and runs a SQL query.

Supported Platforms

Select AI is supported on Autonomous AI Database Serverless and Autonomous AI Database on Dedicated Exadata Infrastructure and Cloud at Customers.

  • Autonomous AI Database Serverless
  • Autonomous AI Database on Dedicated Exadata Infrastructure
  • Autonomous AI Database on Dedicated Exadata Infrastructure Region
  • Autonomous AI Database Cloud@Customer

About SQL Generation

Using natural language to interact with your database data is now achievable with LLMs. This means you can use natural language, for example plain English, to query the database.

When you use Select AI, Autonomous AI Database manages the process of converting natural language into SQL. This means you can provide a natural language prompt instead of SQL code to interact with your data. Select AI serves as a productivity tool for SQL users and developers and enables non-expert SQL users to derive useful insights from their data, without having to understand data structures or technical languages.

The DBMS_CLOUD_AI package in Autonomous AI Database enables integration with a user-specified LLM for generating SQL code using natural language prompts. The package assists in supplying the LLM with knowledge of the database schema and instructing it to write a SQL query consistent with that schema. The DBMS_CLOUD_AI package works with AI providers like OpenAI, Cohere, Azure OpenAI Service, and Oracle Cloud Infrastructure Generative AI.

Note:

Users must have an account with the AI provider and provide their credentials through DBMS_CLOUD_AI objects that the Autonomous AI Database uses.

Terminology

It is important to understand the various terms used with Select AI before using it.

The following are the terms related to Select AI feature:

Term Definition

Database Credential

Database Credentials are authentication credentials used to access and interact with databases. They typically consist of a user name and a password, sometimes supplemented by additional authentication factors like security tokens. These credentials are used to establish a secure connection between an application or user and a database, ensuring that only authorized individuals or systems can access and manipulate the data stored within the database.

Hallucination in LLM

Hallucination in the context of Large Language Models refers to a phenomenon where the model generates text that is incorrect, nonsensical, or unrelated to the input prompt. Despite being a result of the model's attempt to generate coherent text, these instances can contain information that is fabricated, misleading, or purely imaginative. Hallucination can occur due to biases in training data, lack of proper context understanding, or limitations in the model's training process.

IAM Oracle Cloud Infrastructure Identity and Access Management (IAM) lets you control who has access to your cloud resources. You can control what type of access a group of users have and to which specific resources. To learn more, see Overview of Identity and Access Management.

Natural Language Prompts

Natural Language Prompts are human-readable instructions or requests provided to guide generative AI models, such as Large Language Models. Instead of using specific programming languages or commands, users can interact with these models by entering prompts in a more conversational or natural language form. The models then generate output based on the provided prompt.

Network Access Control List (ACL)

A Network Access Control List is a set of rules or permissions that define what network traffic is allowed to pass through a network device, such as a router, firewall, or gateway. ACLs are used to control and filter incoming and outgoing traffic based on various criteria such as IP addresses, port numbers, and protocols. They play a crucial role in network security by enabling administrators to manage and restrict network traffic to prevent unauthorized access, potential attacks, and data breaches.

Similarity Search

Similarity search identifies and retrieves data points that closely match a given query by comparing feature vectors in a vector store.

Vector Distances

Vector distances measure the similarity or dissimilarity between feature vectors by calculating the distance between them in a multidimensional space.

Vector Index

A vector index organizes and stores feature vectors to enable efficient similarity search and retrieval of related data.

Vector Store

A vector store includes systems that store, manage, and enable semantic similarity search involving vector embeddings. This includes standalone vector databases and Oracle Database 23ai AI Vector Search.

Usage Guidelines

Provides usage guidelines that ensure effective and proper usage of natural language prompts for SQL generation to ensure an enhanced user experience.

Intended Use

This feature is intended for the generation and running of SQL queries resulting from user-provided natural language prompts. It automates what a user could do manually based on their schema metadata in combination with a large language model (LLM) of their choice.

While any prompt can be provided, including those that do not relate to the production of SQL query results, Select AI focuses on SQL query generation. Select AI enables submitting general requests with the chat action.

Prompt Augmentation Data

The database augments the user-specified prompt with database metadata to mitigate hallucinations from the LLM. The augmented prompt is then sent to the user-specified LLM to produce the query.

The database augments the prompt with schema metadata only. This metadata may include schema definitions, table and column comments, and content available from the data dictionary and catalog. For the purposes of SQL generation, the database does not provide table or view contents (actual row or column values) when augmenting the prompt.

The narrate action, however, does provide the result of the query, which may contain database data, to the user-specified LLM from which to generate natural language text describing the query results.

WARNING:

Large language models (LLMs) have been trained on a broad set of text documentation and content, typically from the Internet. As a result, LLMs may have incorporated patterns from invalid or malicious content, including SQL injection. Thus, while LLMs are adept at generating useful and relevant content, they also can generate incorrect and false information including SQL queries that produce inaccurate results and/or compromise security of your data.

The queries generated on your behalf by the user-specified LLM provider will be run in your database. Your use of this feature is solely at your own risk, and, notwithstanding any other terms and conditions related to the services provided by Oracle, constitutes your acceptance of that risk and express exclusion of Oracle’s responsibility or liability for any damages resulting from that use.

Select your AI Provider and LLMs

Choose an AI provider and LLM that meets your security standards and aligns with your specific needs, like text or code generation.

Different LLMs excel at various tasks based on their training data and intended purpose. Some models are excellent for text generation but may not perform well in code generation, while others are specifically optimized for coding tasks. Choose an LLM that best suits your needs.

AI Provider LLMs Embedding Model for RAG Purpose

OCI Generative AI

  • meta.llama-3.3-70b-instruct (default)

  • meta.llama-3.2-90b-vision-instruct

  • meta.llama-3.2-11b-vision-instruct

  • meta.llama-3.1-70b-instruct

  • meta.llama-3.1-405b-instruct

  • cohere.command-r-08-2024

  • cohere.command-r-plus-08-2024

  • cohere.command-r-16k (deprecated)

  • cohere.command–r-plus (deprecated)

  • xai.grok-3

  • xai.grok-3-fast

  • xai.grok-3-mini

  • xai.grok-3-mini-fast

  • xai.grok-4

  • xai.grok-4-fast-reasoning

  • xai.grok-4-fast-non-reasoning

  • cohere.embed-english-v3.0 (default)
  • cohere.embed-multilingual-v3.0
  • cohere.embed-english-light-v3.0
  • cohere.embed-multilingual-light-v3.0

See About the Embedding Models in Generative AI.

The OCI Generative AI Chat models are supported for all SELECT AI actions such as runsql, showsql, explainsql, narrate, and chat.

The OCI Generate text models are supported only for SELECT AI chat action.

To configure your profile attributes, see .

Azure OpenAI Service

  • GPT-4o
  • GPT-4
  • GPT-4 Turbo with Vision
  • GPT-3.5-Turbo

text-embedding-ada-002

Best suited for generating SQL from natural language prompts, chat action, and Select AI RAG.

OpenAI

  • gpt-3.5-turbo (default)
  • gpt-4o
  • gpt-4o-mini
  • gpt-4
  • gpt-4-0613
  • gpt-4-32k
  • gpt-4-32k-0613
  • gpt-3.5-turbo-0613
  • gpt-3.5-turbo-16k
  • gpt-3.5-turbo-16k-0613

text-embedding-ada-002

Best suited for generating SQL from natural language prompts, chat action, and Select AI RAG.

OpenAI-Compatible

Models from OpenAI-compatible providers such as:
  • Fireworks AI
  • xAI
  • Others
Embedding models from OpenAI-compaitble providers. For example, see Fireworks AI embedding models.

Supports a wide range of use cases.

Cohere

  • command (default)
  • command-nightly (experimental)
  • command-r
  • command-r-plus
  • command-light
  • command-light-nightly (experimental)
  • custom models

embed-english-v2.0

Best suited for chat action.

Google

  • gemini-1.5-flash (default)
  • gemini-1.5-pro
  • gemini-1.0-pro
text-embedding-004 (default)

Best suited for generating SQL from natural language prompts, chat action, and Select AI RAG.

Anthropic

  • claude-3-5-sonnet-20240620 (default)
  • claude-3-opus-20240229
  • claude-3-sonnet-20240229
  • claude-3-haiku-20240307
NA

Best suited for generating SQL from natural language prompts, chat action, and Select AI RAG.

Hugging Face

  • Mixtral-8x7B-Instruct-v0.1 (default)
  • Meta-Llama-3-70B-Instruct
  • Qwen1.5-1.8B
  • other chat models
NA

Best suited for generating SQL from natural language prompts, chat action, and Select AI RAG.

AWS

  • amazon.titan-embed-text-v1
  • amazon.titan-embed-text-v2:0
  • cohere.embed-english-v3

Supports a wide range of use cases.

Note:

  • Specify OpenAI-compatible provider through provider_endpoint instead of provider parameter. See .

  • For models that accept images, use meta.llama-3.2-90b-vision-instruct. This model is specifically trained for vision and images. While it can be used for text and SQL generation, the model is best suited for images. To learn more, see Chat in OCI Generative AI.

    The meta.llama-3.2-11b-vision-instruct model provides robust multimodal capabilities.

  • Embedding models are also known as transformer models.

Select AI Concepts

Explores the concepts and terms related to Select AI.

About SQL Generation

Using natural language to interact with your database data is now achievable with LLMs. This means you can use natural language, for example plain English, to query the database.

When you use Select AI, Autonomous AI Database manages the process of converting natural language into SQL. This means you can provide a natural language prompt instead of SQL code to interact with your data. Select AI serves as a productivity tool for SQL users and developers and enables non-expert SQL users to derive useful insights from their data, without having to understand data structures or technical languages.

The DBMS_CLOUD_AI package in Autonomous AI Database enables integration with a user-specified LLM for generating SQL code using natural language prompts. The package assists in supplying the LLM with knowledge of the database schema and instructing it to write a SQL query consistent with that schema. The DBMS_CLOUD_AI package works with AI providers like OpenAI, Cohere, Azure OpenAI Service, and Oracle Cloud Infrastructure Generative AI.

Note:

Users must have an account with the AI provider and provide their credentials through DBMS_CLOUD_AI objects that the Autonomous AI Database uses.

Use DBMS_CLOUD_AI to Configure AI Profiles

Autonomous AI Database uses AI profiles to facilitate and configure access to an LLM and to setup for the generation of SQL statements from natural language prompts.

AI profiles include database objects that are the target for natural language queries. Metadata used from these targets can include database table names, column names, column data types, and comments. You create and configure AI profiles using the DBMS_CLOUD_AI.CREATE_PROFILE and DBMS_CLOUD_AI.SET_PROFILE procedures.

Requirements to Configure DBMS_CLOUD_AI Package

The following are required to run DBMS_CLOUD_AI:

  • Access to an Oracle Cloud Infrastructure cloud account and to an Autonomous AI Database instance.
  • A paid API account for a supported AI provider, one of:
    • OpenAI: To enable OpenAI to generate SQL from natural language prompts, obtain API keys from your OpenAI paid account.

      You can find your secret API key in your User settings.

    • Cohere: To enable Cohere to generate SQL from natural language prompts, obtain API keys from your Cohere paid account.

      Click Dashboard, and click API Keys on the left navigation. Copy the default API key or create another key. See API-Keys for more information.

    • Azure OpenAI Service: To enable Azure OpenAI Service to generate SQL from natural language prompts, configure and provide access to the AI provider.

      To use Azure OpenAI Service, perform the following steps:

      1. Obtain your secret API keys. You can find your API keys in the Resource Management section of your Azure portal. On your Azure OpenAI Service Resource page, click Keys and Endpoint. You can copy either KEY1 or KEY2.
      2. Create an Azure OpenAI Service resource and deploy a model: Create and deploy an Azure OpenAI Service resource.

        Tip:

        • Note the resource name and deployment name as those parameters are used to provide network access permission and creating your Azure OpenAI Service profile using the DBMS_CLOUD_AI.CREATE_PROFILE procedure.
        • To know more about rate limits for token per minute on a model, see Azure OpenAI Service quotas and limits.
      3. Allow access to Azure OpenAI Service:
        • You can use your secret API key to allow access to Azure OpenAI Service. To know more, see the example in .
    • OCI Generative AI: See How to Generate the API Signing Key.
    • Google: To enable Google AI Studio to generate SQL and text responses to your natural language prompts, obtain API keys from your Google AI Studio paid account.
      1. Go to Google AI Studio.
      2. Click Sign In to Google AI Studio.
      3. Click Get API key on the prompt screen.
      4. Select all the applicable options on the next page.
      5. Click Create API key.
      6. Click Create API key in new project.

        The screen displays the progress and generates an API key. Copy the key and save it.

    • Anthropic: To enable Anthropic Developer Console to generate SQL and text responses to your natural language prompts, obtain API keys from your Anthropic Developer Console paid account.
      1. Go to Anthropic Developer Console.
      2. Sign up for an account if you don't have one already.
      3. Once logged in, navigate to the API section or the dashboard.
      4. Look for an option to generate or view API keys.
      5. Click to create a new API key.
      6. Copy the generated API key and save it.

        The Claude API is a paid service. You'll need to add credits to your account before you can use the API key.

    • Hugging Face: To enable Hugging Face as your AI provider to generate SQL and text responses to your natural language prompts, obtain API keys from your Hugging Face paid account.
      1. Go to Hugging Face.
      2. Sign up for an account if you don't have one already.
      3. Navigate to your account settings.
      4. In the navigation menu locate the Access Tokens.
      5. Click to create a new API key.
      6. Copy the generated API key and save it.
  • Network ACL privileges to access your external AI provider.

    Note:

    Network ACL is not applicable for OCI Generative AI.
  • A credential that provides access to the AI provider.

Configure DBMS_CLOUD_AI Package

Before you use Select AI, here are the steps to enable DBMS_CLOUD_AI.

To configure DBMS_CLOUD_AI:
  1. Grant the EXECUTE privilege on the DBMS_CLOUD_AI package to the user who wants to use Select AI.

    By default, only ADMIN user is granted the EXECUTE privilege. The ADMIN user can grant EXECUTE privilege to other users.

  2. Grant network ACL access to the user who wants to use Select AI and for the AI provider endpoint.

    The ADMIN user can grant network ACL access. See APPEND_HOST_ACE Procedure in Oracle Database 19c PL/SQL Packages and Types Reference or Oracle Database 26ai PL/SQL Packages and Types Reference for more information.

  3. Create a credential to enable access to your AI provider.

    See CREATE_CREDENTIAL Procedure for more information.

The following example grants the EXECUTE privilege to ADB_USER:
grant execute on DBMS_CLOUD_AI to ADB_USER;

The following example grants ADB_USER the privilege to use the api.openai.com endpoint.

BEGIN  
    DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
         host => 'api.openai.com',
         ace  => xs$ace_type(privilege_list => xs$name_list('http'),
                             principal_name => 'ADB_USER',
                             principal_type => xs_acl.ptype_db)
   );
END;
/

APPEND_HOST_ACE Function Parameters

Parameter Description

host

The host, which can be the name or the IP address of the host. You can use a wildcard to specify a domain or a IP subnet. The host or domain name is not case sensitive.

For OpenAI, use api.openai.com.

For Cohere, use api.cohere.ai.

For Azure OpenAI Service, use <azure_resource_name>.openai.azure.com. See Profile Attributes to know more about azure_resource_name.

For Google, use generativelanguage.googleapis.com.

For Anthropic, use api.anthropic.com.

For Hugging Face, use api-inference.huggingface.co.

ace

The access control entries (ACE). The XS$ACE_TYPE type is provided to construct each ACE entry for the ACL. For more details, see Creating ACLs and ACEs in Oracle Database 19c Real Application Security Administrator's and Developer's Guide or Oracle Database 26ai Real Application Security Administrator's and Developer's Guide.

Here is an example of how to create a credential to enable access to OpenAI.

EXEC DBMS_CLOUD.CREATE_CREDENTIAL('OPENAI_CRED', 'OPENAI', 'your_api_token');

DBMS_CLOUD.CREATE_CREDENTIAL Parameters

Parameter Description

credential_name

The name of the credential to be stored. The credential_name parameter must conform to Oracle object naming conventions, which do not allow spaces or hyphens.

username

The username and password arguments together specify your AI provider credentials.

The username is a user-specified user name.

password

The username and password arguments together specify your AI provider credentials.

The password is your AI provider secret API key, and depends on the provider, that is, OpenAI, Cohere, Azure OpenAI Service, OCI, Google, Anthropic, or Hugging Face. See Requirements to Configure DBMS_CLOUD_AI Package for details.

Create and Set an AI Profile

Describes the steps to create and enable an AI profile.

Use DBMS_CLOUD_AI.CREATE_PROFILE to create an AI profile. Next start DBMS_CLOUD_AI.SET_PROFILE to enable the AI profile so that you can use SELECT AI with a natural language prompt.

Note:

You must run DBMS_CLOUD_AI.SET_PROFILE in each new database session (connection) before you use SELECT AI.

The following example with the OpenAI provider creates an AI profile called OPENAI and sets the OPENAI profile for the current user session.

-- Create AI profile
--
SQL> BEGIN
  DBMS_CLOUD_AI.create_profile(
      'OPENAI',
      '{"provider": "openai",
        "credential_name": "OPENAI_CRED",
        "object_list": [{"owner": "SH", "name": "customers"},
                        {"owner": "SH", "name": "sales"},
                        {"owner": "SH", "name": "products"},
                        {"owner": "SH", "name": "countries"}]
       }');
END;
/
 
PL/SQL procedure successfully completed.
 
--
-- Enable AI profile in current session
--
SQL> EXEC DBMS_CLOUD_AI.set_profile('OPENAI');
 
PL/SQL procedure successfully completed.

Use AI Keyword to Enter Prompts

Use AI as the keyword in a SELECT statement for interacting with the database using natural language prompts.

The AI keyword in a SELECT statement instructs the SQL execution engine to use the LLM identified in the active AI profile to process natural language and to generate SQL.

You can use the AI keyword in a query with Oracle clients such as SQL Developer, OML Notebooks, and third-party tools, to interact with database in natural language.

Note:

You cannot run PL/SQL statements, DDL statements, or DML statements using the AI keyword.

Syntax

The syntax for running AI prompt is:
SELECT AI action natural_language_prompt

Parameters

The following are the parameters available for the action parameter:
Parameter Description

runsql

Run the provided SQL command using a natural language prompt. This is the default action and it is optional to specify this parameter.

showsql

Displays the SQL statement for a natural language prompt.

narrate

The output of the prompt is explained in natural language. This option sends the SQL result to the AI provider to produce a natural language summary.

chat

Generates a response directly from the LLM based on the prompt. If conversation in the DBMS_CLOUD_AI.CREATE_PROFILE function is set to true, this option includes content from prior interactions or prompts, potentially including schema metadata.

explainsql

The SQL generated from the prompt is explained in natural language. This option sends the generated SQL to the AI provider to produce a natural language explanation.

Usage Notes

  • Select AI is not supported in Database Actions or APEX Service. You can use only DBMS_CLOUD_AI.GENERATE function.

  • The AI keyword is supported only in a SELECT statement.

  • You cannot run PL/SQL statements, DDL statements, or DML statements using the AI keyword.

  • The sequence is SELECT followed by AI. These keywords are not case-sensitive. After a DBMS_CLOUD_AI.SET_PROFILE is configured, the text after SELECT AI is a natural language prompt. If an AI profile is not set, SELECT AI reports the following error:

    ORA-00923: FROM keyword not found where expected
    00923. 00000 -  "FROM keyword not found where expected"
  • Special character usage rules apply according to Oracle guidelines. For example, use single quotes twice if you are using an apostrophe in a sentence.

    select ai how many customers in SF don''t own their own home
  • LLMs are subject to hallucinations and results are not always correct:

    • It is possible that SELECT AI may not be able to run the generated SQL for a specific natural language prompt.

    • It is possible that SELECT AI may not be able to generate SQL for a specific natural language prompt.

    In such a scenario, SELECT AI responds with information to assist you in generating valid SQL.

  • Use the chat action, with SELECT AI chat, to learn more about SQL constructs. For better results with the chat action, use database views or tables with contextual column names or consider adding column comments explaining values stored in the columns.

  • To access DBA or USER views, see DBMS_CLOUD_AI Views.

Select AI with Retrieval Augmented Generation (RAG)

Select AI with RAG augments your natural language prompt by retrieving content from your specified vector store using semantic similarity search. This reduces hallucinations by using your specific and up-to-date content and provides more relevant natural language responses to your prompts.

Select AI automates the Retrieval Augmented Generation (RAG) process. This technique retrieves data from enterprise sources using AI vector search and augments user prompts for your specified large language model (LLM). By leveraging information from enterprise data stores, RAG reduces hallucinations and generates grounded responses.

RAG uses AI vector search on a vector index to find semantically similar data for the specified question. Vector store processes vector embeddings, which are mathematical representations of various data points like text, images, and audio. These embeddings capture the meaning of the data, enabling efficient processing and analysis. For more details on vector embeddings and AI vector search, see Overview of AI Vector Search.

Select AI integrates with AI vector search available in Oracle Autonomous AI Database 26ai for similarity search using vector embeddings.

Topics

Benefits of Select AI RAG

Simplify querying, enhance response accuracy with current data, and gain transparency by reviewing sources used by the LLM.

Select AI RAG offers the following benefits:
  • Simplify data querying and increase response accuracy: Enable users to query enterprise data using natural language and provide LLMs with detailed context from enterprise data to generate more accurate and relevant responses, reducing instances of LLM hallucinations.

  • Up-to-date information: Provide LLMs access to current enterprise information using vector stores, eliminating the need for costly, time-consuming fine-tuning of LLMs trained on static data sets.

  • Seamless integration: Integrate with Oracle AI Vector Search for streamlined data handling and enhanced performance.

  • Automated data orchestration: Automate orchestration steps with a fully managed Vector Index pipeline, ensuring efficient processing of new data.

  • Understandable contextual results: Has access and retrieves the sources used by the LLM from vector stores, ensuring transparency and confidence in results. Views and extracts data in natural language text or JSON format for easier integration and application development.

Build your Vector Store

Select AI automates the creation and population of vector store by converting input documents (for example, PDF, DOC, JSON, XML, or HTML) from your object store to plain text. Oracle Text supports around 150 file types. For a complete list of all the supported document formats, see Supported Document Formats.

Select AI automatically processes documents to chunks, generates embeddings, stores them in the specified vector store, and updates the vector index as new data arrives.

Here is how the input from Object Storage is used with Select AI RAG:

  1. Input: Data is initially stored in an Object Storage.
  2. Oracle Autonomous Database retrieves the input data or the document, chunks it and sends the chunks to an embedding model.
  3. The embedding model processes the chunk data and returns vector embeddings.
  4. The vector embeddings are then stored in a vector store for use with RAG. As content is added, the vector index is automatically updated.

RAG retrieves relevant pieces of information from the enterprise database to answer a user's question. This information is provided to the specified large language model along with the user prompt. Select AI uses this additional enterprise information to enhance the prompt, improving the LLM's response. RAG can enhance response quality with update-to-date enterprise information from the vector store.



Select AI implements RAG as follows:
  1. Input: User asks a question (specifies a prompt) using Select AI narrate action.

  2. Select AI generates vector embeddings of the prompt using the embedding model specified in the AI profile.

  3. The vector search index uses the vector embedding of the question to find matching content from the customer's enterprise data (searching the vector store) which has been indexed.

  4. The vector search returns top K texts similar to the input to your Autonomous AI Database instance.
  5. Autonomous AI Database then sends these top K query results with user question to the LLM.
  6. The LLM returns its response to your Autonomous AI Database instance.
  7. Autonomous AI Database Select AI provides the response to the user.

Use DBMS_CLOUD_AI to Create and Manage Vector Indexes

Use the DBMS_CLOUD_AI package to create and manage vector indexes and configure vector database JSON parameters.

After you create credentials and provide network access to the vector database and the AI provider, your Autonomous AI Database instance uses AI profiles to configure access to LLMs. See for a complete example on setting it up and using it in Select AI statements.

Note:

If you do not want table data or vector search documents to be sent to an LLM, a user with administrator privileges can disable such access for all users of the given database. This, in effect, disables the narrate action for RAG.

You can configure AI profiles for providers listed in Select your AI Provider and LLMs through the DBMS_CLOUD_AI package.

See Also:

  • Create a vector index: .

  • Manage vector index profiles and other AI profiles:
  • Query vector index views: .

Use In-database Transformer Models

Select AI RAG enables you to use pretrained ONNX transformer models that are imported into your database in Oracle Database 23ai instance for generating embedding vectors from document chunks and user prompts.

Note:

You must import a pretrained ONNX-format transformer model into Oracle Database 23ai instance to use Select AI RAG with imported in-database transformer model. You can also use other transformer models from supported AI providers.

See Example: Select AI with In-database Transformer Models to explore the feature.

Synthetic Data Generation

Generate synthetic data using random generators, algorithms, statistical models, and Large Language Models (LLMs) to simulate real data for developing and testing solutions effectively.

Note:

Support for Synthetic Data Generation is available in Oracle Database 19c starting with version 19.29, and in Oracle Database 26ai starting with version 23.26.

Synthetic data can be a powerful tool when developing and testing solutions, especially when actual data doesn’t yet exist or isn’t allowed to be used. Synthetic, or artificially generated, data can have many of the characteristics of real data. Synthetic data is typically created using random generators, algorithms, or statistical models to simulate the characteristics and distributions of real data. However, this can be complex to produce or rely on tools with features of varying sophistication. With the availability of Large Language Models (LLMs), more relevant and schema-specific data may be generated that considers characteristics expressed in natural language.

Topics

Benefits of Synthetic Data Generation

Synthetic data generation enables populating database metadata clones, supporting development, testing, and machine learning projects without using sensitive data from original tables.

Synthetic Data Generation offers the following benefits:
  • Populating metadata clones with synthetic data: A metadata clone replicates the structure of a database or schema without including actual data. Select AI allows synthetic data generation to populate these clones, protecting sensitive data while enabling development, testing, and creating templates. This approach supports performance and scalability testing.

  • Starting new projects: When starting a new project, actual data may not be available. Synthetic data provides realistic samples to help demonstrate concepts and gain support for project proposals.

  • Validating user experience: Synthetic data aids in testing user interfaces by providing diverse data sets to uncover design flaws, performance, and scalability issues.

  • Supporting AI and machine learning projects: Synthetic data is useful for training AI and machine learning models when real data is unavailable or restricted. LLMs can generate data with specific patterns to facilitate model training and scoring.

Generate Synthetic Data

Use DBMS_CLOUD_AI.GENERATE_SYNTHETIC_DATA function to generate synthetic data and query the data using Select AI actions.

Monitor and Troubleshoot Synthetic Data Generation

When generating large amounts of data across many tables, Select AI splits synthetic data generation tasks into smaller chunks and runs tasks in parallel. The status of each chunk is tracked in the SYNTHETIC_DATA$<operation_id>_STATUS table.

Synthetic data generation operations are logged in the tables DBA_LOAD_OPERATIONS and USER_LOAD_OPERATIONS. Use these tables to monitor the DBMS_CLOUD_AI.GENERATE_SYNTHETIC_DATA operation. See Track DBMS_CLOUD Load Operations for more details.

After running synthetic data generation in a given session, you can get the latest <operation_id> from USER_LOAD_OPERATION using the following:
SELECT max(id) FROM user_load_operations;

To view the synthetic data operations running in a different session, use the DBA_LOAD_OPERATIONS view.

View Status of Synthetic Data Generation

The status table of synthetic data generation operation shows the progress of each table and its corresponding chunk. The STATUS_TABLE column in USER_LOAD_OPERATIONS or DBA_LOAD_OPERATIONS shows the status table name. The table name is SYNTHETIC_DATA$<operation_id>_STATUS and it has following columns:

Name Datatype Description

ID

NUMBER

Unique identifier of the record.

NAME

VARCHAR2

Qualified name of the table, such as "ADB_USER"."EMPLOYEES"

BYTES

NUMBER

Desired number of records for this data generation task

ROWS_LOADED

NUMBER

Actual number of records generated.

CHECKSUM

VARCHAR2

Starting value for the primary key during this data generation task.

LAST_MODIFIED

TIMESTAMP WITH TIME ZONE

Timestamp indicating when the record was last modified.

STATUS

VARCHAR2

Status of the data generation task. The valid values are:
  • COMPLETED
  • FAILED
  • PENDING
  • SKIPPED

ERROR_CODE

NUMBER

Error code, if the data generation task fails.

ERROR_MESSAGE

VARCHAR2

Error message provided if the task fails.

END_TIME

TIMESTAMP WITH TIME ZONE

Timestamp marking the end of the data generation task.
Example: Check the Number of Records Generated for Each Table

To check the number of records generated for each table, issue the following:

SELECT name, SUM(rows_loaded) FROM synthetic_data$<operation_id>_status group by name;

Query ROWS_LOADED to confirm how many number of rows are loaded for each chunk, and SUM(ROWS_LOADED) for rows for each table.

BEGIN
    DBMS_CLOUD_AI.GENERATE_SYNTHETIC_DATA(
        profile_name => 'GENAI',
        object_list => '[{"owner": "ADB_USER", "name": "DIRECTOR","record_count":150},
                         {"owner": "ADB_USER", "name": "MOVIE_ACTOR","record_count":300},
                         {"owner": "ADB_USER", "name": "CLASSES", "user_prompt":"all in fall semester","record_count":5},
                         {"owner": "ADB_USER", "name": "ACTOR","record_count":220},
                         {"owner": "ADB_USER", "name": "MOVIE","record_count":50}]'
    );
END;
/
 
 
-- Check loaded rows for each chunk
SQL> SELECT name, rows_loaded FROM synthetic_data$141_status order by name;
NAME                           ROWS_LOADED
------------------------------------------
"ADB_USER"."ACTOR"                        188
"ADB_USER"."ACTOR"                         32
"ADB_USER"."CLASSES"                        5
"ADB_USER"."DIRECTOR"                     150
"ADB_USER"."MOVIE"                         50
"ADB_USER"."MOVIE_ACTOR"                   38
"ADB_USER"."MOVIE_ACTOR"                  114
"ADB_USER"."MOVIE_ACTOR"                  148
 
 
 
-- Check loaded rows for each table
SQL> SELECT name, SUM(rows_loaded) FROM synthetic_data$141_status group by name;
 
NAME                      SUM(ROWS_LOADED)
------------------------------------------
"ADB_USER"."DIRECTOR"                     150
"ADB_USER"."MOVIE_ACTOR"                  300
"ADB_USER"."CLASSES"                        5
"ADB_USER"."ACTOR"                        220
"ADB_USER"."MOVIE"                         50