Ingesting Data Source Data in Generative AI Agents

A data ingestion job extracts data from data source documents, converts it into a structured format suitable for analysis, and then stores it in a knowledge base.

On the Knowledge Bases list page, select the knowledge base that you want to ingest data for its data source. If you need help finding the list page, see Listing Knowledge Bases.
Select the data source that you want to ingest its data.
Select Create Ingestion job.
Enter the following values:
- Name: A name that starts with a letter or underscore, followed by letters, numbers, hyphens, or underscores. The length can be from 1 to 255 characters.
- Description: An optional description
- Tags: Select Show advanced options and add one or more tags to the ingestion job. If you have permissions to create a resource, then you have permission to update its tags. If you need help, see Tags and Tag Namespace Concepts.
Select Create.

An ingestion job is started with Accepted as the initial job status. You cannot cancel an ingestion job in any state.

Wait for the job status to change and perform any action that might be required.

View the job's status logs to get file ingestion processing details, including remedial actions. See Getting a Data Ingestion Job's Details for instructions about how to view the status logs.

Job status	Description	Action to perform
Succeeded	The job completed and processed all files successfully.	Review the status logs to confirm that all updated files are successfully ingested.
Completed, with failures	The job completed and processed all files. However, there are some file failures. The possible file failure or failures might be: Corrupted file. PDF file is password-protected. Corrupted images in a file are ignored. Table data in a PDF failed ingestion. The rest of the file content is ingested. URLs in a PDF failed ingestion. The rest of the file content is ingested. There is an issue processing file metadata attributes. The file is ingested but without the metadata attributes.	Check the status logs to understand the reason for individual file failures. Address the issues and restart the job.
Failed, fix data source	There is an issue accessing the bucket or files that are specified in the data source configuration.	Check the status logs for suggestions on how to fix the issue or issues, then restart the job.
Failed, needs retry	There is an issue with a dependent system such as Object Storage or OpenSearch, even after several retries.	Run the job again later.
Failed, contact support	There is an issue that cannot be resolved by retrying.	Contact Support

Note

After Creating an Ingestion Job

Review the status and status logs to confirm that all updated files were successfully ingested. If you need help getting the status logs, see Getting a Data Ingestion Job's Details.
If the ingestion job fails (for example, because of a file being too large), address the issue and restart the job.

How the Ingestion Pipeline Handles Previously Run Jobs

When you restart a previously run ingestion job, the pipeline:

Detects files that were successfully ingested earlier and skip them.
Only ingests files that failed previously and have since been updated.

Example Scenario

Suppose you have 20 files to ingest, and the initial job run results in 2 failed files. When you restart the job, the pipeline:

Recognizes that 18 files have already been successfully ingested and ignore them.
Ingests only the 2 files that failed earlier and have since been updated.

Oracle Cloud Infrastructure Documentation

Ingesting Data Source Data in Generative AI Agents