Ingesting Data Source Data in Generative AI Agents

A data ingestion job extracts data from data source documents, converts it into a structured format suitable for analysis, and then stores it in a knowledge base.

  1. On the Knowledge Bases list page, select the knowledge base that you want to ingest data for its data source.
    If you need help finding the list page, see Listing Knowledge Bases.
  2. Select the data source that you want to ingest its data.
  3. Select Create Ingestion job.
  4. Enter the following values:
    • Name: A name that starts with a letter or underscore, followed by letters, numbers, hyphens, or underscores. The length can be from 1 to 255 characters.
    • Description: An optional description
    • Tags: Select Show advanced options and add one or more tags to the ingestion job. If you have permissions to create a resource, then you have permission to update its tags. If you need help, see Tags and Tag Namespace Concepts.
  5. Select Create.
  6. Wait for the ingestion job status to change. Perform any action that might be required.
    Ingestion job status Description Action to perform
    Succeeded The job completed and processed all files successfully. Review the status logs to confirm that all updated files are successfully ingested.
    Completed, with failures The job completed and processed all files. However, there are some file failures. The possible file failure or failures might be:
    • Corrupted file.
    • PDF file is password-protected.
    • Images in a file that are too small are ignored. The rest of the file content is ingested.
    • Corrupted images in a file are ignored. The rest of the file content is ingested.
    • There is an issue processing file metadata attributes. The file is ingested but without the metadata attributes.
    Check the status logs to understand the reason for individual file failures. Address the issues and restart the job.
    Failed, fix data source There is an issue accessing the bucket or files that are specified in the data source configuration. Check the status logs for suggestions on how to fix the issue or issues, then restart the job.
    Failed, needs retry There is an issue with a dependent system such as Object Storage or OpenSearch, even after several retries. Run the job again later.
    Failed, contact Support There is an issue that cannot be resolved by retrying. Contact Support

Note

After Creating an Ingestion Job
  1. Review the status and status logs to confirm that all updated files were successfully ingested. If you need help getting the status logs, see Getting a Data Ingestion Job's Details.
  2. If the ingestion job fails (for example, because of a file being too large), address the issue and restart the job. See Step 6 for the meaning of a job status and the action to take.
How the Ingestion Pipeline Handles Previously Run Jobs

When you restart a previously run ingestion job, the pipeline:

  1. Detects files that were successfully ingested earlier and skip them.
  2. Only ingests files that failed previously and have since been updated.
Example Scenario

Suppose you have 20 files to ingest, and the initial job run results in 2 failed files. When you restart the job, the pipeline:

  1. Recognizes that 18 files have already been successfully ingested and ignore them.
  2. Ingests only the 2 files that failed earlier and have since been updated.