47 Introduction

Retrieval Augmented Generation (RAG) is an approach developed to address the limitations of LLMs. RAG combines the strengths of pretrained language models with the ability to retrieve recent and accurate information from a dataset or database in real-time during the generation of responses.

Coherence RAG (Retrieval-Augmented Generation) is designed for massively scalable document ingestion and vector embeddings creation on both CPU and GPU-based hardware. It enables efficient and accurate retrieval-augmented responses by ingesting documents from any source, using both local and remote embedding models.

Additionally, Coherence RAG offers optional integration with external vector stores, providing users with flexibility in storage and retrieval solutions. This combination ensures high-performance document processing, retrieval, and augmentation for large-scale AI applications.