RAG (Retrieval-Augmented Generation)

An AI architecture that combines a language model with an external knowledge base, retrieving relevant documents at inference time to generate more accurate and grounded responses.

Also known as: retrieval-augmented generation, retrieval augmented generation, RAG system

Retrieval-augmented generation (RAG) is an approach to AI systems that gives a language model access to an external knowledge base at the moment of answering a question. Instead of relying only on what was learned during training, the system retrieves relevant documents or data chunks from the knowledge base and includes them in the context window before generating a response.

RAG solves two major problems with standard language models: knowledge cutoffs (the model cannot know about events after its training date) and hallucination (the model generates confident-sounding but inaccurate answers from parametric memory).

A RAG system has three main components: a knowledge base (the dataset library), a retrieval mechanism (typically vector similarity search), and a generation model (the language model). The quality of the knowledge base determines the quality of the system's answers.