EN DA
AI
AI

RAG

Retrieval-Augmented Generation

retrievalgroundingknowledge

Overview

Retrieval-Augmented Generation (RAG) combines a language model with a retrieval system to ground outputs in specific, up-to-date documents. Instead of relying solely on parametric knowledge baked into model weights, RAG dynamically fetches relevant passages from a vector store or search index and passes them as context to the LLM.

Key Concepts

  • Query encoding: embeds the user question into a vector
  • Retrieval: finds semantically similar documents from a vector database
  • Context injection: appends retrieved passages to the LLM prompt
  • Generation: the LLM answers grounded in retrieved evidence
  • Optional: post-retrieval reranking for precision improvement

Key Facts

  • RAG was introduced by Lewis et al. at Facebook AI Research in 2020
  • It reduces hallucinations by providing verifiable source material
  • Hybrid RAG combines dense retrieval with keyword search (BM25)
  • RAG is widely used in enterprise knowledge bases, legal research, and medical AI