EN DA
AI
AI

Hallucination

When AI confidently says things that are not true

HallucinationAI ReliabilityGroundingRAGFactuality

Overview

Hallucination refers to confident, fluent, yet factually incorrect outputs generated by large language models. Because LLMs predict each word based on learned statistical patterns rather than accessing a ground-truth knowledge base, they can produce plausible-sounding text that is simply wrong. Hallucinations range from subtle factual errors — a wrong publication date, an invented citation — to entirely fabricated events or people. Understanding why hallucinations occur and how to reduce them is one of the most actively studied challenges in applied AI.

Key Concepts

  • Autoregressive generation: LLMs select each token based on probability, optimising for fluency rather than factual accuracy, which can lead the model to invent details to complete a plausible-sounding sentence.
  • Training data gaps: When a model has seen little or no training data about a specific topic, it may still generate text that looks authoritative, extrapolating patterns in a way that has no factual basis.
  • Overconfidence without calibration: Models do not automatically know when they do not know something. Without explicit uncertainty modelling the model is as fluent and confident when wrong as when right.
  • Sycophancy and prompt bias: If a prompt implies a certain answer the model may agree even if incorrect. System prompts and user framing can bias outputs toward what the model thinks is expected rather than what is true.
  • Mitigation strategies — RAG and grounding: Retrieval-Augmented Generation (RAG) supplies verified source documents at inference time, giving the model factual anchors that reduce the scope for invention. Citation requirements and low temperature settings also improve factuality.

Key Facts

  • Studies show frontier LLMs hallucinate on between 3% and 27% of queries depending on domain, with legal and biomedical domains showing the highest rates of consequential errors.
  • Self-consistency prompting — asking the model the same question multiple times and looking for agreement — can surface uncertainty and flag likely hallucinations before answers are served to users.
  • SelfCheckGPT compares multiple independent samples from a model and scores factuality by measuring how often samples agree; inconsistent statements are flagged as likely hallucinations.
  • Hallucination rates drop significantly when models are forced to cite sources because the citation requirement shifts the task from pattern completion to evidence retrieval.
  • Human eval studies find that hallucinations are often convincingly formatted — correct in style, tone, and syntax — making them harder to detect than grammatically broken or obviously strange outputs.