LLM Hallucinations
Hallucinations occur when LLMs generate content that is factually incorrect, nonsensical, or not grounded in the provided context, while appearing confident and fluent.
Types of Hallucinations
1. Factual Hallucinations
Generated facts that are incorrect:
Q: "When did the Eiffel Tower open?"
A: "The Eiffel Tower opened in 1892."
↳ Wrong! Actual: 1889
2. Fabricated References
Invented citations, papers, or sources:
Q: "Cite a paper on transformer efficiency"
A: "See 'Efficient Transformers: A Survey' by Smith et al. (2021)
in Nature Machine Intelligence."
↳ This paper may not exist
3. Context Hallucinations
Information not present in provided context:
Context: "John is a software engineer at Google."
Q: "Where did John go to college?"
A: "John graduated from Stanford University."
↳ Not in context - model invented this
4. Logical Hallucinations
Inconsistent or contradictory reasoning:
"The product costs $100. With a 20% discount,
the final price is $85."
↳ Should be $80
Why Hallucinations Occur
1. Training Objective
LLMs are trained to predict likely next tokens, not truth:
Objective: P(next_token | previous_tokens)
NOT: Is this factually correct?
2. Training Data Issues
- Outdated information (knowledge cutoff)
- Contradictory sources
- Errors in training data
- Underrepresented topics
3. Compression and Generalization
Billions of facts → Fixed model parameters
Some details get "blurred" or merged
4. Prompt Pressure
User: "Tell me about the Battle of Springfield in 1823"
Model: *invents details because it's expected to answer*
Detection Methods
1. Self-Consistency Checking
def check_consistency(question, model, n_samples=5):
responses = [model.generate(question, temp=0.7)
for _ in range(n_samples)]
# If answers vary significantly, likely hallucination
unique_answers = set(extract_key_facts(r) for r in responses)
consistency_score = 1 / len(unique_answers)
return consistency_score
2. Fact Verification
def verify_facts(response, knowledge_base):
# Extract claims from response
claims = extract_claims(response)
verified_claims = []
for claim in claims:
# Check against knowledge base
evidence = knowledge_base.search(claim)
is_supported = evaluate_support(claim, evidence)
verified_claims.append((claim, is_supported))
return verified_claims
3. Entailment Checking
from transformers import pipeline
nli = pipeline("text-classification",
model="facebook/bart-large-mnli")
def check_grounded(context, claim):
result = nli(f"{context} [SEP] {claim}")
# Returns: entailment, contradiction, or neutral
return result[0]['label'] == 'entailment'
4. Confidence Calibration
def get_token_probabilities(model, prompt):
outputs = model.generate(
prompt,
return_dict_in_generate=True,
output_scores=True
)
# Low probability tokens may indicate uncertainty
probs = [softmax(score) for score in outputs.scores]
avg_confidence = np.mean([max(p) for p in probs])
return avg_confidence
Mitigation Strategies
1. Retrieval Augmented Generation (RAG)
def rag_generate(question, retriever, generator):
# Retrieve relevant documents
docs = retriever.search(question, k=5)
# Ground generation in retrieved context
prompt = f"""
Based ONLY on the following documents, answer the question.
If the answer is not in the documents, say "I don't know."
Documents:
{docs}
Question: {question}
"""
return generator.generate(prompt)
2. Uncertainty Expression
system_prompt = """
When answering questions:
- If you're not sure, say "I'm not certain, but..."
- If you don't know, say "I don't have information about..."
- Distinguish between facts and speculation
- Cite sources when possible
"""
3. Chain of Verification
def generate_with_verification(question, model):
# Step 1: Generate initial response
response = model.generate(question)
# Step 2: Extract verifiable claims
claims = model.generate(
f"List the factual claims in this response: {response}"
)
# Step 3: Verify each claim
for claim in claims:
verification = model.generate(
f"Is this claim accurate? Provide evidence: {claim}"
)
if "uncertain" in verification or "cannot verify" in verification:
# Flag or remove unverified claims
response = remove_claim(response, claim)
return response
4. Constrained Generation
# Only allow generation from known entities
allowed_entities = load_entity_database()
def constrained_generate(prompt, allowed_entities):
response = ""
while not is_complete(response):
next_token_probs = model.get_next_token_probs(prompt + response)
# Filter to allowed tokens based on entity constraints
filtered_probs = apply_entity_constraints(
next_token_probs,
allowed_entities
)
next_token = sample(filtered_probs)
response += next_token
return response
5. Fine-tuning for Honesty
# Training data examples
honesty_examples = [
{
"prompt": "What is the population of Atlantis?",
"response": "Atlantis is a mythical city and does not have "
"an actual population."
},
{
"prompt": "Who won the 2030 World Cup?",
"response": "I don't have information about events after "
"my knowledge cutoff date."
}
]
Measuring Hallucination
Metrics
| Metric | What It Measures |
|---|---|
| Faithfulness | Does output match source documents? |
| Factuality | Are facts verifiably true? |
| Attribution | Are claims properly sourced? |
| Consistency | Does model give same answer repeatedly? |
Evaluation Example
def evaluate_hallucination_rate(model, test_set):
hallucinations = 0
total = 0
for item in test_set:
response = model.generate(item['question'])
claims = extract_claims(response)
for claim in claims:
total += 1
if not verify_against_ground_truth(claim, item['facts']):
hallucinations += 1
return hallucinations / total
Best Practices
For Users
- Verify important facts independently
- Ask for sources and check them
- Use specific, grounded prompts with context
- Be skeptical of confident-sounding responses
- Cross-reference multiple sources
For Developers
- Implement RAG for factual applications
- Add retrieval before generation
- Build verification pipelines
- Log and analyze hallucination patterns
- Provide clear disclaimers to users
Key Takeaways
- Hallucinations are fabricated content that appears plausible
- They occur because LLMs predict likely text, not truth
- Types: factual errors, fake citations, context drift, logic errors
- Detection: self-consistency, fact-checking, entailment verification
- Mitigation: RAG, uncertainty prompting, verification chains
- Critical applications need human verification of LLM outputs