intermediateLLMs & Generative AI

Understand why LLMs generate false or fabricated information (hallucinations) and techniques to detect and mitigate them.

hallucinationllmreliabilitysafetyfactuality

LLM Hallucinations

Hallucinations occur when LLMs generate content that is factually incorrect, nonsensical, or not grounded in the provided context, while appearing confident and fluent.

Types of Hallucinations

1. Factual Hallucinations

Generated facts that are incorrect:

Q: "When did the Eiffel Tower open?"
A: "The Eiffel Tower opened in 1892."  
   ↳ Wrong! Actual: 1889

2. Fabricated References

Invented citations, papers, or sources:

Q: "Cite a paper on transformer efficiency"
A: "See 'Efficient Transformers: A Survey' by Smith et al. (2021)
    in Nature Machine Intelligence."
   ↳ This paper may not exist

3. Context Hallucinations

Information not present in provided context:

Context: "John is a software engineer at Google."
Q: "Where did John go to college?"
A: "John graduated from Stanford University."
   ↳ Not in context - model invented this

4. Logical Hallucinations

Inconsistent or contradictory reasoning:

"The product costs $100. With a 20% discount, 
 the final price is $85."
 ↳ Should be $80

Why Hallucinations Occur

1. Training Objective

LLMs are trained to predict likely next tokens, not truth:

Objective: P(next_token | previous_tokens)
           NOT: Is this factually correct?

2. Training Data Issues

- Outdated information (knowledge cutoff)
- Contradictory sources
- Errors in training data
- Underrepresented topics

3. Compression and Generalization

Billions of facts → Fixed model parameters
Some details get "blurred" or merged

4. Prompt Pressure

User: "Tell me about the Battle of Springfield in 1823"
Model: *invents details because it's expected to answer*

Detection Methods

1. Self-Consistency Checking

def check_consistency(question, model, n_samples=5):
    responses = [model.generate(question, temp=0.7) 
                 for _ in range(n_samples)]
    
    # If answers vary significantly, likely hallucination
    unique_answers = set(extract_key_facts(r) for r in responses)
    consistency_score = 1 / len(unique_answers)
    return consistency_score

2. Fact Verification

def verify_facts(response, knowledge_base):
    # Extract claims from response
    claims = extract_claims(response)
    
    verified_claims = []
    for claim in claims:
        # Check against knowledge base
        evidence = knowledge_base.search(claim)
        is_supported = evaluate_support(claim, evidence)
        verified_claims.append((claim, is_supported))
    
    return verified_claims

3. Entailment Checking

from transformers import pipeline

nli = pipeline("text-classification", 
               model="facebook/bart-large-mnli")

def check_grounded(context, claim):
    result = nli(f"{context} [SEP] {claim}")
    # Returns: entailment, contradiction, or neutral
    return result[0]['label'] == 'entailment'

4. Confidence Calibration

def get_token_probabilities(model, prompt):
    outputs = model.generate(
        prompt, 
        return_dict_in_generate=True,
        output_scores=True
    )
    
    # Low probability tokens may indicate uncertainty
    probs = [softmax(score) for score in outputs.scores]
    avg_confidence = np.mean([max(p) for p in probs])
    return avg_confidence

Mitigation Strategies

1. Retrieval Augmented Generation (RAG)

def rag_generate(question, retriever, generator):
    # Retrieve relevant documents
    docs = retriever.search(question, k=5)
    
    # Ground generation in retrieved context
    prompt = f"""
    Based ONLY on the following documents, answer the question.
    If the answer is not in the documents, say "I don't know."
    
    Documents:
    {docs}
    
    Question: {question}
    """
    
    return generator.generate(prompt)

2. Uncertainty Expression

system_prompt = """
When answering questions:
- If you're not sure, say "I'm not certain, but..."
- If you don't know, say "I don't have information about..."
- Distinguish between facts and speculation
- Cite sources when possible
"""

3. Chain of Verification

def generate_with_verification(question, model):
    # Step 1: Generate initial response
    response = model.generate(question)
    
    # Step 2: Extract verifiable claims
    claims = model.generate(
        f"List the factual claims in this response: {response}"
    )
    
    # Step 3: Verify each claim
    for claim in claims:
        verification = model.generate(
            f"Is this claim accurate? Provide evidence: {claim}"
        )
        if "uncertain" in verification or "cannot verify" in verification:
            # Flag or remove unverified claims
            response = remove_claim(response, claim)
    
    return response

4. Constrained Generation

# Only allow generation from known entities
allowed_entities = load_entity_database()

def constrained_generate(prompt, allowed_entities):
    response = ""
    while not is_complete(response):
        next_token_probs = model.get_next_token_probs(prompt + response)
        
        # Filter to allowed tokens based on entity constraints
        filtered_probs = apply_entity_constraints(
            next_token_probs, 
            allowed_entities
        )
        
        next_token = sample(filtered_probs)
        response += next_token
    
    return response

5. Fine-tuning for Honesty

# Training data examples
honesty_examples = [
    {
        "prompt": "What is the population of Atlantis?",
        "response": "Atlantis is a mythical city and does not have "
                    "an actual population."
    },
    {
        "prompt": "Who won the 2030 World Cup?",
        "response": "I don't have information about events after "
                    "my knowledge cutoff date."
    }
]

Measuring Hallucination

Metrics

MetricWhat It Measures
FaithfulnessDoes output match source documents?
FactualityAre facts verifiably true?
AttributionAre claims properly sourced?
ConsistencyDoes model give same answer repeatedly?

Evaluation Example

def evaluate_hallucination_rate(model, test_set):
    hallucinations = 0
    total = 0
    
    for item in test_set:
        response = model.generate(item['question'])
        claims = extract_claims(response)
        
        for claim in claims:
            total += 1
            if not verify_against_ground_truth(claim, item['facts']):
                hallucinations += 1
    
    return hallucinations / total

Best Practices

For Users

  1. Verify important facts independently
  2. Ask for sources and check them
  3. Use specific, grounded prompts with context
  4. Be skeptical of confident-sounding responses
  5. Cross-reference multiple sources

For Developers

  1. Implement RAG for factual applications
  2. Add retrieval before generation
  3. Build verification pipelines
  4. Log and analyze hallucination patterns
  5. Provide clear disclaimers to users

Key Takeaways

  1. Hallucinations are fabricated content that appears plausible
  2. They occur because LLMs predict likely text, not truth
  3. Types: factual errors, fake citations, context drift, logic errors
  4. Detection: self-consistency, fact-checking, entailment verification
  5. Mitigation: RAG, uncertainty prompting, verification chains
  6. Critical applications need human verification of LLM outputs