intermediateLLMs & Generative AI

Learn how Chain of Thought prompting improves LLM reasoning by generating intermediate steps before the final answer.

promptingreasoningllmcotprompt-engineering

Chain of Thought Prompting

Chain of Thought (CoT) prompting is a technique that improves LLM reasoning by encouraging the model to generate intermediate reasoning steps before arriving at the final answer.

The Problem with Direct Prompting

Prompt: "If John has 3 apples and buys 2 more, then gives half 
         to Mary, how many apples does John have?"

Direct answer: "3"  ← Often wrong

LLMs struggle with multi-step reasoning when asked to output only the final answer.

Chain of Thought Solution

Prompt: "If John has 3 apples and buys 2 more, then gives half 
         to Mary, how many apples does John have? 
         Let's think step by step."

CoT Response:
"Let me work through this step by step:
1. John starts with 3 apples
2. He buys 2 more: 3 + 2 = 5 apples
3. He gives half to Mary: 5 / 2 = 2.5 apples
4. John has 2.5 apples (or 2 if we round down)

Answer: 2.5 apples"

Types of Chain of Thought

1. Zero-Shot CoT

Just add "Let's think step by step":

prompt = f"""
{question}

Let's think step by step.
"""

2. Few-Shot CoT

Provide examples with reasoning:

prompt = f"""
Q: Roger has 5 tennis balls. He buys 2 more cans of 3. 
   How many tennis balls does he have?
A: Roger started with 5 balls. He bought 2 cans of 3 balls 
   each, so 2 × 3 = 6 balls. Total: 5 + 6 = 11 balls.
   The answer is 11.

Q: The cafeteria had 23 apples. They used 20 for lunch and 
   bought 6 more. How many apples do they have?
A: Started with 23 apples. Used 20: 23 - 20 = 3 remaining. 
   Bought 6 more: 3 + 6 = 9 apples.
   The answer is 9.

Q: {new_question}
A:
"""

3. Self-Consistency

Sample multiple reasoning paths, take majority vote:

import collections

def self_consistency(prompt, model, num_samples=5):
    answers = []
    for _ in range(num_samples):
        response = model.generate(prompt, temperature=0.7)
        answer = extract_final_answer(response)
        answers.append(answer)
    
    # Return most common answer
    return collections.Counter(answers).most_common(1)[0][0]

Why CoT Works

1. Decomposition

Complex problem → Series of simple steps
"Calculate compound interest" → 
  1. Find simple interest
  2. Add to principal
  3. Repeat for each period

2. Working Memory

Intermediate results stored in generated text:
"5 + 3 = 8, now 8 × 2 = 16, finally 16 - 4 = 12"

3. Pattern Matching

Similar reasoning patterns in training data
get activated and followed

Effective CoT Strategies

Template Structures

# Problem decomposition
prompt = f"""
To solve this problem, I need to:
1. Identify the key information
2. Determine the operations needed
3. Execute each step
4. Verify the answer

Problem: {problem}
"""

# Explicit reasoning
prompt = f"""
Question: {question}

Let me reason through this:
- First, I observe that...
- This means...
- Therefore...
- My final answer is...
"""

Domain-Specific Prompts

Math:

"Show your work. Write out each calculation."

Logic:

"Consider each premise. What can we deduce?"

Code:

"Trace through the code step by step with example inputs."

Tree of Thoughts

Extension of CoT that explores multiple reasoning branches:

                    Problem
                       │
          ┌────────────┼────────────┐
          ▼            ▼            ▼
       Path A       Path B       Path C
          │            │            │
     ┌────┴────┐      Dead       ┌──┴──┐
     ▼         ▼      End        ▼     ▼
   A.1       A.2              C.1    C.2
   (best)                            Dead
def tree_of_thoughts(problem, model, breadth=3, depth=3):
    def explore(state, depth_remaining):
        if depth_remaining == 0:
            return evaluate(state)
        
        # Generate multiple next steps
        candidates = model.generate_thoughts(state, n=breadth)
        
        # Evaluate and prune
        scored = [(c, model.evaluate_thought(c)) for c in candidates]
        best = sorted(scored, key=lambda x: x[1], reverse=True)[:breadth//2]
        
        # Recurse on best candidates
        return max(explore(c, depth_remaining-1) for c, _ in best)
    
    return explore(problem, depth)

Benchmarks and Results

TaskDirectZero-Shot CoTFew-Shot CoT
GSM8K (math)17.1%40.7%58.1%
SVAMP (math)63.4%68.9%79.0%
AQuA (algebra)26.4%39.4%45.3%

Results for PaLM 540B model

When to Use CoT

Good For

  • Math word problems
  • Multi-step reasoning
  • Logical deduction
  • Code understanding
  • Complex decision making

Not Needed For

  • Simple factual questions
  • Single-step tasks
  • Classification with clear criteria
  • Tasks where reasoning doesn't help

Common Pitfalls

1. Reasoning Errors Compound

Step 1: 5 + 3 = 7  ← Error here
Step 2: 7 × 2 = 14  ← Carries forward

Solution: Use self-consistency with multiple samples

2. Verbose but Wrong

Long explanation that sounds confident but arrives at wrong answer

Solution: Verify final answer, add sanity checks

3. Overthinking Simple Problems

Question: "What is 2 + 2?"
CoT: "Let me break this down into components..." (overkill)

Solution: Match complexity to problem

Implementation Example

from openai import OpenAI

client = OpenAI()

def solve_with_cot(problem, use_few_shot=True):
    few_shot_examples = """
Q: A restaurant has 20 tables. Each table has 4 chairs. 
   If 12 tables are occupied with 3 people each, how many 
   empty chairs are there?
A: Let's solve this step by step:
   - Total chairs: 20 tables × 4 chairs = 80 chairs
   - Occupied chairs: 12 tables × 3 people = 36 chairs
   - Empty chairs: 80 - 36 = 44 chairs
   The answer is 44.

""" if use_few_shot else ""
    
    prompt = f"{few_shot_examples}Q: {problem}\nA: Let's solve this step by step:"
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )
    
    return response.choices[0].message.content

Key Takeaways

  1. CoT improves reasoning by generating intermediate steps
  2. "Let's think step by step" enables zero-shot CoT
  3. Few-shot examples improve performance further
  4. Self-consistency (multiple samples + voting) adds robustness
  5. Tree of Thoughts explores multiple reasoning paths
  6. Best for multi-step reasoning tasks, not simple questions

Practice Questions

Test your understanding with these related interview questions: