perf-workshop/scenario2-memoization
illustris 4fb1bd90db
init
2026-01-08 18:11:30 +05:30
..
2026-01-08 18:11:30 +05:30
2026-01-08 18:11:30 +05:30
2026-01-08 18:11:30 +05:30
2026-01-08 18:11:30 +05:30

Scenario 2: Memoization and Precomputation

Learning Objectives

  • Read cProfile output to identify redundant function calls
  • Use @functools.lru_cache for automatic memoization
  • Recognize when precomputation beats memoization
  • Understand space-time trade-offs

Files

  • fib_slow.py - Naive recursive Fibonacci (exponential time)
  • fib_cached.py - Memoized Fibonacci (linear time)
  • config_validator.py - Comparison of naive, memoized, and precomputed approaches

Exercise 1: Fibonacci

Step 1: Experience the slowness

time python3 fib_slow.py 35

This should take several seconds. Don't try n=50!

Step 2: Profile to understand why

python3 -m cProfile -s ncalls fib_slow.py 35 2>&1 | head -20

Key insight: Look at ncalls for the fib function. For fib(35), it's called millions of times because we recompute the same values repeatedly.

The call tree looks like:

fib(5)
├── fib(4)
│   ├── fib(3)
│   │   ├── fib(2)
│   │   └── fib(1)
│   └── fib(2)
└── fib(3)        <-- Same as above! Redundant!
    ├── fib(2)
    └── fib(1)

Step 3: Apply memoization

time python3 fib_cached.py 35

Now try a much larger value:

time python3 fib_cached.py 100

Step 4: Verify the improvement

python3 -m cProfile -s ncalls fib_cached.py 35 2>&1 | head -20

The ncalls should now be O(n) instead of O(2^n).

Exercise 2: Config Validator

This example shows when precomputation is better than memoization.

Run all three strategies

python3 config_validator.py 5000

Profile to understand the differences

python3 -m cProfile -s cumtime config_validator.py 5000

Discussion Questions

  1. Why is precomputation faster than memoization here?

    • Hint: How many unique inputs are there?
    • Hint: What's the overhead of cache lookup vs dict lookup?
  2. When would memoization be better than precomputation?

    • Hint: What if there were 10,000 rules and 10,000 event types?
    • Hint: What if we didn't know the inputs ahead of time?
  3. What's the memory trade-off?

Key Takeaways

Approach When to Use
No caching Function is cheap OR called once per input
Memoization Unknown/large input space, function is expensive
Precomputation Known/small input space, amortize cost over many lookups

Further Reading

  • functools.lru_cache documentation
  • functools.cache (Python 3.9+) - unbounded cache, simpler API