illustris/perf-workshop

Fork 0

History

illustris 4fb1bd90db

init

2026-01-08 18:11:30 +05:30

config_validator.py

init

2026-01-08 18:11:30 +05:30

fib_cached.py

init

2026-01-08 18:11:30 +05:30

fib_slow.py

init

2026-01-08 18:11:30 +05:30

README.md

init

2026-01-08 18:11:30 +05:30

README.md

Scenario 2: Memoization and Precomputation

Learning Objectives

Read cProfile output to identify redundant function calls
Use @functools.lru_cache for automatic memoization
Recognize when precomputation beats memoization
Understand space-time trade-offs

Files

fib_slow.py - Naive recursive Fibonacci (exponential time)
fib_cached.py - Memoized Fibonacci (linear time)
config_validator.py - Comparison of naive, memoized, and precomputed approaches

Exercise 1: Fibonacci

Step 1: Experience the slowness

time python3 fib_slow.py 35

This should take several seconds. Don't try n=50!

Step 2: Profile to understand why

python3 -m cProfile -s ncalls fib_slow.py 35 2>&1 | head -20

Key insight: Look at ncalls for the fib function. For fib(35), it's called millions of times because we recompute the same values repeatedly.

The call tree looks like:

fib(5)
├── fib(4)
│   ├── fib(3)
│   │   ├── fib(2)
│   │   └── fib(1)
│   └── fib(2)
└── fib(3)        <-- Same as above! Redundant!
    ├── fib(2)
    └── fib(1)

Step 3: Apply memoization

time python3 fib_cached.py 35

Now try a much larger value:

time python3 fib_cached.py 100

Step 4: Verify the improvement

python3 -m cProfile -s ncalls fib_cached.py 35 2>&1 | head -20

The ncalls should now be O(n) instead of O(2^n).

Exercise 2: Config Validator

This example shows when precomputation is better than memoization.

Run all three strategies

python3 config_validator.py 5000

Profile to understand the differences

python3 -m cProfile -s cumtime config_validator.py 5000

Discussion Questions

Why is precomputation faster than memoization here?
- Hint: How many unique inputs are there?
- Hint: What's the overhead of cache lookup vs dict lookup?
When would memoization be better than precomputation?
- Hint: What if there were 10,000 rules and 10,000 event types?
- Hint: What if we didn't know the inputs ahead of time?
What's the memory trade-off?

Key Takeaways

Approach	When to Use
No caching	Function is cheap OR called once per input
Memoization	Unknown/large input space, function is expensive
Precomputation	Known/small input space, amortize cost over many lookups

README.md

Scenario 2: Memoization and Precomputation

Learning Objectives

Files

Exercise 1: Fibonacci

Step 1: Experience the slowness

Step 2: Profile to understand why

Step 3: Apply memoization

Step 4: Verify the improvement

Exercise 2: Config Validator

Run all three strategies

Profile to understand the differences

Discussion Questions

Key Takeaways

Further Reading