illustris 4fb1bd90db
init
2026-01-08 18:11:30 +05:30

97 lines
2.5 KiB
Markdown

# Scenario 2: Memoization and Precomputation
## Learning Objectives
- Read cProfile output to identify redundant function calls
- Use `@functools.lru_cache` for automatic memoization
- Recognize when precomputation beats memoization
- Understand space-time trade-offs
## Files
- `fib_slow.py` - Naive recursive Fibonacci (exponential time)
- `fib_cached.py` - Memoized Fibonacci (linear time)
- `config_validator.py` - Comparison of naive, memoized, and precomputed approaches
## Exercise 1: Fibonacci
### Step 1: Experience the slowness
```bash
time python3 fib_slow.py 35
```
This should take several seconds. Don't try n=50!
### Step 2: Profile to understand why
```bash
python3 -m cProfile -s ncalls fib_slow.py 35 2>&1 | head -20
```
Key insight: Look at `ncalls` for the `fib` function. For fib(35), it's called
millions of times because we recompute the same values repeatedly.
The call tree looks like:
```
fib(5)
├── fib(4)
│ ├── fib(3)
│ │ ├── fib(2)
│ │ └── fib(1)
│ └── fib(2)
└── fib(3) <-- Same as above! Redundant!
├── fib(2)
└── fib(1)
```
### Step 3: Apply memoization
```bash
time python3 fib_cached.py 35
```
Now try a much larger value:
```bash
time python3 fib_cached.py 100
```
### Step 4: Verify the improvement
```bash
python3 -m cProfile -s ncalls fib_cached.py 35 2>&1 | head -20
```
The `ncalls` should now be O(n) instead of O(2^n).
## Exercise 2: Config Validator
This example shows when precomputation is better than memoization.
### Run all three strategies
```bash
python3 config_validator.py 5000
```
### Profile to understand the differences
```bash
python3 -m cProfile -s cumtime config_validator.py 5000
```
### Discussion Questions
1. Why is precomputation faster than memoization here?
- Hint: How many unique inputs are there?
- Hint: What's the overhead of cache lookup vs dict lookup?
2. When would memoization be better than precomputation?
- Hint: What if there were 10,000 rules and 10,000 event types?
- Hint: What if we didn't know the inputs ahead of time?
3. What's the memory trade-off?
## Key Takeaways
| Approach | When to Use |
|----------|-------------|
| No caching | Function is cheap OR called once per input |
| Memoization | Unknown/large input space, function is expensive |
| Precomputation | Known/small input space, amortize cost over many lookups |
## Further Reading
- `functools.lru_cache` documentation
- `functools.cache` (Python 3.9+) - unbounded cache, simpler API