97 lines
2.5 KiB
Markdown
97 lines
2.5 KiB
Markdown
# Scenario 2: Memoization and Precomputation
|
|
|
|
## Learning Objectives
|
|
- Read cProfile output to identify redundant function calls
|
|
- Use `@functools.lru_cache` for automatic memoization
|
|
- Recognize when precomputation beats memoization
|
|
- Understand space-time trade-offs
|
|
|
|
## Files
|
|
- `fib_slow.py` - Naive recursive Fibonacci (exponential time)
|
|
- `fib_cached.py` - Memoized Fibonacci (linear time)
|
|
- `config_validator.py` - Comparison of naive, memoized, and precomputed approaches
|
|
|
|
## Exercise 1: Fibonacci
|
|
|
|
### Step 1: Experience the slowness
|
|
```bash
|
|
time python3 fib_slow.py 35
|
|
```
|
|
|
|
This should take several seconds. Don't try n=50!
|
|
|
|
### Step 2: Profile to understand why
|
|
```bash
|
|
python3 -m cProfile -s ncalls fib_slow.py 35 2>&1 | head -20
|
|
```
|
|
|
|
Key insight: Look at `ncalls` for the `fib` function. For fib(35), it's called
|
|
millions of times because we recompute the same values repeatedly.
|
|
|
|
The call tree looks like:
|
|
```
|
|
fib(5)
|
|
├── fib(4)
|
|
│ ├── fib(3)
|
|
│ │ ├── fib(2)
|
|
│ │ └── fib(1)
|
|
│ └── fib(2)
|
|
└── fib(3) <-- Same as above! Redundant!
|
|
├── fib(2)
|
|
└── fib(1)
|
|
```
|
|
|
|
### Step 3: Apply memoization
|
|
```bash
|
|
time python3 fib_cached.py 35
|
|
```
|
|
|
|
Now try a much larger value:
|
|
```bash
|
|
time python3 fib_cached.py 100
|
|
```
|
|
|
|
### Step 4: Verify the improvement
|
|
```bash
|
|
python3 -m cProfile -s ncalls fib_cached.py 35 2>&1 | head -20
|
|
```
|
|
|
|
The `ncalls` should now be O(n) instead of O(2^n).
|
|
|
|
## Exercise 2: Config Validator
|
|
|
|
This example shows when precomputation is better than memoization.
|
|
|
|
### Run all three strategies
|
|
```bash
|
|
python3 config_validator.py 5000
|
|
```
|
|
|
|
### Profile to understand the differences
|
|
```bash
|
|
python3 -m cProfile -s cumtime config_validator.py 5000
|
|
```
|
|
|
|
### Discussion Questions
|
|
1. Why is precomputation faster than memoization here?
|
|
- Hint: How many unique inputs are there?
|
|
- Hint: What's the overhead of cache lookup vs dict lookup?
|
|
|
|
2. When would memoization be better than precomputation?
|
|
- Hint: What if there were 10,000 rules and 10,000 event types?
|
|
- Hint: What if we didn't know the inputs ahead of time?
|
|
|
|
3. What's the memory trade-off?
|
|
|
|
## Key Takeaways
|
|
|
|
| Approach | When to Use |
|
|
|----------|-------------|
|
|
| No caching | Function is cheap OR called once per input |
|
|
| Memoization | Unknown/large input space, function is expensive |
|
|
| Precomputation | Known/small input space, amortize cost over many lookups |
|
|
|
|
## Further Reading
|
|
- `functools.lru_cache` documentation
|
|
- `functools.cache` (Python 3.9+) - unbounded cache, simpler API
|