scenario 2: improve lrucache vs list bit

This commit is contained in:
illustris 2026-01-10 19:50:13 +05:30
parent 7e8b1191fa
commit 596ae02dd4
Signed by: illustris
GPG Key ID: 56C8FC0B899FEFA3
8 changed files with 399 additions and 179 deletions

3
.gitignore vendored
View File

@ -1,3 +1,6 @@
*~ *~
result result
*.qcow2 *.qcow2
events.pkl
__pycache__
*.svg

View File

@ -1,96 +1,162 @@
# Scenario 2: Memoization and Precomputation # Scenario 2: Memoization and Precomputation
## Learning Objectives ## Learning Objectives
- Read cProfile output to identify redundant function calls - Use cProfile to identify performance bottlenecks
- Use `@functools.lru_cache` for automatic memoization - Recognize when `@lru_cache` becomes a bottleneck itself
- Recognize when precomputation beats memoization - Understand when precomputation beats memoization
- Understand space-time trade-offs - Learn to read profiler output to guide optimization decisions
## Files ## Files
### Fibonacci Example
- `fib_slow.py` - Naive recursive Fibonacci (exponential time) - `fib_slow.py` - Naive recursive Fibonacci (exponential time)
- `fib_cached.py` - Memoized Fibonacci (linear time) - `fib_cached.py` - Memoized Fibonacci (linear time)
- `config_validator.py` - Comparison of naive, memoized, and precomputed approaches
## Exercise 1: Fibonacci ### Config Validator Example
- `generate_events.py` - Generate test data (run first)
- `config_validator_naive.py` - Baseline: no caching
- `config_validator_memoized.py` - Uses `@lru_cache`
- `config_validator_precomputed.py` - Uses 2D array lookup
- `config_validator.py` - Comparison runner
- `common.py` - Shared code
---
## Exercise 1: Fibonacci (Identifying Redundant Calls)
### Step 1: Experience the slowness ### Step 1: Experience the slowness
```bash ```bash
time python3 fib_slow.py 35 time python3 fib_slow.py 35
``` ```
This takes several seconds. Don't try n=50!
This should take several seconds. Don't try n=50!
### Step 2: Profile to understand why ### Step 2: Profile to understand why
```bash ```bash
python3 -m cProfile -s ncalls fib_slow.py 35 2>&1 | head -20 python3 -m cProfile -s ncalls fib_slow.py 35
``` ```
Key insight: Look at `ncalls` for the `fib` function. For fib(35), it's called Look at `ncalls` for the `fib` function - it's called millions of times because
millions of times because we recompute the same values repeatedly. the same values are recomputed repeatedly.
The call tree looks like: ### Step 3: Apply memoization and verify
```
fib(5)
├── fib(4)
│ ├── fib(3)
│ │ ├── fib(2)
│ │ └── fib(1)
│ └── fib(2)
└── fib(3) <-- Same as above! Redundant!
├── fib(2)
└── fib(1)
```
### Step 3: Apply memoization
```bash ```bash
time python3 fib_cached.py 35 time python3 fib_cached.py 35
python3 -m cProfile -s ncalls fib_cached.py 35
``` ```
Now try a much larger value: The `ncalls` drops from millions to ~35.
---
## Exercise 2: Config Validator (When Caching Becomes the Bottleneck)
This exercise demonstrates a common pattern: you add caching, get a big speedup,
but then discover the cache itself is now the bottleneck.
### Step 1: Generate test data
```bash ```bash
time python3 fib_cached.py 100 python3 generate_events.py 100000
``` ```
### Step 4: Verify the improvement ### Step 2: Profile the naive version
```bash ```bash
python3 -m cProfile -s ncalls fib_cached.py 35 2>&1 | head -20 python3 -m cProfile -s tottime config_validator_naive.py
``` ```
The `ncalls` should now be O(n) instead of O(2^n). **What to look for:** `validate_rule_slow` dominates the profile. It's called
100,000 times even though there are only 400 unique input combinations.
## Exercise 2: Config Validator ### Step 3: Add memoization - big improvement!
This example shows when precomputation is better than memoization.
### Run all three strategies
```bash ```bash
python3 config_validator.py 5000 python3 -m cProfile -s tottime config_validator_memoized.py
``` ```
### Profile to understand the differences **Observation:** Dramatic speedup! But look carefully at the profile...
### Step 4: Identify the new bottleneck
Compare `process_events` time between memoized and precomputed:
```bash ```bash
python3 -m cProfile -s cumtime config_validator.py 5000 python3 -m cProfile -s tottime config_validator_memoized.py
python3 -m cProfile -s tottime config_validator_precomputed.py
``` ```
### Discussion Questions **Key insight:** Compare the `process_events` tottime:
1. Why is precomputation faster than memoization here? - Memoized: ~0.014s
- Hint: How many unique inputs are there? - Precomputed: ~0.004s (3.5x faster!)
- Hint: What's the overhead of cache lookup vs dict lookup?
2. When would memoization be better than precomputation? The cache lookup overhead now dominates because:
- Hint: What if there were 10,000 rules and 10,000 event types? - The validation function is cheap (only 50 iterations)
- Hint: What if we didn't know the inputs ahead of time? - But we do 100,000 cache lookups
- Each lookup involves: tuple creation for the key, hashing, dict lookup
3. What's the memory trade-off? ### Step 5: Hypothesis - can we beat the cache?
## Key Takeaways When the input space is **small and bounded** (400 combinations), we can:
1. Precompute all results into a 2D array
2. Use array indexing instead of hash-based lookup
Array indexing is faster because:
- No hash computation
- Direct memory offset calculation
- Better CPU cache locality
### Step 6: Profile the precomputed version
```bash
python3 -m cProfile -s tottime config_validator_precomputed.py
```
**Observation:** No wrapper overhead. Clean array indexing in `process_events`.
### Step 7: Compare all three
```bash
python3 config_validator.py
```
Expected output shows precomputed ~2x faster than memoized.
---
## Key Profiling Techniques
### Finding where time is spent
```bash
python3 -m cProfile -s tottime script.py # Sort by time in function itself
python3 -m cProfile -s cumtime script.py # Sort by cumulative time (includes callees)
```
### Understanding the columns
- `ncalls`: Number of calls
- `tottime`: Time spent in function (excluding callees)
- `cumtime`: Time spent in function (including callees)
- `percall`: Time per call
---
## When to Use Each Approach
| Approach | When to Use | | Approach | When to Use |
|----------|-------------| |----------|-------------|
| No caching | Function is cheap OR called once per input | | No caching | Function is cheap OR each input seen only once |
| Memoization | Unknown/large input space, function is expensive | | Memoization (`@lru_cache`) | Unknown/large input space, expensive function |
| Precomputation | Known/small input space, amortize cost over many lookups | | Precomputation | Known/small input space, many lookups, bounded integers |
---
## Discussion Questions
1. Why does `@lru_cache` have overhead?
- Hint: What happens on each call even for cache hits?
2. When would memoization beat precomputation?
- Hint: What if there were 10,000 x 10,000 possible inputs but you only see 100?
3. Could we make precomputation even faster?
- Hint: What about a flat array with `table[rule_id * 20 + event_type]`?
---
## Further Reading ## Further Reading
- `functools.lru_cache` documentation - `functools.lru_cache` documentation
- `functools.cache` (Python 3.9+) - unbounded cache, simpler API - `functools.cache` (Python 3.9+) - unbounded cache, slightly less overhead
- NumPy arrays for truly O(1) array access

View File

@ -0,0 +1,35 @@
#!/usr/bin/env python3
"""
Shared code for config validator examples.
"""
import pickle
from pathlib import Path
# The set of all valid (rule_id, event_type) pairs we'll encounter
RULES = range(20) # 0-19 (small, bounded input space)
EVENT_TYPES = range(20) # 0-19
EVENTS_FILE = Path(__file__).parent / "events.pkl"
def validate_rule_slow(rule_id, event_type):
"""
Simulate an expensive validation check.
In real life, this might query a database, parse XML, etc.
"""
total = 0
for i in range(50):
total += (rule_id * event_type * i) % 997
return total % 2 == 0
def load_events():
"""Load events from the pickle file."""
if not EVENTS_FILE.exists():
raise FileNotFoundError(
f"Events file not found: {EVENTS_FILE}\n"
"Run 'python3 generate_events.py' first."
)
with open(EVENTS_FILE, "rb") as f:
return pickle.load(f)

View File

@ -1,146 +1,58 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
""" """
Scenario 2b: The Precomputation Insight Config Validator Comparison
======================================= ===========================
This simulates a config validator that checks rules against events. Runs all three validation strategies and compares performance.
The "expensive" validation function is called repeatedly with the same inputs.
This example shows three stages of optimization: Run generate_events.py first to create test data.
1. Naive: call the function every time
2. Memoized: cache results with @lru_cache
3. Precomputed: since inputs are known ahead of time, build a lookup table
EXERCISES: Usage:
1. Run each version and compare times python3 generate_events.py 100000
2. Profile each version - observe ncalls and cumtime python3 config_validator.py
3. Think about: when is precomputation better than memoization?
""" """
import sys
import time import time
from functools import lru_cache
from common import load_events
import config_validator_naive
import config_validator_memoized
import config_validator_precomputed
# Simulated "expensive" validation function ITERATIONS = 5
def validate_rule_slow(rule_id, event_type):
"""
Simulate an expensive validation check.
In real life, this might query a database, parse XML, etc.
"""
# Artificial delay to simulate expensive computation
total = 0
for i in range(10000):
total += (rule_id * event_type * i) % 997
return total % 2 == 0 # Returns True or False
# The set of all valid (rule_id, event_type) pairs we'll encounter def benchmark(name, func, events, setup=None):
RULES = [1, 2, 3, 4, 5] """Run a function multiple times and report average timing."""
EVENT_TYPES = [10, 20, 30, 40, 50] times = []
for i in range(ITERATIONS):
if setup and i == 0:
setup()
start = time.perf_counter()
result = func(events)
times.append(time.perf_counter() - start)
avg = sum(times) / len(times)
def process_events_naive(events): print(f"{name:20s}: {avg:.3f}s avg (valid: {result})")
"""Process events using naive repeated validation.""" return avg
valid_count = 0
for rule_id, event_type, data in events:
if validate_rule_slow(rule_id, event_type):
valid_count += 1
return valid_count
# Memoized version
@lru_cache(maxsize=None)
def validate_rule_cached(rule_id, event_type):
"""Same validation but with caching."""
total = 0
for i in range(10000):
total += (rule_id * event_type * i) % 997
return total % 2 == 0
def process_events_memoized(events):
"""Process events using memoized validation."""
valid_count = 0
for rule_id, event_type, data in events:
if validate_rule_cached(rule_id, event_type):
valid_count += 1
return valid_count
# Precomputed version
def build_validation_table():
"""
Build a lookup table for all possible (rule_id, event_type) combinations.
This is O(n*m) upfront but O(1) per lookup thereafter.
"""
table = {}
for rule_id in RULES:
for event_type in EVENT_TYPES:
table[(rule_id, event_type)] = validate_rule_slow(rule_id, event_type)
return table
VALIDATION_TABLE = None # Lazy initialization
def process_events_precomputed(events):
"""Process events using precomputed lookup table."""
global VALIDATION_TABLE
if VALIDATION_TABLE is None:
VALIDATION_TABLE = build_validation_table()
valid_count = 0
for rule_id, event_type, data in events:
if VALIDATION_TABLE[(rule_id, event_type)]:
valid_count += 1
return valid_count
def generate_test_events(n):
"""Generate n random test events."""
import random
random.seed(42) # Reproducible
events = []
for i in range(n):
rule_id = random.choice(RULES)
event_type = random.choice(EVENT_TYPES)
data = f"event_{i}"
events.append((rule_id, event_type, data))
return events
def benchmark(name, func, events):
"""Run a function and report timing."""
start = time.perf_counter()
result = func(events)
elapsed = time.perf_counter() - start
print(f"{name:20s}: {elapsed:.3f}s (valid: {result})")
return elapsed
def main(): def main():
n_events = 5000 events = load_events()
if len(sys.argv) > 1: print(f"Processing {len(events)} events, {ITERATIONS} iterations each...")
n_events = int(sys.argv[1])
print(f"Processing {n_events} events...")
print(f"Unique (rule, event_type) combinations: {len(RULES) * len(EVENT_TYPES)}")
print() print()
events = generate_test_events(n_events) t_naive = benchmark("Naive", config_validator_naive.process_events, events)
# Reset cached function for fair comparison t_memo = benchmark(
validate_rule_cached.cache_clear() "Memoized",
global VALIDATION_TABLE config_validator_memoized.process_events,
VALIDATION_TABLE = None events,
setup=config_validator_memoized.validate_rule_cached.cache_clear
)
t_naive = benchmark("Naive", process_events_naive, events) t_pre = benchmark("Precomputed", config_validator_precomputed.process_events, events)
validate_rule_cached.cache_clear()
t_memo = benchmark("Memoized", process_events_memoized, events)
VALIDATION_TABLE = None
t_pre = benchmark("Precomputed", process_events_precomputed, events)
print() print()
print(f"Speedup (memo vs naive): {t_naive/t_memo:.1f}x") print(f"Speedup (memo vs naive): {t_naive/t_memo:.1f}x")

View File

@ -0,0 +1,53 @@
#!/usr/bin/env python3
"""
Memoized config validator - uses @lru_cache.
Profile this to see the lru_cache wrapper overhead.
Usage:
python3 config_validator_memoized.py
python3 -m cProfile -s tottime config_validator_memoized.py
"""
import time
from functools import lru_cache
from common import validate_rule_slow, load_events
@lru_cache(maxsize=None)
def validate_rule_cached(rule_id, event_type):
"""Same validation but with caching."""
return validate_rule_slow(rule_id, event_type)
def process_events(events):
"""Process events using memoized validation."""
valid_count = 0
for rule_id, event_type, data in events:
if validate_rule_cached(rule_id, event_type):
valid_count += 1
return valid_count
ITERATIONS = 5
def main():
events = load_events()
print(f"Processing {len(events)} events (memoized), {ITERATIONS} iterations...")
times = []
for i in range(ITERATIONS):
if i == 0:
validate_rule_cached.cache_clear() # Cold start on first run
start = time.perf_counter()
valid_count = process_events(events)
times.append(time.perf_counter() - start)
avg = sum(times) / len(times)
print(f"Valid: {valid_count}")
print(f"Avg time: {avg:.3f}s")
if __name__ == "__main__":
main()

View File

@ -0,0 +1,44 @@
#!/usr/bin/env python3
"""
Naive config validator - no caching.
Profile this to see repeated validate_rule_slow calls.
Usage:
python3 config_validator_naive.py
python3 -m cProfile -s tottime config_validator_naive.py
"""
import time
from common import validate_rule_slow, load_events
def process_events(events):
"""Process events using naive repeated validation."""
valid_count = 0
for rule_id, event_type, data in events:
if validate_rule_slow(rule_id, event_type):
valid_count += 1
return valid_count
ITERATIONS = 5
def main():
events = load_events()
print(f"Processing {len(events)} events (naive), {ITERATIONS} iterations...")
times = []
for _ in range(ITERATIONS):
start = time.perf_counter()
valid_count = process_events(events)
times.append(time.perf_counter() - start)
avg = sum(times) / len(times)
print(f"Valid: {valid_count}")
print(f"Avg time: {avg:.3f}s")
if __name__ == "__main__":
main()

View File

@ -0,0 +1,65 @@
#!/usr/bin/env python3
"""
Precomputed config validator - uses 2D array lookup.
Profile this to see clean array indexing with no wrapper overhead.
Usage:
python3 config_validator_precomputed.py
python3 -m cProfile -s tottime config_validator_precomputed.py
"""
import time
from common import validate_rule_slow, load_events, RULES, EVENT_TYPES
def build_validation_table():
"""
Build a 2D lookup table for all possible (rule_id, event_type) combinations.
Array indexing is faster than hash-based lookup because:
- No hash computation needed
- Direct memory offset calculation
- Better CPU cache locality
"""
table = []
for rule_id in range(max(RULES) + 1):
row = []
for event_type in range(max(EVENT_TYPES) + 1):
row.append(validate_rule_slow(rule_id, event_type))
table.append(row)
return table
# Build table at module load time (simulates startup initialization)
VALIDATION_TABLE = build_validation_table()
def process_events(events):
"""Process events using precomputed 2D lookup table."""
valid_count = 0
for rule_id, event_type, data in events:
if VALIDATION_TABLE[rule_id][event_type]:
valid_count += 1
return valid_count
ITERATIONS = 5
def main():
events = load_events()
print(f"Processing {len(events)} events (precomputed), {ITERATIONS} iterations...")
times = []
for _ in range(ITERATIONS):
start = time.perf_counter()
valid_count = process_events(events)
times.append(time.perf_counter() - start)
avg = sum(times) / len(times)
print(f"Valid: {valid_count}")
print(f"Avg time: {avg:.3f}s")
if __name__ == "__main__":
main()

View File

@ -0,0 +1,42 @@
#!/usr/bin/env python3
"""
Generate test events and save to file.
Run this before profiling the validator scripts.
"""
import pickle
import random
import sys
from common import RULES, EVENT_TYPES, EVENTS_FILE
def generate_test_events(n):
"""Generate n random test events."""
random.seed(42) # Reproducible
events = []
for i in range(n):
rule_id = random.choice(RULES)
event_type = random.choice(EVENT_TYPES)
data = f"event_{i}"
events.append((rule_id, event_type, data))
return events
def main():
n_events = 100000
if len(sys.argv) > 1:
n_events = int(sys.argv[1])
print(f"Generating {n_events} events...")
events = generate_test_events(n_events)
with open(EVENTS_FILE, "wb") as f:
pickle.dump(events, f)
print(f"Saved to {EVENTS_FILE}")
print(f"Unique (rule, event_type) combinations: {len(RULES) * len(EVENT_TYPES)}")
if __name__ == "__main__":
main()