diff --git a/.gitignore b/.gitignore index 240bdee..3bd3edd 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,6 @@ *~ result *.qcow2 +events.pkl +__pycache__ +*.svg diff --git a/scenario2-memoization/README.md b/scenario2-memoization/README.md index 2b2cfb5..53ba802 100644 --- a/scenario2-memoization/README.md +++ b/scenario2-memoization/README.md @@ -1,96 +1,162 @@ # Scenario 2: Memoization and Precomputation ## Learning Objectives -- Read cProfile output to identify redundant function calls -- Use `@functools.lru_cache` for automatic memoization -- Recognize when precomputation beats memoization -- Understand space-time trade-offs +- Use cProfile to identify performance bottlenecks +- Recognize when `@lru_cache` becomes a bottleneck itself +- Understand when precomputation beats memoization +- Learn to read profiler output to guide optimization decisions ## Files + +### Fibonacci Example - `fib_slow.py` - Naive recursive Fibonacci (exponential time) - `fib_cached.py` - Memoized Fibonacci (linear time) -- `config_validator.py` - Comparison of naive, memoized, and precomputed approaches -## Exercise 1: Fibonacci +### Config Validator Example +- `generate_events.py` - Generate test data (run first) +- `config_validator_naive.py` - Baseline: no caching +- `config_validator_memoized.py` - Uses `@lru_cache` +- `config_validator_precomputed.py` - Uses 2D array lookup +- `config_validator.py` - Comparison runner +- `common.py` - Shared code + +--- + +## Exercise 1: Fibonacci (Identifying Redundant Calls) ### Step 1: Experience the slowness ```bash time python3 fib_slow.py 35 ``` - -This should take several seconds. Don't try n=50! +This takes several seconds. Don't try n=50! ### Step 2: Profile to understand why ```bash -python3 -m cProfile -s ncalls fib_slow.py 35 2>&1 | head -20 +python3 -m cProfile -s ncalls fib_slow.py 35 ``` -Key insight: Look at `ncalls` for the `fib` function. For fib(35), it's called -millions of times because we recompute the same values repeatedly. +Look at `ncalls` for the `fib` function - it's called millions of times because +the same values are recomputed repeatedly. -The call tree looks like: -``` -fib(5) -├── fib(4) -│ ├── fib(3) -│ │ ├── fib(2) -│ │ └── fib(1) -│ └── fib(2) -└── fib(3) <-- Same as above! Redundant! - ├── fib(2) - └── fib(1) -``` - -### Step 3: Apply memoization +### Step 3: Apply memoization and verify ```bash time python3 fib_cached.py 35 +python3 -m cProfile -s ncalls fib_cached.py 35 ``` -Now try a much larger value: +The `ncalls` drops from millions to ~35. + +--- + +## Exercise 2: Config Validator (When Caching Becomes the Bottleneck) + +This exercise demonstrates a common pattern: you add caching, get a big speedup, +but then discover the cache itself is now the bottleneck. + +### Step 1: Generate test data ```bash -time python3 fib_cached.py 100 +python3 generate_events.py 100000 ``` -### Step 4: Verify the improvement +### Step 2: Profile the naive version ```bash -python3 -m cProfile -s ncalls fib_cached.py 35 2>&1 | head -20 +python3 -m cProfile -s tottime config_validator_naive.py ``` -The `ncalls` should now be O(n) instead of O(2^n). +**What to look for:** `validate_rule_slow` dominates the profile. It's called +100,000 times even though there are only 400 unique input combinations. -## Exercise 2: Config Validator - -This example shows when precomputation is better than memoization. - -### Run all three strategies +### Step 3: Add memoization - big improvement! ```bash -python3 config_validator.py 5000 +python3 -m cProfile -s tottime config_validator_memoized.py ``` -### Profile to understand the differences +**Observation:** Dramatic speedup! But look carefully at the profile... + +### Step 4: Identify the new bottleneck + +Compare `process_events` time between memoized and precomputed: ```bash -python3 -m cProfile -s cumtime config_validator.py 5000 +python3 -m cProfile -s tottime config_validator_memoized.py +python3 -m cProfile -s tottime config_validator_precomputed.py ``` -### Discussion Questions -1. Why is precomputation faster than memoization here? - - Hint: How many unique inputs are there? - - Hint: What's the overhead of cache lookup vs dict lookup? +**Key insight:** Compare the `process_events` tottime: +- Memoized: ~0.014s +- Precomputed: ~0.004s (3.5x faster!) -2. When would memoization be better than precomputation? - - Hint: What if there were 10,000 rules and 10,000 event types? - - Hint: What if we didn't know the inputs ahead of time? +The cache lookup overhead now dominates because: +- The validation function is cheap (only 50 iterations) +- But we do 100,000 cache lookups +- Each lookup involves: tuple creation for the key, hashing, dict lookup -3. What's the memory trade-off? +### Step 5: Hypothesis - can we beat the cache? -## Key Takeaways +When the input space is **small and bounded** (400 combinations), we can: +1. Precompute all results into a 2D array +2. Use array indexing instead of hash-based lookup + +Array indexing is faster because: +- No hash computation +- Direct memory offset calculation +- Better CPU cache locality + +### Step 6: Profile the precomputed version +```bash +python3 -m cProfile -s tottime config_validator_precomputed.py +``` + +**Observation:** No wrapper overhead. Clean array indexing in `process_events`. + +### Step 7: Compare all three +```bash +python3 config_validator.py +``` + +Expected output shows precomputed ~2x faster than memoized. + +--- + +## Key Profiling Techniques + +### Finding where time is spent +```bash +python3 -m cProfile -s tottime script.py # Sort by time in function itself +python3 -m cProfile -s cumtime script.py # Sort by cumulative time (includes callees) +``` + +### Understanding the columns +- `ncalls`: Number of calls +- `tottime`: Time spent in function (excluding callees) +- `cumtime`: Time spent in function (including callees) +- `percall`: Time per call + +--- + +## When to Use Each Approach | Approach | When to Use | |----------|-------------| -| No caching | Function is cheap OR called once per input | -| Memoization | Unknown/large input space, function is expensive | -| Precomputation | Known/small input space, amortize cost over many lookups | +| No caching | Function is cheap OR each input seen only once | +| Memoization (`@lru_cache`) | Unknown/large input space, expensive function | +| Precomputation | Known/small input space, many lookups, bounded integers | + +--- + +## Discussion Questions + +1. Why does `@lru_cache` have overhead? + - Hint: What happens on each call even for cache hits? + +2. When would memoization beat precomputation? + - Hint: What if there were 10,000 x 10,000 possible inputs but you only see 100? + +3. Could we make precomputation even faster? + - Hint: What about a flat array with `table[rule_id * 20 + event_type]`? + +--- ## Further Reading - `functools.lru_cache` documentation -- `functools.cache` (Python 3.9+) - unbounded cache, simpler API +- `functools.cache` (Python 3.9+) - unbounded cache, slightly less overhead +- NumPy arrays for truly O(1) array access diff --git a/scenario2-memoization/common.py b/scenario2-memoization/common.py new file mode 100644 index 0000000..3394907 --- /dev/null +++ b/scenario2-memoization/common.py @@ -0,0 +1,35 @@ +#!/usr/bin/env python3 +""" +Shared code for config validator examples. +""" + +import pickle +from pathlib import Path + +# The set of all valid (rule_id, event_type) pairs we'll encounter +RULES = range(20) # 0-19 (small, bounded input space) +EVENT_TYPES = range(20) # 0-19 + +EVENTS_FILE = Path(__file__).parent / "events.pkl" + + +def validate_rule_slow(rule_id, event_type): + """ + Simulate an expensive validation check. + In real life, this might query a database, parse XML, etc. + """ + total = 0 + for i in range(50): + total += (rule_id * event_type * i) % 997 + return total % 2 == 0 + + +def load_events(): + """Load events from the pickle file.""" + if not EVENTS_FILE.exists(): + raise FileNotFoundError( + f"Events file not found: {EVENTS_FILE}\n" + "Run 'python3 generate_events.py' first." + ) + with open(EVENTS_FILE, "rb") as f: + return pickle.load(f) diff --git a/scenario2-memoization/config_validator.py b/scenario2-memoization/config_validator.py index 784db3c..ce3d800 100644 --- a/scenario2-memoization/config_validator.py +++ b/scenario2-memoization/config_validator.py @@ -1,147 +1,59 @@ #!/usr/bin/env python3 """ -Scenario 2b: The Precomputation Insight -======================================= -This simulates a config validator that checks rules against events. -The "expensive" validation function is called repeatedly with the same inputs. +Config Validator Comparison +=========================== +Runs all three validation strategies and compares performance. -This example shows three stages of optimization: -1. Naive: call the function every time -2. Memoized: cache results with @lru_cache -3. Precomputed: since inputs are known ahead of time, build a lookup table +Run generate_events.py first to create test data. -EXERCISES: -1. Run each version and compare times -2. Profile each version - observe ncalls and cumtime -3. Think about: when is precomputation better than memoization? +Usage: + python3 generate_events.py 100000 + python3 config_validator.py """ -import sys import time -from functools import lru_cache + +from common import load_events + +import config_validator_naive +import config_validator_memoized +import config_validator_precomputed -# Simulated "expensive" validation function -def validate_rule_slow(rule_id, event_type): - """ - Simulate an expensive validation check. - In real life, this might query a database, parse XML, etc. - """ - # Artificial delay to simulate expensive computation - total = 0 - for i in range(10000): - total += (rule_id * event_type * i) % 997 - return total % 2 == 0 # Returns True or False +ITERATIONS = 5 -# The set of all valid (rule_id, event_type) pairs we'll encounter -RULES = [1, 2, 3, 4, 5] -EVENT_TYPES = [10, 20, 30, 40, 50] +def benchmark(name, func, events, setup=None): + """Run a function multiple times and report average timing.""" + times = [] + for i in range(ITERATIONS): + if setup and i == 0: + setup() + start = time.perf_counter() + result = func(events) + times.append(time.perf_counter() - start) - -def process_events_naive(events): - """Process events using naive repeated validation.""" - valid_count = 0 - for rule_id, event_type, data in events: - if validate_rule_slow(rule_id, event_type): - valid_count += 1 - return valid_count - - -# Memoized version -@lru_cache(maxsize=None) -def validate_rule_cached(rule_id, event_type): - """Same validation but with caching.""" - total = 0 - for i in range(10000): - total += (rule_id * event_type * i) % 997 - return total % 2 == 0 - - -def process_events_memoized(events): - """Process events using memoized validation.""" - valid_count = 0 - for rule_id, event_type, data in events: - if validate_rule_cached(rule_id, event_type): - valid_count += 1 - return valid_count - - -# Precomputed version -def build_validation_table(): - """ - Build a lookup table for all possible (rule_id, event_type) combinations. - This is O(n*m) upfront but O(1) per lookup thereafter. - """ - table = {} - for rule_id in RULES: - for event_type in EVENT_TYPES: - table[(rule_id, event_type)] = validate_rule_slow(rule_id, event_type) - return table - - -VALIDATION_TABLE = None # Lazy initialization - - -def process_events_precomputed(events): - """Process events using precomputed lookup table.""" - global VALIDATION_TABLE - if VALIDATION_TABLE is None: - VALIDATION_TABLE = build_validation_table() - - valid_count = 0 - for rule_id, event_type, data in events: - if VALIDATION_TABLE[(rule_id, event_type)]: - valid_count += 1 - return valid_count - - -def generate_test_events(n): - """Generate n random test events.""" - import random - random.seed(42) # Reproducible - events = [] - for i in range(n): - rule_id = random.choice(RULES) - event_type = random.choice(EVENT_TYPES) - data = f"event_{i}" - events.append((rule_id, event_type, data)) - return events - - -def benchmark(name, func, events): - """Run a function and report timing.""" - start = time.perf_counter() - result = func(events) - elapsed = time.perf_counter() - start - print(f"{name:20s}: {elapsed:.3f}s (valid: {result})") - return elapsed + avg = sum(times) / len(times) + print(f"{name:20s}: {avg:.3f}s avg (valid: {result})") + return avg def main(): - n_events = 5000 - if len(sys.argv) > 1: - n_events = int(sys.argv[1]) - - print(f"Processing {n_events} events...") - print(f"Unique (rule, event_type) combinations: {len(RULES) * len(EVENT_TYPES)}") + events = load_events() + print(f"Processing {len(events)} events, {ITERATIONS} iterations each...") print() - - events = generate_test_events(n_events) - - # Reset cached function for fair comparison - validate_rule_cached.cache_clear() - global VALIDATION_TABLE - VALIDATION_TABLE = None - - t_naive = benchmark("Naive", process_events_naive, events) - - validate_rule_cached.cache_clear() - t_memo = benchmark("Memoized", process_events_memoized, events) - - VALIDATION_TABLE = None - t_pre = benchmark("Precomputed", process_events_precomputed, events) - + + t_naive = benchmark("Naive", config_validator_naive.process_events, events) + + t_memo = benchmark( + "Memoized", + config_validator_memoized.process_events, + events, + setup=config_validator_memoized.validate_rule_cached.cache_clear + ) + + t_pre = benchmark("Precomputed", config_validator_precomputed.process_events, events) + print() print(f"Speedup (memo vs naive): {t_naive/t_memo:.1f}x") print(f"Speedup (precomp vs naive): {t_naive/t_pre:.1f}x") diff --git a/scenario2-memoization/config_validator_memoized.py b/scenario2-memoization/config_validator_memoized.py new file mode 100644 index 0000000..ec08a36 --- /dev/null +++ b/scenario2-memoization/config_validator_memoized.py @@ -0,0 +1,53 @@ +#!/usr/bin/env python3 +""" +Memoized config validator - uses @lru_cache. +Profile this to see the lru_cache wrapper overhead. + +Usage: + python3 config_validator_memoized.py + python3 -m cProfile -s tottime config_validator_memoized.py +""" + +import time +from functools import lru_cache + +from common import validate_rule_slow, load_events + + +@lru_cache(maxsize=None) +def validate_rule_cached(rule_id, event_type): + """Same validation but with caching.""" + return validate_rule_slow(rule_id, event_type) + + +def process_events(events): + """Process events using memoized validation.""" + valid_count = 0 + for rule_id, event_type, data in events: + if validate_rule_cached(rule_id, event_type): + valid_count += 1 + return valid_count + + +ITERATIONS = 5 + + +def main(): + events = load_events() + print(f"Processing {len(events)} events (memoized), {ITERATIONS} iterations...") + + times = [] + for i in range(ITERATIONS): + if i == 0: + validate_rule_cached.cache_clear() # Cold start on first run + start = time.perf_counter() + valid_count = process_events(events) + times.append(time.perf_counter() - start) + + avg = sum(times) / len(times) + print(f"Valid: {valid_count}") + print(f"Avg time: {avg:.3f}s") + + +if __name__ == "__main__": + main() diff --git a/scenario2-memoization/config_validator_naive.py b/scenario2-memoization/config_validator_naive.py new file mode 100644 index 0000000..53483c3 --- /dev/null +++ b/scenario2-memoization/config_validator_naive.py @@ -0,0 +1,44 @@ +#!/usr/bin/env python3 +""" +Naive config validator - no caching. +Profile this to see repeated validate_rule_slow calls. + +Usage: + python3 config_validator_naive.py + python3 -m cProfile -s tottime config_validator_naive.py +""" + +import time + +from common import validate_rule_slow, load_events + + +def process_events(events): + """Process events using naive repeated validation.""" + valid_count = 0 + for rule_id, event_type, data in events: + if validate_rule_slow(rule_id, event_type): + valid_count += 1 + return valid_count + + +ITERATIONS = 5 + + +def main(): + events = load_events() + print(f"Processing {len(events)} events (naive), {ITERATIONS} iterations...") + + times = [] + for _ in range(ITERATIONS): + start = time.perf_counter() + valid_count = process_events(events) + times.append(time.perf_counter() - start) + + avg = sum(times) / len(times) + print(f"Valid: {valid_count}") + print(f"Avg time: {avg:.3f}s") + + +if __name__ == "__main__": + main() diff --git a/scenario2-memoization/config_validator_precomputed.py b/scenario2-memoization/config_validator_precomputed.py new file mode 100644 index 0000000..605b54e --- /dev/null +++ b/scenario2-memoization/config_validator_precomputed.py @@ -0,0 +1,65 @@ +#!/usr/bin/env python3 +""" +Precomputed config validator - uses 2D array lookup. +Profile this to see clean array indexing with no wrapper overhead. + +Usage: + python3 config_validator_precomputed.py + python3 -m cProfile -s tottime config_validator_precomputed.py +""" + +import time + +from common import validate_rule_slow, load_events, RULES, EVENT_TYPES + + +def build_validation_table(): + """ + Build a 2D lookup table for all possible (rule_id, event_type) combinations. + Array indexing is faster than hash-based lookup because: + - No hash computation needed + - Direct memory offset calculation + - Better CPU cache locality + """ + table = [] + for rule_id in range(max(RULES) + 1): + row = [] + for event_type in range(max(EVENT_TYPES) + 1): + row.append(validate_rule_slow(rule_id, event_type)) + table.append(row) + return table + + +# Build table at module load time (simulates startup initialization) +VALIDATION_TABLE = build_validation_table() + + +def process_events(events): + """Process events using precomputed 2D lookup table.""" + valid_count = 0 + for rule_id, event_type, data in events: + if VALIDATION_TABLE[rule_id][event_type]: + valid_count += 1 + return valid_count + + +ITERATIONS = 5 + + +def main(): + events = load_events() + print(f"Processing {len(events)} events (precomputed), {ITERATIONS} iterations...") + + times = [] + for _ in range(ITERATIONS): + start = time.perf_counter() + valid_count = process_events(events) + times.append(time.perf_counter() - start) + + avg = sum(times) / len(times) + print(f"Valid: {valid_count}") + print(f"Avg time: {avg:.3f}s") + + +if __name__ == "__main__": + main() diff --git a/scenario2-memoization/generate_events.py b/scenario2-memoization/generate_events.py new file mode 100644 index 0000000..4c5887c --- /dev/null +++ b/scenario2-memoization/generate_events.py @@ -0,0 +1,42 @@ +#!/usr/bin/env python3 +""" +Generate test events and save to file. +Run this before profiling the validator scripts. +""" + +import pickle +import random +import sys + +from common import RULES, EVENT_TYPES, EVENTS_FILE + + +def generate_test_events(n): + """Generate n random test events.""" + random.seed(42) # Reproducible + events = [] + for i in range(n): + rule_id = random.choice(RULES) + event_type = random.choice(EVENT_TYPES) + data = f"event_{i}" + events.append((rule_id, event_type, data)) + return events + + +def main(): + n_events = 100000 + if len(sys.argv) > 1: + n_events = int(sys.argv[1]) + + print(f"Generating {n_events} events...") + events = generate_test_events(n_events) + + with open(EVENTS_FILE, "wb") as f: + pickle.dump(events, f) + + print(f"Saved to {EVENTS_FILE}") + print(f"Unique (rule, event_type) combinations: {len(RULES) * len(EVENT_TYPES)}") + + +if __name__ == "__main__": + main()