scenario 2: improve lrucache vs list bit

2026-01-10 19:50:13 +05:30 · 2026-01-10 19:50:13 +05:30 · 596ae02dd4
commit 596ae02dd4
parent 7e8b1191fa
8 changed files with 399 additions and 179 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,3 +1,6 @@
 *~
 result
 *.qcow2
+events.pkl
+__pycache__
+*.svg
--- a/scenario2-memoization/README.md
+++ b/scenario2-memoization/README.md
@ -1,96 +1,162 @@
 # Scenario 2: Memoization and Precomputation

 ## Learning Objectives
- Read cProfile output to identify redundant function calls
- Use `@functools.lru_cache` for automatic memoization
- Recognize when precomputation beats memoization
- Understand space-time trade-offs
+- Use cProfile to identify performance bottlenecks
+- Recognize when `@lru_cache` becomes a bottleneck itself
+- Understand when precomputation beats memoization
+- Learn to read profiler output to guide optimization decisions

 ## Files
+
+### Fibonacci Example
 - `fib_slow.py` - Naive recursive Fibonacci (exponential time)
 - `fib_cached.py` - Memoized Fibonacci (linear time)
- `config_validator.py` - Comparison of naive, memoized, and precomputed approaches

-## Exercise 1: Fibonacci
+### Config Validator Example
+- `generate_events.py` - Generate test data (run first)
+- `config_validator_naive.py` - Baseline: no caching
+- `config_validator_memoized.py` - Uses `@lru_cache`
+- `config_validator_precomputed.py` - Uses 2D array lookup
+- `config_validator.py` - Comparison runner
+- `common.py` - Shared code
+
+---
+
+## Exercise 1: Fibonacci (Identifying Redundant Calls)

 ### Step 1: Experience the slowness
 ```bash
 time python3 fib_slow.py 35
 ```
-
-This should take several seconds. Don't try n=50!
+This takes several seconds. Don't try n=50!

 ### Step 2: Profile to understand why
 ```bash
-python3 -m cProfile -s ncalls fib_slow.py 35 2>&1 | head -20
+python3 -m cProfile -s ncalls fib_slow.py 35
 ```

-Key insight: Look at `ncalls` for the `fib` function. For fib(35), it's called
-millions of times because we recompute the same values repeatedly.
+Look at `ncalls` for the `fib` function - it's called millions of times because
+the same values are recomputed repeatedly.

-The call tree looks like:
-```
-fib(5)
-├── fib(4)
-│   ├── fib(3)
-│   │   ├── fib(2)
-│   │   └── fib(1)
-│   └── fib(2)
-└── fib(3)        <-- Same as above! Redundant!
-    ├── fib(2)
-    └── fib(1)
-```
-
-### Step 3: Apply memoization
+### Step 3: Apply memoization and verify
 ```bash
 time python3 fib_cached.py 35
+python3 -m cProfile -s ncalls fib_cached.py 35
 ```

-Now try a much larger value:
+The `ncalls` drops from millions to ~35.
+
+---
+
+## Exercise 2: Config Validator (When Caching Becomes the Bottleneck)
+
+This exercise demonstrates a common pattern: you add caching, get a big speedup,
+but then discover the cache itself is now the bottleneck.
+
+### Step 1: Generate test data
 ```bash
-time python3 fib_cached.py 100
+python3 generate_events.py 100000
 ```

-### Step 4: Verify the improvement
+### Step 2: Profile the naive version
 ```bash
-python3 -m cProfile -s ncalls fib_cached.py 35 2>&1 | head -20
+python3 -m cProfile -s tottime config_validator_naive.py
 ```

-The `ncalls` should now be O(n) instead of O(2^n).
+**What to look for:** `validate_rule_slow` dominates the profile. It's called
+100,000 times even though there are only 400 unique input combinations.

-## Exercise 2: Config Validator
-
-This example shows when precomputation is better than memoization.
-
-### Run all three strategies
+### Step 3: Add memoization - big improvement!
 ```bash
-python3 config_validator.py 5000
+python3 -m cProfile -s tottime config_validator_memoized.py
 ```

-### Profile to understand the differences
+**Observation:** Dramatic speedup! But look carefully at the profile...
+
+### Step 4: Identify the new bottleneck
+
+Compare `process_events` time between memoized and precomputed:
 ```bash
-python3 -m cProfile -s cumtime config_validator.py 5000
+python3 -m cProfile -s tottime config_validator_memoized.py
+python3 -m cProfile -s tottime config_validator_precomputed.py
 ```

-### Discussion Questions
-1. Why is precomputation faster than memoization here?
-   - Hint: How many unique inputs are there?
-   - Hint: What's the overhead of cache lookup vs dict lookup?
+**Key insight:** Compare the `process_events` tottime:
+- Memoized: ~0.014s
+- Precomputed: ~0.004s (3.5x faster!)

-2. When would memoization be better than precomputation?
-   - Hint: What if there were 10,000 rules and 10,000 event types?
-   - Hint: What if we didn't know the inputs ahead of time?
+The cache lookup overhead now dominates because:
+- The validation function is cheap (only 50 iterations)
+- But we do 100,000 cache lookups
+- Each lookup involves: tuple creation for the key, hashing, dict lookup

-3. What's the memory trade-off?
+### Step 5: Hypothesis - can we beat the cache?

-## Key Takeaways
+When the input space is **small and bounded** (400 combinations), we can:
+1. Precompute all results into a 2D array
+2. Use array indexing instead of hash-based lookup
+
+Array indexing is faster because:
+- No hash computation
+- Direct memory offset calculation
+- Better CPU cache locality
+
+### Step 6: Profile the precomputed version
+```bash
+python3 -m cProfile -s tottime config_validator_precomputed.py
+```
+
+**Observation:** No wrapper overhead. Clean array indexing in `process_events`.
+
+### Step 7: Compare all three
+```bash
+python3 config_validator.py
+```
+
+Expected output shows precomputed ~2x faster than memoized.
+
+---
+
+## Key Profiling Techniques
+
+### Finding where time is spent
+```bash
+python3 -m cProfile -s tottime script.py    # Sort by time in function itself
+python3 -m cProfile -s cumtime script.py    # Sort by cumulative time (includes callees)
+```
+
+### Understanding the columns
+- `ncalls`: Number of calls
+- `tottime`: Time spent in function (excluding callees)
+- `cumtime`: Time spent in function (including callees)
+- `percall`: Time per call
+
+---
+
+## When to Use Each Approach

 | Approach | When to Use |
 |----------|-------------|
-| No caching | Function is cheap OR called once per input |
-| Memoization | Unknown/large input space, function is expensive |
-| Precomputation | Known/small input space, amortize cost over many lookups |
+| No caching | Function is cheap OR each input seen only once |
+| Memoization (`@lru_cache`) | Unknown/large input space, expensive function |
+| Precomputation | Known/small input space, many lookups, bounded integers |
+
+---
+
+## Discussion Questions
+
+1. Why does `@lru_cache` have overhead?
+   - Hint: What happens on each call even for cache hits?
+
+2. When would memoization beat precomputation?
+   - Hint: What if there were 10,000 x 10,000 possible inputs but you only see 100?
+
+3. Could we make precomputation even faster?
+   - Hint: What about a flat array with `table[rule_id * 20 + event_type]`?
+
+---

 ## Further Reading
 - `functools.lru_cache` documentation
- `functools.cache` (Python 3.9+) - unbounded cache, simpler API
+- `functools.cache` (Python 3.9+) - unbounded cache, slightly less overhead
+- NumPy arrays for truly O(1) array access
--- a/scenario2-memoization/common.py
+++ b/scenario2-memoization/common.py
@ -0,0 +1,35 @@
+#!/usr/bin/env python3
+"""
+Shared code for config validator examples.
+"""
+
+import pickle
+from pathlib import Path
+
+# The set of all valid (rule_id, event_type) pairs we'll encounter
+RULES = range(20)       # 0-19 (small, bounded input space)
+EVENT_TYPES = range(20) # 0-19
+
+EVENTS_FILE = Path(__file__).parent / "events.pkl"
+
+
+def validate_rule_slow(rule_id, event_type):
+    """
+    Simulate an expensive validation check.
+    In real life, this might query a database, parse XML, etc.
+    """
+    total = 0
+    for i in range(50):
+        total += (rule_id * event_type * i) % 997
+    return total % 2 == 0
+
+
+def load_events():
+    """Load events from the pickle file."""
+    if not EVENTS_FILE.exists():
+        raise FileNotFoundError(
+            f"Events file not found: {EVENTS_FILE}\n"
+            "Run 'python3 generate_events.py' first."
+        )
+    with open(EVENTS_FILE, "rb") as f:
+        return pickle.load(f)
--- a/scenario2-memoization/config_validator.py
+++ b/scenario2-memoization/config_validator.py
@ -1,146 +1,58 @@
 #!/usr/bin/env python3
 """
-Scenario 2b: The Precomputation Insight
-=======================================
-This simulates a config validator that checks rules against events.
-The "expensive" validation function is called repeatedly with the same inputs.
+Config Validator Comparison
+===========================
+Runs all three validation strategies and compares performance.

-This example shows three stages of optimization:
-1. Naive: call the function every time
-2. Memoized: cache results with @lru_cache
-3. Precomputed: since inputs are known ahead of time, build a lookup table
+Run generate_events.py first to create test data.

-EXERCISES:
-1. Run each version and compare times
-2. Profile each version - observe ncalls and cumtime
-3. Think about: when is precomputation better than memoization?
+Usage:
+    python3 generate_events.py 100000
+    python3 config_validator.py
 """

-import sys
 import time
-from functools import lru_cache
+
+from common import load_events
+
+import config_validator_naive
+import config_validator_memoized
+import config_validator_precomputed


-# Simulated "expensive" validation function
-def validate_rule_slow(rule_id, event_type):
-    """
-    Simulate an expensive validation check.
-    In real life, this might query a database, parse XML, etc.
-    """
-    # Artificial delay to simulate expensive computation
-    total = 0
-    for i in range(10000):
-        total += (rule_id * event_type * i) % 997
-    return total % 2 == 0  # Returns True or False
+ITERATIONS = 5


-# The set of all valid (rule_id, event_type) pairs we'll encounter
-RULES = [1, 2, 3, 4, 5]
-EVENT_TYPES = [10, 20, 30, 40, 50]
+def benchmark(name, func, events, setup=None):
+    """Run a function multiple times and report average timing."""
+    times = []
+    for i in range(ITERATIONS):
+        if setup and i == 0:
+            setup()
+        start = time.perf_counter()
+        result = func(events)
+        times.append(time.perf_counter() - start)

-
-def process_events_naive(events):
-    """Process events using naive repeated validation."""
-    valid_count = 0
-    for rule_id, event_type, data in events:
-        if validate_rule_slow(rule_id, event_type):
-            valid_count += 1
-    return valid_count
-
-
-# Memoized version
-@lru_cache(maxsize=None)
-def validate_rule_cached(rule_id, event_type):
-    """Same validation but with caching."""
-    total = 0
-    for i in range(10000):
-        total += (rule_id * event_type * i) % 997
-    return total % 2 == 0
-
-
-def process_events_memoized(events):
-    """Process events using memoized validation."""
-    valid_count = 0
-    for rule_id, event_type, data in events:
-        if validate_rule_cached(rule_id, event_type):
-            valid_count += 1
-    return valid_count
-
-
-# Precomputed version
-def build_validation_table():
-    """
-    Build a lookup table for all possible (rule_id, event_type) combinations.
-    This is O(n*m) upfront but O(1) per lookup thereafter.
-    """
-    table = {}
-    for rule_id in RULES:
-        for event_type in EVENT_TYPES:
-            table[(rule_id, event_type)] = validate_rule_slow(rule_id, event_type)
-    return table
-
-
-VALIDATION_TABLE = None  # Lazy initialization
-
-
-def process_events_precomputed(events):
-    """Process events using precomputed lookup table."""
-    global VALIDATION_TABLE
-    if VALIDATION_TABLE is None:
-        VALIDATION_TABLE = build_validation_table()
-    
-    valid_count = 0
-    for rule_id, event_type, data in events:
-        if VALIDATION_TABLE[(rule_id, event_type)]:
-            valid_count += 1
-    return valid_count
-
-
-def generate_test_events(n):
-    """Generate n random test events."""
-    import random
-    random.seed(42)  # Reproducible
-    events = []
-    for i in range(n):
-        rule_id = random.choice(RULES)
-        event_type = random.choice(EVENT_TYPES)
-        data = f"event_{i}"
-        events.append((rule_id, event_type, data))
-    return events
-
-
-def benchmark(name, func, events):
-    """Run a function and report timing."""
-    start = time.perf_counter()
-    result = func(events)
-    elapsed = time.perf_counter() - start
-    print(f"{name:20s}: {elapsed:.3f}s (valid: {result})")
-    return elapsed
+    avg = sum(times) / len(times)
+    print(f"{name:20s}: {avg:.3f}s avg (valid: {result})")
+    return avg


 def main():
-    n_events = 5000
-    if len(sys.argv) > 1:
-        n_events = int(sys.argv[1])
-    
-    print(f"Processing {n_events} events...")
-    print(f"Unique (rule, event_type) combinations: {len(RULES) * len(EVENT_TYPES)}")
+    events = load_events()
+    print(f"Processing {len(events)} events, {ITERATIONS} iterations each...")
    print()

-    events = generate_test_events(n_events)
+    t_naive = benchmark("Naive", config_validator_naive.process_events, events)

-    # Reset cached function for fair comparison
-    validate_rule_cached.cache_clear()
-    global VALIDATION_TABLE
-    VALIDATION_TABLE = None
+    t_memo = benchmark(
+        "Memoized",
+        config_validator_memoized.process_events,
+        events,
+        setup=config_validator_memoized.validate_rule_cached.cache_clear
+    )

-    t_naive = benchmark("Naive", process_events_naive, events)
-    
-    validate_rule_cached.cache_clear()
-    t_memo = benchmark("Memoized", process_events_memoized, events)
-    
-    VALIDATION_TABLE = None
-    t_pre = benchmark("Precomputed", process_events_precomputed, events)
+    t_pre = benchmark("Precomputed", config_validator_precomputed.process_events, events)

    print()
    print(f"Speedup (memo vs naive):    {t_naive/t_memo:.1f}x")
--- a/scenario2-memoization/config_validator_memoized.py
+++ b/scenario2-memoization/config_validator_memoized.py
@ -0,0 +1,53 @@
+#!/usr/bin/env python3
+"""
+Memoized config validator - uses @lru_cache.
+Profile this to see the lru_cache wrapper overhead.
+
+Usage:
+    python3 config_validator_memoized.py
+    python3 -m cProfile -s tottime config_validator_memoized.py
+"""
+
+import time
+from functools import lru_cache
+
+from common import validate_rule_slow, load_events
+
+
+@lru_cache(maxsize=None)
+def validate_rule_cached(rule_id, event_type):
+    """Same validation but with caching."""
+    return validate_rule_slow(rule_id, event_type)
+
+
+def process_events(events):
+    """Process events using memoized validation."""
+    valid_count = 0
+    for rule_id, event_type, data in events:
+        if validate_rule_cached(rule_id, event_type):
+            valid_count += 1
+    return valid_count
+
+
+ITERATIONS = 5
+
+
+def main():
+    events = load_events()
+    print(f"Processing {len(events)} events (memoized), {ITERATIONS} iterations...")
+
+    times = []
+    for i in range(ITERATIONS):
+        if i == 0:
+            validate_rule_cached.cache_clear()  # Cold start on first run
+        start = time.perf_counter()
+        valid_count = process_events(events)
+        times.append(time.perf_counter() - start)
+
+    avg = sum(times) / len(times)
+    print(f"Valid: {valid_count}")
+    print(f"Avg time: {avg:.3f}s")
+
+
+if __name__ == "__main__":
+    main()
--- a/scenario2-memoization/config_validator_naive.py
+++ b/scenario2-memoization/config_validator_naive.py
@ -0,0 +1,44 @@
+#!/usr/bin/env python3
+"""
+Naive config validator - no caching.
+Profile this to see repeated validate_rule_slow calls.
+
+Usage:
+    python3 config_validator_naive.py
+    python3 -m cProfile -s tottime config_validator_naive.py
+"""
+
+import time
+
+from common import validate_rule_slow, load_events
+
+
+def process_events(events):
+    """Process events using naive repeated validation."""
+    valid_count = 0
+    for rule_id, event_type, data in events:
+        if validate_rule_slow(rule_id, event_type):
+            valid_count += 1
+    return valid_count
+
+
+ITERATIONS = 5
+
+
+def main():
+    events = load_events()
+    print(f"Processing {len(events)} events (naive), {ITERATIONS} iterations...")
+
+    times = []
+    for _ in range(ITERATIONS):
+        start = time.perf_counter()
+        valid_count = process_events(events)
+        times.append(time.perf_counter() - start)
+
+    avg = sum(times) / len(times)
+    print(f"Valid: {valid_count}")
+    print(f"Avg time: {avg:.3f}s")
+
+
+if __name__ == "__main__":
+    main()
--- a/scenario2-memoization/config_validator_precomputed.py
+++ b/scenario2-memoization/config_validator_precomputed.py
@ -0,0 +1,65 @@
+#!/usr/bin/env python3
+"""
+Precomputed config validator - uses 2D array lookup.
+Profile this to see clean array indexing with no wrapper overhead.
+
+Usage:
+    python3 config_validator_precomputed.py
+    python3 -m cProfile -s tottime config_validator_precomputed.py
+"""
+
+import time
+
+from common import validate_rule_slow, load_events, RULES, EVENT_TYPES
+
+
+def build_validation_table():
+    """
+    Build a 2D lookup table for all possible (rule_id, event_type) combinations.
+    Array indexing is faster than hash-based lookup because:
+    - No hash computation needed
+    - Direct memory offset calculation
+    - Better CPU cache locality
+    """
+    table = []
+    for rule_id in range(max(RULES) + 1):
+        row = []
+        for event_type in range(max(EVENT_TYPES) + 1):
+            row.append(validate_rule_slow(rule_id, event_type))
+        table.append(row)
+    return table
+
+
+# Build table at module load time (simulates startup initialization)
+VALIDATION_TABLE = build_validation_table()
+
+
+def process_events(events):
+    """Process events using precomputed 2D lookup table."""
+    valid_count = 0
+    for rule_id, event_type, data in events:
+        if VALIDATION_TABLE[rule_id][event_type]:
+            valid_count += 1
+    return valid_count
+
+
+ITERATIONS = 5
+
+
+def main():
+    events = load_events()
+    print(f"Processing {len(events)} events (precomputed), {ITERATIONS} iterations...")
+
+    times = []
+    for _ in range(ITERATIONS):
+        start = time.perf_counter()
+        valid_count = process_events(events)
+        times.append(time.perf_counter() - start)
+
+    avg = sum(times) / len(times)
+    print(f"Valid: {valid_count}")
+    print(f"Avg time: {avg:.3f}s")
+
+
+if __name__ == "__main__":
+    main()
--- a/scenario2-memoization/generate_events.py
+++ b/scenario2-memoization/generate_events.py
@ -0,0 +1,42 @@
+#!/usr/bin/env python3
+"""
+Generate test events and save to file.
+Run this before profiling the validator scripts.
+"""
+
+import pickle
+import random
+import sys
+
+from common import RULES, EVENT_TYPES, EVENTS_FILE
+
+
+def generate_test_events(n):
+    """Generate n random test events."""
+    random.seed(42)  # Reproducible
+    events = []
+    for i in range(n):
+        rule_id = random.choice(RULES)
+        event_type = random.choice(EVENT_TYPES)
+        data = f"event_{i}"
+        events.append((rule_id, event_type, data))
+    return events
+
+
+def main():
+    n_events = 100000
+    if len(sys.argv) > 1:
+        n_events = int(sys.argv[1])
+
+    print(f"Generating {n_events} events...")
+    events = generate_test_events(n_events)
+
+    with open(EVENTS_FILE, "wb") as f:
+        pickle.dump(events, f)
+
+    print(f"Saved to {EVENTS_FILE}")
+    print(f"Unique (rule, event_type) combinations: {len(RULES) * len(EVENT_TYPES)}")
+
+
+if __name__ == "__main__":
+    main()