scenario 2: improve lrucache vs list bit

2026-01-10 19:50:13 +05:30 · 2026-01-10 19:50:13 +05:30 · 596ae02dd4
commit 596ae02dd4
parent 7e8b1191fa
8 changed files with 399 additions and 179 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,3 +1,6 @@
 *~
 result
 *.qcow2
 events.pkl
 __pycache__
 *.svg
--- a/scenario2-memoization/README.md
+++ b/scenario2-memoization/README.md
@ -1,96 +1,162 @@
 # Scenario 2: Memoization and Precomputation
 ## Learning Objectives
- Read cProfile output to identify redundant function calls
+- Use cProfile to identify performance bottlenecks
- Use `@functools.lru_cache` for automatic memoization
+- Recognize when `@lru_cache` becomes a bottleneck itself
- Recognize when precomputation beats memoization
+- Understand when precomputation beats memoization
- Understand space-time trade-offs
+- Learn to read profiler output to guide optimization decisions
 ## Files
 ### Fibonacci Example
 - `fib_slow.py` - Naive recursive Fibonacci (exponential time)
 - `fib_cached.py` - Memoized Fibonacci (linear time)
 - `config_validator.py` - Comparison of naive, memoized, and precomputed approaches
-## Exercise 1: Fibonacci
+### Config Validator Example
 - `generate_events.py` - Generate test data (run first)
 - `config_validator_naive.py` - Baseline: no caching
 - `config_validator_memoized.py` - Uses `@lru_cache`
 - `config_validator_precomputed.py` - Uses 2D array lookup
 - `config_validator.py` - Comparison runner
 - `common.py` - Shared code
 ---
 ## Exercise 1: Fibonacci (Identifying Redundant Calls)
 ### Step 1: Experience the slowness
 ```bash
 time python3 fib_slow.py 35
 ```
-
+This takes several seconds. Don't try n=50!
 This should take several seconds. Don't try n=50!
 ### Step 2: Profile to understand why
 ```bash
-python3 -m cProfile -s ncalls fib_slow.py 35 2>&1 | head -20
+python3 -m cProfile -s ncalls fib_slow.py 35
 ```
-Key insight: Look at `ncalls` for the `fib` function. For fib(35), it's called
+Look at `ncalls` for the `fib` function - it's called millions of times because
-millions of times because we recompute the same values repeatedly.
+the same values are recomputed repeatedly.
-The call tree looks like:
+### Step 3: Apply memoization and verify
 ```
 fib(5)
 ├── fib(4)
 │   ├── fib(3)
 │   │   ├── fib(2)
 │   │   └── fib(1)
 │   └── fib(2)
 └── fib(3)        <-- Same as above! Redundant!
    ├── fib(2)
    └── fib(1)
 ```
 ### Step 3: Apply memoization
 ```bash
 time python3 fib_cached.py 35
 python3 -m cProfile -s ncalls fib_cached.py 35
 ```
-Now try a much larger value:
+The `ncalls` drops from millions to ~35.
 ---
 ## Exercise 2: Config Validator (When Caching Becomes the Bottleneck)
 This exercise demonstrates a common pattern: you add caching, get a big speedup,
 but then discover the cache itself is now the bottleneck.
 ### Step 1: Generate test data
 ```bash
-time python3 fib_cached.py 100
+python3 generate_events.py 100000
 ```
-### Step 4: Verify the improvement
+### Step 2: Profile the naive version
 ```bash
-python3 -m cProfile -s ncalls fib_cached.py 35 2>&1 | head -20
+python3 -m cProfile -s tottime config_validator_naive.py
 ```
-The `ncalls` should now be O(n) instead of O(2^n).
+**What to look for:** `validate_rule_slow` dominates the profile. It's called
 100,000 times even though there are only 400 unique input combinations.
-## Exercise 2: Config Validator
+### Step 3: Add memoization - big improvement!
 This example shows when precomputation is better than memoization.
 ### Run all three strategies
 ```bash
-python3 config_validator.py 5000
+python3 -m cProfile -s tottime config_validator_memoized.py
 ```
-### Profile to understand the differences
+**Observation:** Dramatic speedup! But look carefully at the profile...
 ### Step 4: Identify the new bottleneck
 Compare `process_events` time between memoized and precomputed:
 ```bash
-python3 -m cProfile -s cumtime config_validator.py 5000
+python3 -m cProfile -s tottime config_validator_memoized.py
 python3 -m cProfile -s tottime config_validator_precomputed.py
 ```
-### Discussion Questions
+**Key insight:** Compare the `process_events` tottime:
-1. Why is precomputation faster than memoization here?
+- Memoized: ~0.014s
-   - Hint: How many unique inputs are there?
+- Precomputed: ~0.004s (3.5x faster!)
   - Hint: What's the overhead of cache lookup vs dict lookup?
-2. When would memoization be better than precomputation?
+The cache lookup overhead now dominates because:
-   - Hint: What if there were 10,000 rules and 10,000 event types?
+- The validation function is cheap (only 50 iterations)
-   - Hint: What if we didn't know the inputs ahead of time?
+- But we do 100,000 cache lookups
 - Each lookup involves: tuple creation for the key, hashing, dict lookup
-3. What's the memory trade-off?
+### Step 5: Hypothesis - can we beat the cache?
-## Key Takeaways
+When the input space is **small and bounded** (400 combinations), we can:
 1. Precompute all results into a 2D array
 2. Use array indexing instead of hash-based lookup
 Array indexing is faster because:
 - No hash computation
 - Direct memory offset calculation
 - Better CPU cache locality
 ### Step 6: Profile the precomputed version
 ```bash
 python3 -m cProfile -s tottime config_validator_precomputed.py
 ```
 **Observation:** No wrapper overhead. Clean array indexing in `process_events`.
 ### Step 7: Compare all three
 ```bash
 python3 config_validator.py
 ```
 Expected output shows precomputed ~2x faster than memoized.
 ---
 ## Key Profiling Techniques
 ### Finding where time is spent
 ```bash
 python3 -m cProfile -s tottime script.py    # Sort by time in function itself
 python3 -m cProfile -s cumtime script.py    # Sort by cumulative time (includes callees)
 ```
 ### Understanding the columns
 - `ncalls`: Number of calls
 - `tottime`: Time spent in function (excluding callees)
 - `cumtime`: Time spent in function (including callees)
 - `percall`: Time per call
 ---
 ## When to Use Each Approach
 | Approach | When to Use |
 |----------|-------------|
-| No caching | Function is cheap OR called once per input |
+| No caching | Function is cheap OR each input seen only once |
-| Memoization | Unknown/large input space, function is expensive |
+| Memoization (`@lru_cache`) | Unknown/large input space, expensive function |
-| Precomputation | Known/small input space, amortize cost over many lookups |
+| Precomputation | Known/small input space, many lookups, bounded integers |
 ---
 ## Discussion Questions
 1. Why does `@lru_cache` have overhead?
   - Hint: What happens on each call even for cache hits?
 2. When would memoization beat precomputation?
   - Hint: What if there were 10,000 x 10,000 possible inputs but you only see 100?
 3. Could we make precomputation even faster?
   - Hint: What about a flat array with `table[rule_id * 20 + event_type]`?
 ---
 ## Further Reading
 - `functools.lru_cache` documentation
- `functools.cache` (Python 3.9+) - unbounded cache, simpler API
+- `functools.cache` (Python 3.9+) - unbounded cache, slightly less overhead
 - NumPy arrays for truly O(1) array access
--- a/scenario2-memoization/common.py
+++ b/scenario2-memoization/common.py
@ -0,0 +1,35 @@
 #!/usr/bin/env python3
 """
 Shared code for config validator examples.
 """
 import pickle
 from pathlib import Path
 # The set of all valid (rule_id, event_type) pairs we'll encounter
 RULES = range(20)       # 0-19 (small, bounded input space)
 EVENT_TYPES = range(20) # 0-19
 EVENTS_FILE = Path(__file__).parent / "events.pkl"
 def validate_rule_slow(rule_id, event_type):
    """
    Simulate an expensive validation check.
    In real life, this might query a database, parse XML, etc.
    """
    total = 0
    for i in range(50):
        total += (rule_id * event_type * i) % 997
    return total % 2 == 0
 def load_events():
    """Load events from the pickle file."""
    if not EVENTS_FILE.exists():
        raise FileNotFoundError(
            f"Events file not found: {EVENTS_FILE}\n"
            "Run 'python3 generate_events.py' first."
        )
    with open(EVENTS_FILE, "rb") as f:
        return pickle.load(f)
--- a/scenario2-memoization/config_validator.py
+++ b/scenario2-memoization/config_validator.py
@ -1,146 +1,58 @@
 #!/usr/bin/env python3
 """
-Scenario 2b: The Precomputation Insight
+Config Validator Comparison
-=======================================
+===========================
-This simulates a config validator that checks rules against events.
+Runs all three validation strategies and compares performance.
 The "expensive" validation function is called repeatedly with the same inputs.
-This example shows three stages of optimization:
+Run generate_events.py first to create test data.
 1. Naive: call the function every time
 2. Memoized: cache results with @lru_cache
 3. Precomputed: since inputs are known ahead of time, build a lookup table
-EXERCISES:
+Usage:
-1. Run each version and compare times
+    python3 generate_events.py 100000
-2. Profile each version - observe ncalls and cumtime
+    python3 config_validator.py
 3. Think about: when is precomputation better than memoization?
 """
 import sys
 import time
-from functools import lru_cache
+
 from common import load_events
 import config_validator_naive
 import config_validator_memoized
 import config_validator_precomputed
-# Simulated "expensive" validation function
+ITERATIONS = 5
 def validate_rule_slow(rule_id, event_type):
    """
    Simulate an expensive validation check.
    In real life, this might query a database, parse XML, etc.
    """
    # Artificial delay to simulate expensive computation
    total = 0
    for i in range(10000):
        total += (rule_id * event_type * i) % 997
    return total % 2 == 0  # Returns True or False
-# The set of all valid (rule_id, event_type) pairs we'll encounter
+def benchmark(name, func, events, setup=None):
-RULES = [1, 2, 3, 4, 5]
+    """Run a function multiple times and report average timing."""
-EVENT_TYPES = [10, 20, 30, 40, 50]
+    times = []
    for i in range(ITERATIONS):
        if setup and i == 0:
            setup()
        start = time.perf_counter()
        result = func(events)
        times.append(time.perf_counter() - start)
-
+    avg = sum(times) / len(times)
-def process_events_naive(events):
+    print(f"{name:20s}: {avg:.3f}s avg (valid: {result})")
-    """Process events using naive repeated validation."""
+    return avg
    valid_count = 0
    for rule_id, event_type, data in events:
        if validate_rule_slow(rule_id, event_type):
            valid_count += 1
    return valid_count
 # Memoized version
@lru_cache(maxsize=None)
 def validate_rule_cached(rule_id, event_type):
    """Same validation but with caching."""
    total = 0
    for i in range(10000):
        total += (rule_id * event_type * i) % 997
    return total % 2 == 0
 def process_events_memoized(events):
    """Process events using memoized validation."""
    valid_count = 0
    for rule_id, event_type, data in events:
        if validate_rule_cached(rule_id, event_type):
            valid_count += 1
    return valid_count
 # Precomputed version
 def build_validation_table():
    """
    Build a lookup table for all possible (rule_id, event_type) combinations.
    This is O(n*m) upfront but O(1) per lookup thereafter.
    """
    table = {}
    for rule_id in RULES:
        for event_type in EVENT_TYPES:
            table[(rule_id, event_type)] = validate_rule_slow(rule_id, event_type)
    return table
 VALIDATION_TABLE = None  # Lazy initialization
 def process_events_precomputed(events):
    """Process events using precomputed lookup table."""
    global VALIDATION_TABLE
    if VALIDATION_TABLE is None:
        VALIDATION_TABLE = build_validation_table()
    valid_count = 0
    for rule_id, event_type, data in events:
        if VALIDATION_TABLE[(rule_id, event_type)]:
            valid_count += 1
    return valid_count
 def generate_test_events(n):
    """Generate n random test events."""
    import random
    random.seed(42)  # Reproducible
    events = []
    for i in range(n):
        rule_id = random.choice(RULES)
        event_type = random.choice(EVENT_TYPES)
        data = f"event_{i}"
        events.append((rule_id, event_type, data))
    return events
 def benchmark(name, func, events):
    """Run a function and report timing."""
    start = time.perf_counter()
    result = func(events)
    elapsed = time.perf_counter() - start
    print(f"{name:20s}: {elapsed:.3f}s (valid: {result})")
    return elapsed
 def main():
-    n_events = 5000
+    events = load_events()
-    if len(sys.argv) > 1:
+    print(f"Processing {len(events)} events, {ITERATIONS} iterations each...")
        n_events = int(sys.argv[1])
    print(f"Processing {n_events} events...")
    print(f"Unique (rule, event_type) combinations: {len(RULES) * len(EVENT_TYPES)}")
    print()
-    events = generate_test_events(n_events)
+    t_naive = benchmark("Naive", config_validator_naive.process_events, events)
-    # Reset cached function for fair comparison
+    t_memo = benchmark(
-    validate_rule_cached.cache_clear()
+        "Memoized",
-    global VALIDATION_TABLE
+        config_validator_memoized.process_events,
-    VALIDATION_TABLE = None
+        events,
        setup=config_validator_memoized.validate_rule_cached.cache_clear
    )
-    t_naive = benchmark("Naive", process_events_naive, events)
+    t_pre = benchmark("Precomputed", config_validator_precomputed.process_events, events)
    validate_rule_cached.cache_clear()
    t_memo = benchmark("Memoized", process_events_memoized, events)
    VALIDATION_TABLE = None
    t_pre = benchmark("Precomputed", process_events_precomputed, events)
    print()
    print(f"Speedup (memo vs naive):    {t_naive/t_memo:.1f}x")
--- a/scenario2-memoization/config_validator_memoized.py
+++ b/scenario2-memoization/config_validator_memoized.py
@ -0,0 +1,53 @@
 #!/usr/bin/env python3
 """
 Memoized config validator - uses @lru_cache.
 Profile this to see the lru_cache wrapper overhead.
 Usage:
    python3 config_validator_memoized.py
    python3 -m cProfile -s tottime config_validator_memoized.py
 """
 import time
 from functools import lru_cache
 from common import validate_rule_slow, load_events
@lru_cache(maxsize=None)
 def validate_rule_cached(rule_id, event_type):
    """Same validation but with caching."""
    return validate_rule_slow(rule_id, event_type)
 def process_events(events):
    """Process events using memoized validation."""
    valid_count = 0
    for rule_id, event_type, data in events:
        if validate_rule_cached(rule_id, event_type):
            valid_count += 1
    return valid_count
 ITERATIONS = 5
 def main():
    events = load_events()
    print(f"Processing {len(events)} events (memoized), {ITERATIONS} iterations...")
    times = []
    for i in range(ITERATIONS):
        if i == 0:
            validate_rule_cached.cache_clear()  # Cold start on first run
        start = time.perf_counter()
        valid_count = process_events(events)
        times.append(time.perf_counter() - start)
    avg = sum(times) / len(times)
    print(f"Valid: {valid_count}")
    print(f"Avg time: {avg:.3f}s")
 if __name__ == "__main__":
    main()
--- a/scenario2-memoization/config_validator_naive.py
+++ b/scenario2-memoization/config_validator_naive.py
@ -0,0 +1,44 @@
 #!/usr/bin/env python3
 """
 Naive config validator - no caching.
 Profile this to see repeated validate_rule_slow calls.
 Usage:
    python3 config_validator_naive.py
    python3 -m cProfile -s tottime config_validator_naive.py
 """
 import time
 from common import validate_rule_slow, load_events
 def process_events(events):
    """Process events using naive repeated validation."""
    valid_count = 0
    for rule_id, event_type, data in events:
        if validate_rule_slow(rule_id, event_type):
            valid_count += 1
    return valid_count
 ITERATIONS = 5
 def main():
    events = load_events()
    print(f"Processing {len(events)} events (naive), {ITERATIONS} iterations...")
    times = []
    for _ in range(ITERATIONS):
        start = time.perf_counter()
        valid_count = process_events(events)
        times.append(time.perf_counter() - start)
    avg = sum(times) / len(times)
    print(f"Valid: {valid_count}")
    print(f"Avg time: {avg:.3f}s")
 if __name__ == "__main__":
    main()
--- a/scenario2-memoization/config_validator_precomputed.py
+++ b/scenario2-memoization/config_validator_precomputed.py
@ -0,0 +1,65 @@
 #!/usr/bin/env python3
 """
 Precomputed config validator - uses 2D array lookup.
 Profile this to see clean array indexing with no wrapper overhead.
 Usage:
    python3 config_validator_precomputed.py
    python3 -m cProfile -s tottime config_validator_precomputed.py
 """
 import time
 from common import validate_rule_slow, load_events, RULES, EVENT_TYPES
 def build_validation_table():
    """
    Build a 2D lookup table for all possible (rule_id, event_type) combinations.
    Array indexing is faster than hash-based lookup because:
    - No hash computation needed
    - Direct memory offset calculation
    - Better CPU cache locality
    """
    table = []
    for rule_id in range(max(RULES) + 1):
        row = []
        for event_type in range(max(EVENT_TYPES) + 1):
            row.append(validate_rule_slow(rule_id, event_type))
        table.append(row)
    return table
 # Build table at module load time (simulates startup initialization)
 VALIDATION_TABLE = build_validation_table()
 def process_events(events):
    """Process events using precomputed 2D lookup table."""
    valid_count = 0
    for rule_id, event_type, data in events:
        if VALIDATION_TABLE[rule_id][event_type]:
            valid_count += 1
    return valid_count
 ITERATIONS = 5
 def main():
    events = load_events()
    print(f"Processing {len(events)} events (precomputed), {ITERATIONS} iterations...")
    times = []
    for _ in range(ITERATIONS):
        start = time.perf_counter()
        valid_count = process_events(events)
        times.append(time.perf_counter() - start)
    avg = sum(times) / len(times)
    print(f"Valid: {valid_count}")
    print(f"Avg time: {avg:.3f}s")
 if __name__ == "__main__":
    main()
--- a/scenario2-memoization/generate_events.py
+++ b/scenario2-memoization/generate_events.py
@ -0,0 +1,42 @@
 #!/usr/bin/env python3
 """
 Generate test events and save to file.
 Run this before profiling the validator scripts.
 """
 import pickle
 import random
 import sys
 from common import RULES, EVENT_TYPES, EVENTS_FILE
 def generate_test_events(n):
    """Generate n random test events."""
    random.seed(42)  # Reproducible
    events = []
    for i in range(n):
        rule_id = random.choice(RULES)
        event_type = random.choice(EVENT_TYPES)
        data = f"event_{i}"
        events.append((rule_id, event_type, data))
    return events
 def main():
    n_events = 100000
    if len(sys.argv) > 1:
        n_events = int(sys.argv[1])
    print(f"Generating {n_events} events...")
    events = generate_test_events(n_events)
    with open(EVENTS_FILE, "wb") as f:
        pickle.dump(events, f)
    print(f"Saved to {EVENTS_FILE}")
    print(f"Unique (rule, event_type) combinations: {len(RULES) * len(EVENT_TYPES)}")
 if __name__ == "__main__":
    main()