init

2026-01-08 18:11:30 +05:30
commit 4fb1bd90db
32 changed files with 3058 additions and 0 deletions
--- a/scenario7-pyroscope/README.md
+++ b/scenario7-pyroscope/README.md
@@ -0,0 +1,195 @@
+# Scenario 7: Continuous Profiling with Pyroscope
+
+## Learning Objectives
+- Understand the difference between one-shot and continuous profiling
+- Set up and use Pyroscope for Python applications
+- Navigate the Pyroscope UI to find performance issues
+- Compare flamegraphs over time
+
+## Background
+
+**One-shot profiling** (what we've done so far):
+- Run profiler → Execute program → Stop → Analyze
+- Good for: reproducible tests, specific scenarios
+- Bad for: intermittent issues, production systems
+
+**Continuous profiling**:
+- Always running in the background
+- Low overhead (~2-5% CPU)
+- Aggregates data over time
+- Good for: production monitoring, finding intermittent issues
+
+## Files
+- `app.py` - Flask web application with Pyroscope instrumentation
+- `loadgen.sh` - Script to generate traffic
+- `requirements.txt` - Python dependencies
+
+## Setup
+
+### 1. Start Pyroscope Server
+
+Option A: Docker (recommended)
+```bash
+docker run -d --name pyroscope -p 4040:4040 grafana/pyroscope
+```
+
+Option B: Binary download
+```bash
+# Download from https://github.com/grafana/pyroscope/releases
+./pyroscope server
+```
+
+### 2. Install Python Dependencies
+```bash
+pip install -r requirements.txt
+# Or: pip install flask pyroscope-io
+```
+
+### 3. Start the Application
+```bash
+python3 app.py
+```
+
+### 4. Generate Load
+```bash
+chmod +x loadgen.sh
+./loadgen.sh http://localhost:5000 120  # 2 minutes of load
+```
+
+### 5. View Profiles
+Open http://localhost:4040 in your browser.
+
+## Exercise 1: Explore the Pyroscope UI
+
+1. Go to http://localhost:4040
+2. Select `workshop.flask.app` from the application dropdown
+3. Observe the flamegraph
+
+### UI Navigation
+- **Timeline**: Shows CPU usage over time, click to select time range
+- **Flamegraph**: Visual representation of where time is spent
+- **Table view**: Sortable list of functions by self/total time
+- **Diff view**: Compare two time ranges
+
+## Exercise 2: Find the Hot Function
+
+While `loadgen.sh` is running:
+
+1. Look at the flamegraph
+2. Find `compute_primes_slow` - it should be prominent
+3. Click on it to zoom in
+4. See the call stack leading to it
+
+## Exercise 3: Compare Cached vs Uncached
+
+1. Note the current time
+2. Stop `loadgen.sh`
+3. Modify `loadgen.sh` to only hit cached endpoints (or run manually):
+   ```bash
+   for i in $(seq 100); do
+       curl -s "localhost:5000/api/hash_cached/test_$((i % 5))"
+   done
+   ```
+4. In Pyroscope, compare the two time periods using the diff view
+
+## Exercise 4: Spot I/O-Bound Code
+
+1. Generate load to the slow_io endpoint:
+   ```bash
+   for i in $(seq 50); do curl -s localhost:5000/api/slow_io; done
+   ```
+2. Look at the flamegraph
+3. Notice that `time.sleep` doesn't show up much - why?
+   - CPU profiling only captures CPU time
+   - I/O wait (sleeping, network, disk) doesn't consume CPU
+   - This is why I/O-bound code looks "fast" in CPU profiles!
+
+## Exercise 5: Timeline Analysis
+
+1. Let `loadgen.sh` run for several minutes
+2. In Pyroscope, zoom out the timeline
+3. Look for patterns:
+   - Spikes in CPU usage
+   - Changes in the flamegraph shape over time
+4. Select different time ranges to compare
+
+## Key Pyroscope Concepts
+
+### Flamegraph Reading
+- **Width** = proportion of total samples (time)
+- **Height** = call stack depth
+- **Color** = usually arbitrary (for differentiation)
+- **Plateaus** = functions that are "hot"
+
+### Comparing Profiles
+Pyroscope can show:
+- **Diff view**: Red = more time, Green = less time
+- Useful for before/after comparisons
+
+### Tags
+The app uses tags for filtering:
+```python
+pyroscope.configure(
+    tags={"env": "workshop", "version": "1.0.0"}
+)
+```
+
+You can filter by tags in the UI.
+
+## Production Considerations
+
+### Overhead
+- Pyroscope Python agent: ~2-5% CPU overhead
+- Sampling rate can be tuned (default: 100Hz)
+
+### Data Volume
+- Profiles are aggregated, not stored raw
+- Storage is efficient (10-100MB per day per app)
+
+### Security
+- Profile data can reveal code structure
+- Consider who has access to Pyroscope
+
+### Alternatives
+- **Datadog Continuous Profiler**
+- **AWS CodeGuru Profiler**
+- **Google Cloud Profiler**
+- **Parca** (open source, eBPF-based)
+
+## Troubleshooting
+
+### "No data in Pyroscope"
+- Check if Pyroscope server is running: http://localhost:4040
+- Check app logs for connection errors
+- Verify `pyroscope-io` is installed
+
+### "Profile looks empty"
+- Generate more load
+- The endpoint might be I/O bound (not CPU)
+- Check the time range in the UI
+
+### High overhead
+- Reduce sampling rate in pyroscope.configure()
+- Check for profiling-related exceptions
+
+## Discussion Questions
+
+1. **When would you use continuous profiling vs one-shot?**
+   - Continuous: production, long-running apps, intermittent issues
+   - One-shot: development, benchmarking, specific scenarios
+
+2. **What can't CPU profiling show you?**
+   - I/O wait time
+   - Lock contention (mostly)
+   - Memory allocation patterns
+
+3. **How would you profile a batch job vs a web server?**
+   - Batch: one-shot profiling of the entire run
+   - Server: continuous, focus on request handling paths
+
+## Key Takeaways
+
+1. **Continuous profiling catches issues that one-shot misses**
+2. **Low overhead makes it safe for production**
+3. **Timeline view reveals patterns over time**
+4. **CPU profiling doesn't show I/O time**
--- a/scenario7-pyroscope/app.py
+++ b/scenario7-pyroscope/app.py
@@ -0,0 +1,207 @@
+#!/usr/bin/env python3
+"""
+Scenario 7: Continuous Profiling with Pyroscope
+===============================================
+A simple Flask web app instrumented with Pyroscope for continuous profiling.
+
+SETUP:
+1. Start Pyroscope: docker run -p 4040:4040 grafana/pyroscope
+2. Install deps: pip install flask pyroscope-io
+3. Run this app: python3 app.py
+4. Generate load: ./loadgen.sh (or curl in a loop)
+5. View profiles: http://localhost:4040
+
+The app has intentionally slow endpoints to demonstrate profiling.
+"""
+
+import os
+import time
+import math
+import hashlib
+from functools import lru_cache
+
+# Try to import pyroscope, gracefully handle if not installed
+try:
+    import pyroscope
+    PYROSCOPE_AVAILABLE = True
+except ImportError:
+    PYROSCOPE_AVAILABLE = False
+    print("Pyroscope not installed. Run: pip install pyroscope-io")
+    print("Continuing without profiling...\n")
+
+from flask import Flask, jsonify
+
+app = Flask(__name__)
+
+# Configure Pyroscope
+if PYROSCOPE_AVAILABLE:
+    pyroscope.configure(
+        application_name="workshop.flask.app",
+        server_address="http://localhost:4040",
+        # Enable profiling for specific aspects
+        tags={
+            "env": "workshop",
+            "version": "1.0.0",
+        }
+    )
+
+
+# ============================================================
+# Endpoint 1: CPU-intensive computation
+# ============================================================
+
+def compute_primes_slow(n):
+    """Intentionally slow prime computation."""
+    primes = []
+    for num in range(2, n):
+        is_prime = True
+        for i in range(2, int(math.sqrt(num)) + 1):
+            if num % i == 0:
+                is_prime = False
+                break
+        if is_prime:
+            primes.append(num)
+    return primes
+
+
+@app.route('/api/primes/<int:n>')
+def primes_endpoint(n):
+    """CPU-bound endpoint - compute primes up to n."""
+    n = min(n, 50000)  # Limit to prevent DoS
+    start = time.time()
+    primes = compute_primes_slow(n)
+    elapsed = time.time() - start
+    return jsonify({
+        'count': len(primes),
+        'limit': n,
+        'elapsed_ms': round(elapsed * 1000, 2)
+    })
+
+
+# ============================================================
+# Endpoint 2: Repeated expensive computation (needs caching)
+# ============================================================
+
+def expensive_hash(data, iterations=1000):
+    """Simulate expensive computation."""
+    result = data.encode()
+    for _ in range(iterations):
+        result = hashlib.sha256(result).digest()
+    return result.hex()
+
+
+@app.route('/api/hash/<data>')
+def hash_endpoint(data):
+    """
+    This endpoint recomputes the hash every time.
+    Profile will show expensive_hash taking lots of time.
+    See hash_cached endpoint for improvement.
+    """
+    start = time.time()
+    result = expensive_hash(data)
+    elapsed = time.time() - start
+    return jsonify({
+        'input': data,
+        'hash': result[:16] + '...',
+        'elapsed_ms': round(elapsed * 1000, 2)
+    })
+
+
+@lru_cache(maxsize=1000)
+def expensive_hash_cached(data, iterations=1000):
+    """Cached version of expensive_hash."""
+    result = data.encode()
+    for _ in range(iterations):
+        result = hashlib.sha256(result).digest()
+    return result.hex()
+
+
+@app.route('/api/hash_cached/<data>')
+def hash_cached_endpoint(data):
+    """Cached version - compare profile with /api/hash."""
+    start = time.time()
+    result = expensive_hash_cached(data)
+    elapsed = time.time() - start
+    return jsonify({
+        'input': data,
+        'hash': result[:16] + '...',
+        'elapsed_ms': round(elapsed * 1000, 2),
+        'cache_info': str(expensive_hash_cached.cache_info())
+    })
+
+
+# ============================================================
+# Endpoint 3: I/O simulation
+# ============================================================
+
+@app.route('/api/slow_io')
+def slow_io_endpoint():
+    """
+    Simulate slow I/O (database query, external API, etc.)
+    This won't show much in CPU profiles - it's I/O bound!
+    """
+    time.sleep(0.1)  # Simulate 100ms I/O
+    return jsonify({'status': 'ok', 'simulated_io_ms': 100})
+
+
+# ============================================================
+# Endpoint 4: Mix of work types
+# ============================================================
+
+@app.route('/api/mixed/<int:n>')
+def mixed_endpoint(n):
+    """Mixed workload: some CPU, some I/O."""
+    n = min(n, 1000)
+    
+    # CPU work
+    total = 0
+    for i in range(n * 100):
+        total += math.sin(i) * math.cos(i)
+    
+    # Simulated I/O
+    time.sleep(0.01)
+    
+    # More CPU work
+    data = str(total).encode()
+    for _ in range(100):
+        data = hashlib.md5(data).digest()
+    
+    return jsonify({
+        'n': n,
+        'result': data.hex()[:16]
+    })
+
+
+# ============================================================
+# Health check
+# ============================================================
+
+@app.route('/health')
+def health():
+    return jsonify({'status': 'healthy', 'pyroscope': PYROSCOPE_AVAILABLE})
+
+
+@app.route('/')
+def index():
+    return '''
+    <h1>Pyroscope Demo App</h1>
+    <h2>Endpoints:</h2>
+    <ul>
+        <li><a href="/api/primes/10000">/api/primes/&lt;n&gt;</a> - CPU intensive</li>
+        <li><a href="/api/hash/hello">/api/hash/&lt;data&gt;</a> - Expensive (uncached)</li>
+        <li><a href="/api/hash_cached/hello">/api/hash_cached/&lt;data&gt;</a> - Expensive (cached)</li>
+        <li><a href="/api/slow_io">/api/slow_io</a> - I/O simulation</li>
+        <li><a href="/api/mixed/100">/api/mixed/&lt;n&gt;</a> - Mixed workload</li>
+        <li><a href="/health">/health</a> - Health check</li>
+    </ul>
+    <h2>Profiling:</h2>
+    <p>View profiles at <a href="http://localhost:4040">http://localhost:4040</a></p>
+    '''
+
+
+if __name__ == '__main__':
+    print("Starting Flask app on http://localhost:5000")
+    print("Pyroscope dashboard: http://localhost:4040")
+    print("\nGenerate load with: ./loadgen.sh")
+    print("Or: for i in $(seq 100); do curl -s localhost:5000/api/primes/5000 > /dev/null; done")
+    app.run(host='0.0.0.0', port=5000, debug=False)
--- a/scenario7-pyroscope/loadgen.sh
+++ b/scenario7-pyroscope/loadgen.sh
@@ -0,0 +1,58 @@
+#!/bin/bash
+#
+# Load generator for Pyroscope demo
+# Run this to generate traffic that will show up in Pyroscope
+#
+
+BASE_URL="${1:-http://localhost:5000}"
+DURATION="${2:-60}"  # seconds
+
+echo "Generating load to $BASE_URL for $DURATION seconds"
+echo "Press Ctrl+C to stop"
+echo ""
+
+end_time=$(($(date +%s) + DURATION))
+request_count=0
+
+while [ $(date +%s) -lt $end_time ]; do
+    # Mix of different endpoints
+    case $((RANDOM % 10)) in
+        0|1|2|3)
+            # 40% - CPU intensive (primes)
+            n=$((1000 + RANDOM % 4000))
+            curl -s "$BASE_URL/api/primes/$n" > /dev/null
+            ;;
+        4|5)
+            # 20% - Hash (uncached)
+            data="data_$(($RANDOM % 100))"
+            curl -s "$BASE_URL/api/hash/$data" > /dev/null
+            ;;
+        6|7)
+            # 20% - Hash (cached)
+            data="data_$(($RANDOM % 10))"  # Smaller set for better cache hits
+            curl -s "$BASE_URL/api/hash_cached/$data" > /dev/null
+            ;;
+        8)
+            # 10% - Slow I/O
+            curl -s "$BASE_URL/api/slow_io" > /dev/null
+            ;;
+        9)
+            # 10% - Mixed
+            curl -s "$BASE_URL/api/mixed/500" > /dev/null
+            ;;
+    esac
+    
+    request_count=$((request_count + 1))
+    
+    # Print progress every 10 requests
+    if [ $((request_count % 10)) -eq 0 ]; then
+        echo -ne "\rRequests: $request_count"
+    fi
+    
+    # Small delay to avoid overwhelming
+    sleep 0.1
+done
+
+echo ""
+echo "Done! Total requests: $request_count"
+echo "Check Pyroscope at http://localhost:4040"
--- a/scenario7-pyroscope/requirements.txt
+++ b/scenario7-pyroscope/requirements.txt
@@ -0,0 +1,2 @@
+flask>=2.0.0
+pyroscope-io>=0.8.0