illustris 4fb1bd90db
init
2026-01-08 18:11:30 +05:30

5.1 KiB

Scenario 7: Continuous Profiling with Pyroscope

Learning Objectives

  • Understand the difference between one-shot and continuous profiling
  • Set up and use Pyroscope for Python applications
  • Navigate the Pyroscope UI to find performance issues
  • Compare flamegraphs over time

Background

One-shot profiling (what we've done so far):

  • Run profiler → Execute program → Stop → Analyze
  • Good for: reproducible tests, specific scenarios
  • Bad for: intermittent issues, production systems

Continuous profiling:

  • Always running in the background
  • Low overhead (~2-5% CPU)
  • Aggregates data over time
  • Good for: production monitoring, finding intermittent issues

Files

  • app.py - Flask web application with Pyroscope instrumentation
  • loadgen.sh - Script to generate traffic
  • requirements.txt - Python dependencies

Setup

1. Start Pyroscope Server

Option A: Docker (recommended)

docker run -d --name pyroscope -p 4040:4040 grafana/pyroscope

Option B: Binary download

# Download from https://github.com/grafana/pyroscope/releases
./pyroscope server

2. Install Python Dependencies

pip install -r requirements.txt
# Or: pip install flask pyroscope-io

3. Start the Application

python3 app.py

4. Generate Load

chmod +x loadgen.sh
./loadgen.sh http://localhost:5000 120  # 2 minutes of load

5. View Profiles

Open http://localhost:4040 in your browser.

Exercise 1: Explore the Pyroscope UI

  1. Go to http://localhost:4040
  2. Select workshop.flask.app from the application dropdown
  3. Observe the flamegraph

UI Navigation

  • Timeline: Shows CPU usage over time, click to select time range
  • Flamegraph: Visual representation of where time is spent
  • Table view: Sortable list of functions by self/total time
  • Diff view: Compare two time ranges

Exercise 2: Find the Hot Function

While loadgen.sh is running:

  1. Look at the flamegraph
  2. Find compute_primes_slow - it should be prominent
  3. Click on it to zoom in
  4. See the call stack leading to it

Exercise 3: Compare Cached vs Uncached

  1. Note the current time
  2. Stop loadgen.sh
  3. Modify loadgen.sh to only hit cached endpoints (or run manually):
    for i in $(seq 100); do
        curl -s "localhost:5000/api/hash_cached/test_$((i % 5))"
    done
    
  4. In Pyroscope, compare the two time periods using the diff view

Exercise 4: Spot I/O-Bound Code

  1. Generate load to the slow_io endpoint:
    for i in $(seq 50); do curl -s localhost:5000/api/slow_io; done
    
  2. Look at the flamegraph
  3. Notice that time.sleep doesn't show up much - why?
    • CPU profiling only captures CPU time
    • I/O wait (sleeping, network, disk) doesn't consume CPU
    • This is why I/O-bound code looks "fast" in CPU profiles!

Exercise 5: Timeline Analysis

  1. Let loadgen.sh run for several minutes
  2. In Pyroscope, zoom out the timeline
  3. Look for patterns:
    • Spikes in CPU usage
    • Changes in the flamegraph shape over time
  4. Select different time ranges to compare

Key Pyroscope Concepts

Flamegraph Reading

  • Width = proportion of total samples (time)
  • Height = call stack depth
  • Color = usually arbitrary (for differentiation)
  • Plateaus = functions that are "hot"

Comparing Profiles

Pyroscope can show:

  • Diff view: Red = more time, Green = less time
  • Useful for before/after comparisons

Tags

The app uses tags for filtering:

pyroscope.configure(
    tags={"env": "workshop", "version": "1.0.0"}
)

You can filter by tags in the UI.

Production Considerations

Overhead

  • Pyroscope Python agent: ~2-5% CPU overhead
  • Sampling rate can be tuned (default: 100Hz)

Data Volume

  • Profiles are aggregated, not stored raw
  • Storage is efficient (10-100MB per day per app)

Security

  • Profile data can reveal code structure
  • Consider who has access to Pyroscope

Alternatives

  • Datadog Continuous Profiler
  • AWS CodeGuru Profiler
  • Google Cloud Profiler
  • Parca (open source, eBPF-based)

Troubleshooting

"No data in Pyroscope"

  • Check if Pyroscope server is running: http://localhost:4040
  • Check app logs for connection errors
  • Verify pyroscope-io is installed

"Profile looks empty"

  • Generate more load
  • The endpoint might be I/O bound (not CPU)
  • Check the time range in the UI

High overhead

  • Reduce sampling rate in pyroscope.configure()
  • Check for profiling-related exceptions

Discussion Questions

  1. When would you use continuous profiling vs one-shot?

    • Continuous: production, long-running apps, intermittent issues
    • One-shot: development, benchmarking, specific scenarios
  2. What can't CPU profiling show you?

    • I/O wait time
    • Lock contention (mostly)
    • Memory allocation patterns
  3. How would you profile a batch job vs a web server?

    • Batch: one-shot profiling of the entire run
    • Server: continuous, focus on request handling paths

Key Takeaways

  1. Continuous profiling catches issues that one-shot misses
  2. Low overhead makes it safe for production
  3. Timeline view reveals patterns over time
  4. CPU profiling doesn't show I/O time