illustris/perf-workshop

Fork 0

illustris 4fb1bd90db

init

2026-01-08 18:11:30 +05:30

5.1 KiB

Raw Blame History

Scenario 7: Continuous Profiling with Pyroscope

Learning Objectives

Understand the difference between one-shot and continuous profiling
Set up and use Pyroscope for Python applications
Navigate the Pyroscope UI to find performance issues
Compare flamegraphs over time

Background

One-shot profiling (what we've done so far):

Run profiler → Execute program → Stop → Analyze
Good for: reproducible tests, specific scenarios
Bad for: intermittent issues, production systems

Continuous profiling:

Always running in the background
Low overhead (~2-5% CPU)
Aggregates data over time
Good for: production monitoring, finding intermittent issues

Files

app.py - Flask web application with Pyroscope instrumentation
loadgen.sh - Script to generate traffic
requirements.txt - Python dependencies

Setup

1. Start Pyroscope Server

Option A: Docker (recommended)

docker run -d --name pyroscope -p 4040:4040 grafana/pyroscope

Option B: Binary download

# Download from https://github.com/grafana/pyroscope/releases
./pyroscope server

2. Install Python Dependencies

pip install -r requirements.txt
# Or: pip install flask pyroscope-io

3. Start the Application

python3 app.py

4. Generate Load

chmod +x loadgen.sh
./loadgen.sh http://localhost:5000 120  # 2 minutes of load

5. View Profiles

Open http://localhost:4040 in your browser.

Exercise 1: Explore the Pyroscope UI

Go to http://localhost:4040
Select workshop.flask.app from the application dropdown
Observe the flamegraph

Timeline: Shows CPU usage over time, click to select time range
Flamegraph: Visual representation of where time is spent
Table view: Sortable list of functions by self/total time
Diff view: Compare two time ranges

Exercise 2: Find the Hot Function

While loadgen.sh is running:

Look at the flamegraph
Find compute_primes_slow - it should be prominent
Click on it to zoom in
See the call stack leading to it

Exercise 3: Compare Cached vs Uncached

Note the current time
Stop loadgen.sh

Modify loadgen.sh to only hit cached endpoints (or run manually):

for i in $(seq 100); do
    curl -s "localhost:5000/api/hash_cached/test_$((i % 5))"
done

In Pyroscope, compare the two time periods using the diff view

Exercise 4: Spot I/O-Bound Code

Generate load to the slow_io endpoint:

for i in $(seq 50); do curl -s localhost:5000/api/slow_io; done

Look at the flamegraph
Notice that time.sleep doesn't show up much - why?
- CPU profiling only captures CPU time
- I/O wait (sleeping, network, disk) doesn't consume CPU
- This is why I/O-bound code looks "fast" in CPU profiles!

Exercise 5: Timeline Analysis

Let loadgen.sh run for several minutes
In Pyroscope, zoom out the timeline
Look for patterns:
- Spikes in CPU usage
- Changes in the flamegraph shape over time
Select different time ranges to compare

Key Pyroscope Concepts

Flamegraph Reading

Width = proportion of total samples (time)
Height = call stack depth
Color = usually arbitrary (for differentiation)
Plateaus = functions that are "hot"

Comparing Profiles

Pyroscope can show:

Diff view: Red = more time, Green = less time
Useful for before/after comparisons

Production Considerations

Overhead

Pyroscope Python agent: ~2-5% CPU overhead
Sampling rate can be tuned (default: 100Hz)

Data Volume

Profiles are aggregated, not stored raw
Storage is efficient (10-100MB per day per app)

Security

Profile data can reveal code structure
Consider who has access to Pyroscope

Alternatives

Datadog Continuous Profiler
AWS CodeGuru Profiler
Google Cloud Profiler
Parca (open source, eBPF-based)

Troubleshooting

"No data in Pyroscope"

Check if Pyroscope server is running: http://localhost:4040
Check app logs for connection errors
Verify pyroscope-io is installed

"Profile looks empty"

Generate more load
The endpoint might be I/O bound (not CPU)
Check the time range in the UI

High overhead

Reduce sampling rate in pyroscope.configure()
Check for profiling-related exceptions

Discussion Questions

When would you use continuous profiling vs one-shot?
- Continuous: production, long-running apps, intermittent issues
- One-shot: development, benchmarking, specific scenarios
What can't CPU profiling show you?
- I/O wait time
- Lock contention (mostly)
- Memory allocation patterns
How would you profile a batch job vs a web server?
- Batch: one-shot profiling of the entire run
- Server: continuous, focus on request handling paths

Key Takeaways

Continuous profiling catches issues that one-shot misses
Low overhead makes it safe for production
Timeline view reveals patterns over time
CPU profiling doesn't show I/O time

5.1 KiB

Raw Blame History

Scenario 7: Continuous Profiling with Pyroscope

Learning Objectives

Background

Files

Setup

1. Start Pyroscope Server

2. Install Python Dependencies

3. Start the Application

4. Generate Load

5. View Profiles

Exercise 1: Explore the Pyroscope UI

UI Navigation

Exercise 2: Find the Hot Function

Exercise 3: Compare Cached vs Uncached

Exercise 4: Spot I/O-Bound Code

Exercise 5: Timeline Analysis

Key Pyroscope Concepts

Flamegraph Reading

Comparing Profiles

Tags

Production Considerations

Overhead

Data Volume

Security

Alternatives

Troubleshooting

"No data in Pyroscope"

"Profile looks empty"

High overhead

Discussion Questions

Key Takeaways

5.1 KiB Raw Blame History

Scenario 7: Continuous Profiling with Pyroscope

Learning Objectives

Background

Files

Setup

1. Start Pyroscope Server

2. Install Python Dependencies

3. Start the Application

4. Generate Load

5. View Profiles

Exercise 1: Explore the Pyroscope UI

UI Navigation

Exercise 2: Find the Hot Function

Exercise 3: Compare Cached vs Uncached

Exercise 4: Spot I/O-Bound Code

Exercise 5: Timeline Analysis

Key Pyroscope Concepts

Flamegraph Reading

Comparing Profiles

Tags

Production Considerations

Overhead

Data Volume

Security

Alternatives

Troubleshooting

"No data in Pyroscope"

"Profile looks empty"

High overhead

Discussion Questions

Key Takeaways

5.1 KiB

Raw Blame History