Scenario 7: Continuous Profiling with Pyroscope
Learning Objectives
- Understand the difference between one-shot and continuous profiling
- Set up and use Pyroscope for Python applications
- Navigate the Pyroscope UI to find performance issues
- Compare flamegraphs over time
Background
One-shot profiling (what we've done so far):
- Run profiler → Execute program → Stop → Analyze
- Good for: reproducible tests, specific scenarios
- Bad for: intermittent issues, production systems
Continuous profiling:
- Always running in the background
- Low overhead (~2-5% CPU)
- Aggregates data over time
- Good for: production monitoring, finding intermittent issues
Files
app.py- Flask web application with Pyroscope instrumentationloadgen.sh- Script to generate trafficrequirements.txt- Python dependencies
Setup
1. Start Pyroscope Server
Option A: Docker (recommended)
docker run -d --name pyroscope -p 4040:4040 grafana/pyroscope
Option B: Binary download
# Download from https://github.com/grafana/pyroscope/releases
./pyroscope server
2. Install Python Dependencies
pip install -r requirements.txt
# Or: pip install flask pyroscope-io
3. Start the Application
python3 app.py
4. Generate Load
chmod +x loadgen.sh
./loadgen.sh http://localhost:5000 120 # 2 minutes of load
5. View Profiles
Open http://localhost:4040 in your browser.
Exercise 1: Explore the Pyroscope UI
- Go to http://localhost:4040
- Select
workshop.flask.appfrom the application dropdown - Observe the flamegraph
UI Navigation
- Timeline: Shows CPU usage over time, click to select time range
- Flamegraph: Visual representation of where time is spent
- Table view: Sortable list of functions by self/total time
- Diff view: Compare two time ranges
Exercise 2: Find the Hot Function
While loadgen.sh is running:
- Look at the flamegraph
- Find
compute_primes_slow- it should be prominent - Click on it to zoom in
- See the call stack leading to it
Exercise 3: Compare Cached vs Uncached
- Note the current time
- Stop
loadgen.sh - Modify
loadgen.shto only hit cached endpoints (or run manually):for i in $(seq 100); do curl -s "localhost:5000/api/hash_cached/test_$((i % 5))" done - In Pyroscope, compare the two time periods using the diff view
Exercise 4: Spot I/O-Bound Code
- Generate load to the slow_io endpoint:
for i in $(seq 50); do curl -s localhost:5000/api/slow_io; done - Look at the flamegraph
- Notice that
time.sleepdoesn't show up much - why?- CPU profiling only captures CPU time
- I/O wait (sleeping, network, disk) doesn't consume CPU
- This is why I/O-bound code looks "fast" in CPU profiles!
Exercise 5: Timeline Analysis
- Let
loadgen.shrun for several minutes - In Pyroscope, zoom out the timeline
- Look for patterns:
- Spikes in CPU usage
- Changes in the flamegraph shape over time
- Select different time ranges to compare
Key Pyroscope Concepts
Flamegraph Reading
- Width = proportion of total samples (time)
- Height = call stack depth
- Color = usually arbitrary (for differentiation)
- Plateaus = functions that are "hot"
Comparing Profiles
Pyroscope can show:
- Diff view: Red = more time, Green = less time
- Useful for before/after comparisons
Tags
The app uses tags for filtering:
pyroscope.configure(
tags={"env": "workshop", "version": "1.0.0"}
)
You can filter by tags in the UI.
Production Considerations
Overhead
- Pyroscope Python agent: ~2-5% CPU overhead
- Sampling rate can be tuned (default: 100Hz)
Data Volume
- Profiles are aggregated, not stored raw
- Storage is efficient (10-100MB per day per app)
Security
- Profile data can reveal code structure
- Consider who has access to Pyroscope
Alternatives
- Datadog Continuous Profiler
- AWS CodeGuru Profiler
- Google Cloud Profiler
- Parca (open source, eBPF-based)
Troubleshooting
"No data in Pyroscope"
- Check if Pyroscope server is running: http://localhost:4040
- Check app logs for connection errors
- Verify
pyroscope-iois installed
"Profile looks empty"
- Generate more load
- The endpoint might be I/O bound (not CPU)
- Check the time range in the UI
High overhead
- Reduce sampling rate in pyroscope.configure()
- Check for profiling-related exceptions
Discussion Questions
-
When would you use continuous profiling vs one-shot?
- Continuous: production, long-running apps, intermittent issues
- One-shot: development, benchmarking, specific scenarios
-
What can't CPU profiling show you?
- I/O wait time
- Lock contention (mostly)
- Memory allocation patterns
-
How would you profile a batch job vs a web server?
- Batch: one-shot profiling of the entire run
- Server: continuous, focus on request handling paths
Key Takeaways
- Continuous profiling catches issues that one-shot misses
- Low overhead makes it safe for production
- Timeline view reveals patterns over time
- CPU profiling doesn't show I/O time