perf-workshop/README.md
illustris 4fb1bd90db
init
2026-01-08 18:11:30 +05:30

276 lines
6.4 KiB
Markdown

# Linux Performance Engineering Workshop
## 4-Hour Hands-On Training for BITS Pilani Goa
### Prerequisites
- Basic C programming knowledge
- Basic Python knowledge
- Familiarity with command line
- Ubuntu 22.04/24.04 (or similar Linux)
---
## Workshop Overview
This workshop teaches practical performance engineering skills using libre tools on Linux.
By the end, you'll be able to identify and fix common performance problems.
### What You'll Learn
- How to measure program performance (not guess!)
- CPU profiling with `perf` and flamegraphs
- Identifying syscall overhead with `strace`
- Understanding cache behavior
- Continuous profiling for production systems
### Philosophy
> "Measure, don't guess."
Most performance intuitions are wrong. This workshop teaches you to find bottlenecks with data.
---
## Schedule
| Time | Topic | Hands-On |
|------|-------|----------|
| 0:00-0:45 | Introduction & Theory | - |
| 0:45-1:30 | Python Profiling | Scenarios 1 & 2 |
| 1:30-1:45 | Break | - |
| 1:45-2:30 | perf & Flamegraphs | Theory + Demo |
| 2:30-3:30 | Cache & Debug Symbols | Scenarios 4 & 5 |
| 3:30-4:00 | Lunch Break | - |
| 4:00-4:30 | Syscalls & I/O | Theory |
| 4:30-5:15 | Syscall Profiling | Scenario 3 |
| 5:15-5:30 | Break | - |
| 5:30-6:00 | Advanced Topics & Wrap-up | Scenarios 6 & 7 |
---
## Setup Instructions
### Install Required Packages
```bash
# Core tools
sudo apt update
sudo apt install -y \
build-essential \
linux-tools-common \
linux-tools-$(uname -r) \
strace \
ltrace \
htop \
python3-pip
# Optional but recommended
sudo apt install -y \
hyperfine \
valgrind \
systemtap-sdt-dev
# Python tools
pip3 install py-spy
# Pyroscope (for scenario 7)
# Option A: Docker
docker pull grafana/pyroscope
# Option B: Download binary from https://github.com/grafana/pyroscope/releases
# FlameGraph scripts
cd ~
git clone https://github.com/brendangregg/FlameGraph.git
```
### Configure perf Permissions
```bash
# Allow perf for non-root users (needed for this workshop)
sudo sysctl -w kernel.perf_event_paranoid=1
# To make permanent:
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf
```
### Verify Installation
```bash
# Should all work without errors:
perf --version
strace --version
py-spy --version
gcc --version
python3 --version
```
---
## Directory Structure
```
perf-workshop/
├── README.md # This file
├── common/
│ └── CHEATSHEET.md # Quick reference card
├── scenario1-python-to-c/
│ ├── README.md
│ ├── prime_slow.py # Slow Python version
│ ├── prime.c # C implementation
│ └── prime_fast.py # Python + C via ctypes
├── scenario2-memoization/
│ ├── README.md
│ ├── fib_slow.py # Naive recursive Fibonacci
│ ├── fib_cached.py # Memoized Fibonacci
│ └── config_validator.py # Precomputation example
├── scenario3-syscall-storm/
│ ├── README.md
│ ├── Makefile
│ ├── read_slow.c # Byte-by-byte reads
│ ├── read_fast.c # Buffered reads
│ ├── read_stdio.c # stdio buffering
│ └── read_python.py # Python equivalent
├── scenario4-cache-misses/
│ ├── README.md
│ ├── Makefile
│ ├── cache_demo.c # Row vs column major
│ └── list_vs_array.c # Array vs linked list
├── scenario5-debug-symbols/
│ ├── README.md
│ ├── Makefile
│ └── program.c # Multi-function program
├── scenario6-usdt-probes/
│ ├── README.md
│ ├── Makefile
│ └── server.c # Program with USDT probes
└── scenario7-pyroscope/
├── README.md
├── requirements.txt
├── app.py # Flask app with Pyroscope
└── loadgen.sh # Load generator script
```
---
## Quick Start
### Build Everything
```bash
# Build all C programs
for dir in scenario{3,4,5,6}*/; do
if [ -f "$dir/Makefile" ]; then
echo "Building $dir"
make -C "$dir"
fi
done
# Build scenario 1 C library
cd scenario1-python-to-c
gcc -O2 -fPIC -shared -o libprime.so prime.c
cd ..
```
### Run a Scenario
Each scenario has its own README with step-by-step instructions.
Start with:
```bash
cd scenario1-python-to-c
cat README.md
```
---
## Key Concepts Summary
### 1. Types of Bottlenecks
| Type | Symptom | Tool |
|------|---------|------|
| CPU-bound | `user` time is high | `perf record` |
| Syscall-bound | `sys` time is high | `strace -c` |
| I/O-bound | Low CPU, slow wall time | `strace`, `iostat` |
| Memory-bound | High cache misses | `perf stat` |
### 2. Profiling Workflow
```
1. Measure: time ./program
2. Hypothesize: Where is time spent?
3. Profile: perf/strace/cProfile
4. Analyze: Find hot spots
5. Optimize: Fix the bottleneck
6. Verify: Re-measure
```
### 3. Tool Selection
| Task | Tool |
|------|------|
| Basic timing | `time` |
| CPU sampling | `perf record` |
| Hardware counters | `perf stat` |
| Syscall tracing | `strace -c` |
| Python profiling | `cProfile`, `py-spy` |
| Visualization | Flamegraphs |
| Continuous profiling | Pyroscope |
---
## Further Learning
### Books
- "Systems Performance" by Brendan Gregg
- "BPF Performance Tools" by Brendan Gregg
### Online Resources
- https://www.brendangregg.com/linuxperf.html
- https://perf.wiki.kernel.org/
- https://jvns.ca/blog/2016/03/12/how-does-perf-work-and-some-questions/
### Tools to Explore Later
- `bpftrace` - High-level tracing language
- `eBPF` - In-kernel programmability
- `Valgrind` - Memory profiling
- `gprof` - Traditional profiler
---
## Troubleshooting
### "perf: command not found"
```bash
sudo apt install linux-tools-common linux-tools-$(uname -r)
```
### "Access to performance monitoring operations is limited"
```bash
sudo sysctl -w kernel.perf_event_paranoid=1
```
### "py-spy: Permission denied"
Either run as root or use `--nonblocking`:
```bash
sudo py-spy record -o profile.svg -- python3 script.py
# Or:
py-spy record --nonblocking -o profile.svg -- python3 script.py
```
### "No debug symbols"
Recompile with `-g`:
```bash
gcc -O2 -g -o program program.c
```
---
## Feedback
Found an issue? Have suggestions?
Please provide feedback to your instructor!
---
*Workshop materials prepared for BITS Pilani Goa*
*Tools: All libre/open-source software*