276 lines
6.4 KiB
Markdown
276 lines
6.4 KiB
Markdown
# Linux Performance Engineering Workshop
|
|
|
|
## 4-Hour Hands-On Training for BITS Pilani Goa
|
|
|
|
### Prerequisites
|
|
- Basic C programming knowledge
|
|
- Basic Python knowledge
|
|
- Familiarity with command line
|
|
- Ubuntu 22.04/24.04 (or similar Linux)
|
|
|
|
---
|
|
|
|
## Workshop Overview
|
|
|
|
This workshop teaches practical performance engineering skills using libre tools on Linux.
|
|
By the end, you'll be able to identify and fix common performance problems.
|
|
|
|
### What You'll Learn
|
|
- How to measure program performance (not guess!)
|
|
- CPU profiling with `perf` and flamegraphs
|
|
- Identifying syscall overhead with `strace`
|
|
- Understanding cache behavior
|
|
- Continuous profiling for production systems
|
|
|
|
### Philosophy
|
|
> "Measure, don't guess."
|
|
|
|
Most performance intuitions are wrong. This workshop teaches you to find bottlenecks with data.
|
|
|
|
---
|
|
|
|
## Schedule
|
|
|
|
| Time | Topic | Hands-On |
|
|
|------|-------|----------|
|
|
| 0:00-0:45 | Introduction & Theory | - |
|
|
| 0:45-1:30 | Python Profiling | Scenarios 1 & 2 |
|
|
| 1:30-1:45 | Break | - |
|
|
| 1:45-2:30 | perf & Flamegraphs | Theory + Demo |
|
|
| 2:30-3:30 | Cache & Debug Symbols | Scenarios 4 & 5 |
|
|
| 3:30-4:00 | Lunch Break | - |
|
|
| 4:00-4:30 | Syscalls & I/O | Theory |
|
|
| 4:30-5:15 | Syscall Profiling | Scenario 3 |
|
|
| 5:15-5:30 | Break | - |
|
|
| 5:30-6:00 | Advanced Topics & Wrap-up | Scenarios 6 & 7 |
|
|
|
|
---
|
|
|
|
## Setup Instructions
|
|
|
|
### Install Required Packages
|
|
|
|
```bash
|
|
# Core tools
|
|
sudo apt update
|
|
sudo apt install -y \
|
|
build-essential \
|
|
linux-tools-common \
|
|
linux-tools-$(uname -r) \
|
|
strace \
|
|
ltrace \
|
|
htop \
|
|
python3-pip
|
|
|
|
# Optional but recommended
|
|
sudo apt install -y \
|
|
hyperfine \
|
|
valgrind \
|
|
systemtap-sdt-dev
|
|
|
|
# Python tools
|
|
pip3 install py-spy
|
|
|
|
# Pyroscope (for scenario 7)
|
|
# Option A: Docker
|
|
docker pull grafana/pyroscope
|
|
# Option B: Download binary from https://github.com/grafana/pyroscope/releases
|
|
|
|
# FlameGraph scripts
|
|
cd ~
|
|
git clone https://github.com/brendangregg/FlameGraph.git
|
|
```
|
|
|
|
### Configure perf Permissions
|
|
|
|
```bash
|
|
# Allow perf for non-root users (needed for this workshop)
|
|
sudo sysctl -w kernel.perf_event_paranoid=1
|
|
|
|
# To make permanent:
|
|
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf
|
|
```
|
|
|
|
### Verify Installation
|
|
|
|
```bash
|
|
# Should all work without errors:
|
|
perf --version
|
|
strace --version
|
|
py-spy --version
|
|
gcc --version
|
|
python3 --version
|
|
```
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
perf-workshop/
|
|
├── README.md # This file
|
|
├── common/
|
|
│ └── CHEATSHEET.md # Quick reference card
|
|
├── scenario1-python-to-c/
|
|
│ ├── README.md
|
|
│ ├── prime_slow.py # Slow Python version
|
|
│ ├── prime.c # C implementation
|
|
│ └── prime_fast.py # Python + C via ctypes
|
|
├── scenario2-memoization/
|
|
│ ├── README.md
|
|
│ ├── fib_slow.py # Naive recursive Fibonacci
|
|
│ ├── fib_cached.py # Memoized Fibonacci
|
|
│ └── config_validator.py # Precomputation example
|
|
├── scenario3-syscall-storm/
|
|
│ ├── README.md
|
|
│ ├── Makefile
|
|
│ ├── read_slow.c # Byte-by-byte reads
|
|
│ ├── read_fast.c # Buffered reads
|
|
│ ├── read_stdio.c # stdio buffering
|
|
│ └── read_python.py # Python equivalent
|
|
├── scenario4-cache-misses/
|
|
│ ├── README.md
|
|
│ ├── Makefile
|
|
│ ├── cache_demo.c # Row vs column major
|
|
│ └── list_vs_array.c # Array vs linked list
|
|
├── scenario5-debug-symbols/
|
|
│ ├── README.md
|
|
│ ├── Makefile
|
|
│ └── program.c # Multi-function program
|
|
├── scenario6-usdt-probes/
|
|
│ ├── README.md
|
|
│ ├── Makefile
|
|
│ └── server.c # Program with USDT probes
|
|
└── scenario7-pyroscope/
|
|
├── README.md
|
|
├── requirements.txt
|
|
├── app.py # Flask app with Pyroscope
|
|
└── loadgen.sh # Load generator script
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Build Everything
|
|
|
|
```bash
|
|
# Build all C programs
|
|
for dir in scenario{3,4,5,6}*/; do
|
|
if [ -f "$dir/Makefile" ]; then
|
|
echo "Building $dir"
|
|
make -C "$dir"
|
|
fi
|
|
done
|
|
|
|
# Build scenario 1 C library
|
|
cd scenario1-python-to-c
|
|
gcc -O2 -fPIC -shared -o libprime.so prime.c
|
|
cd ..
|
|
```
|
|
|
|
### Run a Scenario
|
|
|
|
Each scenario has its own README with step-by-step instructions.
|
|
Start with:
|
|
|
|
```bash
|
|
cd scenario1-python-to-c
|
|
cat README.md
|
|
```
|
|
|
|
---
|
|
|
|
## Key Concepts Summary
|
|
|
|
### 1. Types of Bottlenecks
|
|
|
|
| Type | Symptom | Tool |
|
|
|------|---------|------|
|
|
| CPU-bound | `user` time is high | `perf record` |
|
|
| Syscall-bound | `sys` time is high | `strace -c` |
|
|
| I/O-bound | Low CPU, slow wall time | `strace`, `iostat` |
|
|
| Memory-bound | High cache misses | `perf stat` |
|
|
|
|
### 2. Profiling Workflow
|
|
|
|
```
|
|
1. Measure: time ./program
|
|
2. Hypothesize: Where is time spent?
|
|
3. Profile: perf/strace/cProfile
|
|
4. Analyze: Find hot spots
|
|
5. Optimize: Fix the bottleneck
|
|
6. Verify: Re-measure
|
|
```
|
|
|
|
### 3. Tool Selection
|
|
|
|
| Task | Tool |
|
|
|------|------|
|
|
| Basic timing | `time` |
|
|
| CPU sampling | `perf record` |
|
|
| Hardware counters | `perf stat` |
|
|
| Syscall tracing | `strace -c` |
|
|
| Python profiling | `cProfile`, `py-spy` |
|
|
| Visualization | Flamegraphs |
|
|
| Continuous profiling | Pyroscope |
|
|
|
|
---
|
|
|
|
## Further Learning
|
|
|
|
### Books
|
|
- "Systems Performance" by Brendan Gregg
|
|
- "BPF Performance Tools" by Brendan Gregg
|
|
|
|
### Online Resources
|
|
- https://www.brendangregg.com/linuxperf.html
|
|
- https://perf.wiki.kernel.org/
|
|
- https://jvns.ca/blog/2016/03/12/how-does-perf-work-and-some-questions/
|
|
|
|
### Tools to Explore Later
|
|
- `bpftrace` - High-level tracing language
|
|
- `eBPF` - In-kernel programmability
|
|
- `Valgrind` - Memory profiling
|
|
- `gprof` - Traditional profiler
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### "perf: command not found"
|
|
```bash
|
|
sudo apt install linux-tools-common linux-tools-$(uname -r)
|
|
```
|
|
|
|
### "Access to performance monitoring operations is limited"
|
|
```bash
|
|
sudo sysctl -w kernel.perf_event_paranoid=1
|
|
```
|
|
|
|
### "py-spy: Permission denied"
|
|
Either run as root or use `--nonblocking`:
|
|
```bash
|
|
sudo py-spy record -o profile.svg -- python3 script.py
|
|
# Or:
|
|
py-spy record --nonblocking -o profile.svg -- python3 script.py
|
|
```
|
|
|
|
### "No debug symbols"
|
|
Recompile with `-g`:
|
|
```bash
|
|
gcc -O2 -g -o program program.c
|
|
```
|
|
|
|
---
|
|
|
|
## Feedback
|
|
|
|
Found an issue? Have suggestions?
|
|
Please provide feedback to your instructor!
|
|
|
|
---
|
|
|
|
*Workshop materials prepared for BITS Pilani Goa*
|
|
*Tools: All libre/open-source software*
|