# Linux Performance Engineering Workshop ## 4-Hour Hands-On Training for BITS Pilani Goa ### Prerequisites - Basic C programming knowledge - Basic Python knowledge - Familiarity with command line - Ubuntu 22.04/24.04 (or similar Linux) --- ## Workshop Overview This workshop teaches practical performance engineering skills using libre tools on Linux. By the end, you'll be able to identify and fix common performance problems. ### What You'll Learn - How to measure program performance (not guess!) - CPU profiling with `perf` and flamegraphs - Identifying syscall overhead with `strace` - Understanding cache behavior - Continuous profiling for production systems ### Philosophy > "Measure, don't guess." Most performance intuitions are wrong. This workshop teaches you to find bottlenecks with data. --- ## Schedule | Time | Topic | Hands-On | |------|-------|----------| | 0:00-0:45 | Introduction & Theory | - | | 0:45-1:30 | Python Profiling | Scenarios 1 & 2 | | 1:30-1:45 | Break | - | | 1:45-2:30 | perf & Flamegraphs | Theory + Demo | | 2:30-3:30 | Cache & Debug Symbols | Scenarios 4 & 5 | | 3:30-4:00 | Lunch Break | - | | 4:00-4:30 | Syscalls & I/O | Theory | | 4:30-5:15 | Syscall Profiling | Scenario 3 | | 5:15-5:30 | Break | - | | 5:30-6:00 | Advanced Topics & Wrap-up | Scenarios 6 & 7 | --- ## Setup Instructions ### Install Required Packages ```bash # Core tools sudo apt update sudo apt install -y \ build-essential \ linux-tools-common \ linux-tools-$(uname -r) \ strace \ ltrace \ htop \ python3-pip # Optional but recommended sudo apt install -y \ hyperfine \ valgrind \ systemtap-sdt-dev # Python tools pip3 install py-spy # Pyroscope (for scenario 7) # Option A: Docker docker pull grafana/pyroscope # Option B: Download binary from https://github.com/grafana/pyroscope/releases # FlameGraph scripts cd ~ git clone https://github.com/brendangregg/FlameGraph.git ``` ### Configure perf Permissions ```bash # Allow perf for non-root users (needed for this workshop) sudo sysctl -w kernel.perf_event_paranoid=1 # To make permanent: echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf ``` ### Verify Installation ```bash # Should all work without errors: perf --version strace --version py-spy --version gcc --version python3 --version ``` --- ## Directory Structure ``` perf-workshop/ ├── README.md # This file ├── common/ │ └── CHEATSHEET.md # Quick reference card ├── scenario1-python-to-c/ │ ├── README.md │ ├── prime_slow.py # Slow Python version │ ├── prime.c # C implementation │ └── prime_fast.py # Python + C via ctypes ├── scenario2-memoization/ │ ├── README.md │ ├── fib_slow.py # Naive recursive Fibonacci │ ├── fib_cached.py # Memoized Fibonacci │ └── config_validator.py # Precomputation example ├── scenario3-syscall-storm/ │ ├── README.md │ ├── Makefile │ ├── read_slow.c # Byte-by-byte reads │ ├── read_fast.c # Buffered reads │ ├── read_stdio.c # stdio buffering │ └── read_python.py # Python equivalent ├── scenario4-cache-misses/ │ ├── README.md │ ├── Makefile │ ├── cache_demo.c # Row vs column major │ └── list_vs_array.c # Array vs linked list ├── scenario5-debug-symbols/ │ ├── README.md │ ├── Makefile │ └── program.c # Multi-function program ├── scenario6-usdt-probes/ │ ├── README.md │ ├── Makefile │ └── server.c # Program with USDT probes └── scenario7-pyroscope/ ├── README.md ├── requirements.txt ├── app.py # Flask app with Pyroscope └── loadgen.sh # Load generator script ``` --- ## Quick Start ### Build Everything ```bash # Build all C programs for dir in scenario{3,4,5,6}*/; do if [ -f "$dir/Makefile" ]; then echo "Building $dir" make -C "$dir" fi done # Build scenario 1 C library cd scenario1-python-to-c gcc -O2 -fPIC -shared -o libprime.so prime.c cd .. ``` ### Run a Scenario Each scenario has its own README with step-by-step instructions. Start with: ```bash cd scenario1-python-to-c cat README.md ``` --- ## Key Concepts Summary ### 1. Types of Bottlenecks | Type | Symptom | Tool | |------|---------|------| | CPU-bound | `user` time is high | `perf record` | | Syscall-bound | `sys` time is high | `strace -c` | | I/O-bound | Low CPU, slow wall time | `strace`, `iostat` | | Memory-bound | High cache misses | `perf stat` | ### 2. Profiling Workflow ``` 1. Measure: time ./program 2. Hypothesize: Where is time spent? 3. Profile: perf/strace/cProfile 4. Analyze: Find hot spots 5. Optimize: Fix the bottleneck 6. Verify: Re-measure ``` ### 3. Tool Selection | Task | Tool | |------|------| | Basic timing | `time` | | CPU sampling | `perf record` | | Hardware counters | `perf stat` | | Syscall tracing | `strace -c` | | Python profiling | `cProfile`, `py-spy` | | Visualization | Flamegraphs | | Continuous profiling | Pyroscope | --- ## Further Learning ### Books - "Systems Performance" by Brendan Gregg - "BPF Performance Tools" by Brendan Gregg ### Online Resources - https://www.brendangregg.com/linuxperf.html - https://perf.wiki.kernel.org/ - https://jvns.ca/blog/2016/03/12/how-does-perf-work-and-some-questions/ ### Tools to Explore Later - `bpftrace` - High-level tracing language - `eBPF` - In-kernel programmability - `Valgrind` - Memory profiling - `gprof` - Traditional profiler --- ## Troubleshooting ### "perf: command not found" ```bash sudo apt install linux-tools-common linux-tools-$(uname -r) ``` ### "Access to performance monitoring operations is limited" ```bash sudo sysctl -w kernel.perf_event_paranoid=1 ``` ### "py-spy: Permission denied" Either run as root or use `--nonblocking`: ```bash sudo py-spy record -o profile.svg -- python3 script.py # Or: py-spy record --nonblocking -o profile.svg -- python3 script.py ``` ### "No debug symbols" Recompile with `-g`: ```bash gcc -O2 -g -o program program.c ``` --- ## Feedback Found an issue? Have suggestions? Please provide feedback to your instructor! --- *Workshop materials prepared for BITS Pilani Goa* *Tools: All libre/open-source software*