418 lines
11 KiB
Markdown
418 lines
11 KiB
Markdown
# Linux Performance Engineering Workshop
|
|
|
|
## 4-Hour Hands-On Training for BITS Pilani Goa
|
|
|
|
### Prerequisites
|
|
- Basic C programming knowledge
|
|
- Basic Python knowledge
|
|
- Familiarity with command line
|
|
- Ubuntu 22.04/24.04 (or similar Linux)
|
|
|
|
---
|
|
|
|
## Workshop Overview
|
|
|
|
This workshop teaches practical performance engineering skills using libre tools on Linux.
|
|
By the end, you'll be able to identify and fix common performance problems.
|
|
|
|
### What You'll Learn
|
|
- How to measure program performance (not guess!)
|
|
- CPU profiling with `perf` and flamegraphs
|
|
- Identifying syscall overhead with `strace`
|
|
- Understanding cache behavior
|
|
- Continuous profiling for production systems
|
|
|
|
### Philosophy
|
|
> "Measure, don't guess."
|
|
|
|
Most performance intuitions are wrong. This workshop teaches you to find bottlenecks with data.
|
|
|
|
---
|
|
|
|
## Schedule
|
|
|
|
| Time | Topic | Hands-On |
|
|
|------|-------|----------|
|
|
| 0:00-0:45 | Introduction & Theory | - |
|
|
| 0:45-1:30 | Python Profiling | Scenarios 1 & 2 |
|
|
| 1:30-1:45 | Break | - |
|
|
| 1:45-2:30 | perf & Flamegraphs | Theory + Demo |
|
|
| 2:30-3:30 | Cache & Debug Symbols | Scenarios 4 & 5 |
|
|
| 3:30-4:00 | Lunch Break | - |
|
|
| 4:00-4:30 | Syscalls & I/O | Theory |
|
|
| 4:30-5:15 | Syscall Profiling | Scenario 3 |
|
|
| 5:15-5:30 | Break | - |
|
|
| 5:30-6:00 | Advanced Topics & Wrap-up | Scenarios 6 & 7 |
|
|
|
|
---
|
|
|
|
## Setup Instructions
|
|
|
|
### Install Required Packages
|
|
|
|
```bash
|
|
# Core tools
|
|
sudo apt update
|
|
sudo apt install -y \
|
|
build-essential \
|
|
linux-tools-common \
|
|
linux-tools-$(uname -r) \
|
|
strace \
|
|
ltrace \
|
|
htop \
|
|
python3-pip
|
|
|
|
# Optional but recommended
|
|
sudo apt install -y \
|
|
hyperfine \
|
|
systemtap-sdt-dev
|
|
|
|
# Python tools
|
|
pip3 install py-spy
|
|
|
|
# Pyroscope (for scenario 7)
|
|
# Option A: Docker
|
|
docker pull grafana/pyroscope
|
|
# Option B: Download binary from https://github.com/grafana/pyroscope/releases
|
|
|
|
# FlameGraph scripts
|
|
cd ~
|
|
git clone https://github.com/brendangregg/FlameGraph.git
|
|
```
|
|
|
|
### Configure perf Permissions
|
|
|
|
```bash
|
|
# Allow perf for non-root users (needed for this workshop)
|
|
sudo sysctl -w kernel.perf_event_paranoid=1
|
|
|
|
# To make permanent:
|
|
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf
|
|
```
|
|
|
|
### Verify Installation
|
|
|
|
```bash
|
|
# Should all work without errors:
|
|
perf --version
|
|
strace --version
|
|
py-spy --version
|
|
gcc --version
|
|
python3 --version
|
|
```
|
|
|
|
---
|
|
|
|
## Nix-Based Setup (Alternative)
|
|
|
|
This workshop includes a Nix flake for reproducible environments. Use this if you have Nix installed or want a pre-configured bootable image.
|
|
|
|
### Quick Reference
|
|
|
|
| Goal | Command |
|
|
|------|---------|
|
|
| Dev shell with all tools | `nix develop` |
|
|
| Apply tools to Ubuntu | `nix run github:numtide/system-manager -- switch --flake .` |
|
|
| Build bootable USB ISO | `nix build .#iso` |
|
|
| Build netboot files | `nix build .#netboot` |
|
|
|
|
### Development Shell
|
|
|
|
Get all workshop tools in your current shell without installing anything system-wide:
|
|
|
|
```bash
|
|
cd perf-workshop
|
|
nix develop
|
|
|
|
# Now you have: perf, strace, py-spy, bpftrace, hyperfine, flamegraph, pyroscope
|
|
perf --version
|
|
py-spy --help
|
|
```
|
|
|
|
### System-Manager (Ubuntu/Debian)
|
|
|
|
Install all workshop tools on an existing Ubuntu system using Nix:
|
|
|
|
```bash
|
|
# Apply the configuration (installs tools via Nix)
|
|
nix run 'github:numtide/system-manager' -- switch --flake .
|
|
|
|
# Configure perf permissions (system-manager can't do this)
|
|
sudo /etc/perf-workshop-setup.sh
|
|
```
|
|
|
|
This installs tools into `/nix/store` and adds them to your PATH without conflicting with apt packages.
|
|
|
|
### Bootable USB Image
|
|
|
|
Build a complete NixOS image with XFCE desktop, all tools pre-installed, and workshop materials:
|
|
|
|
```bash
|
|
# Build the ISO (~4-5 GB)
|
|
nix build .#iso
|
|
|
|
# Flash to USB (replace sdX with your device)
|
|
sudo dd if=result/iso/*.iso of=/dev/sdX bs=4M status=progress conv=fsync
|
|
```
|
|
|
|
**ISO Features:**
|
|
- XFCE desktop with auto-login (user: `workshop`, password: `workshop`)
|
|
- `copytoram` enabled — boots from USB, runs entirely from RAM (USB can be removed after boot)
|
|
- `kernel.perf_event_paranoid=1` pre-configured
|
|
- Workshop materials in `/home/workshop/perf-workshop`
|
|
- Desktop shortcut to open terminal in workshop directory
|
|
- SSH enabled for remote access
|
|
|
|
**Requirements:** 8+ GB RAM recommended (the system runs from RAM)
|
|
|
|
### Netboot over LAN
|
|
|
|
For workshops with many participants, netboot is more efficient than flashing multiple USBs.
|
|
|
|
```bash
|
|
# Build netboot bundle
|
|
nix build .#netboot
|
|
cd result
|
|
|
|
# Contents:
|
|
# - bzImage (kernel)
|
|
# - initrd (initrd with full system, ~2-4 GB)
|
|
# - netboot.ipxe (iPXE boot script)
|
|
```
|
|
|
|
**Option 1: Pixiecore (easiest)**
|
|
|
|
Pixiecore is an all-in-one PXE server — just point it at the files:
|
|
|
|
```bash
|
|
nix shell nixpkgs#pixiecore
|
|
|
|
# Serve on your LAN (requires root for DHCP proxy)
|
|
sudo pixiecore boot bzImage initrd \
|
|
--cmdline "$(grep -oP 'imgargs.*? \K.*' netboot.ipxe)"
|
|
```
|
|
|
|
Participants set their BIOS to network boot and get the workshop environment automatically.
|
|
|
|
**Option 2: dnsmasq + HTTP server**
|
|
|
|
For more control or integration with existing infrastructure:
|
|
|
|
```bash
|
|
# Terminal 1: Serve files over HTTP
|
|
python3 -m http.server 8080
|
|
```
|
|
|
|
Configure dnsmasq (`/etc/dnsmasq.d/workshop.conf`):
|
|
```ini
|
|
interface=eth0
|
|
dhcp-range=192.168.1.100,192.168.1.200,12h
|
|
enable-tftp
|
|
tftp-root=/path/to/result
|
|
dhcp-boot=netboot.ipxe
|
|
```
|
|
|
|
**Option 3: Existing PXE infrastructure**
|
|
|
|
Copy files to your TFTP/HTTP server and configure your DHCP server to serve `netboot.ipxe`.
|
|
|
|
### Flake Outputs Reference
|
|
|
|
```bash
|
|
# List all outputs
|
|
nix flake show
|
|
|
|
# Available outputs:
|
|
# - devShells.x86_64-linux.default # Development shell
|
|
# - packages.x86_64-linux.iso # Bootable ISO image
|
|
# - packages.x86_64-linux.netboot # Netboot bundle (kernel + initrd + ipxe)
|
|
# - packages.x86_64-linux.netboot-kernel
|
|
# - packages.x86_64-linux.netboot-initrd
|
|
# - packages.x86_64-linux.netboot-ipxe
|
|
# - nixosConfigurations.workshop-iso # NixOS config for ISO
|
|
# - nixosConfigurations.workshop-netboot # NixOS config for netboot
|
|
# - systemConfigs.default # system-manager config for Ubuntu
|
|
```
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
perf-workshop/
|
|
├── README.md # This file
|
|
├── flake.nix # Nix flake (dev shell, ISO, netboot)
|
|
├── flake.lock # Locked dependencies
|
|
├── nix/
|
|
│ ├── packages.nix # Shared package list
|
|
│ ├── common.nix # Common NixOS configuration
|
|
│ ├── iso.nix # ISO-specific configuration
|
|
│ ├── netboot.nix # Netboot-specific configuration
|
|
│ └── system-manager.nix # Ubuntu system-manager module
|
|
├── common/
|
|
│ └── CHEATSHEET.md # Quick reference card
|
|
├── scenario1-python-to-c/
|
|
│ ├── README.md
|
|
│ ├── prime_slow.py # Slow Python version
|
|
│ ├── prime.c # C implementation
|
|
│ └── prime_fast.py # Python + C via ctypes
|
|
├── scenario2-memoization/
|
|
│ ├── README.md
|
|
│ ├── fib_slow.py # Naive recursive Fibonacci
|
|
│ ├── fib_cached.py # Memoized Fibonacci
|
|
│ └── config_validator.py # Precomputation example
|
|
├── scenario3-syscall-storm/
|
|
│ ├── README.md
|
|
│ ├── Makefile
|
|
│ ├── read_slow.c # Byte-by-byte reads
|
|
│ ├── read_fast.c # Buffered reads
|
|
│ ├── read_stdio.c # stdio buffering
|
|
│ └── read_python.py # Python equivalent
|
|
├── scenario4-cache-misses/
|
|
│ ├── README.md
|
|
│ ├── Makefile
|
|
│ ├── matrix_col_major.c # BAD: Column-major traversal
|
|
│ ├── matrix_row_major.c # GOOD: Row-major traversal
|
|
│ ├── list_scattered.c # BAD: Scattered linked list
|
|
│ ├── list_sequential.c # MEDIUM: Sequential linked list
|
|
│ └── array_sum.c # GOOD: Contiguous array
|
|
├── scenario5-debug-symbols/
|
|
│ ├── README.md
|
|
│ ├── Makefile
|
|
│ └── program.c # Multi-function program
|
|
├── scenario6-usdt-probes/
|
|
│ ├── README.md
|
|
│ ├── Makefile
|
|
│ └── server.c # Program with USDT probes
|
|
└── scenario7-pyroscope/
|
|
├── README.md
|
|
├── requirements.txt
|
|
├── app.py # Flask app with Pyroscope
|
|
└── loadgen.sh # Load generator script
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Build Everything
|
|
|
|
```bash
|
|
# Build all C programs
|
|
for dir in scenario{3,4,5,6}*/; do
|
|
if [ -f "$dir/Makefile" ]; then
|
|
echo "Building $dir"
|
|
make -C "$dir"
|
|
fi
|
|
done
|
|
|
|
# Build scenario 1 C library
|
|
cd scenario1-python-to-c
|
|
gcc -O2 -fPIC -shared -o libprime.so prime.c
|
|
cd ..
|
|
```
|
|
|
|
### Run a Scenario
|
|
|
|
Each scenario has its own README with step-by-step instructions.
|
|
Start with:
|
|
|
|
```bash
|
|
cd scenario1-python-to-c
|
|
cat README.md
|
|
```
|
|
|
|
---
|
|
|
|
## Key Concepts Summary
|
|
|
|
### 1. Types of Bottlenecks
|
|
|
|
| Type | Symptom | Tool |
|
|
|------|---------|------|
|
|
| CPU-bound | `user` time is high | `perf record` |
|
|
| Syscall-bound | `sys` time is high | `strace -c` |
|
|
| I/O-bound | Low CPU, slow wall time | `strace`, `iostat` |
|
|
| Memory-bound | High cache misses | `perf stat` |
|
|
|
|
### 2. Profiling Workflow
|
|
|
|
```
|
|
1. Measure: time ./program
|
|
2. Hypothesize: Where is time spent?
|
|
3. Profile: perf/strace/cProfile
|
|
4. Analyze: Find hot spots
|
|
5. Optimize: Fix the bottleneck
|
|
6. Verify: Re-measure
|
|
```
|
|
|
|
### 3. Tool Selection
|
|
|
|
| Task | Tool |
|
|
|------|------|
|
|
| Basic timing | `time` |
|
|
| CPU sampling | `perf record` |
|
|
| Hardware counters | `perf stat` |
|
|
| Syscall tracing | `strace -c` |
|
|
| Python profiling | `cProfile`, `py-spy` |
|
|
| Visualization | Flamegraphs |
|
|
| Continuous profiling | Pyroscope |
|
|
|
|
---
|
|
|
|
## Further Learning
|
|
|
|
### Books
|
|
- "Systems Performance" by Brendan Gregg
|
|
- "BPF Performance Tools" by Brendan Gregg
|
|
|
|
### Online Resources
|
|
- https://www.brendangregg.com/linuxperf.html
|
|
- https://perf.wiki.kernel.org/
|
|
- https://jvns.ca/blog/2016/03/12/how-does-perf-work-and-some-questions/
|
|
|
|
### Tools to Explore Later
|
|
- `bpftrace` - High-level tracing language
|
|
- `eBPF` - In-kernel programmability
|
|
- `gprof` - Traditional profiler
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### "perf: command not found"
|
|
```bash
|
|
sudo apt install linux-tools-common linux-tools-$(uname -r)
|
|
```
|
|
|
|
### "Access to performance monitoring operations is limited"
|
|
```bash
|
|
sudo sysctl -w kernel.perf_event_paranoid=1
|
|
```
|
|
|
|
### "py-spy: Permission denied"
|
|
Either run as root or use `--nonblocking`:
|
|
```bash
|
|
sudo py-spy record -o profile.svg -- python3 script.py
|
|
# Or:
|
|
py-spy record --nonblocking -o profile.svg -- python3 script.py
|
|
```
|
|
|
|
### "No debug symbols"
|
|
Recompile with `-g`:
|
|
```bash
|
|
gcc -O2 -g -o program program.c
|
|
```
|
|
|
|
---
|
|
|
|
## Feedback
|
|
|
|
Found an issue? Have suggestions?
|
|
Please provide feedback to your instructor!
|
|
|
|
---
|
|
|
|
*Workshop materials prepared for BITS Pilani Goa*
|
|
*Tools: All libre/open-source software*
|