11 KiB
Linux Performance Engineering Workshop
4-Hour Hands-On Training for BITS Pilani Goa
Prerequisites
- Basic C programming knowledge
- Basic Python knowledge
- Familiarity with command line
- Ubuntu 22.04/24.04 (or similar Linux)
Workshop Overview
This workshop teaches practical performance engineering skills using libre tools on Linux. By the end, you'll be able to identify and fix common performance problems.
What You'll Learn
- How to measure program performance (not guess!)
- CPU profiling with
perfand flamegraphs - Identifying syscall overhead with
strace - Understanding cache behavior
- Continuous profiling for production systems
Philosophy
"Measure, don't guess."
Most performance intuitions are wrong. This workshop teaches you to find bottlenecks with data.
Schedule
| Time | Topic | Hands-On |
|---|---|---|
| 0:00-0:45 | Introduction & Theory | - |
| 0:45-1:30 | Python Profiling | Scenarios 1 & 2 |
| 1:30-1:45 | Break | - |
| 1:45-2:30 | perf & Flamegraphs | Theory + Demo |
| 2:30-3:30 | Cache & Debug Symbols | Scenarios 4 & 5 |
| 3:30-4:00 | Lunch Break | - |
| 4:00-4:30 | Syscalls & I/O | Theory |
| 4:30-5:15 | Syscall Profiling | Scenario 3 |
| 5:15-5:30 | Break | - |
| 5:30-6:00 | Advanced Topics & Wrap-up | Scenarios 6 & 7 |
Setup Instructions
Install Required Packages
# Core tools
sudo apt update
sudo apt install -y \
build-essential \
linux-tools-common \
linux-tools-$(uname -r) \
strace \
ltrace \
htop \
python3-pip
# Optional but recommended
sudo apt install -y \
hyperfine \
systemtap-sdt-dev
# Python tools
pip3 install py-spy
# Pyroscope (for scenario 7)
# Option A: Docker
docker pull grafana/pyroscope
# Option B: Download binary from https://github.com/grafana/pyroscope/releases
# FlameGraph scripts
cd ~
git clone https://github.com/brendangregg/FlameGraph.git
Configure perf Permissions
# Allow perf for non-root users (needed for this workshop)
sudo sysctl -w kernel.perf_event_paranoid=1
# To make permanent:
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf
Verify Installation
# Should all work without errors:
perf --version
strace --version
py-spy --version
gcc --version
python3 --version
Nix-Based Setup (Alternative)
This workshop includes a Nix flake for reproducible environments. Use this if you have Nix installed or want a pre-configured bootable image.
Quick Reference
| Goal | Command |
|---|---|
| Dev shell with all tools | nix develop |
| Apply tools to Ubuntu | nix run github:numtide/system-manager -- switch --flake . |
| Build bootable USB ISO | nix build .#iso |
| Build netboot files | nix build .#netboot |
Development Shell
Get all workshop tools in your current shell without installing anything system-wide:
cd perf-workshop
nix develop
# Now you have: perf, strace, py-spy, bpftrace, hyperfine, flamegraph, pyroscope
perf --version
py-spy --help
System-Manager (Ubuntu/Debian)
Install all workshop tools on an existing Ubuntu system using Nix:
# Apply the configuration (installs tools via Nix)
nix run 'github:numtide/system-manager' -- switch --flake .
# Configure perf permissions (system-manager can't do this)
sudo /etc/perf-workshop-setup.sh
This installs tools into /nix/store and adds them to your PATH without conflicting with apt packages.
Bootable USB Image
Build a complete NixOS image with XFCE desktop, all tools pre-installed, and workshop materials:
# Build the ISO (~4-5 GB)
nix build .#iso
# Flash to USB (replace sdX with your device)
sudo dd if=result/iso/*.iso of=/dev/sdX bs=4M status=progress conv=fsync
ISO Features:
- XFCE desktop with auto-login (user:
workshop, password:workshop) copytoramenabled — boots from USB, runs entirely from RAM (USB can be removed after boot)kernel.perf_event_paranoid=1pre-configured- Workshop materials in
/home/workshop/perf-workshop - Desktop shortcut to open terminal in workshop directory
- SSH enabled for remote access
Requirements: 8+ GB RAM recommended (the system runs from RAM)
Netboot over LAN
For workshops with many participants, netboot is more efficient than flashing multiple USBs.
# Build netboot bundle
nix build .#netboot
cd result
# Contents:
# - bzImage (kernel)
# - initrd (initrd with full system, ~2-4 GB)
# - netboot.ipxe (iPXE boot script)
Option 1: Pixiecore (easiest)
Pixiecore is an all-in-one PXE server — just point it at the files:
nix shell nixpkgs#pixiecore
# Serve on your LAN (requires root for DHCP proxy)
sudo pixiecore boot bzImage initrd \
--cmdline "$(grep -oP 'imgargs.*? \K.*' netboot.ipxe)"
Participants set their BIOS to network boot and get the workshop environment automatically.
Option 2: dnsmasq + HTTP server
For more control or integration with existing infrastructure:
# Terminal 1: Serve files over HTTP
python3 -m http.server 8080
Configure dnsmasq (/etc/dnsmasq.d/workshop.conf):
interface=eth0
dhcp-range=192.168.1.100,192.168.1.200,12h
enable-tftp
tftp-root=/path/to/result
dhcp-boot=netboot.ipxe
Option 3: Existing PXE infrastructure
Copy files to your TFTP/HTTP server and configure your DHCP server to serve netboot.ipxe.
Flake Outputs Reference
# List all outputs
nix flake show
# Available outputs:
# - devShells.x86_64-linux.default # Development shell
# - packages.x86_64-linux.iso # Bootable ISO image
# - packages.x86_64-linux.netboot # Netboot bundle (kernel + initrd + ipxe)
# - packages.x86_64-linux.netboot-kernel
# - packages.x86_64-linux.netboot-initrd
# - packages.x86_64-linux.netboot-ipxe
# - nixosConfigurations.workshop-iso # NixOS config for ISO
# - nixosConfigurations.workshop-netboot # NixOS config for netboot
# - systemConfigs.default # system-manager config for Ubuntu
Directory Structure
perf-workshop/
├── README.md # This file
├── flake.nix # Nix flake (dev shell, ISO, netboot)
├── flake.lock # Locked dependencies
├── nix/
│ ├── packages.nix # Shared package list
│ ├── common.nix # Common NixOS configuration
│ ├── iso.nix # ISO-specific configuration
│ ├── netboot.nix # Netboot-specific configuration
│ └── system-manager.nix # Ubuntu system-manager module
├── common/
│ └── CHEATSHEET.md # Quick reference card
├── scenario1-python-to-c/
│ ├── README.md
│ ├── prime_slow.py # Slow Python version
│ ├── prime.c # C implementation
│ └── prime_fast.py # Python + C via ctypes
├── scenario2-memoization/
│ ├── README.md
│ ├── fib_slow.py # Naive recursive Fibonacci
│ ├── fib_cached.py # Memoized Fibonacci
│ └── config_validator.py # Precomputation example
├── scenario3-syscall-storm/
│ ├── README.md
│ ├── Makefile
│ ├── read_slow.c # Byte-by-byte reads
│ ├── read_fast.c # Buffered reads
│ ├── read_stdio.c # stdio buffering
│ └── read_python.py # Python equivalent
├── scenario4-cache-misses/
│ ├── README.md
│ ├── Makefile
│ ├── matrix_col_major.c # BAD: Column-major traversal
│ ├── matrix_row_major.c # GOOD: Row-major traversal
│ ├── list_scattered.c # BAD: Scattered linked list
│ ├── list_sequential.c # MEDIUM: Sequential linked list
│ └── array_sum.c # GOOD: Contiguous array
├── scenario5-debug-symbols/
│ ├── README.md
│ ├── Makefile
│ └── program.c # Multi-function program
├── scenario6-usdt-probes/
│ ├── README.md
│ ├── Makefile
│ └── server.c # Program with USDT probes
└── scenario7-pyroscope/
├── README.md
├── requirements.txt
├── app.py # Flask app with Pyroscope
└── loadgen.sh # Load generator script
Quick Start
Build Everything
# Build all C programs
for dir in scenario{3,4,5,6}*/; do
if [ -f "$dir/Makefile" ]; then
echo "Building $dir"
make -C "$dir"
fi
done
# Build scenario 1 C library
cd scenario1-python-to-c
gcc -O2 -fPIC -shared -o libprime.so prime.c
cd ..
Run a Scenario
Each scenario has its own README with step-by-step instructions. Start with:
cd scenario1-python-to-c
cat README.md
Key Concepts Summary
1. Types of Bottlenecks
| Type | Symptom | Tool |
|---|---|---|
| CPU-bound | user time is high |
perf record |
| Syscall-bound | sys time is high |
strace -c |
| I/O-bound | Low CPU, slow wall time | strace, iostat |
| Memory-bound | High cache misses | perf stat |
2. Profiling Workflow
1. Measure: time ./program
2. Hypothesize: Where is time spent?
3. Profile: perf/strace/cProfile
4. Analyze: Find hot spots
5. Optimize: Fix the bottleneck
6. Verify: Re-measure
3. Tool Selection
| Task | Tool |
|---|---|
| Basic timing | time |
| CPU sampling | perf record |
| Hardware counters | perf stat |
| Syscall tracing | strace -c |
| Python profiling | cProfile, py-spy |
| Visualization | Flamegraphs |
| Continuous profiling | Pyroscope |
Further Learning
Books
- "Systems Performance" by Brendan Gregg
- "BPF Performance Tools" by Brendan Gregg
Online Resources
- https://www.brendangregg.com/linuxperf.html
- https://perf.wiki.kernel.org/
- https://jvns.ca/blog/2016/03/12/how-does-perf-work-and-some-questions/
Tools to Explore Later
bpftrace- High-level tracing languageeBPF- In-kernel programmabilitygprof- Traditional profiler
Troubleshooting
"perf: command not found"
sudo apt install linux-tools-common linux-tools-$(uname -r)
"Access to performance monitoring operations is limited"
sudo sysctl -w kernel.perf_event_paranoid=1
"py-spy: Permission denied"
Either run as root or use --nonblocking:
sudo py-spy record -o profile.svg -- python3 script.py
# Or:
py-spy record --nonblocking -o profile.svg -- python3 script.py
"No debug symbols"
Recompile with -g:
gcc -O2 -g -o program program.c
Feedback
Found an issue? Have suggestions? Please provide feedback to your instructor!
Workshop materials prepared for BITS Pilani Goa Tools: All libre/open-source software