# Linux Performance Engineering Workshop ## 4-Hour Hands-On Training for BITS Pilani Goa ### Prerequisites - Basic C programming knowledge - Basic Python knowledge - Familiarity with command line - Ubuntu 22.04/24.04 (or similar Linux) --- ## Workshop Overview This workshop teaches practical performance engineering skills using libre tools on Linux. By the end, you'll be able to identify and fix common performance problems. ### What You'll Learn - How to measure program performance (not guess!) - CPU profiling with `perf` and flamegraphs - Identifying syscall overhead with `strace` - Understanding cache behavior - Continuous profiling for production systems ### Philosophy > "Measure, don't guess." Most performance intuitions are wrong. This workshop teaches you to find bottlenecks with data. --- ## Schedule | Time | Topic | Hands-On | |------|-------|----------| | 0:00-0:45 | Introduction & Theory | - | | 0:45-1:30 | Python Profiling | Scenarios 1 & 2 | | 1:30-1:45 | Break | - | | 1:45-2:30 | perf & Flamegraphs | Theory + Demo | | 2:30-3:30 | Cache & Debug Symbols | Scenarios 4 & 5 | | 3:30-4:00 | Lunch Break | - | | 4:00-4:30 | Syscalls & I/O | Theory | | 4:30-5:15 | Syscall Profiling | Scenario 3 | | 5:15-5:30 | Break | - | | 5:30-6:00 | Advanced Topics & Wrap-up | Scenarios 6 & 7 | --- ## Setup Instructions ### Install Required Packages ```bash # Core tools sudo apt update sudo apt install -y \ build-essential \ linux-tools-common \ linux-tools-$(uname -r) \ strace \ ltrace \ htop \ python3-pip # Optional but recommended sudo apt install -y \ hyperfine \ systemtap-sdt-dev # Python tools pip3 install py-spy # Pyroscope (for scenario 7) # Option A: Docker docker pull grafana/pyroscope # Option B: Download binary from https://github.com/grafana/pyroscope/releases # FlameGraph scripts cd ~ git clone https://github.com/brendangregg/FlameGraph.git ``` ### Configure perf Permissions ```bash # Allow perf for non-root users (needed for this workshop) sudo sysctl -w kernel.perf_event_paranoid=1 # To make permanent: echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf ``` ### Verify Installation ```bash # Should all work without errors: perf --version strace --version py-spy --version gcc --version python3 --version ``` --- ## Nix-Based Setup (Alternative) This workshop includes a Nix flake for reproducible environments. Use this if you have Nix installed or want a pre-configured bootable image. ### Quick Reference | Goal | Command | |------|---------| | Dev shell with all tools | `nix develop` | | Apply tools to Ubuntu | `nix run github:numtide/system-manager -- switch --flake .` | | Build bootable USB ISO | `nix build .#iso` | | Build netboot files | `nix build .#netboot` | ### Development Shell Get all workshop tools in your current shell without installing anything system-wide: ```bash cd perf-workshop nix develop # Now you have: perf, strace, py-spy, bpftrace, hyperfine, flamegraph, pyroscope perf --version py-spy --help ``` ### System-Manager (Ubuntu/Debian) Install all workshop tools on an existing Ubuntu system using Nix: ```bash # Apply the configuration (installs tools via Nix) nix run 'github:numtide/system-manager' -- switch --flake . # Configure perf permissions (system-manager can't do this) sudo /etc/perf-workshop-setup.sh ``` This installs tools into `/nix/store` and adds them to your PATH without conflicting with apt packages. ### Bootable USB Image Build a complete NixOS image with XFCE desktop, all tools pre-installed, and workshop materials: ```bash # Build the ISO (~4-5 GB) nix build .#iso # Flash to USB (replace sdX with your device) sudo dd if=result/iso/*.iso of=/dev/sdX bs=4M status=progress conv=fsync ``` **ISO Features:** - XFCE desktop with auto-login (user: `workshop`, password: `workshop`) - `copytoram` enabled — boots from USB, runs entirely from RAM (USB can be removed after boot) - `kernel.perf_event_paranoid=1` pre-configured - Workshop materials in `/home/workshop/perf-workshop` - Desktop shortcut to open terminal in workshop directory - SSH enabled for remote access **Requirements:** 8+ GB RAM recommended (the system runs from RAM) ### Netboot over LAN For workshops with many participants, netboot is more efficient than flashing multiple USBs. ```bash # Build netboot bundle nix build .#netboot cd result # Contents: # - bzImage (kernel) # - initrd (initrd with full system, ~2-4 GB) # - netboot.ipxe (iPXE boot script) ``` **Option 1: Pixiecore (easiest)** Pixiecore is an all-in-one PXE server — just point it at the files: ```bash nix shell nixpkgs#pixiecore # Serve on your LAN (requires root for DHCP proxy) sudo pixiecore boot bzImage initrd \ --cmdline "$(grep -oP 'imgargs.*? \K.*' netboot.ipxe)" ``` Participants set their BIOS to network boot and get the workshop environment automatically. **Option 2: dnsmasq + HTTP server** For more control or integration with existing infrastructure: ```bash # Terminal 1: Serve files over HTTP python3 -m http.server 8080 ``` Configure dnsmasq (`/etc/dnsmasq.d/workshop.conf`): ```ini interface=eth0 dhcp-range=192.168.1.100,192.168.1.200,12h enable-tftp tftp-root=/path/to/result dhcp-boot=netboot.ipxe ``` **Option 3: Existing PXE infrastructure** Copy files to your TFTP/HTTP server and configure your DHCP server to serve `netboot.ipxe`. ### Flake Outputs Reference ```bash # List all outputs nix flake show # Available outputs: # - devShells.x86_64-linux.default # Development shell # - packages.x86_64-linux.iso # Bootable ISO image # - packages.x86_64-linux.netboot # Netboot bundle (kernel + initrd + ipxe) # - packages.x86_64-linux.netboot-kernel # - packages.x86_64-linux.netboot-initrd # - packages.x86_64-linux.netboot-ipxe # - nixosConfigurations.workshop-iso # NixOS config for ISO # - nixosConfigurations.workshop-netboot # NixOS config for netboot # - systemConfigs.default # system-manager config for Ubuntu ``` --- ## Directory Structure ``` perf-workshop/ ├── README.md # This file ├── flake.nix # Nix flake (dev shell, ISO, netboot) ├── flake.lock # Locked dependencies ├── nix/ │ ├── packages.nix # Shared package list │ ├── common.nix # Common NixOS configuration │ ├── iso.nix # ISO-specific configuration │ ├── netboot.nix # Netboot-specific configuration │ └── system-manager.nix # Ubuntu system-manager module ├── common/ │ └── CHEATSHEET.md # Quick reference card ├── scenario1-python-to-c/ │ ├── README.md │ ├── prime_slow.py # Slow Python version │ ├── prime.c # C implementation │ └── prime_fast.py # Python + C via ctypes ├── scenario2-memoization/ │ ├── README.md │ ├── fib_slow.py # Naive recursive Fibonacci │ ├── fib_cached.py # Memoized Fibonacci │ └── config_validator.py # Precomputation example ├── scenario3-syscall-storm/ │ ├── README.md │ ├── Makefile │ ├── read_slow.c # Byte-by-byte reads │ ├── read_fast.c # Buffered reads │ ├── read_stdio.c # stdio buffering │ └── read_python.py # Python equivalent ├── scenario4-cache-misses/ │ ├── README.md │ ├── Makefile │ ├── matrix_col_major.c # BAD: Column-major traversal │ ├── matrix_row_major.c # GOOD: Row-major traversal │ ├── list_scattered.c # BAD: Scattered linked list │ ├── list_sequential.c # MEDIUM: Sequential linked list │ └── array_sum.c # GOOD: Contiguous array ├── scenario5-debug-symbols/ │ ├── README.md │ ├── Makefile │ └── program.c # Multi-function program ├── scenario6-usdt-probes/ │ ├── README.md │ ├── Makefile │ └── server.c # Program with USDT probes └── scenario7-pyroscope/ ├── README.md ├── requirements.txt ├── app.py # Flask app with Pyroscope └── loadgen.sh # Load generator script ``` --- ## Quick Start ### Build Everything ```bash # Build all C programs for dir in scenario{3,4,5,6}*/; do if [ -f "$dir/Makefile" ]; then echo "Building $dir" make -C "$dir" fi done # Build scenario 1 C library cd scenario1-python-to-c gcc -O2 -fPIC -shared -o libprime.so prime.c cd .. ``` ### Run a Scenario Each scenario has its own README with step-by-step instructions. Start with: ```bash cd scenario1-python-to-c cat README.md ``` --- ## Key Concepts Summary ### 1. Types of Bottlenecks | Type | Symptom | Tool | |------|---------|------| | CPU-bound | `user` time is high | `perf record` | | Syscall-bound | `sys` time is high | `strace -c` | | I/O-bound | Low CPU, slow wall time | `strace`, `iostat` | | Memory-bound | High cache misses | `perf stat` | ### 2. Profiling Workflow ``` 1. Measure: time ./program 2. Hypothesize: Where is time spent? 3. Profile: perf/strace/cProfile 4. Analyze: Find hot spots 5. Optimize: Fix the bottleneck 6. Verify: Re-measure ``` ### 3. Tool Selection | Task | Tool | |------|------| | Basic timing | `time` | | CPU sampling | `perf record` | | Hardware counters | `perf stat` | | Syscall tracing | `strace -c` | | Python profiling | `cProfile`, `py-spy` | | Visualization | Flamegraphs | | Continuous profiling | Pyroscope | --- ## Further Learning ### Books - "Systems Performance" by Brendan Gregg - "BPF Performance Tools" by Brendan Gregg ### Online Resources - https://www.brendangregg.com/linuxperf.html - https://perf.wiki.kernel.org/ - https://jvns.ca/blog/2016/03/12/how-does-perf-work-and-some-questions/ ### Tools to Explore Later - `bpftrace` - High-level tracing language - `eBPF` - In-kernel programmability - `gprof` - Traditional profiler --- ## Troubleshooting ### "perf: command not found" ```bash sudo apt install linux-tools-common linux-tools-$(uname -r) ``` ### "Access to performance monitoring operations is limited" ```bash sudo sysctl -w kernel.perf_event_paranoid=1 ``` ### "py-spy: Permission denied" Either run as root or use `--nonblocking`: ```bash sudo py-spy record -o profile.svg -- python3 script.py # Or: py-spy record --nonblocking -o profile.svg -- python3 script.py ``` ### "No debug symbols" Recompile with `-g`: ```bash gcc -O2 -g -o program program.c ``` --- ## Feedback Found an issue? Have suggestions? Please provide feedback to your instructor! --- *Workshop materials prepared for BITS Pilani Goa* *Tools: All libre/open-source software*