perf-workshop/README.md
2026-01-11 10:53:20 +05:30

11 KiB

Linux Performance Engineering Workshop

4-Hour Hands-On Training for BITS Pilani Goa

Prerequisites

  • Basic C programming knowledge
  • Basic Python knowledge
  • Familiarity with command line
  • Ubuntu 22.04/24.04 (or similar Linux)

Workshop Overview

This workshop teaches practical performance engineering skills using libre tools on Linux. By the end, you'll be able to identify and fix common performance problems.

What You'll Learn

  • How to measure program performance (not guess!)
  • CPU profiling with perf and flamegraphs
  • Identifying syscall overhead with strace
  • Understanding cache behavior
  • Continuous profiling for production systems

Philosophy

"Measure, don't guess."

Most performance intuitions are wrong. This workshop teaches you to find bottlenecks with data.


Schedule

Time Topic Hands-On
0:00-0:45 Introduction & Theory -
0:45-1:30 Python Profiling Scenarios 1 & 2
1:30-1:45 Break -
1:45-2:30 perf & Flamegraphs Theory + Demo
2:30-3:30 Cache & Debug Symbols Scenarios 4 & 5
3:30-4:00 Lunch Break -
4:00-4:30 Syscalls & I/O Theory
4:30-5:15 Syscall Profiling Scenario 3
5:15-5:30 Break -
5:30-6:00 Advanced Topics & Wrap-up Scenarios 6 & 7

Setup Instructions

Install Required Packages

# Core tools
sudo apt update
sudo apt install -y \
    build-essential \
    linux-tools-common \
    linux-tools-$(uname -r) \
    strace \
    ltrace \
    htop \
    python3-pip

# Optional but recommended
sudo apt install -y \
    hyperfine \
    systemtap-sdt-dev

# Python tools
pip3 install py-spy

# Pyroscope (for scenario 7)
# Option A: Docker
docker pull grafana/pyroscope
# Option B: Download binary from https://github.com/grafana/pyroscope/releases

# FlameGraph scripts
cd ~
git clone https://github.com/brendangregg/FlameGraph.git

Configure perf Permissions

# Allow perf for non-root users (needed for this workshop)
sudo sysctl -w kernel.perf_event_paranoid=1

# To make permanent:
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf

Verify Installation

# Should all work without errors:
perf --version
strace --version
py-spy --version
gcc --version
python3 --version

Nix-Based Setup (Alternative)

This workshop includes a Nix flake for reproducible environments. Use this if you have Nix installed or want a pre-configured bootable image.

Quick Reference

Goal Command
Dev shell with all tools nix develop
Apply tools to Ubuntu nix run github:numtide/system-manager -- switch --flake .
Build bootable USB ISO nix build .#iso
Build netboot files nix build .#netboot

Development Shell

Get all workshop tools in your current shell without installing anything system-wide:

cd perf-workshop
nix develop

# Now you have: perf, strace, py-spy, bpftrace, hyperfine, flamegraph, pyroscope
perf --version
py-spy --help

System-Manager (Ubuntu/Debian)

Install all workshop tools on an existing Ubuntu system using Nix:

# Apply the configuration (installs tools via Nix)
nix run 'github:numtide/system-manager' -- switch --flake .

# Configure perf permissions (system-manager can't do this)
sudo /etc/perf-workshop-setup.sh

This installs tools into /nix/store and adds them to your PATH without conflicting with apt packages.

Bootable USB Image

Build a complete NixOS image with XFCE desktop, all tools pre-installed, and workshop materials:

# Build the ISO (~4-5 GB)
nix build .#iso

# Flash to USB (replace sdX with your device)
sudo dd if=result/iso/*.iso of=/dev/sdX bs=4M status=progress conv=fsync

ISO Features:

  • XFCE desktop with auto-login (user: workshop, password: workshop)
  • copytoram enabled — boots from USB, runs entirely from RAM (USB can be removed after boot)
  • kernel.perf_event_paranoid=1 pre-configured
  • Workshop materials in /home/workshop/perf-workshop
  • Desktop shortcut to open terminal in workshop directory
  • SSH enabled for remote access

Requirements: 8+ GB RAM recommended (the system runs from RAM)

Netboot over LAN

For workshops with many participants, netboot is more efficient than flashing multiple USBs.

# Build netboot bundle
nix build .#netboot
cd result

# Contents:
# - bzImage      (kernel)
# - initrd       (initrd with full system, ~2-4 GB)
# - netboot.ipxe (iPXE boot script)

Option 1: Pixiecore (easiest)

Pixiecore is an all-in-one PXE server — just point it at the files:

nix shell nixpkgs#pixiecore

# Serve on your LAN (requires root for DHCP proxy)
sudo pixiecore boot bzImage initrd \
  --cmdline "$(grep -oP 'imgargs.*? \K.*' netboot.ipxe)"

Participants set their BIOS to network boot and get the workshop environment automatically.

Option 2: dnsmasq + HTTP server

For more control or integration with existing infrastructure:

# Terminal 1: Serve files over HTTP
python3 -m http.server 8080

Configure dnsmasq (/etc/dnsmasq.d/workshop.conf):

interface=eth0
dhcp-range=192.168.1.100,192.168.1.200,12h
enable-tftp
tftp-root=/path/to/result
dhcp-boot=netboot.ipxe

Option 3: Existing PXE infrastructure

Copy files to your TFTP/HTTP server and configure your DHCP server to serve netboot.ipxe.

Flake Outputs Reference

# List all outputs
nix flake show

# Available outputs:
# - devShells.x86_64-linux.default    # Development shell
# - packages.x86_64-linux.iso         # Bootable ISO image
# - packages.x86_64-linux.netboot     # Netboot bundle (kernel + initrd + ipxe)
# - packages.x86_64-linux.netboot-kernel
# - packages.x86_64-linux.netboot-initrd
# - packages.x86_64-linux.netboot-ipxe
# - nixosConfigurations.workshop-iso  # NixOS config for ISO
# - nixosConfigurations.workshop-netboot  # NixOS config for netboot
# - systemConfigs.default             # system-manager config for Ubuntu

Directory Structure

perf-workshop/
├── README.md                    # This file
├── flake.nix                    # Nix flake (dev shell, ISO, netboot)
├── flake.lock                   # Locked dependencies
├── nix/
│   ├── packages.nix            # Shared package list
│   ├── common.nix              # Common NixOS configuration
│   ├── iso.nix                 # ISO-specific configuration
│   ├── netboot.nix             # Netboot-specific configuration
│   └── system-manager.nix      # Ubuntu system-manager module
├── common/
│   └── CHEATSHEET.md           # Quick reference card
├── scenario1-python-to-c/
│   ├── README.md
│   ├── prime_slow.py           # Slow Python version
│   ├── prime.c                 # C implementation
│   └── prime_fast.py           # Python + C via ctypes
├── scenario2-memoization/
│   ├── README.md
│   ├── fib_slow.py             # Naive recursive Fibonacci
│   ├── fib_cached.py           # Memoized Fibonacci
│   └── config_validator.py     # Precomputation example
├── scenario3-syscall-storm/
│   ├── README.md
│   ├── Makefile
│   ├── read_slow.c             # Byte-by-byte reads
│   ├── read_fast.c             # Buffered reads
│   ├── read_stdio.c            # stdio buffering
│   └── read_python.py          # Python equivalent
├── scenario4-cache-misses/
│   ├── README.md
│   ├── Makefile
│   ├── matrix_col_major.c      # BAD: Column-major traversal
│   ├── matrix_row_major.c      # GOOD: Row-major traversal
│   ├── list_scattered.c        # BAD: Scattered linked list
│   ├── list_sequential.c       # MEDIUM: Sequential linked list
│   └── array_sum.c             # GOOD: Contiguous array
├── scenario5-debug-symbols/
│   ├── README.md
│   ├── Makefile
│   └── program.c               # Multi-function program
├── scenario6-usdt-probes/
│   ├── README.md
│   ├── Makefile
│   └── server.c                # Program with USDT probes
└── scenario7-pyroscope/
    ├── README.md
    ├── requirements.txt
    ├── app.py                  # Flask app with Pyroscope
    └── loadgen.sh              # Load generator script

Quick Start

Build Everything

# Build all C programs
for dir in scenario{3,4,5,6}*/; do
    if [ -f "$dir/Makefile" ]; then
        echo "Building $dir"
        make -C "$dir"
    fi
done

# Build scenario 1 C library
cd scenario1-python-to-c
gcc -O2 -fPIC -shared -o libprime.so prime.c
cd ..

Run a Scenario

Each scenario has its own README with step-by-step instructions. Start with:

cd scenario1-python-to-c
cat README.md

Key Concepts Summary

1. Types of Bottlenecks

Type Symptom Tool
CPU-bound user time is high perf record
Syscall-bound sys time is high strace -c
I/O-bound Low CPU, slow wall time strace, iostat
Memory-bound High cache misses perf stat

2. Profiling Workflow

1. Measure: time ./program
2. Hypothesize: Where is time spent?
3. Profile: perf/strace/cProfile
4. Analyze: Find hot spots
5. Optimize: Fix the bottleneck
6. Verify: Re-measure

3. Tool Selection

Task Tool
Basic timing time
CPU sampling perf record
Hardware counters perf stat
Syscall tracing strace -c
Python profiling cProfile, py-spy
Visualization Flamegraphs
Continuous profiling Pyroscope

Further Learning

Books

  • "Systems Performance" by Brendan Gregg
  • "BPF Performance Tools" by Brendan Gregg

Online Resources

Tools to Explore Later

  • bpftrace - High-level tracing language
  • eBPF - In-kernel programmability
  • gprof - Traditional profiler

Troubleshooting

"perf: command not found"

sudo apt install linux-tools-common linux-tools-$(uname -r)

"Access to performance monitoring operations is limited"

sudo sysctl -w kernel.perf_event_paranoid=1

"py-spy: Permission denied"

Either run as root or use --nonblocking:

sudo py-spy record -o profile.svg -- python3 script.py
# Or:
py-spy record --nonblocking -o profile.svg -- python3 script.py

"No debug symbols"

Recompile with -g:

gcc -O2 -g -o program program.c

Feedback

Found an issue? Have suggestions? Please provide feedback to your instructor!


Workshop materials prepared for BITS Pilani Goa Tools: All libre/open-source software