perf-workshop/README.md
illustris 4fb1bd90db
init
2026-01-08 18:11:30 +05:30

6.4 KiB

Linux Performance Engineering Workshop

4-Hour Hands-On Training for BITS Pilani Goa

Prerequisites

  • Basic C programming knowledge
  • Basic Python knowledge
  • Familiarity with command line
  • Ubuntu 22.04/24.04 (or similar Linux)

Workshop Overview

This workshop teaches practical performance engineering skills using libre tools on Linux. By the end, you'll be able to identify and fix common performance problems.

What You'll Learn

  • How to measure program performance (not guess!)
  • CPU profiling with perf and flamegraphs
  • Identifying syscall overhead with strace
  • Understanding cache behavior
  • Continuous profiling for production systems

Philosophy

"Measure, don't guess."

Most performance intuitions are wrong. This workshop teaches you to find bottlenecks with data.


Schedule

Time Topic Hands-On
0:00-0:45 Introduction & Theory -
0:45-1:30 Python Profiling Scenarios 1 & 2
1:30-1:45 Break -
1:45-2:30 perf & Flamegraphs Theory + Demo
2:30-3:30 Cache & Debug Symbols Scenarios 4 & 5
3:30-4:00 Lunch Break -
4:00-4:30 Syscalls & I/O Theory
4:30-5:15 Syscall Profiling Scenario 3
5:15-5:30 Break -
5:30-6:00 Advanced Topics & Wrap-up Scenarios 6 & 7

Setup Instructions

Install Required Packages

# Core tools
sudo apt update
sudo apt install -y \
    build-essential \
    linux-tools-common \
    linux-tools-$(uname -r) \
    strace \
    ltrace \
    htop \
    python3-pip

# Optional but recommended
sudo apt install -y \
    hyperfine \
    valgrind \
    systemtap-sdt-dev

# Python tools
pip3 install py-spy

# Pyroscope (for scenario 7)
# Option A: Docker
docker pull grafana/pyroscope
# Option B: Download binary from https://github.com/grafana/pyroscope/releases

# FlameGraph scripts
cd ~
git clone https://github.com/brendangregg/FlameGraph.git

Configure perf Permissions

# Allow perf for non-root users (needed for this workshop)
sudo sysctl -w kernel.perf_event_paranoid=1

# To make permanent:
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf

Verify Installation

# Should all work without errors:
perf --version
strace --version
py-spy --version
gcc --version
python3 --version

Directory Structure

perf-workshop/
├── README.md                    # This file
├── common/
│   └── CHEATSHEET.md           # Quick reference card
├── scenario1-python-to-c/
│   ├── README.md
│   ├── prime_slow.py           # Slow Python version
│   ├── prime.c                 # C implementation
│   └── prime_fast.py           # Python + C via ctypes
├── scenario2-memoization/
│   ├── README.md
│   ├── fib_slow.py             # Naive recursive Fibonacci
│   ├── fib_cached.py           # Memoized Fibonacci
│   └── config_validator.py     # Precomputation example
├── scenario3-syscall-storm/
│   ├── README.md
│   ├── Makefile
│   ├── read_slow.c             # Byte-by-byte reads
│   ├── read_fast.c             # Buffered reads
│   ├── read_stdio.c            # stdio buffering
│   └── read_python.py          # Python equivalent
├── scenario4-cache-misses/
│   ├── README.md
│   ├── Makefile
│   ├── cache_demo.c            # Row vs column major
│   └── list_vs_array.c         # Array vs linked list
├── scenario5-debug-symbols/
│   ├── README.md
│   ├── Makefile
│   └── program.c               # Multi-function program
├── scenario6-usdt-probes/
│   ├── README.md
│   ├── Makefile
│   └── server.c                # Program with USDT probes
└── scenario7-pyroscope/
    ├── README.md
    ├── requirements.txt
    ├── app.py                  # Flask app with Pyroscope
    └── loadgen.sh              # Load generator script

Quick Start

Build Everything

# Build all C programs
for dir in scenario{3,4,5,6}*/; do
    if [ -f "$dir/Makefile" ]; then
        echo "Building $dir"
        make -C "$dir"
    fi
done

# Build scenario 1 C library
cd scenario1-python-to-c
gcc -O2 -fPIC -shared -o libprime.so prime.c
cd ..

Run a Scenario

Each scenario has its own README with step-by-step instructions. Start with:

cd scenario1-python-to-c
cat README.md

Key Concepts Summary

1. Types of Bottlenecks

Type Symptom Tool
CPU-bound user time is high perf record
Syscall-bound sys time is high strace -c
I/O-bound Low CPU, slow wall time strace, iostat
Memory-bound High cache misses perf stat

2. Profiling Workflow

1. Measure: time ./program
2. Hypothesize: Where is time spent?
3. Profile: perf/strace/cProfile
4. Analyze: Find hot spots
5. Optimize: Fix the bottleneck
6. Verify: Re-measure

3. Tool Selection

Task Tool
Basic timing time
CPU sampling perf record
Hardware counters perf stat
Syscall tracing strace -c
Python profiling cProfile, py-spy
Visualization Flamegraphs
Continuous profiling Pyroscope

Further Learning

Books

  • "Systems Performance" by Brendan Gregg
  • "BPF Performance Tools" by Brendan Gregg

Online Resources

Tools to Explore Later

  • bpftrace - High-level tracing language
  • eBPF - In-kernel programmability
  • Valgrind - Memory profiling
  • gprof - Traditional profiler

Troubleshooting

"perf: command not found"

sudo apt install linux-tools-common linux-tools-$(uname -r)

"Access to performance monitoring operations is limited"

sudo sysctl -w kernel.perf_event_paranoid=1

"py-spy: Permission denied"

Either run as root or use --nonblocking:

sudo py-spy record -o profile.svg -- python3 script.py
# Or:
py-spy record --nonblocking -o profile.svg -- python3 script.py

"No debug symbols"

Recompile with -g:

gcc -O2 -g -o program program.c

Feedback

Found an issue? Have suggestions? Please provide feedback to your instructor!


Workshop materials prepared for BITS Pilani Goa Tools: All libre/open-source software