perf-workshop/common/CHEATSHEET.md
illustris 4fb1bd90db
init
2026-01-08 18:11:30 +05:30

5.6 KiB

Linux Performance Tools Cheatsheet

Quick Reference Card for Workshop


time - Basic Timing

time ./program           # Wall clock, user, sys time
time -v ./program        # Verbose (GNU time, may need /usr/bin/time -v)

Output explained:

  • real - Wall clock time (what a stopwatch would show)
  • user - CPU time in user space (your code)
  • sys - CPU time in kernel space (syscalls)

perf stat - Hardware Counters

# Basic stats
perf stat ./program

# Specific events
perf stat -e cycles,instructions,cache-misses ./program

# Repeat for statistical accuracy
perf stat -r 5 ./program

# Common events
perf stat -e cycles,instructions,cache-references,cache-misses,branches,branch-misses ./program

Key metrics:

  • IPC (Instructions Per Cycle): Higher is better, >1 is good
  • Cache miss ratio: Lower is better
  • Branch misses: Lower is better

perf record/report - CPU Sampling

# Record samples
perf record ./program
perf record -g ./program              # With call graphs
perf record -F 99 ./program           # Custom frequency (99 Hz)

# Analyze
perf report                           # Interactive TUI
perf report --stdio                   # Text output
perf report -n --stdio                # With sample counts

TUI navigation:

  • Arrow keys: Navigate
  • Enter: Zoom into function
  • a: Annotate (show source/assembly)
  • q: Quit/back

perf annotate - Source-Level View

# After perf record
perf annotate function_name           # Show hot lines
perf annotate -s function_name        # Source view (needs -g)

Requires: Compiled with -g for source view


strace - Syscall Tracing

# Summary of syscalls
strace -c ./program

# Trace specific syscalls
strace -e read,write ./program

# With timing per call
strace -T ./program

# Follow forks
strace -f ./program

# Output to file
strace -o trace.log ./program

Key columns in -c output:

  • % time: Percentage of total syscall time
  • calls: Number of times called
  • errors: Failed calls

Flamegraphs

# Clone FlameGraph repo (one time)
git clone https://github.com/brendangregg/FlameGraph.git

# Generate flamegraph
perf record -g ./program
perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > profile.svg

# Open in browser
firefox profile.svg

Python Profiling

# cProfile - built-in profiler
python3 -m cProfile script.py
python3 -m cProfile -s cumtime script.py     # Sort by cumulative time
python3 -m cProfile -s ncalls script.py      # Sort by call count
python3 -m cProfile -o profile.stats script.py  # Save for later analysis

# py-spy - sampling profiler (low overhead)
pip install py-spy
py-spy record -o profile.svg -- python3 script.py   # Flamegraph
py-spy top -- python3 script.py                     # Live view

# Attach to running process
py-spy top --pid 12345

/proc Filesystem

# Process info
cat /proc/<pid>/status          # Process status
cat /proc/<pid>/maps            # Memory mappings
cat /proc/<pid>/fd              # Open file descriptors
cat /proc/<pid>/smaps           # Detailed memory info

# System info
cat /proc/cpuinfo               # CPU details
cat /proc/meminfo               # Memory details
cat /proc/loadavg               # Load average

htop - System Overview

htop                            # Interactive process viewer

Key shortcuts:

  • F6: Sort by column
  • F5: Tree view
  • F9: Kill process
  • t: Toggle tree view
  • H: Toggle user threads

hyperfine - Benchmarking

# Basic benchmark
hyperfine './program'

# Compare two programs
hyperfine './program_v1' './program_v2'

# With warmup
hyperfine --warmup 3 './program'

# Export results
hyperfine --export-markdown results.md './program'

Quick Diagnosis Flowchart

Program is slow
       │
       ▼
┌──────────────────┐
│ time ./program   │
└────────┬─────────┘
         │
         ▼
   ┌─────────────────────────────────┐
   │ Is 'sys' time high?              │
   └─────────┬───────────┬───────────┘
           YES           NO
             │             │
             ▼             ▼
      ┌──────────┐  ┌──────────────┐
      │ strace -c │  │ perf record  │
      │ (syscalls)│  │ (CPU profile)│
      └──────────┘  └──────────────┘

Permission Issues

# Allow perf for non-root (temporary)
sudo sysctl -w kernel.perf_event_paranoid=1

# Allow perf for non-root (permanent)
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf

# Run perf as root if needed
sudo perf record ./program

Useful One-Liners

# Top 10 functions by CPU time
perf report -n --stdio | head -20

# Count syscalls by type
strace -c ./program 2>&1 | tail -20

# Watch cache misses in real-time
perf stat -e cache-misses -I 1000 -p <pid>

# Find memory-mapped files for a process
cat /proc/<pid>/maps | grep -v '\[' | awk '{print $6}' | sort -u

# Monitor a Python process
py-spy top --pid $(pgrep -f 'python.*myapp')

Resources