257 lines
5.6 KiB
Markdown
257 lines
5.6 KiB
Markdown
# Linux Performance Tools Cheatsheet
|
|
|
|
## Quick Reference Card for Workshop
|
|
|
|
---
|
|
|
|
## time - Basic Timing
|
|
|
|
```bash
|
|
time ./program # Wall clock, user, sys time
|
|
time -v ./program # Verbose (GNU time, may need /usr/bin/time -v)
|
|
```
|
|
|
|
**Output explained:**
|
|
- `real` - Wall clock time (what a stopwatch would show)
|
|
- `user` - CPU time in user space (your code)
|
|
- `sys` - CPU time in kernel space (syscalls)
|
|
|
|
---
|
|
|
|
## perf stat - Hardware Counters
|
|
|
|
```bash
|
|
# Basic stats
|
|
perf stat ./program
|
|
|
|
# Specific events
|
|
perf stat -e cycles,instructions,cache-misses ./program
|
|
|
|
# Repeat for statistical accuracy
|
|
perf stat -r 5 ./program
|
|
|
|
# Common events
|
|
perf stat -e cycles,instructions,cache-references,cache-misses,branches,branch-misses ./program
|
|
```
|
|
|
|
**Key metrics:**
|
|
- **IPC** (Instructions Per Cycle): Higher is better, >1 is good
|
|
- **Cache miss ratio**: Lower is better
|
|
- **Branch misses**: Lower is better
|
|
|
|
---
|
|
|
|
## perf record/report - CPU Sampling
|
|
|
|
```bash
|
|
# Record samples
|
|
perf record ./program
|
|
perf record -g ./program # With call graphs
|
|
perf record -F 99 ./program # Custom frequency (99 Hz)
|
|
|
|
# Analyze
|
|
perf report # Interactive TUI
|
|
perf report --stdio # Text output
|
|
perf report -n --stdio # With sample counts
|
|
```
|
|
|
|
**TUI navigation:**
|
|
- Arrow keys: Navigate
|
|
- Enter: Zoom into function
|
|
- `a`: Annotate (show source/assembly)
|
|
- `q`: Quit/back
|
|
|
|
---
|
|
|
|
## perf annotate - Source-Level View
|
|
|
|
```bash
|
|
# After perf record
|
|
perf annotate function_name # Show hot lines
|
|
perf annotate -s function_name # Source view (needs -g)
|
|
```
|
|
|
|
**Requires:** Compiled with `-g` for source view
|
|
|
|
---
|
|
|
|
## strace - Syscall Tracing
|
|
|
|
```bash
|
|
# Summary of syscalls
|
|
strace -c ./program
|
|
|
|
# Trace specific syscalls
|
|
strace -e read,write ./program
|
|
|
|
# With timing per call
|
|
strace -T ./program
|
|
|
|
# Follow forks
|
|
strace -f ./program
|
|
|
|
# Output to file
|
|
strace -o trace.log ./program
|
|
```
|
|
|
|
**Key columns in `-c` output:**
|
|
- `% time`: Percentage of total syscall time
|
|
- `calls`: Number of times called
|
|
- `errors`: Failed calls
|
|
|
|
---
|
|
|
|
## Flamegraphs
|
|
|
|
```bash
|
|
# Clone FlameGraph repo (one time)
|
|
git clone https://github.com/brendangregg/FlameGraph.git
|
|
|
|
# Generate flamegraph
|
|
perf record -g ./program
|
|
perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > profile.svg
|
|
|
|
# Open in browser
|
|
firefox profile.svg
|
|
```
|
|
|
|
---
|
|
|
|
## Python Profiling
|
|
|
|
```bash
|
|
# cProfile - built-in profiler
|
|
python3 -m cProfile script.py
|
|
python3 -m cProfile -s cumtime script.py # Sort by cumulative time
|
|
python3 -m cProfile -s ncalls script.py # Sort by call count
|
|
python3 -m cProfile -o profile.stats script.py # Save for later analysis
|
|
|
|
# py-spy - sampling profiler (low overhead)
|
|
pip install py-spy
|
|
py-spy record -o profile.svg -- python3 script.py # Flamegraph
|
|
py-spy top -- python3 script.py # Live view
|
|
|
|
# Attach to running process
|
|
py-spy top --pid 12345
|
|
```
|
|
|
|
---
|
|
|
|
## /proc Filesystem
|
|
|
|
```bash
|
|
# Process info
|
|
cat /proc/<pid>/status # Process status
|
|
cat /proc/<pid>/maps # Memory mappings
|
|
cat /proc/<pid>/fd # Open file descriptors
|
|
cat /proc/<pid>/smaps # Detailed memory info
|
|
|
|
# System info
|
|
cat /proc/cpuinfo # CPU details
|
|
cat /proc/meminfo # Memory details
|
|
cat /proc/loadavg # Load average
|
|
```
|
|
|
|
---
|
|
|
|
## htop - System Overview
|
|
|
|
```bash
|
|
htop # Interactive process viewer
|
|
```
|
|
|
|
**Key shortcuts:**
|
|
- `F6`: Sort by column
|
|
- `F5`: Tree view
|
|
- `F9`: Kill process
|
|
- `t`: Toggle tree view
|
|
- `H`: Toggle user threads
|
|
|
|
---
|
|
|
|
## hyperfine - Benchmarking
|
|
|
|
```bash
|
|
# Basic benchmark
|
|
hyperfine './program'
|
|
|
|
# Compare two programs
|
|
hyperfine './program_v1' './program_v2'
|
|
|
|
# With warmup
|
|
hyperfine --warmup 3 './program'
|
|
|
|
# Export results
|
|
hyperfine --export-markdown results.md './program'
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Diagnosis Flowchart
|
|
|
|
```
|
|
Program is slow
|
|
│
|
|
▼
|
|
┌──────────────────┐
|
|
│ time ./program │
|
|
└────────┬─────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────┐
|
|
│ Is 'sys' time high? │
|
|
└─────────┬───────────┬───────────┘
|
|
YES NO
|
|
│ │
|
|
▼ ▼
|
|
┌──────────┐ ┌──────────────┐
|
|
│ strace -c │ │ perf record │
|
|
│ (syscalls)│ │ (CPU profile)│
|
|
└──────────┘ └──────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Permission Issues
|
|
|
|
```bash
|
|
# Allow perf for non-root (temporary)
|
|
sudo sysctl -w kernel.perf_event_paranoid=1
|
|
|
|
# Allow perf for non-root (permanent)
|
|
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf
|
|
|
|
# Run perf as root if needed
|
|
sudo perf record ./program
|
|
```
|
|
|
|
---
|
|
|
|
## Useful One-Liners
|
|
|
|
```bash
|
|
# Top 10 functions by CPU time
|
|
perf report -n --stdio | head -20
|
|
|
|
# Count syscalls by type
|
|
strace -c ./program 2>&1 | tail -20
|
|
|
|
# Watch cache misses in real-time
|
|
perf stat -e cache-misses -I 1000 -p <pid>
|
|
|
|
# Find memory-mapped files for a process
|
|
cat /proc/<pid>/maps | grep -v '\[' | awk '{print $6}' | sort -u
|
|
|
|
# Monitor a Python process
|
|
py-spy top --pid $(pgrep -f 'python.*myapp')
|
|
```
|
|
|
|
---
|
|
|
|
## Resources
|
|
|
|
- **Brendan Gregg's site**: https://www.brendangregg.com/linuxperf.html
|
|
- **perf wiki**: https://perf.wiki.kernel.org/
|
|
- **FlameGraph repo**: https://github.com/brendangregg/FlameGraph
|
|
- **py-spy docs**: https://github.com/benfred/py-spy
|