# Scenario 5: Debug Symbols - The Missing Map ## Learning Objectives - Understand what debug symbols are and why they matter - Compare profiling output with and without symbols - Use `perf annotate` to see source-level hotspots - Understand the trade-offs of shipping debug symbols ## Background When you compile C code, the compiler translates your source into machine code. By default, the connection between source lines and machine instructions is lost. **Debug symbols** (enabled with `-g`) preserve this mapping: - Function names - Source file names and line numbers - Variable names and types - Inline function information ## Files - `program.c` - A program with nested function calls - `Makefile` - Builds `nodebug` and `withdebug` versions ## Exercise 1: Build Both Versions ```bash make all ``` Compare file sizes: ```bash make sizes ``` The `withdebug` binary is larger due to DWARF debug sections. ## Exercise 2: Profile Without Debug Symbols ```bash perf record ./nodebug 500 5000 perf report ``` You'll see something like: ``` 45.23% nodebug nodebug [.] 0x0000000000001234 32.11% nodebug nodebug [.] 0x0000000000001456 12.45% nodebug libm.so [.] __sin_fma ``` The hex addresses (`0x...`) tell you nothing useful! ## Exercise 3: Profile With Debug Symbols ```bash perf record ./withdebug 500 5000 perf report ``` Now you see: ``` 45.23% withdebug withdebug [.] compute_inner 32.11% withdebug withdebug [.] compute_middle 12.45% withdebug libm.so [.] __sin_fma ``` Much better! You can see which functions are hot. ## Exercise 4: Source-Level Annotation With debug symbols, you can see exactly which lines are hot: ```bash perf record ./withdebug 500 5000 perf annotate --stdio -l compute_inner ``` The `-l` flag shows a summary of hot source lines at the top: ``` Sorted summary for file .../withdebug ---------------------------------------------- 72.75 program.c:29 27.25 program.c:28 ``` This tells you line 29 (`result = result * 1.0001 + 0.0001`) consumed ~73% of cycles! Below the summary, you see interleaved source and assembly: ``` Percent | Source code & Disassembly of withdebug -------------------------------------------------- : 9 double compute_inner(double x, int iterations) { : 10 double result = x; 27.25 : 130f: addsd %xmm0,%xmm1 // program.c:28 27.67 : 1313: mulsd ... // program.c:29 ``` **Note on line numbers**: The left margin (9, 10, etc.) shows objdump output line numbers, NOT source file lines. The actual source lines are shown in the summary and in `// program.c:XX` comments on assembly lines. ## Exercise 5: Understanding Symbol Tables ```bash # Show symbols in each binary make symbols # Or manually: nm nodebug | head -20 nm withdebug | head -20 # Show more detail about sections readelf -S withdebug | grep debug ``` Debug sections you might see: - `.debug_info` - Type information - `.debug_line` - Line number tables - `.debug_str` - String table - `.debug_abbrev` - Abbreviation tables ## Exercise 6: Stripping Symbols Production binaries are often "stripped" to reduce size: ```bash make stripped ls -l withdebug stripped # Try to profile the stripped binary perf record ./stripped 500 5000 perf report ``` The stripped binary loses symbol information too! ## Exercise 7: Separate Debug Files In production, you can ship stripped binaries but keep debug info separate: ```bash # Extract debug info to separate file objcopy --only-keep-debug withdebug withdebug.debug # Strip the binary strip withdebug -o withdebug.stripped # Add a link to the debug file objcopy --add-gnu-debuglink=withdebug.debug withdebug.stripped # Now perf can find the debug info perf record ./withdebug.stripped 500 5000 perf report ``` This is how Linux distros provide `-dbg` or `-debuginfo` packages. ## Discussion Questions 1. **Why don't we always compile with `-g`?** - Binary size (can be 2-10x larger) - Exposes source structure (security/IP concerns) - Some optimizations may be affected (though `-O2 -g` works well) 2. **Does `-g` affect performance?** - Generally no: debug info is stored in separate sections - Not loaded unless a debugger attaches - Some edge cases with frame pointers 3. **What about release vs debug builds?** - Debug build: `-O0 -g` (no optimization, full debug) - Release build: `-O2 -g` (optimized, with symbols) - Stripped release: `-O2` then `strip` ## Key Takeaways 1. **Always compile with `-g` during development** 2. **Debug symbols don't meaningfully affect runtime performance** 3. **Without symbols, profilers show useless hex addresses** 4. **Production: ship stripped binaries, keep debug files for crash analysis** ## Bonus: Flamegraph Generation ```bash # Record with call graph perf record -g ./withdebug 500 5000 # Generate flamegraph (requires FlameGraph scripts) perf script | /path/to/FlameGraph/stackcollapse-perf.pl | /path/to/FlameGraph/flamegraph.pl > profile.svg ```