illustris/perf-workshop

illustris 4fb1bd90db

init

2026-01-08 18:11:30 +05:30

1.5 KiB

Raw Blame History

Scenario 1: Python to C Optimization

Learning Objectives

Use time to measure execution time
Profile Python code with cProfile
Generate flamegraphs with py-spy
Use ctypes to call C code from Python

Files

prime_slow.py - Pure Python implementation (slow)
prime.c - C implementation of the hot function
prime_fast.py - Python calling C via ctypes

Exercises

Step 1: Measure the baseline

time python3 prime_slow.py 100000

Note the real time (wall clock) and user time (CPU time in user space).

Step 2: Profile with cProfile

python3 -m cProfile -s cumtime prime_slow.py 100000

Look for:

Which function has the highest cumtime (cumulative time)?
How many times (ncalls) is is_prime called?

Step 3: Generate a flamegraph

py-spy record -o prime_slow.svg -- python3 prime_slow.py

Open prime_slow.svg in a browser. The widest bar at the top shows where time is spent.

Step 4: Compile and run the optimized version

# Compile the C library
gcc -O2 -fPIC -shared -o libprime.so prime.c

# Run the fast version
time python3 prime_fast.py 100000

Step 5: Compare

How much faster is the C version?
Generate a flamegraph for prime_fast.py - what's different?

Discussion Questions

Why is the C version faster? (Hint: interpreter overhead, type checking)
When is it worth rewriting in C vs. finding a library?
What's the trade-off of using ctypes vs native Python?