Scenario 1: Python to C Optimization

Learning Objectives

Use time to measure execution time
Profile Python code with cProfile
Generate flamegraphs with py-spy
Use ctypes to call C code from Python
Understand ctypes call overhead and when to move loops to C

Files

prime_slow.py - Pure Python implementation (slow)
prime.c - C implementation of the hot function
prime_fastish.py - Python loop calling C is_prime via ctypes
prime_fast.py - Entire counting loop in C via ctypes

Exercises

Step 1: Measure the baseline

time python3 prime_slow.py

Note the real time (wall clock) and user time (CPU time in user space).

Step 2: Profile with cProfile

python3 -m cProfile -s cumtime prime_slow.py

Look for:

Which function has the highest cumtime (cumulative time)?
How many times (ncalls) is is_prime called?

Step 3: Generate a flamegraph

py-spy record -o prime_slow.svg -- python3 prime_slow.py

Open prime_slow.svg in a browser. The widest bar at the top shows where time is spent.

Step 4: Compile and run the partially optimized version

# Compile the C library
gcc -O2 -fPIC -shared -o libprime.so prime.c

# Run the "fastish" version (C is_prime, Python loop)
time python3 prime_fastish.py

This version calls the C is_prime function, but the counting loop is still in Python.

Step 5: Run the fully optimized version

time python3 prime_fast.py

This version calls count_primes entirely in C - no Python loop.

Step 6: Compare

How much faster is prime_fastish.py vs prime_slow.py?
How much faster is prime_fast.py vs prime_fastish.py?
Generate flamegraphs for both - what's different?

Discussion Questions

Why is the C version faster? (Hint: interpreter overhead, type checking)
Why is prime_fast.py faster than prime_fastish.py? (Hint: ctypes call overhead)
When is it worth rewriting in C vs. finding a library?
What's the trade-off of using ctypes vs native Python?

2.0 KiB Raw Blame History