Py-Spy Python Profiling: Stop Guessing and Fix Slow Code

We need to talk about performance. For some reason, the standard advice for handling large datasets in Python has become “just write the logic and let it run,” but that mindset is killing your application’s responsiveness. I honestly don’t care if you are building a custom WooCommerce integration or a standalone data scraper; if your code is slow, it is effectively broken. Most developers spend hours staring at logs when they should be using Py-Spy Python Profiling to pinpoint the actual bottleneck.

In my 14 years of wrestling with complex systems, I’ve seen too many projects fail because of “silent” bottlenecks—code that produces the right result but takes three minutes to do a job that should take half a second. If you’ve read my previous thoughts on WordPress Core Performance, you know I have zero patience for inefficient scripts that hog server resources.

The Problem: Naive Loops and High Overhead

Specifically, the most common trap I see is the iterrows() function in Pandas. It looks clean, but it is notoriously slow because it creates a new Series object for every single row. Let’s look at a “war story” example: calculating the Haversine distance for 3.5 million flight records. Specifically, a naive approach might look like this:

# The Slow Approach: Using iterrows()
haversine_dists = []
for i, row in flights_df.iterrows():
    haversine_dists.append(haversine(
        lat_1=row["LATITUDE_ORIGIN"],
        lon_1=row["LONGITUDE_ORIGIN"],
        lat_2=row["LATITUDE_DEST"],
        lon_2=row["LONGITUDE_DEST"]
    ))
flights_df["Distance"] = haversine_dists

On a dataset of this size, this script takes nearly 170 seconds. That is nearly three minutes of idling. In a production environment, that’s a race condition waiting to happen or a timeout that crashes your worker process.

Why Py-Spy Python Profiling?

Furthermore, traditional tracers like cProfile often add so much overhead that they skew the results. Py-Spy Python Profiling is different. It is a sampling profiler. It sits outside your script and takes snapshots of the call stack 100 times a second. Consequently, it doesn’t slow down your program, and it gives you a brutal, honest look at where the CPU is actually spending its time.

To get started, you just install it via pip and run the recorder:

pip install py-spy
py-spy record -o profile.svg -r 100 -- python main.py

When you open the resulting SVG in a browser, you’ll likely see a massive “icicle graph.” If a bar representing iterrows() takes up 70% of the width, that is your target for refactoring.

The Fix: Vectorization Over Iteration

Once you’ve identified the bottleneck using Py-Spy Python Profiling, the solution is almost always to move away from Python-level loops and toward C-level vectorized operations using NumPy. Therefore, we refactor the logic to handle arrays instead of individual coordinates.

# The Optimized Approach: Vectorized NumPy
import numpy as np

def haversine_vectorized(lat_1, lon_1, lat_2, lon_2):
    lat_1_rad, lon_1_rad = np.radians(lat_1), np.radians(lon_1)
    lat_2_rad, lon_2_rad = np.radians(lat_2), np.radians(lon_2)
    
    # Logic remains the same, but operates on arrays
    delta_lat = lat_2_rad - lat_1_rad
    # ... calculation ...
    return result

flights_df["Distance"] = haversine_vectorized(
    flights_df["LATITUDE_ORIGIN"], 
    flights_df["LONGITUDE_ORIGIN"], 
    flights_df["LATITUDE_DEST"], 
    flights_df["LONGITUDE_DEST"]
)

The result? The execution time drops from 169 seconds to 0.56 seconds. That is a 300x performance increase. If you want more details on the internals, check the official Py-Spy documentation on GitHub.

Look, if this performance optimization stuff is eating up your dev hours, let me handle it. I’ve been wrestling with WordPress and high-load backend scripts since the 4.x days.

Takeaway: Stop Guessing

Stop assuming you know where the lag is coming from. Use Py-Spy Python Profiling to get hard data. Whether you are dealing with a slow WordPress build or a massive data migration, the workflow is always the same: profile, identify the widest bar in the graph, and refactor for efficiency. Ship it.

Ahmad Wael

I'm a WordPress and WooCommerce developer with 15+ years of experience building custom e-commerce solutions and plugins. I specialize in PHP development, following WordPress coding standards to deliver clean, maintainable code. Currently, I'm exploring AI and e-commerce by building multi-agent systems and SaaS products that integrate technologies like Google Gemini API with WordPress platforms, approaching every project with a commitment to performance, security, and exceptional user experience.

See Full Bio