AI Coding Agents: Why Your Data Science Codebase Smells Sour

We need to talk about AI coding agents. For some reason, the standard advice in the data science world has become “just prompt it and ship it,” and it’s killing the long-term health of our codebases.

I’ve spent 14+ years wrestling with WordPress and WooCommerce sites that were “built by a guy who knew a guy.” Today, that guy is an LLM. While AI coding agents can spit out perfect syntax in seconds, they have zero intuition for architecture. Consequently, if you don’t know how to steer the ship, you’re just accelerating toward a technical debt iceberg.

The Shift From Writer to Reviewer

The manual labor of writing code has been offloaded. Therefore, your primary responsibility has shifted from writing code to reviewing it. You’ve effectively become a senior developer guiding a very fast, very obedient, but often misguided junior (the AI). If you want to remain relevant, you need the structural intuition of an architect.

In my experience, AI coding agents are prone to two specific “code smells” that can turn a simple script into a maintenance nightmare. Specifically, we’re talking about Divergent Change and Speculative Generality.

1. Divergent Change: The “Do-It-All” Class

Divergent change happens when a single class or module is doing too many things at once. In software engineering, we call this a violation of the Single Responsibility Principle. Furthermore, if you ask an agent to “add functionality to handle X,” it will usually cram that code into your existing class without a second thought.

Consider this common (and smelly) ML pipeline example:

class ModelPipeline:
    def __init__(self, data_path):
        self.data_path = data_path

    def load_from_s3(self):
        print(f"Connecting to S3 to get {self.data_path}")
        return "raw_data"

    def clean_txn_data(self, data):
        print("Cleaning specific transaction JSON format")
        return "cleaned_data"

    def train_xgboost(self, data):
        print("Running XGBoost trainer")
        return "model"

This class is wearing three hats: infrastructure (S3), data engineering (cleaning), and ML research (training). Consequently, it has three times the reasons to break. Every time the S3 bucket permissions change or you want to try a new model, you’re touching this same file. That’s a recipe for bugs.

The Refactor: Decoupling for AI Coding Agents

To fix this, we need to separate the concerns. This makes the code easier for you—and your AI coding agents—to manage. Check out the cleaner, “contract-based” approach below:

class S3DataLoader:
    def load(self, path):
        # Only handles S3 logic
        return "raw_data"

class TransactionsCleaner:
    def clean(self, data):
        # Only handles domain cleaning
        return "cleaned_data"

class XGBoostTrainer:
    def train(self, data):
        # Only handles the model
        return "model"

class ModelPipeline:
    def __init__(self, loader, cleaner, trainer):
        self.loader = loader
        self.cleaner = cleaner
        self.trainer = trainer

    def run(self, path):
        data = self.loader.load(path)
        cleaned = self.cleaner.clean(data)
        return self.trainer.train(cleaned)

Now, if you want to swap S3 for Azure, you just drop in a new loader. The orchestrator doesn’t care. This is how you build production-grade code that doesn’t collapse under its own weight.

2. Speculative Generality: The YAGNI Trap

While Divergent Change happens in old code, Speculative Generality happens at the start. It’s when you (or the AI) try to future-proof a project by guessing features you might need. I call this the “monster project” smell. Specifically, AI coding agents love this because a vague prompt like “make it scalable” will result in hundreds of lines of useless abstract classes.

The mantra here is YAGNI (You Ain’t Gonna Need It). In contrast to building a “universal model trainer,” start with the simplest thing that works. Organic growth is always more stable than speculative architecture. If you’re struggling with AI hallucinations during this phase, you might want to check my guide on AI coding agent context.

Context is the Real North Star

Software engineering is rarely about “correct” code; it’s about context. An agent doesn’t know if you’re building a throwaway MVP or a multi-million dollar revenue engine. Therefore, your value as a data scientist isn’t in your ability to prompt—it’s in your structural intuition. You have to know when to refactor and when to keep it simple.

For more on this, I’ve written about the cost of vibe coding and how it impacts site stability.

Look, if this AI coding agents stuff is eating up your dev hours, let me handle it. I’ve been wrestling with WordPress since the 4.x days, and I know how to keep a codebase clean.

The Senior Dev Takeaway

Don’t be a data scientist who blindly copy-pastes. Instead, master the concepts of code smells and abstraction. Specifically, use AI coding agents to handle the boilerplate, but you must remain the architect. If you don’t steer the ship, the AI will happily drive you into a wall of unmaintainable legacy code. Ship it, but ship it right.

” queries:[
author avatar
Ahmad Wael
I'm a WordPress and WooCommerce developer with 15+ years of experience building custom e-commerce solutions and plugins. I specialize in PHP development, following WordPress coding standards to deliver clean, maintainable code. Currently, I'm exploring AI and e-commerce by building multi-agent systems and SaaS products that integrate technologies like Google Gemini API with WordPress platforms, approaching every project with a commitment to performance, security, and exceptional user experience.

Leave a Comment