A Practical Guide to WordPress Regex That Won’t Break Your Site

Got a call from a client last week. They’d just moved their big WooCommerce site to SSL, did the whole search-and-replace for the new URL, but they were still getting hammered with mixed-content warnings. A bunch of images and old links inside posts were still pointing to http. Turns out, the standard plugins missed a ton of URLs hardcoded inside old shortcodes and inline styles. Total mess. And it’s a classic case where you have to roll up your sleeves and use some WordPress regex.

Let’s be clear: regular expressions look like a cat walked across your keyboard. They can be a nightmare. But for problems like this, they’re the only tool for the job. You can’t manually fix 5,000 posts, and a blunt search-and-replace will break serialized data faster than you can say “fatal error.”

The Obvious Fix That Broke Everything

My first thought was, “No problem, five-minute fix.” I jumped into the database and cooked up a quick `preg_replace` function to find image sources. My pattern was something like /src="http:\/\/their-domain.com([^"]*)"/. Simple enough, right? Find the `src` attribute and grab everything until the next double quote.

And yeah, that worked… until it didn’t. I ran it on a test post, and it absolutely mangled the content. It choked on single quotes, ignored attributes that had extra spaces, and completely fell apart on some weirdly formatted captions from 2010. It was a good reminder: when it comes to regex, the “quick and dirty” solution is usually just dirty. Trust me on this.

Doing WordPress Regex the Right Way

The real fix had to be more specific. Instead of a broad, hopeful pattern, you need to account for the variations. You have to handle both single and double quotes and make sure you’re only changing your *actual* domain, not some other external link that happens to be http. This is the kind of pattern that actually works, without destroying the content.

function replace_hardcoded_http_urls($content) {
    // This pattern looks for src or href attributes, handles ' or " quotes,
    // and only targets your specific old domain. Much safer.
    $pattern = '/(src|href)=([\'"])(http:\/\/your-old-domain\.com)(.*?)\2/i';
    
    // The replacement uses backreferences ($1, $2, $4) to rebuild the attribute correctly.
    $replacement = '$1=$2https://your-old-domain.com$4$2';
    
    return preg_replace($pattern, $replacement, $content);
}

See the difference? We’re capturing the attribute type (`src` or `href`), the quote type, the old URL, and the path separately. Then we rebuild it properly with `https`. It’s precise, and it doesn’t make dangerous assumptions. For a deep dive into every symbol, a site I stumbled on, carlalexander.ca, has a very thorough guide, but this practical approach is what you need in the trenches.

So, What’s the Point?

Regular expressions are a scalpel, not a hammer. A poorly written pattern can cause more damage than the problem you’re trying to fix. The key is to be explicit:

  • Always account for variations, like single vs. double quotes.
  • Be as specific as possible. Don’t just match `http://`, match `http://your-domain.com`.
  • Test. For the love of God, test on a staging site before you even think about running it on a live database.

Look, this stuff gets complicated fast. If you’re tired of debugging someone else’s mess and just want your site to work, drop my team a line. We’ve probably seen it before.

Leave a Reply

Your email address will not be published. Required fields are marked *