Scaling ML Inference: Liquid vs. Partitioned Databricks

We need to talk about scaling ML inference. I’ve seen too many “beefy” clusters sitting idle because the data layout is a mess. Just last week, I was looking at a 420-core cluster that spent nearly 10 hours processing a mere 18 partitions. If you’re a developer or a business owner paying those Databricks bills, that should make your stomach turn. It’s not a compute problem; it’s a data engineering failure.

The standard advice for scaling ML inference usually starts and ends with “add more workers.” But if your data is skewed—meaning one product or category has 100x more rows than another—Spark’s default partitioning will kill your performance. One executor will be sweating over 50 million rows while the other 419 cores are literally doing nothing. That is a bottleneck that no amount of RAM can fix.

The Partitioning Trap: Why AQE Isn’t Enough

Most devs rely on Spark’s Adaptive Query Execution (AQE) to handle the heavy lifting. Don’t get me wrong, AQE is great for standard SQL queries, but it wasn’t built to optimize for model inference runtimes. In a partitioned table without salt, you often end up with “fat” files. For instance, Product D in our case study accounted for nearly 80% of the 550M rows. Without a strategy to break that up, you’re stuck with long-running, skewed tasks that block the entire pipeline.

If you’re interested in how these data structures impact the broader ecosystem, check out my thoughts on modern data stack consolidation. Understanding how data flows into your applications is critical for performance.

Option 1: Adding Salt to Your Datasets

Salting is a classic hack to force Spark to distribute data more evenly. By appending a random “salt” key to your high-cardinality columns, you can break one massive partition into hundreds of smaller, manageable chunks. This allows you to saturate your cluster properly.

# bbioon_dynamic_salting_example
from pyspark.sql import functions as F

# Calculate percentages to determine bucket counts
total_count = df.count()
product_percents = df.groupBy("ProductLine").count().withColumn(
   "percent", F.col("count") / F.lit(total_count)
)

# Set min/max buckets for parallelism
min_buckets = 10
max_buckets = 1160

# Generate the salt based on product volume
df = df.withColumn("salt", (F.rand(seed=42) * F.lit(max_buckets)).cast("int"))

# Repartition to unlock parallelism
df_final = df.repartition(1200, "ProductLine", "salt").drop("salt")

When we applied this “Salty and Partitioned” approach, the runtime for Product D dropped from “never-ending” to about 3 hours. By enforcing a 1 million row limit per partition (maxRecordsPerFile), we ensured that no single executor was overwhelmed.

Option 2: The Liquid Clustering Fix

Databricks recently introduced Liquid Clustering, and it’s a game-changer for scaling ML inference. Traditional partitioning is rigid; if you partition by date and product, you’re stuck with that folder structure. Liquid clustering, however, organizes data based on clustering keys without the rigid folder hierarchy. It’s essentially a more flexible, automated version of Z-Ordering.

As per the official Databricks documentation, liquid clustering adapts to changing query patterns. When you combine this with salting, you get the best of both worlds: extreme parallelism and intelligent data skipping.

Salty + Liquid: The Winning Combo

In our tests, the “Salty and Liquid” scenario showed the most stable task distribution. The runtime window for tasks was much tighter, meaning fewer outliers. If you’re running hundreds of models in an ensemble, you can’t afford a “long tail” of tasks taking 2x longer than the rest. Liquid clustering preserves data locality, which is vital when you’re filtering for specific products or dates before running inference.

This reminds me of a post I wrote about why your environment dictates ML success. Your data layout *is* your environment. If the layout is bad, the ML pipeline will fail, no matter how good your models are.

Pragmatic Takeaways for Scaling ML Inference

  • Don’t trust defaults: Default partitioning in Spark is a performance killer for skewed datasets.
  • Salt your keys: If one product has 80% of your data, salt it. Force that data across more cores.
  • Use Liquid Clustering: It’s more resilient than traditional partitioning and handles data growth without requiring a total refactor.
  • Monitor the Spark UI: If you see 10 tasks taking 5 hours and 400 tasks taking 2 minutes, you have a skew problem.

Look, if this Scaling ML Inference stuff is eating up your dev hours, let me handle it. I’ve been wrestling with WordPress and high-performance backend systems since the 4.x days.

Ship It or Fix It?

At the end of the day, scaling ML inference on Databricks is about maximizing your cluster utilization. If you’re paying for 420 cores, you better make sure 420 cores are working. Salting gives you the control you need, and Liquid Clustering provides the flexibility to grow. Stop letting skewed data burn your budget.

author avatar
Ahmad Wael
I'm a WordPress and WooCommerce developer with 15+ years of experience building custom e-commerce solutions and plugins. I specialize in PHP development, following WordPress coding standards to deliver clean, maintainable code. Currently, I'm exploring AI and e-commerce by building multi-agent systems and SaaS products that integrate technologies like Google Gemini API with WordPress platforms, approaching every project with a commitment to performance, security, and exceptional user experience.

Leave a Comment