Jul 11, 2025 3 min read

Scaling to the Moon (Without Melting Your Wallet)

When a client called to say their site was feeling sluggish and asked if we could "add more resources," I gave them the kind of answer you don’t usually hear from cloud engineers:

"Why not just fix the actual problem?"

In a world where AWS makes it easy to spin up a fleet of servers at the click of a button, it’s tempting to throw compute at every hiccup. But that only bloats your infrastructure bill—and hides the real issue. Scaling isn’t just about handling more users; it’s about doing more with less.

Slow and Steady Wins the Performance Race

We didn’t rush in with autoscaling groups, load balancers, and Elasti-whatever. Instead, we spent the next three months chipping away at performance bottlenecks the right way.

Today, the system handles over 6,000 requests per minute without our CPUs even breaking a sweat. But three months ago? Two concurrent users could take the whole thing down.

Here’s what changed—and what we learned along the way.

1. Fix the Code, Don’t Patch the Symptoms

The site loaded in 30 seconds. First stop: the network tab. Once we ruled out frontend issues, we hit the backend.

There, we found a nest of bad SQLAlchemy queries, abstracted to the point of being unreadable—and nearly unfixable. So we scrapped it and wrote a lean, custom dynamic SQL library just for search.

The result? A 20-second query dropped to 0.3 seconds. No infrastructure changes, just cleaner logic.

2. Use Your Database Like You Mean It

Next, we moved all read-heavy search queries to a PostgreSQL replica, taking pressure off the primary database.

We added the right indexes, fine-tuned a few query plans, and watched as load times dropped even further. Your primary DB shouldn't be your reporting engine—use secondaries for what they're good at.

3. Monitor Like a Maniac

To trace the real bottlenecks, we installed Grafana, Prometheus, and Loki. Within days, we found the truth: occasional traffic spikes of 40,000 requests per minute were hammering backend services.

We wouldn’t have known without observability. Metrics gave us the confidence to know what was fixed—not just hope.

4. Learn Your Stack (The Real One)

Sometimes you just need to RTFM.

I became an ingress-nginx whisperer, reading every doc and tuning every setting to squeeze performance from the stack. What looked like a black box became an asset, once I understood how it worked under pressure.

5. Media Matters

One huge slowdown? Massive media files.

I wrote a quick shell script using ffmpeg to:

Download videos
Compress them
Re-upload the smaller versions

Suddenly, page load times improved dramatically. This wasn't about flashy CDN tricks—it was just about using the right size for the job. Many 50MB files went down to 5MB wihtout noticing any quality difference.

6. Build to Scale (But Don’t Start There)

After all the backend optimizations, we introduced rate limiting and autoscalers. But only after the code, database, and infrastructure were humming.

Scaling early is expensive and pointless if the foundation is broken. Engineer just enough to make sense. Scale when the data tells you to.

The Takeaway: Sharpen the Sword

Every traffic spike became a chance to learn. After every rush, we sharpened the sword, tightened the bolts, and prepared for the next wave. We didn’t chase perfection—we pursued measurable improvements.

And that’s the secret:

Scaling isn't about doing more. It's about doing better—consistently, incrementally, and with purpose.

TL;DR – How We Did It

✂️ Clean up code and optimize SQL
🧵 Read from replica DBs to offload pressure
📈 Track metrics so you know what works
📚 Read the manual (especially if it’s ingress-nginx)
🖼️ Resize your media assets
🚀 Only scale services after you optimize

If it feels like you can’t scale without AWS bleeding you dry, pause. Take a step back. Measure. Think.

Then engineer something that makes sense.