Your Database Is One Bad Query Away From Being Murdered By Linux

You deployed PostgreSQL on Linux. You configured it perfectly. You monitored it. You felt safe.

Then at 3 AM on a Tuesday, your database vanished. No crash log. No graceful shutdown. No warning. Just… gone.

The process that killed it wasn’t a hacker, wasn’t a bug, and wasn’t even your code. It was the operating system itself — specifically, a Linux subsystem called the OOM Killer, whose entire job is to play God when memory runs low. And it doesn’t play fair.

The OOM Killer doesn’t care that your database is mission-critical. It only cares about the math. And the math is rigged against you.

Here’s what happens: Linux, by default, lets applications request far more memory than physically exists. It’s called “memory overcommit,” and it’s an optimistic lie. The kernel bets that most applications will ask for memory they never actually use. Usually, the bet pays off. But when it doesn’t — when too many applications actually try to USE the memory they requested — the kernel panics and starts shooting processes.

And PostgreSQL, with its large shared memory segments and multiple worker processes, looks like a very appetizing target.

One developer described it perfectly in the comments of Ubicloud’s recent analysis: they had a Go backend and PostgreSQL co-located on the same machine. The Go app allocated enormous amounts of virtual memory. PostgreSQL ran a big query. The OOM Killer woke up, looked around, and executed PostgreSQL. Production down. Customers furious. Engineer bewildered.

This isn’t a rare edge case. It’s the default behavior.

Linux’s default memory policy is like a bank that approves every loan application and then, when the vault runs dry, forecloses on whichever customer has the biggest mortgage. It’s not malicious — it’s just brutally, blindly mechanical.

The fix is switching Linux’s overcommit mode from 0 (heuristic — the default) to 2 (strict). In strict mode, the kernel refuses to let applications commit more memory than the system can actually back. No more optimistic lies. No more promises the system can’t keep.

But here’s the twist nobody warns you about: enabling strict overcommit will break applications that have been lying about their memory usage for years.

And that’s the real problem. Not the OOM Killer. Not PostgreSQL. The real problem is a developer culture that treats virtual memory allocation like an all-you-can-eat buffet. Many programs commit 2x the memory they actually use — one commenter reported seeing 32GB committed against 16GB resident. That’s not a memory strategy. That’s wishful thinking compiled into production.

Strict overcommit doesn’t just prevent random kills. It forces every application on your system to be honest about what it actually needs. And that honesty is uncomfortable — because it reveals how much we’ve been getting away with on borrowed memory.

Now, before you rush to change your production config at 4 PM on a Friday — don’t. The commenters who’ve been through this have a clear message: test first. Mode 2 can prevent process forks if your overcommit ratios aren’t tuned correctly. You need to load test, run full QA, and verify every application can restart cleanly under strict allocation. This is surgery, not a toggle.

But the deeper lesson isn’t about configuration. It’s about architecture. If your mission-critical database shares a machine with applications that allocate memory like there’s no tomorrow, you’ve already lost. The OOM Killer is just the executioner — the verdict was decided the moment you co-located your database with an application that doesn’t respect resource boundaries.

Isolate your databases. Enforce honest allocation. Stop trusting an operating system subsystem that was designed to sacrifice your most important processes first.

Your database deserves better than being murdered in its sleep by the thing that was supposed to protect it.

FAQ

Q: Won't strict overcommit break my existing applications?

A: Almost certainly some of them — and that's the point. Applications that commit 2x their actual memory usage have been coasting on the kernel's optimism. Strict mode exposes them. Test in QA first, tune your overcommit ratios, and prepare to fix or isolate the offenders.

Q: Should I just isolate PostgreSQL on its own machine instead?

A: That's the ideal architecture, yes. But even isolated databases benefit from strict overcommit because PostgreSQL's own worker processes and shared memory can trigger the OOM Killer under heavy load. Isolation reduces risk; strict overcommit eliminates the entire class of random kills.

Q: Is the OOM Killer actually broken, or is this overblown?

A: The OOM Killer works exactly as designed — the problem is that its design optimizes for system survival, not application priority. It will happily kill your most important process if that's what the scoring algorithm decides. Calling it 'broken' misses the point: it's a tool with different priorities than yours, and you need to configure around that reality.

📎 Source: View Source