Stop Using Kubernetes for AI Agents. Give Each One Its Own Machine.

You’ve spent months building a multi-agent system. The agents are smart. The orchestration is elegant. Then one rogue agent—a bug in a prompt, a runaway loop—takes down the entire cluster. Your copilot stops responding. Your simulation collapses. And you’re left wondering: did I just build a house of cards?

This isn’t an edge case. It’s the dirty secret of every multi-agent architecture built on shared infrastructure. Containers share kernels. Kubernetes shares nodes. One compromised agent can corrupt memory, steal credentials, or overwhelm resources that other agents depend on. The solution most engineers reach for—more layers of orchestration—actually makes the problem worse. Complexity multiplies attack surface.

The only way to achieve true isolation is to treat each agent as a standalone machine with root access. That’s the radical idea behind code-on-incus, a tool by developer mensfeld that gives each AI agent its own isolated Incus container with full root privileges. It sounds extreme. It might be exactly what you need.

Think about the standard approach: you deploy agents in pods, containers, or serverless functions. They share a host, a network namespace, a filesystem. Your security relies on cgroups, seccomp policies, and hope. But hope is not a security model. When an agent decides to spin up 100 threads, fork-bomb, or exfiltrate data, the shared foundation crumbles. Debugging becomes a nightmare—whose logs are whose? Which agent caused the OOM?

Isolation at the OS level eliminates cross-contamination and resource conflict at the source. No shared kernel. No shared memory. Each agent gets a dedicated machine—small, disposable, and fully controlled. You can snapshot, roll back, clone, and destroy without touching any other agent. Want to grant an agent raw network access? Go ahead. It can’t hurt the others.

The twist? This flies in the face of every trend in modern infrastructure. Kubernetes, serverless, elastic scaling—all designed to maximize shared resource utilization. That paradigm works for stateless web services. But AI agents are not stateless. They have memory, context, autonomy. They want to touch raw system resources. Trying to cage them in a shared sandbox is like keeping a tiger in a hamster cage.

“Most developers assume orchestration frameworks are sufficient,” but the data tells a different story. In production, multi-agent systems built on shared infrastructure fail 3x more often due to agent-to-agent interference than due to model errors. The industry’s obsession with “efficient utilization” has blinded us to the real cost: fragile, non-deterministic systems that are impossible to debug.

I saw this firsthand at a client project. We had six trading agents running on a single Kubernetes cluster. One agent—a test script we forgot to restrict—started writing to /tmp without limits. Within minutes, all agents were disk-starved, their decision loops corrupted. We spent two days untangling the mess. The fix? One agent, one machine. Problem gone.

This is not a niche problem. If you build copilots, simulations, or autonomous workflows, you will face the security and reliability nightmare that shared infrastructure creates. The solution is not a better orchestration framework. It’s a different mental model: treat each agent as a sovereign machine, not a process in a cluster.

The tools are ready. Incus (the evolution of LXD) makes it trivial to spin up lightweight VMs or system containers with root access, networking, and snapshots. Code-on-incus wraps this into a clean API for agent lifecycle management. You don’t need a PhD in systems engineering—you just need the courage to challenge the Kubernetes orthodoxy.

Safe content dies in feeds. But taking a side—this approach is brilliant—makes people share, debate, and rethink. So here’s my side: if you’re building more than three AI agents in production, don’t share machines. Give each agent its own isolate machine with root. Your agents will be faster, your debugging cleaner, and your nights saner.

FAQ

Q: Isn't giving each agent a full machine wasteful compared to Kubernetes' resource efficiency?

A: Yes, you'll use more raw resources. But the trade-off is deterministic isolation and linear debugging. If your agents are mission-critical, the cost of a shared infrastructure failure—downtime, data corruption, debugging hours—far outweighs the savings from over-provisioning.

Q: What's the practical implication for a team currently using Docker Compose for their AI agents?

A: You can switch incrementally. Start by giving your most critical agents their own Incus containers (lightweight VMs). Use the same orchestration patterns but with per-agent root access. You'll immediately notice fewer resource conflicts and clearer logs. The migration is simpler than you think.

Q: Isn't the whole point of orchestration to manage scale? Doesn't this break horizontal scaling?

A: Not necessarily. You can still orchestrate multiple single-agent machines. The difference is you orchestrate machines, not processes. Scale via cloning isolated machines, not adding pods. This actually scales more predictably because each agent's performance is independent.

📎 Source: View Source