You’re Wrong About Union-Find. It’s Actually an Array Problem.

If you’ve ever tried to prove that union-find works correctly—with all that path compression and union by rank—you know the pain. Trees with mutable pointers, recursive find operations, and a correctness argument that feels like walking through a maze. But what if I told you that the entire algorithm can be reduced to something far simpler? Something that fits on a single line of logic?

Here’s the twist: Union-find isn’t a tree algorithm. It’s an array algorithm in disguise.

I know, I know. Every textbook, every lecture, every LeetCode solution shows you nodes and parent pointers. But when you look at what the algorithm actually does—read from an array, write to an array—you realize the tree is just a convenient story we tell ourselves. The real computation is pure array theory.

This isn’t just a philosophical head-trip. It has real consequences. By framing union-find as operations on arrays, you can plug it directly into SMT solvers. You can prove correctness mechanically. You can even derive new algorithmic insights that the tree metaphor obscures.

Let me show you what I mean. In the standard view, you have a forest of trees. Union links roots. Find climbs parent pointers. Path compression flattens the tree. It’s dynamic, pointer-heavy, and hard to reason about formally. But think about it: every parent pointer is just a value in an array at index i. Every path compression is just an assignment to that array. Every union is just a write to the root’s entry.

Once you see union-find as array logic, you can prove correctness in minutes, not months.

Philip Zucker, the author of the original analysis, shows exactly this: the Theory of Arrays (ToA) provides a logical foundation for union-find. Instead of reasoning about pointer chasing, you reason about read-over-write constraints. It’s a beautiful reduction—from a messy dynamic structure to a clean, static array with updates.

The moment that clicked for me was when he said: “Forget the trees. Just think about what happens when you write to an array cell.” Suddenly, the whole thing became transparent. The path compression? That’s just a conditional write. The find operation? That’s a loop of array reads, which is equivalent to a chain of read-over-write expressions.

This perspective flips the narrative. Most practitioners see union-find as a graph algorithm. But it’s actually a special case of array logic—and that means it can be verified with standard SMT techniques. No need for complex induction invariants. No need for pointer aliasing models. Just arrays and equalities.

The pointer chase is a lie; the truth is stored in flat memory.

Now, you might ask: “So what? I can implement union-find in my sleep. Who cares about formal verification?” Fair point. But think about the bigger picture. How many other “obvious” algorithms could be hiding a simpler logical core? What if the way we teach data structures is actively making them harder to understand?

This is more than a neat trick. It’s a challenge to how we think about algorithms. We’ve built entire mental models around pointers and trees, when the underlying logic is often just reads and writes. The array theory view strips away the metaphor and leaves you with the essence.

Next time you write a union-find implementation, try this: look at your code and write down every read and every write to your parent array. Now ask yourself: do the trees matter? Or is the array enough?

Stop explaining union-find with trees. Start explaining it with arrays. Your future self—and your theorem prover—will thank you.

FAQ

Q: Union-find works fine in practice; why bother with this theoretical reframing?

A: Because formal verification of complex systems needs compositional proofs. This reframing makes it trivial to integrate union-find into SMT solvers, enabling automated correctness proofs for larger software.

Q: How does this help me as a programmer?

A: It gives you a clear mental model that maps directly to logic. You can reason about your implementation using array invariants, which reduces bugs and simplifies code reviews. Plus, it makes it easy to plug into proof assistants.

Q: Isn't this just a different representation, not a new insight?

A: It's not just representation—it's a change in what we take as primitive. Trees are built on arrays; reversing the priority reveals that the dynamic behavior is emergent from static array updates, which is a powerful lesson for algorithm design.

📎 Source: View Source