# Union-Find Disjoint Set

April 27, 2023

The Union-Find algorithm, also known as the Disjoint Set algorithm, is a data structure that is used to keep track of a partition of a set of objects. Specifically, it tracks a collection of disjoint sets and allows for efficient merging and querying of these sets. The algorithm is widely used in computer science, particularly in graph theory and computer networks, to solve a variety of problems such as finding connected components, minimum spanning trees, and cycle detection.

The Union-Find algorithm was first introduced by Bernard Galler and Michael Fischer in 1964 as a solution to the problem of finding connected components in graphs. It was later refined by Robert E. Tarjan in 1975, who introduced the concept of path compression to improve its efficiency. Since then, the algorithm has become one of the most widely studied and used data structures in computer science.

## Key concepts and principles

The Union-Find algorithm is based on a few key concepts and principles:

• Disjoint sets: A set of objects is said to be disjoint if its members do not intersect or overlap with the members of any other set. In the context of the Union-Find algorithm, each set is represented as a tree, where each node represents an element of the set and each tree represents a disjoint set.
• Union: The operation of merging two disjoint sets into a single set. This is done by selecting the roots of the two trees and making one of them a child of the other.
• Find: The operation of determining which set a given element belongs to. This is done by traversing the tree from the given element to its root.
• Path compression: An optimization technique that is used to improve the efficiency of the Find operation. It involves changing the parent pointer of each node along the path from the given element to its root to point directly to the root. This flattens the tree and reduces the time complexity of subsequent Find operations.

## Pseudocode and implementation details

The Union-Find algorithm can be implemented using a variety of data structures, such as arrays, linked lists, or trees. Here is an example implementation using trees and path compression:

``````class UnionFind:
def __init__(self, n):
self.parent = [i for i in range(n)]
self.rank = [0] * n

def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]

def union(self, x, y):
px, py = self.find(x), self.find(y)
if px == py:
return

if self.rank[px] < self.rank[py]:
self.parent[px] = py
elif self.rank[px] > self.rank[py]:
self.parent[py] = px
else:
self.parent[py] = px
self.rank[px] += 1
``````

In this implementation, `parent` is an array that stores the parent of each node in the tree, and `rank` is an array that stores the height of each node. The `find` function uses path compression to find the root of the tree that contains the given element `x`, and the `union` function merges the trees that contain `x` and `y`.

## Examples and use cases

The Union-Find algorithm has a wide range of applications in computer science, particularly in graph theory and computer networks. Here are a few examples:

• Finding connected components: Given an undirected graph, the Union-Find algorithm can be used to find the connected components of the graph. Each node is initially assigned to its own set, and the algorithm iteratively merges the sets that are connected by an edge. At the end of the algorithm, each set represents a connected component of the graph.
• Minimum spanning trees: Given a weighted, connected graph, the Union-Find algorithm can be used to find the minimum spanning tree of the graph. The algorithm initially assigns each node to its own set, and iteratively adds the cheapest edge between two sets that are not yet connected. This is equivalent to the Kruskal’s algorithm and is more efficient with the use of Union-Find.
• Cycle detection: Given a directed graph, the Union-Find algorithm can be used to detect cycles in the graph. The algorithm assigns each node to its own set, and iteratively merges the sets that are connected by an edge. If a merge operation attempts to merge two nodes that are already in the same set, then there must be a cycle in the graph.