Union-Find Disjoint Set

April 27, 2023

The Union-Find algorithm, also known as the Disjoint Set algorithm, is a data structure that is used to keep track of a partition of a set of objects. Specifically, it tracks a collection of disjoint sets and allows for efficient merging and querying of these sets. The algorithm is widely used in computer science, particularly in graph theory and computer networks, to solve a variety of problems such as finding connected components, minimum spanning trees, and cycle detection.

The Union-Find algorithm was first introduced by Bernard Galler and Michael Fischer in 1964 as a solution to the problem of finding connected components in graphs. It was later refined by Robert E. Tarjan in 1975, who introduced the concept of path compression to improve its efficiency. Since then, the algorithm has become one of the most widely studied and used data structures in computer science.

Key concepts and principles

The Union-Find algorithm is based on a few key concepts and principles:

  • Disjoint sets: A set of objects is said to be disjoint if its members do not intersect or overlap with the members of any other set. In the context of the Union-Find algorithm, each set is represented as a tree, where each node represents an element of the set and each tree represents a disjoint set.
  • Union: The operation of merging two disjoint sets into a single set. This is done by selecting the roots of the two trees and making one of them a child of the other.
  • Find: The operation of determining which set a given element belongs to. This is done by traversing the tree from the given element to its root.
  • Path compression: An optimization technique that is used to improve the efficiency of the Find operation. It involves changing the parent pointer of each node along the path from the given element to its root to point directly to the root. This flattens the tree and reduces the time complexity of subsequent Find operations.

Pseudocode and implementation details

The Union-Find algorithm can be implemented using a variety of data structures, such as arrays, linked lists, or trees. Here is an example implementation using trees and path compression:

class UnionFind:
    def __init__(self, n):
        self.parent = [i for i in range(n)]
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return

        if self.rank[px] < self.rank[py]:
            self.parent[px] = py
        elif self.rank[px] > self.rank[py]:
            self.parent[py] = px
        else:
            self.parent[py] = px
            self.rank[px] += 1

In this implementation, parent is an array that stores the parent of each node in the tree, and rank is an array that stores the height of each node. The find function uses path compression to find the root of the tree that contains the given element x, and the union function merges the trees that contain x and y.

Examples and use cases

The Union-Find algorithm has a wide range of applications in computer science, particularly in graph theory and computer networks. Here are a few examples:

  • Finding connected components: Given an undirected graph, the Union-Find algorithm can be used to find the connected components of the graph. Each node is initially assigned to its own set, and the algorithm iteratively merges the sets that are connected by an edge. At the end of the algorithm, each set represents a connected component of the graph.
  • Minimum spanning trees: Given a weighted, connected graph, the Union-Find algorithm can be used to find the minimum spanning tree of the graph. The algorithm initially assigns each node to its own set, and iteratively adds the cheapest edge between two sets that are not yet connected. This is equivalent to the Kruskal’s algorithm and is more efficient with the use of Union-Find.
  • Cycle detection: Given a directed graph, the Union-Find algorithm can be used to detect cycles in the graph. The algorithm assigns each node to its own set, and iteratively merges the sets that are connected by an edge. If a merge operation attempts to merge two nodes that are already in the same set, then there must be a cycle in the graph.

Advantages and disadvantages

The Union-Find algorithm has a number of advantages and disadvantages:

  • Advantages: The algorithm has a time complexity of O(alpha(N)), where alpha is the inverse Ackermann function, which is extremely small and practically constant for all practical purposes. This makes the algorithm very efficient for a wide range of applications. Additionally, the algorithm is relatively simple to implement and can be used in a variety of contexts.
  • Disadvantages: While the Union-Find algorithm is very efficient for most applications, there are some cases where it may not be the best choice. For example, if the number of elements in the set is very small, a simpler data structure such as an array or a linked list may be more appropriate.

There are a number of related algorithms and variations of the Union-Find algorithm:

  • Weighted Union-Find: A variation of the Union-Find algorithm that takes into account the size of each set when merging them. This ensures that the resulting tree is more balanced and reduces the time complexity of subsequent operations.
  • Persistent Union-Find: A variation of the Union-Find algorithm that allows for efficient querying of historical versions of the data structure. This is useful in applications such as version control systems or database management systems.
  • Dynamic Connectivity: A problem that involves maintaining a set of elements that can be connected and disconnected over time. This problem can be solved using the Union-Find algorithm, along with a few additional optimizations to handle dynamic updates efficiently.