Tarjan’s Algorithm

April 28, 2023

Tarjan’s Algorithm is a graph algorithm used for finding strongly connected components (SCCs) in directed graphs. This algorithm is named after its inventor, Robert Tarjan, who first described it in 1972. The SCCs of a directed graph represent groups of nodes that are mutually reachable, meaning that there is a path from any node in the group to any other node in the group.

The primary purpose of Tarjan’s Algorithm is to identify and extract SCCs from a directed graph. SCCs are useful in a variety of applications, including circuit design, network analysis, and program optimization. For example, in a compiler, SCCs can be used to identify code blocks that can be optimized together.

Brief History and Development

Robert Tarjan is an American computer scientist who is known for his contributions to the field of computer science, particularly in the areas of algorithms, data structures, and graph theory. He is the recipient of numerous awards, including the Turing Award, and has made significant contributions to the development of many important algorithms, including Dijkstra’s algorithm and the union-find algorithm.

Tarjan’s Algorithm was first published in 1972 in a paper called “Depth-first search and linear graph algorithms”. The algorithm was one of the first algorithms to be developed for finding SCCs in directed graphs. Since then, the algorithm has been widely studied and used in a variety of applications.

Key Concepts and Principles

Tarjan’s Algorithm is based on the concept of a depth-first search (DFS) of a directed graph. The algorithm works by traversing the graph in a DFS manner, and keeping track of the nodes visited, the order in which they were visited, and the low-link values of each node. The low-link value of a node is the smallest node number reachable from that node in the DFS traversal.

The key idea behind the algorithm is that nodes that are in the same SCC will have the same low-link value. Therefore, SCCs can be identified by looking for groups of nodes that have the same low-link value. Tarjan’s Algorithm uses a stack to keep track of the nodes in each SCC as they are identified.

Pseudocode and Implementation Details

The following is the pseudocode for Tarjan’s Algorithm:

procedure tarjan(node v)
      id[v] := index
      lowlink[v] := index
      index := index + 1
      S.push(v)
      onStack[v] := true

      // Consider successors of v
      for each (v, w) in E do
          if (id[w] == UNVISITED) then
              // Successor w has not yet been visited; recurse on it
              tarjan(w)
              lowlink[v]  := min(lowlink[v], lowlink[w])
          else if (onStack[w])
              // Successor w is in stack S and hence in the current SCC
              lowlink[v]  := min(lowlink[v], id[w])
      // If v is a root node, pop the stack and generate an SCC
      if (lowlink[v] == id[v]) then
          print "SCC:"
          repeat
              w := S.pop()
              onStack[w] := false
              print w
          until (w == v)

In this pseudocode, id[v] represents the DFS index of node v, lowlink[v] represents the lowest DFS index of any node reachable from v, S is a stack used to keep track of nodes in SCCs, and onStack[v] is a flag that indicates whether node v is currently on the stack.

The algorithm begins by initializing the DFS index and low-link values of the starting node, pushing it onto the stack, and marking it as on the stack. It then considers each of the node’s successors, and if a successor has not yet been visited, it recursively calls the algorithm on that node. If the successor is already on the stack, it updates the low-link value of the current node to be the minimum of its current low-link value and the DFS index of the successor.

If the current node is a root node (i.e., its low-link value is equal to its DFS index), the algorithm pops nodes off the stack until it reaches the current node, printing out each node that it pops off. This group of nodes constitutes an SCC.

Examples and Use Cases

Tarjan’s Algorithm can be used to solve a variety of problems that involve finding SCCs in directed graphs. One example of such a problem is the two-satisfiability problem, which involves finding a satisfying assignment for a set of boolean variables that satisfy a given set of constraints.

Another example of a use case for Tarjan’s Algorithm is in the analysis of the connectivity of a network. SCCs can be used to identify groups of nodes that are closely connected, which can help to identify areas of the network that may be vulnerable to failure.

Advantages and Disadvantages

One advantage of Tarjan’s Algorithm is that it is very efficient in terms of time complexity. The algorithm has a time complexity of O(V+E), where V is the number of nodes in the graph and E is the number of edges. This makes it well-suited for use in large graphs with many nodes and edges.

However, one disadvantage of the algorithm is that it requires a complete traversal of the graph, which can be costly in terms of memory usage. If the graph is very large or has a very high degree of connectivity, the algorithm may require a significant amount of memory to store the visited nodes and their low-link values.

There are several other algorithms that are used for finding SCCs in directed graphs, including the Kosaraju-Sharir algorithm and the Gabow’s algorithm. These algorithms are similar to Tarjan’s Algorithm in that they are based on a DFS traversal of the graph, but they use different approaches for identifying SCCs.

The Kosaraju-Sharir algorithm involves two DFS traversals of the graph, while Gabow’s algorithm uses a different approach for identifying SCCs based on the use of a dynamic data structure called a “finger tree”. Each of these algorithms has its own advantages and disadvantages, and the choice of algorithm will depend on the specific requirements of the problem at hand.