Interview Kickstart has enabled over 21000 engineers to uplevel.
If you are a software engineer looking to land your dream job, sorting algorithms are an indispensable part of your technical interview prep. One of these sorting algorithms is topological sort, or top sort, which is defined only for directed acyclic graphs (DAGs).
Topological order is the result of a linear ordering of a DAG’s vertices such that for every directed edge (U, V) present in it, U comes before V in the topological ordering. This means there may be multiple valid topological orderings possible for the same DAG. To learn about this in more detail, check out our article on topological sort.
As you may know, topological sorting can be done using algorithms like Kahn’s algorithm, depth-first search, and some parallel algorithms. This article focuses on one of these algorithms: Kahn’s algorithm (Kahn algorithm or Kahn topological sort), which can help people working as software developers/coding engineers solve some complicated-looking questions with more ease.
In this article, we’ll discuss:
The basis of Kahn’s Algorithm is the following assertion:
A directed acyclic graph (DAG) G has at least one vertex with the indegree zero and one vertex with the out-degree zero.
Let's try and prove this assertion and see if it’s accurate:
For any directed acyclic graph G, we can say that it does not contain a cycle. Since there is no cycle, there’s no endless loop that exists as a path in the graph. In other words, all paths in G are of finite length.
Let us now consider a path P that represents the longest path in G. We know that:
Essentially, Kahn’s algorithm works by keeping track of the number of incoming edges into each node (indegree). It repeatedly:
This process continues until no element with zero indegree can be found. As mentioned earlier, this can happen when the topological sorting is complete and also when a cycle is encountered. For this reason, the algorithm checks if the result of topological sorting contains the same number of elements that were supposed to be sorted:
We’ll now learn in detail what this really means and how such an algorithm can be implemented.
As we’ve outlined in Kahn’s algorithm, we iteratively remove nodes with zero directed edges pointing towards them. However, these zero indegree nodes may have outgoing directed edges (originating at them and pointing towards other nodes).
These outgoing edges get removed when their source nodes are deleted. This reduces the indegrees of their destination nodes and creates zero or more new nodes with zero dependencies (for a DAG). (The case of creating zero new nodes with indegree zero is only possible if the DAG already contains one or more than one node with in-degree zero. As we’ve seen, it’s a property of a DAG that it will always have at least one node with in-degree zero.)
This process continues till no nodes are left to remove, and we get a topological ordering of the directed acyclic graph. We can condense this process into the following steps:
The best way to further understand how Kahn’s algorithm sorts topologically is through an illustrative example. Let’s jump right in!
We begin with a directed acyclic graph, as shown in the figure below. We also notice all the directed edges of the DAG, which gives us everything we need to know to represent the DAG.
Edges: {0, 1}, {0, 2}, {1, 3}, {1, 4}, {3, 4}, {3, 5}
Here, “In” = indegree
Order of removal: First 0, then 1 and 2, then 3, and finally, 4 and 5.
Topological order: 0,1,2,3,4,5
Note: This is not the only possible topological order.
For example, 0,1,3,4,5,2 is also a valid topological order.
We now calculate the indegree of each node. The indegree of a node pictorially refers to the number of edges pointing towards that node. The direction of the arrow is incoming, hence the term indegree.
In terms of edges, each edge {Ui, V} where Ui is the source and V is the destination node contributes to the indegree of its destination node V as the edge points towards it. We’re going to update the indegree value for any node V every time an edge associated with it as its destination node is deleted.
We can now begin applying Kahn’s algorithm for topological sorting:
We can now summarize the above steps in the form of an algorithm:
KahnTopologicalSort()
For(vertex=0; vertex<inputSize; vertex++)
Find indegree[vertex]
Create an auxiliary storage like stack or queue
Add nodes with indegree 0 in auxiliary storage
while(node U with indegree zero exists in auxiliary storage)
{
Remove U from auxiliary storage
Remove all its outgoing edges (U, Vi) from the graph.
For destination vertices Vi which had their incoming edges removed:
indegree[vertex Vi]=indegree[vertex Vi]-1
Add new nodes with indegree 0 to the auxiliary storage
)
if(elements sorted = all elements)
Return or Print nodes in topologically sorted order
Else
Return null or Print no topological ordering exists
end topologicalSort()
Let us see more closely how the calculation of indegree can be done.
We can use an indegree array to track indegree values. To find the indegree of all nodes initially, we first initialize all values in the indegree array to 0 and then go through all the edges and increment the indegree counter of the destination node of each edge by 1. We do this because the destination node has an edge that’s an incoming edge pointing to it, contributing to its indegree.
Pseudocode
for each node in allNodes
indegree[node] = 0;
for each edge(source,destination) in Edges
indegree[destination]++;
The upper loop runs for all nodes, and the lower loop runs for all edges. With V number of vertices or nodes and E edges, the total complexity of this method becomes O(V+E).
We can go through every node in the list, considering it the source node. Any node connected to it in an edge as a destination node gets its indegree incremented by 1.
Pseudocode
for each node in allNodes
{for each destinationNode in destinationNodesList[node]
indegree[destinationNode]++;}
The inner loop here will be executed E number of times where E is the total number of Edges. The outer loop will be executed V number of times, where V is the total number of vertices or nodes. The overall time complexity for this method will hence also be O(V+E).
Let us now look at how we can implement Kahn’s algorithm using C++.
Kahn’s Algorithm Topological Sort C++ Code
#include <iostream>
#include <vector>
using namespace std;
// Structure to store an edge of a graph
struct Edge {
int from, to;
};
// A class that represents a graph
class Graph
{
public:
// Each element of an adjacency list contains a node and every other node it points to.
vector<vector<int>> adjacencyList;
// To store indegree of a node
vector<int> indegree;
// Constructing a graph
Graph(vector<Edge> const &Edges, int allNodes)
{
// Resizing the vector to hold all nodes
adjacencyList.resize(allNodes);
// Initializing indegree of all nodes to zero
indegree.assign(allNodes, 0);
// Adding directed edges to the graph start node-->end node
for (auto &edge: Edges)
{
// Adding an edge from start node to end node
adjacencyList[edge.from].push_back(edge.to);
// Incrementing in-degree of end node by 1
indegree[edge.to]++;
}
}
};
bool topologicalSort(Graph const &inputGraph, vector<int> &sorted, int allNodes)
{
vector<int> indegree = inputGraph.indegree;
// Storing all the nodes with no incoming edges
vector<int> zeroIndegreeNodes;
for (int i = 0; i < allNodes; i++)
{
if (indegree[i]==0) {
zeroIndegreeNodes.push_back(i);
}
}
while (!zeroIndegreeNodes.empty())
{
// Deleting fromNode from zeroIndegreeNodes
int fromNode = zeroIndegreeNodes.back();
zeroIndegreeNodes.pop_back();
// Pushing fromNode into sorted
sorted.push_back(fromNode);
for (int toNode: inputGraph.adjacencyList[fromNode])
{
// Deleting from the graph the edge from fromNode to toNode
indegree[toNode] -= 1;
// If the updated in-degree of toNode is now zero, inserting toNode into zeroIndegreeNodes
if (indegree[toNode]==0) {
zeroIndegreeNodes.push_back(toNode);
}
}
}
// A cycle was encountered if the sorting is done for all zero in-degree nodes in the loop above, but not all input nodes have been sorted
if( (int) sorted.size()!=allNodes)
{
return false;
}
return true;
}
int main()
{
// Count of all the nodes in the graph
int allNodes = 6;
// All the edges of the graph
vector<Edge> edges =
{
{ 0, 1 }, { 0, 2 }, { 1, 3}, {1, 4}, { 3, 4}, { 3, 5 }
};
// A vector to store elements in the sorted order
vector<int> sorted;
// Building a graph, given the number of nodes and all it edges
Graph inputGraph(edges, allNodes);
// Topological sorting. Printing the topologically sorted order if one exists.
if (topologicalSort(inputGraph, sorted, allNodes))
{
for (int i;i< (int) sorted.size();i++)
{
cout << sorted[i] << " ";
}
}
else
{
cout << "Topological sorting is not possible as the graph isn't acyclic";
}
return 0;
}
The output is 0 2 1 3 5 4, which is another valid topological ordering of the directed acyclic graph we discussed in the example earlier. Nodes (2,1) and (5, 4) reach indegree zero at the same time; therefore, they can be written in any order relative to each other. So, 0 | 1,2 | 3 | 4,5 is just as valid as 0 | 2,1 | 3 | 5,4. Of course, there are even more valid topological orders possible in this case.
Let us now discuss the time and space complexities of topological sorting using Kahn’s algorithm.
The time complexity is O(V+E) where V = Vertices, E = Edges.
To find out the time complexity of Kahn’s algorithm, let's try to find each step’s time complexity.
Adding all the individual run times, the time complexity of topological sort is O(V+E).
This is the best-, worst-, and average-case time complexity of this algorithm, as we perform the same steps regardless of how the elements are organized.
Auxiliary space required is O(V).
Adding all three, we arrive at the space complexity of O(V).
Question 1: What are the applications of Kahn’s Algorithm?
Answer: Kahn’s algorithm for topological sorting is used mainly when tasks or items have to be ordered, where some tasks or items have to occur before others can. For example:
Question 2: Is it possible for a DAG to have no valid topological ordering, and why don’t we order non-DAG graphs topologically?
Answer: No. A DAG will always have at least one valid topological ordering possible. If a graph is cyclic, topological sorting is impossible, and hence, not defined for cyclic graphs. And if a graph is undirected, each edge being bidirectional creates a cycle between the two nodes it connects; hence topological ordering isn’t possible in that case either. The graph has to be a DAG for topological ordering to be possible.
If you’re looking for guidance and help with getting your prep started, sign up for our free webinar. As pioneers in the field of technical interview prep, we have trained thousands of software engineers to crack the toughest coding interviews and land jobs at their dream companies, such as Google, Facebook, Apple, Netflix, Amazon, and more!
----------
Article contributed by Tanya Shrivastava
Attend our webinar on
"How to nail your next tech interview" and learn