Graph is common data structure that represents the relationships between different entities using edges and vertices. The behavior of search algorithm can be changed by plugging in different data structures, such as using a stack yields depth-first search, and using a queue gives breadth-first search.

Brief

In the previous blog post, we have discussed the basic concept of graph traversal algorithms. Here is a very simple and good illustration of the graph scan, derived from IDEA.

In practice, a graph can be extermely large, and there is a need to traverse or search the graph along the edges. The breadth-first search and depth-first search are the most commonly used traversal and search algorithm. In this post, I would like to practice and discuss the implementations of directed and undirected graphs, breath-first and depth-first search algorithms. Also, practice some exercises associated with graphs search algorithm.

Graph Representation and Implementation

A graph is made up of vertices (also called nodes or points) which are connected by edges (also called links or lines). A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmertically. Here is the concise implementation of UndirectedGraph and DirectedGraph classes, which inherit from Graph:

graph representation
  • graph
  • undirectedgraph
  • directedgraph
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class Graph():
"""Docstring for Graph. """
def __init__(self) -> None:
"""TODO: to be defined. """
# Adjacency set representation.
# key: node name, value: adjacency node name set
self.edges = dict()

def add_node(self, node: Union[int, str]) -> None:
"""TODO: Docstring for add_node.
"""
if node not in self.edges:
self.edges[node] = set()

def add_edge(self, source: Union[int, str], destination: Union[int, str]) -> None:
"""TODO: Docstring for add_edge.
"""
self.add_node(source)
self.add_node(destination)
self.edges[source].add(destination)

def get_nigihbors(self, node: Union[int, str]) -> Set[Union[int, str]]:
"""TODO: Docstring for get_nigihbors.
"""
if node not in self.edges:
raise RuntimeError(f"Node {node} does not exist in the graph.")

for u in self.edges[node]:
yield u

def get_num_nodes(self):
"""TODO: Docstring for get_num_nodes.
"""
return len(self.edges)

Traversal and Search Implementation

Graph traversal refers to the process of visiting each vertex in a graph. Such traversals are classified by the order in which the vertices are visited. If each vertex in a graph is to be traversed by a tree-based algorithm, then the algorithm must be called at least once for each connected component of the graph. Here is the concise implementation of breadth first traversal, iterative depth first traversal and recursive depth first traversal:

traversal and search implementation
  • bfs
  • iterativedfs
  • recursivedfs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def breadth_first_traversal(graph: Graph, start_nodes: List[int]) -> List[int]:
"""TODO: Docstring for breadth_first_traversal.
BFS supports multiple start nodes, the data structure used by queue
O(V+E) where V is the number of nodes and E is the number of edges
"""
search_order = []
num_nodes = graph.get_num_nodes()

# Store the nodes that have been visited
visited = set()

# FIFO
q = Queue(maxsize=num_nodes)

# Mark the source node as visited and enqueue it
for node in start_nodes:
q.put(node)
visited.add(node)

while not q.empty():
src_node = q.get()
search_order.append(src_node)
for neighbor in graph.get_nigihbors(src_node):
if neighbor not in visited:
q.put(neighbor)
visited.add(neighbor)

return search_order

Note: BFS explores all the nodes at the current level before it explores the next level. To achieve this, it uses first-in-first-out (FIFO) queue and traverse all the connected nodes; DFS explores as far as possible along each branch before bracktracking. It can be implemented recursively or iteratively. The recursive algorithm naturally explores the depth first but it not favored for large graphs because of stack overflow. The iterative algorithm uses a last-in-first-out (LIFO) stack to explore the depth first.

Experiments

We created a simple example of directed and undirected graphs, the network is same as previous blog post, to test around with BFS and DFS. Here is how we invoke a method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def create_dummy_directed_graph() -> DirectedGraph:
"""TODO: Docstring for create_dummy_directed_graph.
"""
graph = DirectedGraph()
graph.add_edge(0, 1)
graph.add_edge(0, 5)
graph.add_edge(1, 2)
graph.add_edge(2, 4)
graph.add_edge(2, 6)
graph.add_edge(3, 2)
graph.add_edge(5, 8)
graph.add_edge(6, 5)
graph.add_edge(7, 5)
return graph

def create_dummy_undirected_graph():
"""TODO: Docstring for create_dummy_undirected_graph.
"""
graph = UndirectedGraph()
graph.add_edge(0, 1)
graph.add_edge(1, 2)
graph.add_edge(2, 3)
graph.add_edge(1, 7)
graph.add_edge(3, 7)
graph.add_edge(7, 8)
graph.add_edge(3, 4)
graph.add_edge(3, 5)
graph.add_edge(4, 5)
graph.add_edge(5, 6)
graph.add_edge(6, 7)
graph.add_edge(5, 8)
return graph

if __name__ == "__main__":
directed_graph = create_dummy_directed_graph()
undirectedGraph = create_dummy_undirected_graph()

# Directed Graph
search_order = breadth_first_traversal(graph=directed_graph, start_nodes=[0, 1])
print(search_order)

search_order = iterative_depth_first_traversal(graph=directed_graph, start_node=0)
print(search_order)

search_order = recursive_depth_first_traversal(graph=directed_graph, start_node=0)
print(search_order)

# Undirected Graph
search_order = breadth_first_traversal(graph=undirectedGraph, start_nodes=[0, 1])
print(search_order)

search_order = iterative_depth_first_traversal(graph=undirectedGraph, start_node=0)
print(search_order)

search_order = recursive_depth_first_traversal(graph=undirectedGraph, start_node=0)
print(search_order)
The expected outputs from the terminal are
[0, 1, 5, 2, 8, 4, 6]
[0, 5, 8, 1, 2, 6, 4]
[0, 1, 2, 4, 6, 5, 8]
[0, 1, 2, 7, 3, 8, 6, 4, 5]
[0, 1, 7, 6, 5, 4, 3, 2, 8]
[0, 1, 2, 3, 4, 5, 8, 7, 6]

Exercises

  1. Exercise 1 - Shortest Distance from All Buildings
  2. Exercise 2 - Critical Connections in a Network by DFS

Exercise 1. - Shortest Distance from All Buildings

You want to build a house on an empty land which reaches all buildings in the shortest amount of distance. You can only move up, down left, and right. You are given a 2D grid of values 0, 1, 2, where:

  • Each 0 marks an empty land which you can pass by freely.
  • Each 1 marks a building which you cannot pass through.
  • Each 2 marks an obstacle which you cannot pass through.

Example: Input [[1, 0, 2, 0, 1], [0, 0, 0, 0, 0], [0, 0, 1, 0, 0]], Output: 7

Explanation: In this example, there are three buildings at (0, 0), (0, 4), (2, 2), and an obstacle at (0, 2)

The point (1, 2) is an ideal empty land to build a house, as the total travel distance of 3 + 3 + 1 = 7 is minimal. So return 7.

shortestDistance
  • python
  • cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def shortestDistance(self, grid: List[List[int]]) -> int:
def bfs(i, j):
q = Queue()
q.put([i, j])
step = 0
xx, yy = 0, 0

tmp_grid = [[0]*column for _ in range(row)]

for x in range(row):
for y in range(column):
tmp_grid[x][y] = grid[x][y]

while not q.empty():
curDepth = q.qsize()

for i in range(curDepth):
xx, yy = q.get()

# meet the boundary
if xx == len(grid) or xx < 0 or yy == len(grid[0]) or yy < 0:
continue

# only empty land which you can pass by freely
if step != 0 and tmp_grid[xx][yy] != 0:
continue

visited[xx][yy] += 1
distance[xx][yy] += step
tmp_grid[xx][yy] = -1
q.put([xx+1, yy])
q.put([xx-1, yy])
q.put([xx, yy+1])
q.put([xx, yy-1])

step += 1

row, column = len(grid), len(grid[0])
distance = [[0]*column for _ in range(row)]
visited = [[0]*column for _ in range(row)]
num_building = 0
ans = math.inf

# Search & Traversal
for i in range(row):
for j in range(column):
if grid[i][j] == 1:
num_building += 1
bfs(i, j)

# Find the shortest distance
for i in range(row):
for j in range(column):
if visited[i][j] == num_building:
ans = min(ans, distance[i][j])

return ans if ans != math.inf else -1

The basic functions for caculating the shortest distance are as follows:

  • Traversing 2D grid by BFS algorithm
  • Store visited count and distance between two buildings

Exercise 2. - Critical Connections in a Network

There are a servers numbered from 0 to n-1 connected by undirected server-to-server connections forming a network where connections[i] = [a, b] represents a connection between servers a and b. Any server can reach any other server directly or indirectly through the network.

A critical connection is a connection that, if removed, will make some server unable to reach some other server.

Return all critical connections in the network in any order.

Example: Input: n=4, connections = [[0, 1], [1, 2], [2, 0], [1, 3]], Output:[[1, 3]]

criticalConnections
  • python
  • cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def criticalConnections(self, n: int, connections: List[List[int]]) -> List[List[int]]:
def dfs_visit(curr, parent, visited):
low_link[node] = visited
visited += 1

# Traversal
for neigh in graph[curr]:
if neigh == parent:
continue

if low_link[neigh] == 0;
dfs_visit(neigh, node, visited)

# Track the smallest low link value
low_link[curr] = min(low_link[curr], low_link[neigh])

if low_link[neigh] == visited:
bridge.append([curr, neigh])

graph = defaultdict(list)
for sor, dst in connections:
graph[sor].append(dst)
graph[sor].append(dst)

low_link , bridge = defaultdict(int), []
visited = 1

dfs_visit(0, None, visited)

return bridge

The solution was initially inspired by Tarjan's algorithm, and then reworked. The basic functions for determining critical connections are as follows:

  • Constructing undirected graph

  • Do a recursive DFS traversal, labeling node with increaseing visited time and track the smallest low link value

  • Determine critical connection (bridge) depending on when the low link value is equal to visited times.

Note: Low link value of a node is defined as the smallest visited time from current node when doing a DFS, including itself. And the bridge in graphy theory is any edge in a graph whose removal increases the number of connected components.

Reference

Thanks for reading! Feel free to leave the comments below or email to me. Any pieces of advice or discussions are always welcome. :)