Published: May 4, 2021
Last Modified: May 4, 2021
MathJax Loaded: 0 ms
Page Complete: 0 ms
Graphs
 Notes based on Graphs Series by William Fiset
 Pseudocode and lots of the images from above as well
Graph Introduction
Graph Terminology
 Undirected Graph: Graph where edges have no orientation. The edge (u, v) is the same as the edge (v, u).
 Directed Graph (Digraph): Edges have orientation. Edge from (u, v) is the edge from node u to node v.
 Weighted Graphs: Edges have weights to represent values like cost, distance, quantity, etc.
 Connected Component: The subgraph where it’s possible to get to any vertex from any vertex (but doesn’t have to be an edge between every vertex to every other vertex).
 The following graph has 3 connected components
Tree
Connected undirected graph with no cycles. Equivalent definition: Connected graph with N nodes N1 edges.
Examples:
Rooted Tree
 Tree that has a designated root node where every edge points either away from the root or towards the node.
 OutTree (Arborescence) : All edges point away from root
 InTree (AntiArborescence) : All edges point towards root (More common)
Directed Acyclic Graph (DAG)
Directed graph with no cycles
 Represent dependencies (class prereqs, scheduling, etc.)
 All outtrees are DAGs but not all DAGs are outtrees
 The difference between trees and DAGs is that nodes in DAGs can have multiple parents, while tree nodes can only have on parent.
Bipartite Graph
 Definition 1: Vertices can be split into two independent groups U, V such that every edge connects between U and V.
 Definition 2: Two colored graph
 Definition 3: No odd length cycles
Complete Graph
Unique edge between every pair of nodes.
 Complete graph with n vertices is denoted as the graph $K_n$
 Complete graphs are often seen as the worst case graph due to there being the maximum amount of edges
Graph Representation
Adjacency Matrix
Pros  Cons 

Space efficient for dense graphs  Requires $O(V^2)$ space 
Edge weight lookup is $O(1)$  Iterating over all edges takes $O(V^2)$ 
Simplest graph representation 
Adjacency List
Pros  Cons 

Space efficient for sparse graphs  Less space efficient for denser graphs 
Iterating over all edges is efficient  Edge weight lookup is $O(E)$ 
Slight more complex graph representation 
Edge List
Unordered list of edges. Triplet (u, v, w) means “the cost from u to node v is w”.
Pros  Cons 

Space efficient for sparse graphs  Less space efficient for denser graphs 
Iterating over all edges is efficient  Edge weight lookup is $O(E)$ 
Very simple structure  Lacks structure, so seldom used 
Common Graph Theory Problems
Shortest Path
Shortest path from node A to node B
 BFS (only for unweighted), Dijkstra’s, BellmanFord, FloydWarshall, A*, etc.
Connectivity
Does there exist a path between node A to B
 Union find or DFS/BFS
Negative Cycles
Does a weighted digraph have negative cycles?
 Negative cycles are like a trap since the sum of the weighted edges keeps decreasing as you go through the cycle
 BellmanFord and FloydWarshall
Strongly Connected Components (SCC)
SCCs are selfcontained cycles within a directed graph where each vertex in a given cycle can reach every other vertex in the same cycle.
 Tarjan’s Algorithm, Kosaraju’s Algorithm with DFS
 a, b, e is a SCC and f, g is a SCC and c, d, h is a SCC
Traveling Salesman Problem
“Given a list of cities and the distance between pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?”
 NP Hard
 HeldKarp with Dynamic Programming, Branch and Bound, and approximation algorithms
Bridges
A Bridge/cut edge is any edge whose removal increases the number of connected components.
 Bridges can help identify weak points and bottlenecks
 The bolded edges are bridges since they would disconnect a part of the graph and create a new connected component
Articulation Points
Any node in a graph that increases number connected components when cut.
Minimum Spanning Tree (MST)
“Subset of edges of a connected, edgeweighted graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.”
 All MSTs have the same total cost, but MSTs on a graph are not always unique.
 Kruskal’s, Prims, etc.
Network Flow: Max Flow
With an infinite input source, how much “flow” can we push through the network?
 Edges can be roads with cars, pipes with water, or hallways with people. Flow would be the number of cars the roads can sustain in traffic, volume of water allowed in the pipes, and maximum number of people in hallways, respectively.
 FordFulkerson, EdmondsKarp, Dinic’s Algorithm
Depth First Search
 See https://jacobshin.com/notes/computerscience/datastructures/#depthfirstsearch
 Stuff Augmented DFS can do:
 Compute a graph’s Minimum spanning tree
 Check if graph is bipartite
 Find strongly connected components
 Topologically sort the nodes of a graph
 Find bridges and articulation points
 Find augmenting paths in a flow network?
 Generate mazes
Count Connected Components
Algorithm: Start a DFS at every node (unless it’s already visited) and mark all reachable nodes as being part of the same component. BFS also works in place of DFS.
Breadth First Search
Grids
 Grids are implicit graphs since you can determine a node’s neighbors (e.g. mazes, maps, etc.)
 Common approach is to convert grid to adjacency list/matrix
 Conversion isn’t strictly necessary in all cases; instead you can just use properties of the grid.
Shortest Path in Maze Grid
 For example, if you wanted to access adjacent vertices (neighboring cells in the grid):
# From https://youtu.be/KiCBXu4P2Y
# Define the direction vectors for
# north, south, east, and west.
dr = [1, +1, 0, 0]
dc = [0, 0, +1, 1]
for (i = 0; i < 4; i++):
rr = r + dr[i]
cc = c + dc[i]
# Skip invalid cells. Assume R and
# C for the number of rows and columns
if rr < 0 or cc < 0: continue
if rr >= R or cc >= C: continue
# (rr, cc) is a neighboring cell of (r, c)
You can still use BFS to see if a grid based maze is solvable, just using the above to find adjacent vertices and puting the (x, y) pair of the neighbors in a queue. An alternative to putting an (x, y) pair into a single queue is to put x in one queue and y in a separate queue (works well for higher dimensional coordinates as well).
Topological Sort
 DAGs can model school class prereqs, program dependencies, event scheduling, etc.
 Topological orderings are not unique; there can be more than one topological orderings in a graph
 Graphs with cycles can’t have topological orderings since cyclic dependencies mean there’s no place to start
 To find if a graph is a DAG, you can use Tarjan’s Strongly Connected Component algorithm to find cycles
 Trees by definition don’t have cycles, so they have a topological ordering
 Pick off the leaves
 Keep removing the leaves of the new tree to get topological ordering
Algorithm
 Pick an unvisited node
 Beginning with the selected node, do a DFS exploring only unvisited nodes
 On the recursive callback of the DFS (when you backtrack/pop off the stack frame), add the current node to the topological ordering in reverse order
Shortest/Longest Path in DAGs
Shortest Path in DAGs
The Single Source Shortest Path (SSSP) problem can be solved on a DAG in $O(V + E)$. Dijkstra’s might not work for negative weights, but the following algo does and is faster.

Generates the shortest path from a source node for ALL other nodes

“Relax each edge”, keep updating the path if it’s better than what’s already there for each node (similar to dijkstra’s)

Go through each node in topological ordering and for each adjacent node update the table of paths if a better path is found
d = [infinity, infinity, ..., infinity] # Keeps track of shortest paths (distance from start node) for each node
d[start_node] = 0
Generate topological ordering
for node, n, in the topological ordering:
for every node, v, adjacent to n:
if d[n] + weight(n, v) < d[v]:
d[v] = d[n] + weight(n, v)
Longest Path in DAG
 For a general graph finding the longest path is NPHard, but for a DAG it’s $O(V+E)$
 For DAG multiply edge values by 1 and find the shortest path and then multiply edge values by 1 again
Dijkstra’s Algorithm
 Dijkstra’s is a SSSP (Single Source Shortest Path)
 Specify a source node
 Get back the shortest path from the source to all other node
 Time Complexity can be $O(E\cdot \log(V))$
 Dijkstra constraint: All edges have to have nonnegative edges weights
 Once a node has been visited, its optimal distance cannot be improved (greedy manner)
Lazy Algorithm
 Create “dist” array where distance is positive infinity. Mark distance to start node as 0.
 Maintain priority queue of keyvalue pairs of (node index, distance) which tell you which node to visit next based on which node has the min distance.
 Insert (s, 0) into priority queue and loop until PQ is empty, pulling out the next most promising pair (node index, distance), which is the one with the smallest distance.
 Iterate over all edges outwards from the current node and relax each edge appending to new keyvalue pair to the priority queue for every relaxation
visited = [false, false, ..., false]
dist = [infinty, infinity, ... infinity]
dist[start_node] = 0
PQ = []
PQ.add(start_node, 0)
while (PQ is not empty):
v, oldDist = PQ.remove()
visited[v] = true
if dist[v] < oldDist: continue # This means this is a duplicate keyvalue pair. Means it's an old entry, so ignore it
for every node, n, adjacent to node v:
if visited[n] == true: continue
if dist[v] + edge(v, n) < dist[n]:
dist[n] = dist[v] + edge(n, v)
PQ.add((n, dist[n]))
return dist
 Note how the distance is stored in two places, the dist array and in the value of the keyvalue Priority Queue. This is what makes the implementation lazy, since we have to delete outdated distances in the priority queue whenever we pull out a pair from the PQ and it is greater than what’s in the dist array.
 In this context, a visited node is the same thing as processed node
 Note how there can be multiple entries in the PQ with the same node and different distances. This is because updating a keyvalue entry in a PQ takes linear time (since the PQ is sorted by value, not by key), so the workaround is just to add a new entry and ignore the old one.
Finding the Shortest Path

The above doesn’t actually give you the shortest path, just the length of the paths

To actually get the path, add a
prev = [null, null, null ... null]
array that keeps track of the node that comes before a node in the shortest path traversal 
Then set
prev[n] = v
ifdist[v] + edge(v, n < dist[n]
and add areturn (dist, prev)
at the end of the dijkstra function 
Then to find the shortest path, first make sure
dist[end_node] != infinity

Then reconstruct the shortest path by starting at the
end_node
and working backwards, using the prev array 
Return the path but reverse it since we worked backwards
Stopping Early
 If we’re only looking for the shortest path between two nodes, and we don’t need the shortest path from the source to ALL the other nodes, we can stop Dijkstra early once we visit the destination node, since once the destination node has been visited, its shortest distance will not change as more future nodes are visited.
 In the code, you can just add simple check to see if the node we processed is the end node
Eager Algorithm using Indexed Priority Queue
 Lazy version adds new keyvalue pairs $log(n)$ instead of updating values $O(n)$
 Lazy version is inefficient for dense graphs due to lots of stale keyvalue pairs
 In the Eager version, use a indexed priority queue to update the keyvalue pairs instead of just adding new ones
 Indexed priority queues allow updating a keyvalue pair (also called a decrease key operation) to be done much faster than linear time
Dary Heap
 A Dary heap can be used instead of an indexedpriority queue to make Dijkstra’s faster
 Dary Heap is where each node has D children
 Faster update (decreasekey) operations
 Slower removals
 Faster updates and slower removals works well with Dijkstra’s since Dijkstra’s does more updates than removals
 D = E/V for optimal performance
 Leads to $O(E\cdot \log_{E/V}(V))$
The State of the Art Heap for Dijkstra’s
 Fibonacci Heap
 $O(E + V\log(V))$
 Impractical in practice since Fibonacci heaps are difficult to implement and has a large amortized constant overhead (so only useful for really large graphs)
BellmanFord (BF) Algorithm
 Another SSSP (Single Source Shortest Path) algo
 Finds shortest path from one node to any other node
 $O(EV)$
 Worse than Dijkstra’s $O((E+V)\log(V))$
 Use BF when Dijkstra’s fails with negative cycles (Dijkstra would be stuck in an infinite loop as it finds better and better paths)
 Application of negative cycles: Arbitrage in financial markets
Algorithm
 E = Number of edges
 V = Number of vertices
 S = starting node
 D = array of size V that tracks the best distance from S to each node
 Set every entry in D to $+\infty$
 Set D[S] = 0
 Relax each edge V1 times
for (i = 0; i < V1; i++):
for edge in graph.edges:
// Relax edge (update D with shorter path
if (D[edge.from] + edge.cost < D[edge.to])
D[edge.to] = D[edge.from] + edge.cost
// Find nodes caught in negative cycle
for (i = 0; i < V1; i++):
for edge in graph.edges:
if (D[edge.from] + edge.cost < D[edge.to])
D[edge.to] = infinity
Floyd Warshall Algorithm
 AllPairs Shortest Path (APSP): Shortest path between all pairs of nodes
 $O(V^3)$ only good for graphs with a few hundred nodes
 Detects negative cycles and works with negative edges
 Bad for regular shortest path tho
Graph Setup
 Best way is using a 2d Adjacency Matrix, where m[i][j] represents the weight of the edge from node i to node j.
 If there is no edge from i to j, then set edge m[i][j] = $+\infty$
 IMPORTANT: If the programming language doesn’t support a constant infinity such that $\infty + \infty = \infty$, then don’t use 2^32  1 as infinity since it will cause an integer overflow; using a large constant like 10^7 might be better.
 The goal of FW is to eventually consider going through all possible intermediate nodes on paths of different lengths
Memo Table
 Use Dynamic Programming (DP) using a memo table
 In this case the memo table is a 3d matrix of size n x n x n
dp[k][i][j] = shortest path from i to j routing through nodes {0, 1, … k1, k}

Start with k=0, then k=1, then k=2, …

This gradually builds up the optimal solution routing through 0, then all optimal solutions routing through 0 and 1, then all optimal solutions routing through 0, 1, 2, etc. up until n1.

dp[n1] is the 2d Matrix solution we’re after
n = size of adjacency matrix
dp = memo table with the APSP solution
next = matrix used to reconstruct shortest path
function floydWarshall(m):
# Setup copies over values from m into dp
# and sets next[i][j] = j if m[i][j] != +infinity
setup(m)
for (k:= 0; k < n; k++):
for (i:=0; i < n; i++):
for (j:=0; j < n; j++):
if (dp[i][k] + dp[i][k] < dp[k][j]):
dp[i][j] = dp[i][k] + dp[k][j]
next[i][j] = next[i][k]
# Detect and propogate negative cycles
propogateNegativeCycles(dp, n)
# Return APSP matrix
return dp
 If a shortest path goes through a negative cycle, it is compromised and thus is invalid. To find these negative cycles, run FW a second time but if the distance can be improved set the optimal distance to be infinity
 Every edge (i, j) marked with infinity is either part of or reaches into a negative cycle.
function propogateNegativeCycles(dp, n):
for (k:= 0; k < n; k++):
for (i:=0; i < n; i++):
for (j:=0; j < n; j++):
if (dp[i][k] + dp[i][k] < dp[k][j]):
dp[i][j] = infinity
next[i][j] = 1
 To reconstruct a shortest path
function reconstructPath(start, end):
path = []
if dp[start][end] == +infinity: return path
at = start
for (; at != end; at = next[at][end]):
# If at is 1, then the shortest path is stuck in a negative cycle
if at == 1: return null
path.add(at)
if next[at][end] == 1: return null
path.add(end)
return path
Bridges and Articulation Points
 Bridge: Edge that increases number of connected components if removed
 Articulation Point: Node that increases the number of connected components when removed
Bridge Algorithm

LowLink value is the smallest id reachable from a node using forward and backward edges

Do a depth first search and label each node with an unique id that you keep incrementing as you go to more nodes

Find the low link value for each node

If id(e.from) < lowlink(e.to), then you have a bridge
 This algorithm works since the definition of a bridge really just describes when you don’t have another edge going back to the node where the bridge starts. If you have another edge looping back to that node, you would have a low link value at least equal to the starting node’s id.
 The above algo does one DFS and V more DFSs to find all the low link values, so you have $O(V(V+E))$.
 However, a single pass of DFS can update the lowlink values, giving $O(V+E)$
id = 0
g = adjacency list
n = size of the graph
ids = [0, 0, ..., 0] # length n
low = [0, 0, ..., 0] # length n
visited = [false, false, false, ..., false] # length n
function findBridges():
bridges = []
for (i = 0; i < n; i++):
if (!visited[i]):
dfs(i, 1, bridges)
return bridges
function dfs(at, parent, bridges):
visited[at] = true
id++
low[at] = ids[at] = id
# For each edge from node 'at' to node 'to'
for (to: g[at]):
if to == parent: continue
if (!visited[to]):
dfs(to, at, bridges)
low[at] = min(low[at], low[to])
if (ids[at] < low[to]):
bridges.add((at, to))
else:
# If we've already visited the 'to' node, there's still a chance
# that the low link value might have to be changed to the id of 'to'
low[at] = min(low[at], ids[to])
Articulation Point Algorithm
 On a connected component with three or more vertices, if an edge (u, v) is a bridge, then either either u or v is an articulation point.
 The above condition doesn’t capture all articulation points since it’s possible for articulation points to exist without bridges nearby)
 For the callback
low[at] = min(low[at], low[to])
after the DFS, ifid(e.from) == lowlink(e.to)
, then there is a cycle  The presence of a cycle also implies an articulation point
 The only exception when
id(e.from) == lowlink(e.to)
fails is when the starting node has 0 or 1 outgoing directed edges.  Here, the starting node (green) has only 1 outgoing edge, so the condition fails and there’s no articulation point
 However, when there’s more than one outgoing edge, the node can escape the cycle and become an articulation point
 For the callback
# Global variables accessible from any function
id = 0
g = adjacency list
n = size of the graph
outEdgeCount = 0
ids = [0, 0, ..., 0] # length n
low = [0, 0, ..., 0] # length n
visited = [false, false, false, ..., false] # length n
isArt = [false, false, ..., false] # length n
function findArtPoints():
for (i = 0; i < n; i++):
if (!visited[i]):
outEdgeCount = 0 # Reset
dfs(i, i, 1)
isArt[i] = (outEdgeCont > 1)
return isArt
function dfs(root, at, parent):
if (parent == root): outEdgeCount++;
visited[at] = true
id++
low[at] = ids[at] = id
# For each edge from node 'at' to node 'to'
for (to: g[at]):
if to == parent: continue
if (!visited[to]):
dfs(to, at, bridges)
low[at] = min(low[at], low[to])
if (ids[at] == low[to]):
isArt[at] = true
if (ids[at] < low[to]):
isArt[at] = true
else:
# If we've already visited the 'to' node, there's still a chance
# that the low link value might have to be changed to the id of 'to'
low[at] = min(low[at], ids[to])
Strongly Connected Components (SCC)
 SelfContained cycles within a directed graph where every vertex in the given cycle can eventually reach every other vertex in the same cycle
 All nodes in a the same SCC have the same low link values, but only if DFS traversal is done in the right order
Tarjan’s Algorithm
 Tarjan’s algorithm maintains a set (most times a stack) of valid nodes from which to update lowlink values from
 Nodes are added to teh stack of valid nodes as they’re explored for the first time
 Nodes are removed from the stack each a complete SCC is found
 If u and v are nodes we’re exploring
 To update u’s low link value to node v’s low link value, there has to be a graph of edges from u to v and node v must be on the stack
 Still $O(V+E)$
Algorithm
 Mark the id of each node as unvisited
 Start DFS anywhere. Assign id and lowlink value. Mark as visited and add to seen stack
 On DFS callback, if the node we were going to is on the stack, then set low link of the node we’re at to min of at node’s low link and to node’s lowlink
if (onStack[to]): low[at] = min(low[at], low[to]
 After visiting all neighbors in the current DFS call, if the current node started a connected component, then pop nodes off stack until current node is reached. (We know a node started a connected component if its id equals its low link value)
UNVISITED = 1
n = number of nodes in graph
g = adjacency list with directed edges
id = 0 # Used to give each node an id
sccCount = 0 # Used to count number of SCCs found
# Index i in these arrays represents node i
ids = [0, 0, ...0, 0]
low = [0, 0, ...0, 0] # Keeps track of lowlinks for all the n nodes
onStack = [false, false, ..., false]
stack = an empty stack data structure
function findSccs():
for (i = 0; i < n; i++):
ids[i] = UNVISITED
for (i = 0; i < n; i++):
if (ids[i] == UNVISITED):
dfs(i)
return low
function dfs(at):
stack.push(at)
onstack[at] = true
ids[at] = low[at] = id++
# Visit all neighbors & min lowlink on the callback
for (to: g[at]):
if (ids[to] == UNVISITED):
dfs(to)
if (onStack[to]):
low[at] = min(low[at], low[to])
# After visiting all the neighbors of "at"
# if we're at the start of a SCC, empty the seen stack
# until we're back to the start of the SCC
if (ids[at] == low[at]):
for (node = stack.pop(); ; node = stack.pop()):
onStack[node] = false
low[node] = ids[at]
if (node == at): break
sccCount++
Traveling Salesman Problem (TSP)
 Reworded Problem: Given a complete graph with weighted edges (as adjacency matrix), what is the Hamiltonian cycle (path that visits every node only once) of minimum cost?
 NPComplete: No known algorithm that runs in polynomial time
Brute Force Solutin
 Compute the cost of every possible tour. Try all possible permutations of node orderings which takes $O(n!)$
Dynamic Programming
 Reduces $O(n!)$ to $O(n^2 2^n)$
 Makes graphs up to 23 nodes feasible to solve the TSP for most home computers
Steps
 Pick a node $0 \le S < N$ to be the starting node, S, for the tour
 Compute the optimal solution when we have a path where n = 2, in other words a partial path with one edge. For this we need two things
 The set of visited nodes in the subpath
 The index of the last visited node in the path
 The above form the dynamic programming state. Since there are N possible nodes that could have visited last and 2^N possible subsets of visited nodes, the space needed to store the answer is $O(N2^N)$ since we need to store every possible combination for DP
 Note that we store the set of visited nodes in a single 32bit integer to save space instead of an array
 To solve paths of length $3 \le n \le N$, we’re going to take the solved subpaths from n1 which are cached, and add another edge extending to a node which hasn’t been visited from the last visited node
# m  2D adjacency matrix of the graph
# S  The start node (0 <= S < N)
function tsp(m, S):
N = matrix.size()
# Initialize memo table
# Fil table with null values or +infinity
memo = 2D table of size N by 2^N
setup(m, memo, S, N)
solve(m, memo, S, N)
minCost = findMinCost(m, memo, S, N)
tour = findOptimalTour(m, memo, S, N)
return (minCost, tour)
# Only generates tours with path length n=2 (two nodes)
function setup(m, memo, S, N):
for (i = 0; i < N; i++):
if i == S: continue
# Store the optimal value from node S
# to each node i (this is given as input
# in the adjacency matrix m).
memo[i][1 << S  1 << i] = m[S][i]
# Bits at "index" of S and "index" of i to 1 and the rest are zero
# If S was 5 and i was 4, then the 5th and 4th bit are set so 10000  1000 = 11000
# The memo[i] subarray keeps track of the visited nodes in a tour
# and the i keeps track of the node visited last that we're keep incrementing
function solve(m, memo, S, N):
# r keeps track of the number of nodes in a partial tour (gets incremented until N)
for (r = 3; r <= N; r++):
# The combinations function generates all bit sets
# of size N with r bits set to 1. For example,
# combinations(3, 4) = {011, 1011, 1101, 1110}
for subset in combinations(r, N):
# If S is not in the subset, then disgard that subset since it is invalid
# since the subset doesn't contain our starting node
if notIn(S, subset): continue
# next is the next node we're going to
for (next = 0; next < N; next++):
# Check if the subset we generated has the next node in it
# Basically run this loop body for every ndoe in the subset
if next == S  notIn(next, subset): continue
# The sbuset state without the next node
state = subset ^ (1 << next)
minDist = +infinity
# 'e' is short for end node
# Generate all possible end nodes and see which one is best
for (e = 0; e < N; e++):
# Invalid end node
if (e == S  e == next  notIn(e, subset)):
continue
newDistance = memo[e][state] + m[e][next]
if (newDistance < minDist):
minDist = newDistance
memo[next][subset] = minDist
function notIn(i, subset):
# Check if the ith bit is 0
return ((1 << i) & subset) == 0
function findMinCost(m, memo, S, N):
# END_STATE is a bitmask with all N bits set to 1
END_STATE = (1 << N)  1
minTourCost = +infinity
for (e = 0; e < N; e++):
if (e == S): continue
tourCost = memo[e][END_STATE] + m[e][S]
if tourCost < minTourCost: minTourCost = tourCost
return minTourCost
function findOptimalTour(m, memo, S, N):
lastIndex = S
state = (1 << N)  1 # end state
tour = array of size N+1
# Work backwards
# Start with end state and remove the nodes one by one to get path
for (i = N1; i >=1; i):
index = 1
# Find next node in path (in reverse order)
for (j = 0; j < N; j++):
if j == S  notIn(j, state): continue
if (index == 1) index == j
prevDist = memo[index][state] + m[index][lastIndex]
newDist = memo[j][state] + m[j][lastIndex]
if (newDist < prevDist) index = j
tour[i] = index
state = state ^ (1 << index) # remove the index node from the state since we're going backwards
lastIndex = index
tour[0] = tour[N] = S # Tour ends and starts at S
return tour
Eulerian Paths
 Eulerian Path: Path of edges that visits all the edges in a graph exactly once.
 Eulerian Circuit: Eulerian path where the start and end are the same node
 If you know that an eulerian circuit exists, you can start on any node
Eulerian Circuit  Eulerian Path  

Undirected Graph  Every vertex has an even degree.  Either every vertex has even degree or exactly two vertices have odd degree (start and end nodes) 
Directed Graph  Every vertex has equal indegree and outdegree.  At most one vertex has outdegreeindegree = 1 and at most one vertex has indegree  outdegree = 1 and all other vertices have equal in and out degrees. 
 Requirement for paths/circuits is that all vertices with nonzero degree need to belong to a single connected component
Finding Eulerian Paths (Hierholzer’s Algorithm)

Determine if an eulerian path exists using the table

For directed graphs, we have to start at the node with exactly one extra outgoing edge and the end has to be the node with exactly one extra incoming edge

If we just do a regular DFS, even when starting at the right node and we know an Eulerian path exists, it doesn’t give us an Eulerian path
 Modify the DFS so that when you get “stuck” (no more out degrees to go to that aren’t yet visited), backtrack and add the nodes to a stack.

$O(E)$ since computing in/out degrees + DFS are both linear in number of edges
# Global/class scope variables
n = number of vertices in graph
m = number of edges in graph
g = adjacency list of the directed graph
in = [0, 0, ..., 0] # length n
out = [0, 0, ..., 0] # length n
path = empty integer linkd list data structure
function fundEulerianPath():
countInOutDegrees()
if not graphHasEulerianPath(): return null
dfs(findStartNode())
if path.size() == m+1: return path
return null
function countInOutDegrees():
for edges in g:
for edge in edges:
out[edge.from]++
in[edge.to]++
function graphHasEulerianPath():
start_nodes, end_nodes = 0, 0
for (i = 0; i < n; i++):
if (out[i]  in[i]) > 1 or (in[i]  out[i])> 1):
return false
else if out[i]  in[i] == 1:
start_nodes++
else if in[i]  out[i] == 1:
end_nodes++
return (end_nodes == 0 and start_node==0) or (end_nodes == 1 and start_node == 1)
function findStartNode():
start = 0
for (i = 0; i < n; i++):
# Unique starting node
if out[i]  in[i] == 1: return i
# Start at any node with an outgoing edge (so we don't start on a singleton node)
if out[i] > 0: start = i
# The out array serves two purposes
# 1. track whether there are outgoing edges
# 2. Index into the adjacency list to select next outgoing edge
function dfs(at):
# While the current node stil has outgoing edges
while (out[at] != 0):
# Select the next unvisited outgoing edge
next_edge = g[at].get(out[at])
dfs(next_edge.to)
path.insertFirst(at)
Minimum Spanning Tree (MST)
 Minimum Spanning Tree: The subset of the edges in the graph which connects all vertices (without cycles) while minimizing total edge cost for an undirected graph with weighted edges
 MSTs aren’t necessarily unique for a graph, meaning there can be multiple valid solutions with the same edge cost
 A MST can only exist if all the nodes are part of a single connected component
 For an undirected graph we can store it as a directed graph with one incoming edge and outbound edge per edge in the undirected graph
Prim’s MST Algorithm

Greedy, works well for dense graphs

Doesn’t parallelize well

Lazy Version: $O(E \cdot \log(E))$

Eager version: $O(E\cdot \log(V))$
Lazy Prim’s MST Algo
Steps
 Maintain a min Priority Queue (PQ) that sorts edges by min edge cost. Used to determine next node to visit and the edge to get to that node.
 Start the algo at s. Mark s as visited and iterate over all edges of s and add them to PQ.
 While the PQ is not empty and a MST has not been formed, dequeue the next edge from the PQ.
 If dequeued edge is outdated (i.e. node it points to has already been visited), then skip and poll again
 Otherwise mark the current node as visited and add the edge to the MST
 Iterate over the current node’s edges and add all its edges to PQ. Do not add edges to PQ that have already been visited.
 We know that the algorithm is done when the number of edges in the MST is one less than the number of nodes (definition of a tree)
Eager Prim’s MST Algo
 Instead of “blindly inserting edges into a PQ which could later become stale”, the eager version tracks (node, edge) keyvalue pairs that can easily be updated and polled
 Each node in the graph is paired with only one of its incoming edges for a MST (except start node)
 Instead of adding edges to the PQ as we iterate over the edges, in the eager version, we relax (update) the destination node’s most promising incoming edge using an Indexed Priority Queue (IPQ)
 IPQ = PQ + Hashmap and supports polling and updating in logarithmic time
 Since there are V (node, edge) pairs for this algo in the IPQ, the poll/update operations will be $O(\log(V))$
 Thus the total time complexity would be $O(E\cdot \log(V))$ since we need to poll/update for every edge
Steps
 Create a min IPQ of size V that sorts vertexedge pairs (v, e) based on the min edge cost of e. By default all vertices v have a best value of (v, null) in the IPQ
 Start algorithm on the start node ‘s’ (any node). Mark s as visited and relax all edges of s. Relax means update the entry for node v in the IPQ from $(v, e_{old})$ to $(v, e_{new})$ if the new edge $e_{new}$ from u to v has a lower cost than $e_old$.
 While the IPQ is not empty and MST has not been formed, dequeue the next best (v, e) pair.
 Mark node v as visited and add edge e to the MST
 Relax all edges of v while making sure not to relax any edge pointing a node which has already been visited
n = # Number of nodes in the graph
ipq = # IPQ data structure, stores (node index, edge object) pairrs
# Edge objects consist of {start node, end node, edge cost} tuples
# IPQ sorts (node index, edge object) pairs based on min edge cost
g = # graph adjacency list
visited = [false, ..., false] # visited[i] tracks whether node i has been visited or not, size n
function eagerPrims(s = 0):
m = n  1 # number of total edges that should be in MST
# edgeCount is current number of edge in the partial MST
edgeCount, mstCost = 0, 0
mstEdges = [null, ..., null] # size m
relaxEdgesAtNode(s)
while (!ipq.isEmpty() and edgeCount != m):
destNodeIndex, edge = ipq.dequeue
mstEdges[edgeCount++] = edge
mstCost += edge.cost
relaxEdgesAtNode(destNodeIndex)
if edgeCount != m:
return (null, null)
return (mstCost, mstEdges)
function relaxEdgesAtNode(currentNodeIndex):
visited[currentNodeIndex] = true
edges = g[currentNodeIndex]
for (edge: edges):
destNodeIndex = edge.to
if visited[destNodeIndex]: continue
if !ipq.contains(destNodeIndex):
ipq.insert(destNodeIndex, edge)
else:
# Try improving cheapest edge entry in IPQ if already in there
ipq.decreaseKey(destNodeIndex, edge)
Kruskal’s Algorithm
 Select an edge of G of minimum weight
 Select a remaining edge of G of minimum weight
 Select any remaining edge of G of minimum weight that does not form a cycle with the previously selected edges
 Repeat 3. until n1 edges have been selected (MST formed)
Max Flow Ford Fulkerson
 Question: With an infinite input soure, how much “flow” can we push through the network without exceeding the capacity of any edge?
 Two special nodes: source and sink
 Solution to question finds the flow for each edge to maximize flow for the total graph/network. Initially, the flow is 0 for each edge.
 Remaining Capacity:
e.capacity  e.flow
 Augmenting Path: Path of edges in the residual graph with remaining capacity greater than zero from the source s to the sink t.
 Bottleneck: Smallest edge on the path (find this by subtracting current flow from capacity)
 Augmenting the Flow: Updating the flow values of the edges along augmenting path
 For forward edges, this means increasing the flow by the bottleneck value
 For back edges (called residual edges in this case), decrease the flow along each edge by the bottleneck value. A residual edge is in the opposite direction for every edge of the augmenting path
 Residual edges exist so that we can “undo” bad augmenting path decisions that don’t lead to maximum flow
 Every edge in the graph also starts out with a residual edge with flow/capacity of 0/0
 Residual Graph: The graph which also contains residual edges along with the regular edges
 The algorithm continues finding augmenting paths and augments the flow until no more augmenting paths from s to t exist.
 The sum of the bottlenecks fond in each augmenting path is equal to the max flow.
 Time complexity is dependent on how we find the augmenting paths
 If augmenting paths are found with DFS then the time complexity is $O(fE)$, where $f$ is maximum flow and $E$ is number of edges
Other Implementations of FordFulkerson
 EdmondsKarp: Use BFS to find augmenting paths, $O(E^2 V)$
 Capacity Scaling: Heuristic to pick larger paths first, $O(E^2 \log(U))$
 Dinic’s Algorithm: Uses combination of DFS + BFS to find augmenting paths, $O(V^2 E)$
Unweighted Bipartite Graph Matching  Network Flow
 Bipartite Graph: Vertices can be split into two independent groups U, V such that every edge connects between U and V.
 Other definitions: The graph is two colorable or there is no cycle with an odd length
 Maximum Cardinality Bipartite Matching (MCBM): When we’ve maximized the pairs that can be matched with each other. What we’re usually interested in
Planar Graphs
 Planar Graph: Graph that can be drawn on one plane without any edges crossing/intersecting.
Euler’s Formula for Connected Planar Graphs
$$ v  e + r = 2$$ where v is the number of vertices, e is the number of edges, and r is the number of regions.
Proof
 Case 1: Graph is tree so use the fact that v=n and e=n1
 Case 2: Removing edges until there is only one region turns the graph into a tree so that there are only $e(r1)$ edges
Other Properties
 $3r \le 2e$
 $e \le 3v  6$
Image Credits
 David Eppstein
 Maksim
 Dcoetzee
 Rest of the graph diagrams based on Mr. Fiset’s videos