Reaves.dev

v0.1.0

built using

Phoenix v1.7.20

Strongly Connected Components

Stephen M. Reaves

::

2025-02-13

Graph algorithms to find and use SCCs

Summary

We can use DFS to sort various types of graphs. Some graphs require multiple runs and reversing the graph, but the runtime is still O(n+m)

Contents

Undirected Graphs

How to get connected components in undirected graph G?

  • Run DFS and keep track of component #
DFS(G):
  input: G=(V,E) in adjacency list representation
  output: vertices labelled by connected components
  
  cc=0
  for all v in V, visited(v) = False
  for all v in V:
    if not visited(v):
      cc++
      Explore(v)

Explore(z):
  ccnum(z) = cc
  visited(z) = True
  for all (z,w) in E:
    if not visited(w):
      Explore(w)

Running time is O(n+m) where n=V,m=E O(n+m) \text{ where } n=\lvert V \rvert , m = \lvert E \rvert

Paths

How do I find a path between two connected vertices?

Keep track of predecessor vertex

DFS(G):
  input: G=(V,E) in adjacency list representation
  output: vertices labelled by connected components
  
  cc=0
  for all v in V:
    visited(v) = False
    prev(v) = NULL
  for all v in V:
    if not visited(v):
      cc++
      Explore(v)

Explore(z):
  ccnum(z) = cc
  visited(z) = True
  for all (z,w) in E:
    if not visited(w):
      prev(w) = z
      Explore(w)

Directed Graphs

Use DFS and pre/post-order numbers

DFS(G):
  input: G=(V,E) in adjacency list representation
  output: vertices labelled by connected components
  
  clock = 1
  for all v in V:
    visited(v) = False
    prev(v) = NULL
  for all v in V:
    if not visited(v):
      Explore(v)

Explore(z):
  pre(z) = clock++
  visited(z) = True
  for all (z,w) in E:
    if not visited(w):
      prev(w) = z
      Explore(w)
  post(z) = clock++
GBBAAB->ACCB->CEEB->EDDA->DGGD->GHHD->HFFC->FF->BE->AE->G
GBB1,16AA2,11B->AEE4,7B->ECC12,15B->CDD3,10A->DD->EGG5,6D->GHH8,9D->HE->AE->GH->GFF13,14C->F

For tree (black) edges going from z to w, the post-order number of z will be higher than w

post(z)>post(w) post(z) > post(w)

Of the blue edges:

  • Back edges
    • like EA E \rightarrow A
    • post(z)<post(w) post(z) < post(w)
  • Forward edges
    • like BE B \rightarrow E
    • post(z)>post(w) post(z) > post(w)
      • Like black edges, but order difference > 1
  • Cross edges
    • like HG H \rightarrow G
    • post(z)>post(w) post(z) > post(w)

Back edges are the only ones where post(z)<post(w) post(z) < post(w)

Cycles

G has a cylce iff its DFS tree has a back edge

DAG

No cycles = no back edages

Topologically sorting a DAG = order all vertices so that all edges go lower to higher

Run DFS on G, then order vertices by decreasing post-order number

  • Since the maxiumum post-order number is bounded by 2n, and we have all the numbers, and they’re all distinct, we can sort them in linear time with bucket/radix sort
Source vertex
No incoming edges
Sink vertex
No outgoing edges

A DAG always has at least one source and one sink. Highest post order number is always a source, lowest is a sink.

Alternative way to sort is to repeatedly find sinks and remove them. Then we just output the sinks in reverse order.

Connectivity

Strongly Connected
Vertices v and w are strongly connected if there is a path from v to w and from w to v.

SCCs are the maximal set

GAABBA->BCCB->CDDB->DEEB->EFFC->FGGF->GIIF->IG->CG->FE->BLLE->LJJI->JL->IKKJ->KHHJ->HK->LH->IH->J
Gcluster_acluster_bcluster_ccluster_dcluster_hAABBA->BEEB->ECCB->CDDB->DE->BLLE->LFFC->FGGF->GIIF->IG->CG->FHHH->IJJH->JI->JJ->HKKJ->KK->LL->I
Gcluster_acluster_bcluster_ccluster_dcluster_hAABBA->BEECCB->CDDB->DHHB->HLLFFC->HGGIIJJKK

The metagraph is alwasy a DAG

SCC Algo Idea

Find sink SCCs, output it, remove it, and repeat

We can’t use sources because when we run Explore(v) in a sink SCC, we only see vertices in that SCC. Running Explore(v) on anything else would “leak” other vertices.

In a DAG, the lowest post order number is guaranteed to be a sink. This is not true in general. But the highest post order number is gauranteed to be in a source SCC.

If we reverse all the edges in the graph, then find a source, when we flip the graph back that source is now a sink.

Gcluster_acluster_bcluster_ccluster_dcluster_hAABBB->AEEB->EE->BCCC->BGGC->GFFF->CF->GG->FDDD->BHHJJH->JIII->FI->HLLI->LJ->HJ->IKKK->JL->EL->K

Then run DFS

Gcluster_acluster_dcluster_lCC1,12GG2,5C->GBB6,11C->BFF3,4G->FAA7,8B->AEE9,10B->EDD12,13LL14,23KK15,22L->KJJ16,21K->JHH17,18J->HII19,20J->I

This gives us L,K,J,I,H,D,C,B,E,A,G,F in decreasing post order number.

We start at the left and run DFS on the original graph, removing every node we see. This will output the meta vertices in reverse topological order

SCC(G):
  input: Directed G=(V,E) in adj. list
  
  1. Construct rev_G
  2. Run DFS on rev_G
  3. Order V by decreasing post order
  4. Run DFS (undirected connected components) on the original graph