Reaves.dev

v0.1.0

built using

Phoenix v1.7.17

Communication

Stephen M. Reaves

::

2024-02-05

Notes about Lecture 4c for CS-6210

Summary

Barrier Synchronization

All threads wait at a point until all threads reach that point

Centralized Barrier

AKA counting barrier

Need to make sure all threads stop spinning before you reset counter

int count = N;
decrement(count); // atomic
if (count == 0) {
  count = N;
}
else {
  while (count > 0); // Wait for all threads to decrement
  while (count != N); // Wait for counter to reset
}

Sense Reversing Barrier

Like counting barrier, but one less spin per critical section

Share count plus sense

spin on sense reversal

Tree Barrier

A tree of sense reversal barriers

Once one barrier unlocks, you decrement parent barrier and wait for that to unlock

More complexity, but lowers contention on any one specific lock

MCS Tree Barrier (4-ary arrival)

4-ary tree to mark when all processes have reached a barrier

Two data structures associated with every parent

Position in the HC vector is statically determined

MCS Tree Barrier (binary wakeup)

binary tree

root node wakes up two children

Tournament Barrier

N Playerslog2(N) rounds N \text{ Players} \Rightarrow log_2(N) \text{ rounds}

Bracket of 1v1 where winner of each round is fixed

Winner progesses to next round

Everybody waits until tourney is over

Works even if system is NOT a shared memory machines

Dissemination Barrier

N need not be a power of 2

For each round(k), PiPi+2kmodN P_i \rightarrow P_{i+2^k} \mod N

Once a processor has sent and received a message, it knows the round is done as fas as its concerned.

log2(N) \lceil \log_2 (N) \rceil rounds

Communication complexity is O(Nlog(N))

For N=5, 3 rounds

Round 0

Gcluster_orderingp0p0p1p1p0->p1p2p2p1->p2p3p3p2->p3p4p4p3->p4p4:nw->p0:ne

Round 1

Gcluster_orderingp0p0p1p1p2p2p0:se->p2:swp3p3p1:se->p3:swp4p4p2:se->p4:swp3:nw->p0:nep4:nw->p1:ne

Round 2

Gcluster_orderingp0p0p1p1p4p4p0:ne->p4:nwp1->p0p2p2p2->p1p3p3p3->p2p4->p3

After 3 rounds, all are “awake”

No hierarchy

Performance Evaluation

Which spin algorithm are fastest on which architectures? What about barrier?