Synchronization

Stephen M. Reaves

2024-02-01

Notes about Lecture 4b for CS-6210

Summary

Primitives
- Atomic Operations
- Scalability Issues
Spinlock
Queuing Lock
- Array Based Queuing Lock
- Linked List Based Queuing Lock

Primitives

Lock makes sures it is the only thing accessing a shared piece of data

Exclusive lock
- One at a time
Shared lock
- Read only access

Barrier Synchronization limits execution of all threads until everybody reaches a particular point in execution.

R/W locks allow one writer xor multiple readers

Atomic Operations

We need to be able to Read-Modify-Write atomically to use Locks properly

Test-and-Set
Fetch-and-Increment

Scalability Issues

Latency

Memory access now requires more time to acquire lock

Waiting Time

Memory access may require waiting for someone to release lock

Contention

Once the lock is released, there may be a “fight” between multiple threads to access lock
- How long does it take for one to be a winner?

Spinlock

Naive Spinlock

Spin on Test and Set

L can either be locked or unlocked

while (test_and_set(L) == locked);

Caching Spinlock

Spin on read

In cache-coherent systems, the cache needs to be updated after memory writes, so threads can spin on cached value of lock.

Still need to do test and set after cache is updated

while (L == locked);
if (test_and_set(L) == locked) go_back();

In write-invalidate cache coherence systems, acquiring the lock will result in n*n operations

Everybody will try to invalidate everybody else’s cache

Spinlocks With Delay

while (L == locked || test_and_set(L) == locked) {
  sleep(rand() % 5); // or whatever number
}

Exponential delay increases the amount of time we wait

Ticket Lock

More concerned with fairness

Assign tickets like at restaurants

struct lock {
  int next_ticket;
  int now_serving;
}

void acquire_lock(lock L) {
  int my_ticket = fetch_and_inc(L->next_ticket);

  while (L->now_serving != my_ticket) {
    sleep(my_ticket - L->now_serving);
  }

  return;
}

void release_lock(lock L) {
  L->now_serving++;

  return;
}

Spinlock Summary

Read + test_and_set
- not fair
test_and_set with delay
- not fair
Ticket lock
- Fair but noisy

Ideally we would only signal one thread and not have all threads fight at once.

Queuing Lock

Array Based Queuing Lock

Associated with each lock, an array of flags (queue, ring buffer)

One element in array for each cpu
Two states
- has-lock
- must-wait
Not statically assigned to each processor
- Assigned first come first serve
Also has queue_last variable pointing to the open index

void lock(L l) {
  my_place = fetch_and_inc(queulast);
  while (flags[my_place % N] == must_wait);
}

void unlock(L l) {
  flag[current % N] = must_wait;
  flag[current +1 % N] = has_lock;
}

Only one atomic operation per critical section

First come, first serve

No contention due to my_place being distinct per thread

only one lock is signaled on unlock

Need O(N) memory for each lock in algorithm

where N is number of possible threads

Linked List Based Queuing Lock

For each lock

struct node {
  bool got_it;
  node next;
}

Head is a dummy node pointing to nil

Dummy node always points to last-requester. Currently running node points to next, which may or may not be last-requester

Waiting nodes spin on got_it flag. When node is done, set next->got_it = true;

// How to make this atomic?
void join(L l, node me) {
  l->last_requester->next = me;
  l->last_requester = me;
}

/*
void atomic_fetch_and_store(l, me) {
  return what was stored in l
  store me in l
  node ret = l
  l = me;
  return ret
}
  */

void lock(L l, node me) {
  join(l, me);
  while (me.got_it == false);
}

// Race conditions
void unlock(L l, node curr) {
  if (curr->next == NULL) {
    // atomic
    if (l->last_requester = me){
      l->last_requester = NULL;
    }
    // comp_and_swap(L, me, nil)
    // end atomic
  } else {
    curr->next.got_it = true;
  }
  free(curr);
}

Fair lock

Unique spin location

lowers contention

Mostly 1 atomic operation per critical section

edge-case where second atomic operation is needed when queue is empty

Space complexity is relative to nodes actively waiting on lock

better than array based queuing lock

linked list maintenance can be a chore