Synchronization
Summary
Primitives
Lock makes sures it is the only thing accessing a shared piece of data
- Exclusive lock
- One at a time
- Shared lock
- Read only access
Barrier Synchronization limits execution of all threads until everybody reaches a particular point in execution.
R/W locks allow one writer xor multiple readers
Atomic Operations
We need to be able to Read-Modify-Write atomically to use Locks properly
- Test-and-Set
- Fetch-and-Increment
Scalability Issues
Latency
- Memory access now requires more time to acquire lock
Waiting Time
- Memory access may require waiting for someone to release lock
Contention
- Once the lock is released, there may be a “fight” between multiple threads
to access lock
- How long does it take for one to be a winner?
Spinlock
Naive Spinlock
Spin on Test and Set
L can either be locked or unlocked
while (test_and_set(L) == locked);
Caching Spinlock
Spin on read
In cache-coherent systems, the cache needs to be updated after memory writes, so threads can spin on cached value of lock.
- Still need to do test and set after cache is updated
while (L == locked);
if (test_and_set(L) == locked) go_back();
In write-invalidate cache coherence systems, acquiring the lock will result in n*n operations
- Everybody will try to invalidate everybody else’s cache
Spinlocks With Delay
while (L == locked || test_and_set(L) == locked) {
sleep(rand() % 5); // or whatever number
}
Exponential delay increases the amount of time we wait
Ticket Lock
More concerned with fairness
Assign tickets like at restaurants
struct lock {
int next_ticket;
int now_serving;
}
void acquire_lock(lock L) {
int my_ticket = fetch_and_inc(L->next_ticket);
while (L->now_serving != my_ticket) {
sleep(my_ticket - L->now_serving);
}
return;
}
void release_lock(lock L) {
L->now_serving++;
return;
}
Spinlock Summary
- Read + test_and_set
- not fair
- test_and_set with delay
- not fair
- Ticket lock
- Fair but noisy
Ideally we would only signal one thread and not have all threads fight at once.
Queuing Lock
Array Based Queuing Lock
Associated with each lock, an array of flags (queue, ring buffer)
- One element in array for each cpu
- Two states
- has-lock
- must-wait
- Not statically assigned to each processor
- Assigned first come first serve
- Also has queue_last variable pointing to the open index
void lock(L l) {
my_place = fetch_and_inc(queulast);
while (flags[my_place % N] == must_wait);
}
void unlock(L l) {
flag[current % N] = must_wait;
flag[current +1 % N] = has_lock;
}
Only one atomic operation per critical section
First come, first serve
No contention due to my_place
being distinct per thread
- only one lock is signaled on unlock
Need O(N) memory for each lock in algorithm
- where N is number of possible threads
Linked List Based Queuing Lock
For each lock
struct node {
bool got_it;
node next;
}
Head is a dummy node pointing to nil
Dummy node always points to last-requester. Currently running node points to next, which may or may not be last-requester
Waiting nodes spin on got_it
flag. When node is done, set next->got_it = true;
// How to make this atomic?
void join(L l, node me) {
l->last_requester->next = me;
l->last_requester = me;
}
/*
void atomic_fetch_and_store(l, me) {
return what was stored in l
store me in l
node ret = l
l = me;
return ret
}
*/
void lock(L l, node me) {
join(l, me);
while (me.got_it == false);
}
// Race conditions
void unlock(L l, node curr) {
if (curr->next == NULL) {
// atomic
if (l->last_requester = me){
l->last_requester = NULL;
}
// comp_and_swap(L, me, nil)
// end atomic
} else {
curr->next.got_it = true;
}
free(curr);
}
Fair lock
Unique spin location
- lowers contention
Mostly 1 atomic operation per critical section
- edge-case where second atomic operation is needed when queue is empty
Space complexity is relative to nodes actively waiting on lock
- better than array based queuing lock
linked list maintenance can be a chore