Reaves.dev

v0.1.0

built using

Phoenix v1.7.12

Distributed Transactions

Stephen M. Reaves

::

2023-02-20

Notes about Lesson 9 of CS-7210

Required Readings

Optional

Summary

What are Distributed Transactions

Group of operations that are applied together, over multiple nodes

ACID properties

Useful for concurrency and fault tolerance

Outcomes are commit or abort

Typically done with coordinator (leader)

Spanner Brief

Data is geographically distributed

Data is sharded within a locations

Data is replicated across multiple sites within a location

Spanner Stack

3 Colossus FS nodes, which are optimized for reads and appends

Big Table on top of Colossus which exports files into application specific data model

Data model is grouped into tablets, which are groups of related objects

Megastore, which is a Replicated State Machine (w/ Paxos) per tablet

Consistency Requirements

Read operations on are replica of state, not current state, to prevent blocking

External consistency := order of events in the system must match the order in which events appeared in the real world clock

Requires strict serializability and common global clock

True Time

TT != absolute real time

TT is uncertainty around real time

You can define:

Periodic probing of master clock servers in datacenter

  1. Acquire locks
  2. Pick time s = TT.now().latest
  3. Do operation
  4. Wait until TT.now().earliest > s
  5. Release locks

Commit wait may be much longer than actual operation

Ordering Write Transactions with TrueTime Timestamps

Pessimistic Locking := acquiring all locks upfront to prevent later conflicts

On a single replica set:

  1. Acquire Locks
  2. Pick TT Timestamp s
  3. Leader starts consensus
  4. Achieve consensus
  5. Commit wait done
  6. Release locks
  7. Notify slaves

On transactions across multiple replica sets, we add 2PC:

  1. Each replica acquires locks
  2. Each replica picks TT Timestamp s
  3. Transaction starts executing as before
  4. Each replica set starts logging updates they are to perform
  5. Done logging, each node sends it’s s to transaction coordinator to compute overall timestamp s for transaction
  6. Commit wait
  7. Release locks on transaction coordinator
  8. Notify participants
  9. Participants release locks

Read Transactions

2 Types of transactions, Read Now and Read Later

Read Now

Still need to be externally ordered

Use leader info to determine a safe timestamp

Potentially delay until it’s safe time is reached

Prepared but not committed transactions delay reads

Read Later

Simply read at specific timestamp

Timestamp saves us from using distributed cut algorithm

No TrueTime

GPS + atomic clocks => few ms uncertainty

What if we don’t have GPS + atomic clocks?

NTP => ~100ms uncertainty

Make it visible when/if external consistency is needed, only perform expensive wait when it’s needed

[Cockroach DB](https://www.cockroachlabs.com/docs/stable/architecture/transaction-layer.html does this

Snapshot isolation (or Multi-Version Concurrency Control) allows transactions to be interleaved, even when operating on same data

To guarantee correctness:

Another Distributed DB Example: AWS Aurora

Follows Primary-Replica architecture

Primary handles reads and writes, multiple replicas handle reads only

Designed for availability

Replicated across 3 zones, 2 nodes per zone

6 replicas = I/O Amplification (each write turns into 6 writes)

Log replication over S3 solves this