State In Distributed Systems
Required Readings
Summary
- Terminology
- Challenges
- System model
- Finding a Consistent Cut
- Global State
- Properties of a Global State
- Benefits of a Global State
Terminology
Distributed System := a collection of nodes and the communication channels between them
Nodes communicate by sending messages and the sending/receiving of a
message constitutes and Event
.
Process state := most recent event at that node
Channel state := in flight messages
System/Global state := collection of state from all of the nodes/processes and channels
Every event changes the state of at least one entity, which corresponds to a change of state in the entire system.
State := point in time of execution
Run := a subsequence of events
A run may be actual or observed
Actual Run := exactly what happened
Observed Run := what could have potentially happened
Cut := vertical slice of state at a given time
Prerecording Event := an event that happens before a cut
Postrecording Event := an event that happens after a cut
Consistent Cut := a cut where for all prerecording events , if , then must also be prerecording.
Challenges
Instaneous recordings are not possible
- No global clock
- Random network delays
- Non-deterministic computation
System model
Processes are vertices
Channels are directed, FIFO, error-free edges
Finding a Consistent Cut
Chandy & Lamport Algorithm
Initiator node := the node that starts the algorithm snapshot
Initiator node:
- Saves local state
- Sends marker token on all outgoing edges
All other processes:
- On receiving the first marker on any incoming edge
- Save state and propogate markers on all outgoing edge
- Resume execution, but also save incoming messages until a marker
arrives through the channel
- Marks all messages between first marker and marker for channel as in-flight
Assumptions
- No failures
- Messages arrive intact and only once
- Channels are unidirectional and FIFO
- Snapshot algorithm does not interfere with normal execution of processes
- Each process records its local state and state of all incoming channels
- Including enough storage to hold state
Global State
Chandy & Lamport Algorithm Features
- Does NOT promise to give us exactly what is there
- Does give consistent state
Recorded global state does not necessarily correspond to any real global state, only possible global state.
Permutations of possible global state are ok, as long as they don’t break causal relationships.
Properties of a Global State
stateDiagram-v2
Initial --> S
Initial --> S1
Initial --> S*
S --> Final
S1 --> Final
S* --> Final
If the state recorded (S*
), and S*
is reachable from Initial
and Final
is reachable from S*
, then S*
is a possible global state
Benefits of a Global State
Allows us to determine stable properties.
Stable Property := a property that once it becomes true for a state
S
, it remains true for all reachable statesS'
.
Unstable Property := a property that makes no gaurantee about truthiness across states
Examples of Stable Properties
- Deadlock
- Token loss
- Termination
Examples of Unstable Properties
- Buffer overflow
- Load spike
- Race condition