Reaves.dev

v0.1.0

built using

Phoenix v1.7.12

Consensus In Distributed Systems

Stephen M. Reaves

::

2023-02-06

Notes about Lesson 5 of CS-7210

Required Readings

Summary

What is a Consensus

Consensus := the ability for multiple separate processes in a distributeds system to agree on something.

Consensus is needed to agree on the outcome of a transaction.

3 key properties:

System Model

  1. System is async
    • Messages may be reorded and delayed, but not corrupted
  2. At most one faulty processor
  3. Fail-stop failure model
    • All failures are indistinguishable from a message delay

Real systems are more complex

Definitions

Admissable Run := a run with 1 faulty processor and all messages eventually delivered (matches system model)

Deciding run := Admissable run where some non-faulty processor reaches a decision

Totally Correct Consensus Protocol := a system where all admissable runs are deciding runs

Uni-valent configuration := a deciding run where only one decision is possible

FLP Theorem

Dijkstra award for Proving the impossibility of consensus using async communication.

In a system with one fault, no consensus protocol can be totally correct.

Proof

Proof in a nutshell

Start with a system where nodes can either decide on 0 or 1.

Lemma 2: There is an initial configuration for which the final decision is not predetermined, but depends on the schedule of events (initial bivalent configuration)

There must be an event that changes from a bivalent to univalent configuration.

It is possible to delay/reorder messages in such a way so that the system never goes through the valence transition.

Is Consensus Really Impossible?

Faults are inevitable

Network delays are inevitable

Cannot expect a stronger system model

=> Impossible to have a correct distributed system?

There are examples of correct distributed systems:

These do NOT contradict the FLP result, rather they change some of the assumptions and system prooperties

Under which conditions will it provide us consensus?