Consistency In Distributed Data Stores
Required Readings
Summary
- Why is Consistency Important and Hard
- Key Value Store
- Memcached
- Look-Aside Cache Design
- Mechanisms in Memcached
- Causal Plus Consistency
Why is Consistency Important and Hard
It’s important because distributed systems are to appear as one logical entity but it’s hard because we contain multiple copies of state and those copies might be out of sync at any given time.
We make guarantees about the ordering of the updates in the system and how they are going to be visible to the read operations on the system.
Example consistency models:
- Strong Consistency
- guarantees the real ordering will be available to all reads
- Sequential Consistency
- guarantees a single ordering of all the writes will be seen by all of the participants
- Not necessarily the real ordering
- Causal Consistency
- Only enforces happens before relationship
- No guarantees on concurrent events
- Eventual Consistency
- There are periods where read operations are not up to date, but all writes will become visible as long as partitions/failures are not permanent
Ordering is from most consistent to most available
Key Value Store
Like hash map
Memcached
Designed and released by Facebook in 2013
In-memory KV store
Replaced by Tao a few months later
Tao is better for graph traversals, which Facebook used a lot
Memcached is used in different contexts like within cluster and across geos
Look-Aside Cache Design
Workload can be considered for cache if:
- very read intensive
- large data capacity
- hot data has temporal locality
Cache sits beside the db, not in front
- Explicit GET call to cache
- Cache replies Miss
- Explicit call to DB (using SQL-native semantics)
- Explicit SET call to cache
Any updates to DB must call DELETE on cache
Memcached discards based on LRU
Cache is clean, but non-authoritative
Explicit nature allows client applications to decide on consistency vs speed trade off
Mechanisms in Memcached
Concurrent DBRead
-> DBSet
-> CacheSet
operations can arrive out of order
which can lead to incorrect cache states
Lease
s are used to fix this.
Memcached leases are tokens issued on cache miss with time bounds to soft-lock writes on certain objects (among other issues).
Limiting the number of tokens issues can (partially) solve the Thundering Herd problem
Single instance of MC has finite memory, data is sharded by multiple instances
- shard boundaries can be adjusted
Routing is handled by clients to keep MC simple
- clients use
mcrouter
library
Multiple MC clusters can be mirrored to help with single object performance and add failure domains
Invalidations are driven by DB in commit order
Cannot use same consistency mechanisms for cross-datacenter clusters
MC expects replication at the storage layer for cross-datacenter clusters
Cross-datacenter traffic order:
- GET on remote MC, setting
remote
flag - Write to master DB
- DELETE from MC
- Master DB replicates to slave DB
- MC deletes remote marker
Causal Plus Consistency
Causal dependencies are not visible at system level
Client library embeds ordering metadata on PUT requests. (PUT_AFTER)