Support For Data Center Based Distributed Computing
Required Readings⌗
- Challenges and Solutions for Fast Remote Persistent Memory Access
- LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation
Optional⌗
Summary⌗
- Datacenter Trends
- What is RDMA
- RDMA Specialized RPC
- What If Memory Is Persistent
- Disaggregation
- Disaggregating CPU and Memory with LegoOS
- LegoOS Select Experimental Result
Datacenter Trends⌗
Fall off of Moore’s law leads to specialization and heterogeneity compute
- GPUs
- TPUs
- New Memory/Storage classes
Disaggregation := independently scaled tiers of different resources
What is RDMA⌗
Remote Direct Memory Access
Bypass CPU involvement in data access via interconnect
Higher bandwidth and lower latency, but higher cost per port
Two sided RDMA
- Traditional send/recv semantics like sockets
One sided RDMA
- CPU access remote memory directly
- Memory needs to be pinned and not swapped out to disk
RDMA Specialized RPC⌗
One sided RDMA is faster, but needs a redesign
- Plus RPC typically has a service that needs to be accessed anyway
Instead we can create a new type of RPC that leverages RDMA features
- Connectionless protocols
- Shared receive queues
What If Memory Is Persistent⌗
Intel Optane was Byte-Addressable Persistent Memory (PMEM)
Persistent data operations require flush to persistent memory
- Must complete before client is acknowledged
- Removes advantage of RDMA over send/recv RPC
Disaggregation⌗
Pools of network attached but independently scaled resources
Not new, but easier with faster interconnects and smarter devices
Disaggregating CPU and Memory with LegoOS⌗
Colocate Virtual Memory System onto MMU instead of CPU
Cache misses now have to go over the network which is much slower and latent
- Help this by adding large CPU “Extended Cache”
LegoOS Select Experimental Result⌗
Prototype implemented in emulation
Monolithic servers, but all but some resources ignored
- So “Network attached Hard drives” were regular servers that didn’t utilize CPU, RAM, etc
Controllers implemented in Linux
Connected via RDMA network, communicated via RPC
Actual system designs:
- HPE “The Machine”
- Interconnects for Fabric Attached Memory (OpenFAM)
- Berkeley Firebox system
Baseline Comparisons:
- Linux with SSD Swap
- Linux with Ramdisk Swap
- Linux with InfiniSwap
- LegoOS
Workloads:
- Unmodified TensorFlow running CIFAR-10
- Working set: 0.9G
- 4 threads
LegoOS was much better