Support For Data Center Based Distributed Computing
Required Readings
- Challenges and Solutions for Fast Remote Persistent Memory Access
- LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation
Optional
Summary
- Datacenter Trends
- What is RDMA
- RDMA Specialized RPC
- What If Memory Is Persistent
- Disaggregation
- Disaggregating CPU and Memory with LegoOS
- LegoOS Select Experimental Result
Datacenter Trends
Fall off of Moore’s law leads to specialization and heterogeneity compute
- GPUs
- TPUs
- New Memory/Storage classes
Disaggregation := independently scaled tiers of different resources
What is RDMA
Remote Direct Memory Access
Bypass CPU involvement in data access via interconnect
Higher bandwidth and lower latency, but higher cost per port
Two sided RDMA
- Traditional send/recv semantics like sockets
One sided RDMA
- CPU access remote memory directly
- Memory needs to be pinned and not swapped out to disk
RDMA Specialized RPC
One sided RDMA is faster, but needs a redesign
- Plus RPC typically has a service that needs to be accessed anyway
Instead we can create a new type of RPC that leverages RDMA features
- Connectionless protocols
- Shared receive queues
What If Memory Is Persistent
Intel Optane was Byte-Addressable Persistent Memory (PMEM)
Persistent data operations require flush to persistent memory
- Must complete before client is acknowledged
- Removes advantage of RDMA over send/recv RPC
Disaggregation
Pools of network attached but independently scaled resources
Not new, but easier with faster interconnects and smarter devices
Disaggregating CPU and Memory with LegoOS
Colocate Virtual Memory System onto MMU instead of CPU
Cache misses now have to go over the network which is much slower and latent
- Help this by adding large CPU Extended Cache
LegoOS Select Experimental Result
Prototype implemented in emulation
Monolithic servers, but all but some resources ignored
- So Network attached Hard drives were regular servers that didn’t utilize CPU, RAM, etc
Controllers implemented in Linux
Connected via RDMA network, communicated via RPC
Actual system designs:
- HPE The Machine
- Interconnects for Fabric Attached Memory (OpenFAM)
- Berkeley Firebox system
Baseline Comparisons:
- Linux with SSD Swap
- Linux with Ramdisk Swap
- Linux with InfiniSwap
- LegoOS
Workloads:
- Unmodified TensorFlow running CIFAR-10
- Working set: 0.9G
- 4 threads
LegoOS was much better