Shared Memory Multiprocessor OS
Summary
- OS for Parallel Machines
- Refresher on Page Fault Service
- Parallel OS and Page Fault Service
- Recipe for Scalable Structure in Parallel OS
- Tornado’s Secret Sauce
- Traditional Structure
- Objectization of Memory Management
- Objectized Structure of VM Manager
- Advantages of Clustered Object
- Implementation of Clustered Object
- Non-Hierarchical Locking
- Dynamic Memory Allocation
- IPC
- Tornado Summary
- Summary of Ideas in Corey System
OS for Parallel Machines
Challenges:
- NUMA effects
- Deep memory hierarchy
- False sharing
Principles
Cache Conscious Decisions:
- Limit shared system data structures
- Keep memory local
Refresher on Page Fault Service
Parallel OS and Page Fault Service
Easy Scenario:
- Multiprocess workload
- Threads are completely separate from each other
- Page faults in nodes can be handled independently
Hard Scenario:
- multithreaded workload
- one process creates multiple threads with a shared address space
Recipe for Scalable Structure in Parallel OS
For every subsystem:
- Determine functional needs of that service
- To ensure concurrent execution of service
- Minimize shared data structures
- Less sharing -> more scalable
- Where possible replicate/partition system data structures
- less locking
- more concurrency
Tornado’s Secret Sauce
Illusion of single object
Degree of clustering?
- Choice of implementer of service
- singleton rep, one per core, one per cpu, one per group of cpus, …
- Protected Procedure Calls (PPC) for consistency
Traditional Structure
VM Data Structures
- PCB, TLB, PT
- Virtual pages on disk
Objectization of Memory Management
Address Space -> Process Object (PCB)
Carve up Backing Store => File Cache Manager for each region
Page Frame Manager => DRAM object
Page I/O => Cached Object Rep (COR)
Objectized Structure of VM Manager
Advantages of Clustered Object
Same object references on all nodes
Allows incremental optimization
- usage pattern determines level of replication
Page fault handling scales with number of processors
Destroying memory in a region does not scale as well
- Doesn’t happen as much as other operations
Implementation of Clustered Object
Translation table
- on each cpu
Miss Handling table
- If miss happens, objref resolved locally
Miss handler is a partitioned data structure and may not exist on the processor
- That’s why we have a
global miss handler
- Replicated on every node
- Knows the partitioning of every ‘miss handler’ so it can find the mapping needed
Non-Hierarchical Locking
Hierarchical locking:
- kills concurrency
Use refcount + existence guarantee instead of hierarchical locking
Dynamic Memory Allocation
Break up heap space into regions
IPC
Object calls need IPC
- Realized by PPC
- local PPC = no context switch
- remote PPC = full context switch
- similar to LRPC
Tornado Summary
- Object Oriented Design for scalability
- Multiple Implementations of OS objects
- Optimize common case
- Page Fault handling vs region destruction
- No hierarchical locking
- Limited sharing of OS data structures
Summary of Ideas in Corey System
- Address ranges in an App
- Similar to Tornado Regions
- Allows colocation on cpu cores to increase cache coherency
- Shares
- Explicitly sharing/not sharing allows to optimize by reducing manegerial overhead
- Dedicated cores for kernel activity