Scheduling
Summary
- First Principles
- Memory Hierarchy Refresher
- Cache Affinity Scheduling
- Scheduling Policies
- Implementation Issues
- Performance
- Cache Affinity and Multicore
- Cache Aware Scheduling
First Principles
Run a thread for a while, block for IO, run a different thread in the meantime
Memory Hierarchy Refresher
- CPU
- L1
- 1-2 cycles
- L2
- ~10 cycles
- Memory
- ~100 cycles
Cache Affinity Scheduling
When rescheduling a thread, it makes sense to schedule it on the same processor to reuse cache.
- Other threads may have been scheduled in between and polluted cache.
Scheduling Policies
- First Come First Serve (FCFS)
- Ignores affinity for fairness
- Fixed processor
- Predetermined processor
- Last processor
- Prefer last processor that thread ran on
- Minimum Intervening
- Similar to Last Processor, but lower affinity the more threads that ran on processor since last time this thread ran
- Limited Minimum Intervening variant
- Only keep metadata for top few processors instead of ALL processors
- Minimum Intervening Plus Queue
- Take into effect that there is a queue which could schedule other threads on our processor before we actually run
- Prioritize shorter queues in addition to cpu affinity index
Implementation Issues
Queue Based
- Global queue
- Huge data structure for all cpus to access can be a bottle neck
- Affinity-based local queue
- Each processor maintains its own queue based on thread affinity
- Optional work-stealing if local queue is empty
- Priority is based on
Performance
Figures of merit:
- Throughput
- System-centric
- Response time
- User-centric
- Variance
- User-centric
The bigger the memory footprint of a process, the more time it will take to load working-set into cache
The heavier the load, the more a fixed-processor scheduler makes sense
If no thread has high affinity with a processor, maybe it sits idle
- Trade-off between execution time and cache locality
Cache Affinity and Multicore
Hardware multi-threaded support allows cpu to context-switch basically for free
Layered caching makes misses not as bad
Cache Aware Scheduling
Schedule some number of cache-frugal threads and some number of cache-hungry threads on the different cores so that together, the sum of all the cache-hungriness of all threads that are executing at any given time is less than the size of the L2 Cache (or other last-level cache).
Characterize threads as cache-frugal
or cache-hungry
- Only can know by profiling during execution
- Assumes threads are scheduled over-and-over again