Scheduling

Stephen M. Reaves

2024-02-08

Notes about Lecture 4e for CS-6210

Summary

First Principles
Memory Hierarchy Refresher
Cache Affinity Scheduling
Scheduling Policies
Implementation Issues
Performance
Cache Affinity and Multicore
Cache Aware Scheduling

First Principles

Run a thread for a while, block for IO, run a different thread in the meantime

Memory Hierarchy Refresher

CPU
L1
- 1-2 cycles
L2
- ~10 cycles
Memory
- ~100 cycles

Cache Affinity Scheduling

When rescheduling a thread, it makes sense to schedule it on the same processor to reuse cache.

Other threads may have been scheduled in between and polluted cache.

Scheduling Policies

First Come First Serve (FCFS)
- Ignores affinity for fairness
Fixed processor
- Predetermined processor
Last processor
- Prefer last processor that thread ran on
Minimum Intervening
- Similar to Last Processor, but lower affinity the more threads that ran on processor since last time this thread ran
- Limited Minimum Intervening variant
  - Only keep metadata for top few processors instead of ALL processors
- Minimum Intervening Plus Queue
  - Take into effect that there is a queue which could schedule other threads on our processor before we actually run
  - Prioritize shorter queues in addition to cpu affinity index

Implementation Issues

Queue Based

Global queue
- Huge data structure for all cpus to access can be a bottle neck
Affinity-based local queue
- Each processor maintains its own queue based on thread affinity
- Optional work-stealing if local queue is empty
- Priority is based on $BP_i + \text{Age}_i + \text{Affinity}_i$
  - $BP = \text{Base Priority}$

Performance

Figures of merit:

Throughput
- System-centric
Response time
- User-centric
Variance
- User-centric

The bigger the memory footprint of a process, the more time it will take to load working-set into cache

The heavier the load, the more a fixed-processor scheduler makes sense

If no thread has high affinity with a processor, maybe it sits idle

Trade-off between execution time and cache locality

Cache Affinity and Multicore

Hardware multi-threaded support allows cpu to context-switch basically for free

Layered caching makes misses not as bad

Cache Aware Scheduling

Schedule some number of cache-frugal threads and some number of cache-hungry threads on the different cores so that together, the sum of all the cache-hungriness of all threads that are executing at any given time is less than the size of the L2 Cache (or other last-level cache).

Characterize threads as cache-frugal or cache-hungry

Only can know by profiling during execution
- Assumes threads are scheduled over-and-over again

$\sum_1^n C_{ft} + \sum_1^m C_{ht} < \text{size(L2 cache)}$