Basic Model Of Locality
Required Readings
Summary
- A Basic Model
- of computations the processor needs to do
- of L-sized slow-fast transfers
- Quizes
- Algorithmic Design Goals
- Intensity, Balance, and Time
A Basic Model
Fast memory is small, can only hold Z
words
Two Rules
- Processor may only compute on data in fast memory
- “Local data rule”
- Slow-Fast transers in blocks of size
L
[words]- “Block transfer rule”
- If you want to move x, you have to pay an additionaly L-x to get x
- Where x is located in the block of L is called
data alignment
- Where x is located in the block of L is called
Costs
- Work
- # of computations the processor needs to do
- Transfers
- # of L-sized slow-fast transfers
- “load and stores”
- “I/O” complexity
Quizes
How many transfers are needed in the worst case, assuming nothing about alignment, to sum an array of numbers?
Give a simple (trivial) lower bound on the asymptotic number of transfers when sorting an array
or
What is the minimum number of transfers required to multiply two nxn matrices?
Algorithmic Design Goals
- Work Optimality
- Work shouldn’t be worse by introducing a memory-hierarchy
- High Computational Intensity
- Maximize:
- operations per word
- data reuse
- Maximize:
Given two algorithms, one with and , which is better?
Algorithm 1 prioritizes low work, but algorithm 2 prioritizes high intensity, so its a draw, unless we know more about the problem.
Intensity, Balance, and Time
Assume it takes time per operation of compute, and (amortized) time to load word, The time to compute () will be , while the time spent loading from memory () will be .
This leads the total time to equal (assuming perfect overlap)
is an ideal computation time, assuming data movement is free
is the cost/penalty of moving data
is just intensity, I: ops/word
is the how many operations can be executed in the
time it takes to move a word, or machine balance
,
The minimum possible execution time in terms of the balance of the machine and the intensity of the algorithm is
Normalizing the performance against the best RAM model () gives you