Reaves.dev

v0.1.0

built using

Phoenix v1.7.12

Basic Model Of Locality

Stephen M. Reaves

::

2024-08-19

Notes about Lecture 1 for CS-6220

Required Readings

Summary

A Basic Model

Gvon Neumann Modelpprocfast memmslow memp:1->m

Fast memory is small, can only hold Z words

Two Rules

Costs

Quizes

How many transfers are needed in the worst case, assuming nothing about alignment, to sum an array of numbers?

ceil(n/L)+1 ceil(n/L)+1

Give a simple (trivial) lower bound on the asymptotic number of transfers when sorting an array

W(n)=Ω(nlogn) W(n) = \Omega(n\log{n})

Q(n,Z,L)=Ω(ceil(n/L)) Q(n,Z,L) = \Omega(ceil(n/L))

or

Q(n,Z,L)=Ω(nLlognL)logZL) Q(n,Z,L) = \Omega(\frac{\frac{n}{L}\log{\frac{n}{L}})}{\log{\frac{Z}{L}}})

What is the minimum number of transfers required to multiply two nxn matrices?

W(n)=O(n3) W(n) = O(n^3)

Q(n,Z,L)=Ω(n2L) Q(n,Z,L) = \Omega(\frac{n^2}{L})

Q(n,Z,L)=Ω(n3L×Z) Q(n,Z,L) = \Omega(\frac{n^3}{L\times\sqrt{Z}})

Algorithmic Design Goals

Given two algorithms, one with W(n)=Θ(n),Q(n,Z,L)=Θ(nL) W(n) = \Theta(n), Q(n,Z,L) = \Theta(\frac{n}{L}) and W(n)=Θ(nlogn),Q(n,Z,L)=Θ(nLlogZ) W(n) = \Theta(n\log{n}), Q(n,Z,L) = \Theta(\frac{n}{L\log{Z}}) , which is better?

Algorithm 1 prioritizes low work, but algorithm 2 prioritizes high intensity, so its a draw, unless we know more about the problem.

Intensity, Balance, and Time

Assume it takes τ \tau time per operation of compute, and α \alpha (amortized) time to load word, The time to compute (Tcomp T_{comp} ) will be τW \tau W , while the time spent loading from memory (Tmem T_{mem} ) will be αLQ \alpha L Q .

This leads the total time to equal Tmax(Tcomp,Tmem) T \ge max(T_{comp}, T_{mem}) (assuming perfect overlap)

T=τW×max(1,α/τW/(LQ)) T = \tau W \times max(1, \frac{\alpha/\tau}{W/(LQ)})

τW \tau W is an ideal computation time, assuming data movement is free

max(1,α/τW/(LQ)) max(1, \frac{\alpha/\tau}{W/(LQ)}) is the cost/penalty of moving data

W/(LQ) W/(LQ) is just intensity, I: ops/word

α/τ \alpha/\tau is the how many operations can be executed in the time it takes to move a word, or machine balance, B B

The minimum possible execution time in terms of the balance of the machine and the intensity of the algorithm is τW×max(1,BI) \tau W \times max(1, \frac{B}{I})

Normalizing the performance against the best RAM model (W W_\star ) gives you RτWTWW×min(1,IB) R \equiv \frac{\tau W_*}{T} \le \frac{W_\star}{W} \times min(1, \frac{I}{B})