Reaves.dev

v0.1.0

built using

Phoenix v1.7.12

Io Avoiding Algorithms

Stephen M. Reaves

::

2024-08-22

Notes about Lecture 2 for CS-6220

Required Readings

Summary

Sense of Scale

Given:

Calculate the following number of transfer operations

scratchspace

n=rnn=250256=4.4e12 n = \frac{r \cdot n}{n} = \frac{2^{50}}{256} = 4.4e12

Z=rZn=236256=2.68e8 Z = \frac{r \cdot Z}{n} = \frac{2^{36}}{256} = 2.68e8

L=rLn=215256=128=27 L = \frac{r \cdot L}{n} = \frac{2^{15}}{256} = 128 = 2^7

nlog2n n \cdot log_2 n

185

nlog2nL n \cdot log_2 \frac{n}{L}

154

n n

4.40

nLlog2nL \frac{n}{L} \cdot log_2 \frac{n}{L}

1.203

nLlog2nZ \frac{n}{L} \cdot log_2 \frac{n}{Z}

0.481

nLlogZLnL \frac{n}{L} \cdot log_{\frac{Z}{L}} \frac{n}{L}

0.0572

Lower is better (since this is the required number of transactions)

External Memory Merge Sort

Assuming a slow/fast memory hierarchy

Phase 1

Phase 2

Overall

Quiz

Given:

# of transfers in external memory merge sort with 2-way merging:

Q(n,Z,L)=O(nLlog2nZ)=O(nL[log2nLlog2ZL])Q(n,Z,L) = O\left(\frac{n}{L}log_2\frac{n}{Z}\right) = O\left(\frac{n}{L}\left[log_2\frac{n}{L} - log_2\frac{Z}{L}\right]\right)

Lower bound:

Q(n,Z,L)=Ω(nLlogZLnZ)=Ω(nLlog2nLlog2ZL)Q(n,Z,L) = \Omega(\frac{n}{L}log_{\frac{Z}{L}}\frac{n}{Z}) = \Omega\left(\frac{n}{L}\frac{log_2\frac{n}{L}}{log_2\frac{Z}{L}}\right)

Why doesn't 2-way merge do better?

2-way merge “under-utilizes” Z, i.e. fast memory

External Memory Multiway Merge Sort

2-way merge only uses 3 L-sized blocks. There can be up to Z/L blocks of size L <= ( (K+1)LZ (K+1)L \le Z )

k input buffers, the (k+1)th is the output buffer. For each buffer, maintain which is the next element to be able to be read. Choose the smallest element from all the available elements, copy it into the output buffer. All the semantics for full/empty buffers carries over.

Choosing which buffer contains the next smallest element can be tricky. If k is small enough, a linear search will do. If k is large enough, you may need something like a min-heap.

Cost of 1 k-way merge (assuming a min-heap):

Lower Bound on External Memory Sorting

Mergesort with θ(ZL) \theta(\frac{Z}{L}) -way merges:

Q(n,Z,L)=θ(nLlogZLnL)Q(n,Z,L) = \theta \left( \frac{n}{L} log_{\frac{Z}{L}}\frac{n}{L} \right)

K(t-1) = # of orderings after t-1 trans

K(0) = n!

K(t)K(t1)(ZL)L!K(t)n![(ZL)L!]tK(t)\begin{align}\ge \frac{K(t-1)}{\left(\frac{Z}{L}\right)L!} \\phantom{K(t)} \ge \frac{n!}{\left[\left(\frac{Z}{L}\right)L!\right]^t}\end{align}

K(t)K(t1)(ZL)L! K(t) \ge \frac{K(t-1)}{\left(\frac{Z}{L}\right)L!}

K(t)n![(ZL)L!]t \phantom{K(t)} \ge \frac{n!}{\left[\left(\frac{Z}{L}\right)L!\right]^t}

The above is only true if we’ve never seen any of the items before. But we can only perform nL \frac{n}{L} reads of items we haven’t seen before

K(t)n!(ZL)t(L!)nL \phantom{K(t)} \ge \frac{n!}{{Z\choose{L}}^t\cdot\left(L!\right)^{\frac{n}{L}}}

When does right-hand side equal 1?

K(t)n!(ZL)t(L!)nL1 \phantom{K(t) \ge} \frac{n!}{{Z\choose{L}}^t\cdot\left(L!\right)^{\frac{n}{L}}} \le 1

The smallest value t such that this happens is the lower bound on the number of transfers

tnLlogZLnL t \gtrsim \frac{n}{L}log_{\frac{Z}{L}}\frac{n}{L}

Binary Search Trees

Given:

How many transfers does the binary search do?

O(log2nL) O\left(log_2\frac{n}{L}\right)

Size of index i = log(n) + 1 bits

Let x(L) = maximum number of bits “learned” per L-sized read

Qsearch(n,Z,L)=Ω(log(N)x(L)) Q_{search}(n,Z,L) = \Omega\left(\frac{log(N)}{x(L)}\right)

What is x?

log(L)

Qbinary search(n,Z,L)Qsearch(n,Z,L)=Ω(log(N)log(L)) \phantom{Q_{\text{binary search}}(n,Z,L)}Q_{search}(n,Z,L) = \Omega\left(\frac{log(N)}{log(L)}\right)

Qsearch(n,Z,L)Qbinary search(n,Z,L)=Ω(log(N)log(L)) \phantom{Q_{search}(n,Z,L)}Q_{\text{binary search}}(n,Z,L) = \Omega\left(\frac{log(N)}{log(L)}\right)

Qsearch(n,Z,L)Qbinary search(n,Z,L)=Ω(log(N)log(L)) \phantom{Q_{search}(n,Z,L)}\phantom{Q_{\text{binary search}}(n,Z,L)} = \Omega\left(log(N) - log(L)\right)

There is about a log(L) speed up need to get from binary search to theoretical max

B-trees can be IO-efficient if their branching factor is made equal to L (cache transfer size)