Io Avoiding Algorithms
Required Readings
- Short summary of the sorting lower bound
- IO Complexity of Sorting and Related Problems
- Lecture Notes
Summary
- Sense of Scale
- External Memory Merge Sort
- of transfers in external memory merge sort with 2-way merging:
- External Memory Multiway Merge Sort
- of orderings after t-1 trans
- Binary Search Trees
Sense of Scale
Given:
- Volume of data to sort:
- Record (item) size
- Fast memory size
- Memory transfer size
Calculate the following number of transfer operations
scratchspace
185
154
4.40
1.203
0.481
0.0572
Lower is better (since this is the required number of transactions)
- Big speed ups come from:
- Multiplying by n/L
- This corresponds to processing data in L-sized chunks as much as possible
-
- This corresponds to use as much of the capacity of fast memory as possible
- Multiplying by n/L
External Memory Merge Sort
Assuming a slow/fast memory hierarchy
Phase 1
- Partition input into chunks
- foreach chunk i <- 1 to do
- read chunk i
- sort it into a (sorted) run
- write run i
- Transfers =
- Comparisons =
Phase 2
- Read L-sized blocks of A,B -> A’,B’
- while any unmerged items in A&B do
- Merge A’,B’ -> C’ as possible
- if A’ or B’ empty, then read more
- if C’ full, then flush
- Flush any unmerged in A or B
- Transfers =
- Comparisons =
Overall
- Transfers =
- Comparisons =
Quiz
Given:
# of transfers in external memory merge sort with 2-way merging:
Lower bound:
Why doesn't 2-way merge do better?
2-way merge “under-utilizes” Z, i.e. fast memory
External Memory Multiway Merge Sort
2-way merge only uses 3 L-sized blocks. There can be up to Z/L blocks of size L <= ( )
k input buffers, the (k+1)th is the output buffer. For each buffer, maintain which is the next element to be able to be read. Choose the smallest element from all the available elements, copy it into the output buffer. All the semantics for full/empty buffers carries over.
Choosing which buffer contains the next smallest element can be tricky. If k is small enough, a linear search will do. If k is large enough, you may need something like a min-heap.
- Min heaps have the following cost
- Build: O(k)
- ExtractMin: O(log(k))
- insert: O(log(k))
Cost of 1 k-way merge (assuming a min-heap):
- Transfers:
- 2ks/L
- You only ever read input blocks once, and only ever write output blocks once.
- 2ks/L
- Comparisons
- O(k + (ks)log(k))
- Cost to build head, then every ks items are either inserted or extracted
- O(k + (ks)log(k))
Lower Bound on External Memory Sorting
Mergesort with -way merges:
K(t-1) = # of orderings after t-1 trans
K(0) = n!
The above is only true if we’ve never seen any of the items before. But we can only perform reads of items we haven’t seen before
When does right-hand side equal 1?
The smallest value t such that this happens is the lower bound on the number of transfers
Binary Search Trees
Given:
- a sorted array (A) of n elements in slow memory
- a target value v
- a binary-search algorithm trying to find the largest i such that A[i] <= v
How many transfers does the binary search do?
Size of index i = log(n) + 1 bits
Let x(L) = maximum number of bits “learned” per L-sized read
What is x?
log(L)
There is about a log(L) speed up need to get from binary search to theoretical max
B-trees can be IO-efficient if their branching factor is made equal to L (cache transfer size)