Work Span Model
Required Readings
Optional
Summary
- Multithreaded DAG Model
- Example Sample Reduction
- Work and Span
- of nodes
- of vertices on the longest path
- Brent’s Theorem
- Speedup
- Basic Concurrency Primitives
Multithreaded DAG Model
Each vertex is an operation, each edge is a dependency.
We assume there is a PRAM model with P processors with shared memory. We can assign up to P operations to complete at a time, assuming all the dependencies are met.
Our cost model assumes:
- All processors run at the same speed
- 1 op = 1 unit of time
- No edge cost
Example Sample Reduction
let A = array of length n
s <- 0
for i <- 1 to n do
s <- s + A[i]
Time to execute this DAG?
Minimum time to execute this DAG on a PRAM with p=n procs?
Work and Span
Work := total # of nodes
Span := # of vertices on the longest path
Given work and span, what can we say?
- Given only 1 proc, Time should equal Work
- Given only infinite proc, Time should equal Span
Average available parallelism :=
Brent’s Theorem
Break execution into phases:
- Each phase has 1 critical path vertex
- All non-critical path vertices are
independent
- vertices in the same phase cannot depend on each other
- Every vertex should appear in some phase
For any phase (k), W(k) is the total amount of work for that phase.
How long will it take to execute phase k? ()?
The time to execute a DAG is no more than the time to execute the critical path (D) plus the time to execute everything off of the critical path using the P processors
Brent's Thoerem
Speedup
Speedup := Best sequential time divided by the best parallel time
Ideal speedup is linear in P. In order for speedup to be linear, I need the denominator to be a constant. For that to happen the work per processor has to grow as some function of n.
- Weak scalability
- If you want good scaling, you might need to increase the problem size
Speedup
Work optimality
Weak scalability
Basic Concurrency Primitives
Spawn and Sync
reduce(A[0:n-1])
if n > 2 then
a <- reduce(A[0:n/2-1])
b <- reduce(A[n/2:n-1])
return a+b
// else n = 1
return A[0]
reduce(A[0:n-1])
if n > 2 then
a <- spawn reduce(A[0:n/2-1])
b <- reduce(A[n/2:n-1])
sync
return a+b
// else n = 1
return A[0]
Par-for
par-for i <- 1 to n do
foo(i)
That’s O(n)
ParForT(foo, a, b)
let n = b - a + 1
if n == 1 then foo(a)
else
let m = a + n/2 // floor/integer division
spawn ParForT(foo, a, m-1)
ParForT(foo, m, b)
sync
D(n) = O(log(n)) (span)