## Scheduling Multithreaded Computations by Work Stealing (1994)

### Cached

### Download Links

- [supertech.csail.mit.edu]
- [theory.lcs.mit.edu]
- [dept-info.labri.u-bordeaux.fr]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS |

Citations: | 429 - 39 self |

### BibTeX

@INPROCEEDINGS{Blumofe94schedulingmultithreaded,

author = {Robert D. Blumofe and Charles E. Leiserson},

title = {Scheduling Multithreaded Computations by Work Stealing},

booktitle = {In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS},

year = {1994},

pages = {356--368}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies. Specifically, our analysis shows that the expected time TP to execute a fully strict computation on P processors using our work-stealing scheduler is TP = O(T 1 =P + T1 ), where T 1 is the minimum serial execution time of the multithreaded computation and T1 is the minimum execution time with an infinite number of processors. Moreover, the space SP required by the execution satisfies SP S 1 P . We also show that the expected total communication of the algorithm is at most O(T1SmaxP ), where Smax is the size of the largest activation...

### Citations

582 | Cilk: An efficient multithreaded runtime system - Blumofe, Joerg, et al. - 1996 |

433 | Bounds on multiprocessing timing anomalies
- Graham
- 1969
(Show Context)
Citation Context ...eaded computation. define what it means for computations to be "fully strict." We conclude with a statement of the greedyscheduling theorem, which is an adaptation of theorems by Brent [5] a=-=nd Graham [11, 12]-=- on dag scheduling. A multithreaded computation is composed of a set of threads, each of which is a sequential ordering of unit-time tasks. In Figure 1, for example, each shaded block is a thread with... |

360 | The implementation of the cilk-5 multithreaded language - Frigo, Leiserson, et al. - 1998 |

335 |
Bounds for certain multiprocessing anomalies
- Graham
- 1966
(Show Context)
Citation Context ...eaded computation. define what it means for computations to be "fully strict." We conclude with a statement of the greedyscheduling theorem, which is an adaptation of theorems by Brent [5] a=-=nd Graham [11, 12]-=- on dag scheduling. A multithreaded computation is composed of a set of threads, each of which is a sequential ordering of unit-time tasks. In Figure 1, for example, each shaded block is a thread with... |

245 | The parallel evaluation of general arithmetic expressions
- Brent
- 1974
(Show Context)
Citation Context ...e 1: A multithreaded computation. define what it means for computations to be "fully strict." We conclude with a statement of the greedyscheduling theorem, which is an adaptation of theorems=-= by Brent [5]-=- and Graham [11, 12] on dag scheduling. A multithreaded computation is composed of a set of threads, each of which is a sequential ordering of unit-time tasks. In Figure 1, for example, each shaded bl... |

240 | I-structures: Data structure for parallel computing
- Arvind, Nikhil, et al.
- 1989
(Show Context)
Citation Context ...llstructured) multithreaded computations. This class of computations encompasses both backtrack search computations [15, 26] and divide-and-conquer computations [25], as well as dataflow computations =-=[1]-=- in which threads may stall due to a data dependency. We analyze our algorithms in a stringent atomic access model similar to the atomic message-passing model of [17] in which concurrent accesses to t... |

233 | Lazy task creation: A technique for increasing the granularity of parallel programs
- Mohr, Kranz, et al.
- 1991
(Show Context)
Citation Context ...s Burton and Sleep's research [7] on parallel execution of functional programs and Halstead's implementation of Multilisp [14]. Since then, many researchers have implemented variants on this strategy =-=[4, 9, 10, 13, 16, 18, 24]-=-. Rudolph, Slivkin-Allalouf, and Upfal [22] analyzed a randomized work-stealing strategy for load balancing independent jobs on a parallel computer, and Karp and Zhang [15] analyzed a randomized work-... |

200 |
How to emulate shared memory
- Ranade
- 1987
(Show Context)
Citation Context ...sent and analyze a combinatorial "balls and bins" game that we use to derive a bound on the contention that arises in random work stealing. We then use this bound along with a delay-sequence=-= argument [21]-=- in Section 6 to analyze the execution time and communication cost of the work-stealing algorithm. We make some concluding remarks in Section 7. 2 A model for multithreaded computation This section re... |

173 | Thread scheduling for multiprogrammed multiprocessors - Arora, Blumofe, et al. |

149 | Speedup Versus Efficiency in Parallel Systems - Eager, Zahorjan, et al. - 1989 |

107 | An Analysis of Dag-Consistent Distributed Shared-Memory Algorithms - Blumofe, Frigo, et al. - 1996 |

105 |
DIB — A distributed implementation of backtracking
- Finkel, Manber
- 1987
(Show Context)
Citation Context ...s Burton and Sleep's research [7] on parallel execution of functional programs and Halstead's implementation of Multilisp [14]. Since then, many researchers have implemented variants on this strategy =-=[4, 9, 10, 13, 16, 18, 24]-=-. Rudolph, Slivkin-Allalouf, and Upfal [22] analyzed a randomized work-stealing strategy for load balancing independent jobs on a parallel computer, and Karp and Zhang [15] analyzed a randomized work-... |

87 | Space-efficient Scheduling of Multithreaded Computations
- Blumofe, Leiserson
- 1993
(Show Context)
Citation Context ...ully strict multithreaded computations which is provably efficient in terms of time, space, and communication. The bounds on space and time are better than previous bounds for work-sharing schedulers =-=[3]-=-, and the work-stealing scheduler is much simpler and eminently practical. Part of this improvement is due to our focusing on fully strict computations, as compared to the (general) strict computation... |

86 | Detecting data races in cilk programs that use locks - Cheng, Feng, et al. - 1998 |

85 | Provably efficient scheduling for languages with finegrained parallelism - Blelloch, Gibbons, et al. - 1995 |

84 |
Executing functional programs on a virtual tree of processsors
- BURTON, SLEEP
- 1981
(Show Context)
Citation Context ...do, no threads are migrated by a work-stealing scheduler, but threads are always migrated by a work-sharing scheduler. The work-stealing idea dates back at least as far as Burton and Sleep's research =-=[7]-=- on parallel execution of functional programs and Halstead's implementation of Multilisp [14]. Since then, many researchers have implemented variants on this strategy [4, 9, 10, 13, 16, 18, 24]. Rudol... |

79 | A simple load balancing scheme for task allocation in parallel machines
- Rudolph, Slivkin-Allalouf, et al.
- 1991
(Show Context)
Citation Context ...onal programs and Halstead's implementation of Multilisp [14]. Since then, many researchers have implemented variants on this strategy [4, 9, 10, 13, 16, 18, 24]. Rudolph, Slivkin-Allalouf, and Upfal =-=[22]-=- analyzed a randomized work-stealing strategy for load balancing independent jobs on a parallel computer, and Karp and Zhang [15] analyzed a randomized work-stealing strategy for parallel backtrack se... |

79 |
WorkCrews: An abstraction for controlling parallelism
- Vandevoorde, Roberts
- 1988
(Show Context)
Citation Context ...s Burton and Sleep's research [7] on parallel execution of functional programs and Halstead's implementation of Multilisp [14]. Since then, many researchers have implemented variants on this strategy =-=[4, 9, 10, 13, 16, 18, 24]-=-. Rudolph, Slivkin-Allalouf, and Upfal [22] analyzed a randomized work-stealing strategy for load balancing independent jobs on a parallel computer, and Karp and Zhang [15] analyzed a randomized work-... |

70 |
Randomized Parallel Algorithms for Backtrack Search and Branch-andBound Computation
- Karp, Zhang
- 1993
(Show Context)
Citation Context ...ategy [4, 9, 10, 13, 16, 18, 24]. Rudolph, Slivkin-Allalouf, and Upfal [22] analyzed a randomized work-stealing strategy for load balancing independent jobs on a parallel computer, and Karp and Zhang =-=[15]-=- analyzed a randomized work-stealing strategy for parallel backtrack search. Recently, Zhang and Ortynski [26] have obtained good bounds on the communication requirements of the randomized parallel ba... |

69 | Executing Multithreaded Programs Efficiently - Blumofe - 1995 |

68 |
Resource requirements of dataflow programs
- Culler, Arvind
- 1989
(Show Context)
Citation Context ...wisdom that work stealing is superior to work sharing. Others have studied and continue to study the problem of efficiently managing the space requirements of parallel computations. Culler and Arvind =-=[8]-=- and Ruggiero and Sargeant [23] give heuristics for limiting the space required by dataflow programs. Burton [6] shows how to limit space in certain parallel computations without causing deadlock. The... |

68 | Adaptive and reliable parallel computing on networks of workstations - BLUMOFE, LISIECKI - 1997 |

66 | Cilk: Efficient Multithreaded Computing - Randall - 1998 |

63 |
Implementation of Multilisp: Lisp on a multiprocessor
- Halstead
- 1984
(Show Context)
Citation Context ... a work-sharing scheduler. The work-stealing idea dates back at least as far as Burton and Sleep's research [7] on parallel execution of functional programs and Halstead's implementation of Multilisp =-=[14]-=-. Since then, many researchers have implemented variants on this strategy [4, 9, 10, 13, 16, 18, 24]. Rudolph, Slivkin-Allalouf, and Upfal [22] analyzed a randomized work-stealing strategy for load ba... |

63 | DAG-consistent distributed shared memory. Parallel Processing Symposium - Blumofe, Frigo, et al. - 1996 |

50 | Efficient detection of determinacy races in Cilk programs - Feng, Leiserson - 1997 |

46 | Scheduling Large-Scale Parallel Computations on
- Blumofe, Park
- 1994
(Show Context)
Citation Context ...or programming multithreaded computations [2]. We currently have preliminary versions of the system that run on the Connection Machine CM-5, the Phish runtime system for networks of Unix workstations =-=[4], and the -=-Sparcstation 10 symmetric multiprocessor. Cilk is derived from the PCM "parallel continuation machine" system [13], which was itself inspired in part by the research reported here. The per-p... |

46 | Game tree search on a massively parallel system - Feldmann, Mysliwietz, et al. - 1993 |

42 | The Cilk System for Parallel Multithreaded Computing - Joerg - 1996 |

37 | Synchronized MIMD Computing
- Kuszmaul
- 1994
(Show Context)
Citation Context |

35 |
Control of parallelism in the Manchester dataflow machine
- Ruggiero, Sargeant
- 1987
(Show Context)
Citation Context ...uperior to work sharing. Others have studied and continue to study the problem of efficiently managing the space requirements of parallel computations. Culler and Arvind [8] and Ruggiero and Sargeant =-=[23]-=- give heuristics for limiting the space required by dataflow programs. Burton [6] shows how to limit space in certain parallel computations without causing deadlock. The remainder of this paper is org... |

30 | Communication complexity for parallel divide-and-conquer
- Wu, Kung
- 1991
(Show Context)
Citation Context ...rithm for scheduling "fully strict" (wellstructured) multithreaded computations. This class of computations encompasses both backtrack search computations [15, 26] and divide-and-conquer com=-=putations [25]-=-, as well as dataflow computations [1] in which threads may stall due to a data dependency. We analyze our algorithms in a stringent atomic access model similar to the atomic message-passing model of ... |

29 | Massively parallel chess - Joerg, Kuszmaul - 1994 |

28 | Computation-centric memory models - Frigo, Luchangco - 1998 |

27 | S.: An atomic model for message-passing
- Liu, Aiello, et al.
- 1993
(Show Context)
Citation Context ..., as well as dataflow computations [1] in which threads may stall due to a data dependency. We analyze our algorithms in a stringent atomic access model similar to the atomic message-passing model of =-=[17]-=- in which concurrent accesses to the same data structure are serially queued by an adversary. Our main contribution is a randomized workstealing scheduling algorithm for fully strict multithreaded com... |

27 | Space-efficient scheduling of parallelism with synchronization variables - Blelloch, Gibbons, et al. - 1997 |

25 | The weakest reasonable memory model - Frigo - 1997 |

24 | A Fully Distributed Chess Program
- Feldmann, Mysliwietz, et al.
- 1990
(Show Context)
Citation Context |

22 | The performance of work stealing in multiprogrammed environments (extended abstract - Blumofe, Papadopoulos - 1998 |

21 |
MIMD-style parallel programming with continuation-passing threads
- Halbherr, Zhou, et al.
- 1994
(Show Context)
Citation Context |

18 |
Yu Grosberg, and Toyoichi Tanaka. Enumerations of the hamiltonian walks on a cubic sublattice
- Pande, Joerg, et al.
- 1994
(Show Context)
Citation Context ...k implementation, compared with a native C implementation, is typically at most 15 percent on various applications that we have programmed. To date, our applications include a protein-folding program =-=[19]-=-, which was the first program to find the number of hamiltonian paths in a 4 \Theta 4 \Theta 3 grid, and a parallel chess-playing program ?Socrates, which won third prize at the 1994 ACM International... |

13 |
Storage management in virtual tree machines
- Burton
- 1988
(Show Context)
Citation Context ...efficiently managing the space requirements of parallel computations. Culler and Arvind [8] and Ruggiero and Sargeant [23] give heuristics for limiting the space required by dataflow programs. Burton =-=[6]-=- shows how to limit space in certain parallel computations without causing deadlock. The remainder of this paper is organized as follows. In Section 2 we review the graph-theoretic model of multithrea... |

9 |
The efficiency of randomized parallel backtrack search
- Zhang, Ortynski
- 1994
(Show Context)
Citation Context ...g strategy for load balancing independent jobs on a parallel computer, and Karp and Zhang [15] analyzed a randomized work-stealing strategy for parallel backtrack search. Recently, Zhang and Ortynski =-=[26]-=- have obtained good bounds on the communication requirements of the randomized parallel backtrack search algorithm presented in [15]. In this paper, we present and analyze a workstealing algorithm for... |

9 | Hood: a user-level thread library for multiprogrammed multiprocessors - Papadopoulos - 1998 |

5 | Algorithms for data-race detection in multithreaded programs - Cheng - 1998 |

5 | Debugging multithreaded programs that incorporate userlevel locks - Stark - 1998 |

4 | Guaranteeing good memory bounds for parallel programs - Burton - 1996 |

2 |
Cilk 1.0 reference manual
- Blumofe, Joerg, et al.
- 1994
(Show Context)
Citation Context ... speculative computations. How practical are the methods analyzed in this paper ? We have been actively engaged in building a C-based language called "Cilk" for programming multithreaded com=-=putations [2]-=-. We currently have preliminary versions of the system that run on the Connection Machine CM-5, the Phish runtime system for networks of Unix workstations [4], and the Sparcstation 10 symmetric multip... |

1 | Macroscheduling in the Cilk network of workstations environment - Lisiecki - 1996 |

1 | Private communication. [41] Abhiram Ranade. How to emulate shared memory - Plaxton - 1994 |