## Provably efficient scheduling for languages with fine-grained parallelism (1995)

### Cached

### Download Links

Venue: | IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES |

Citations: | 85 - 24 self |

### BibTeX

@INPROCEEDINGS{Blelloch95provablyefficient,

author = {Guy E. Blelloch and Phillip B. Gibbons and Yossi Matias},

title = {Provably efficient scheduling for languages with fine-grained parallelism},

booktitle = {IN PROC. SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES},

year = {1995},

pages = {1--12},

publisher = {}

}

### OpenURL

### Abstract

Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program. This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any

### Citations

9158 | Introduction to Algorithms - Cormen, Leiserson, et al. - 1998 |

973 |
Performance Fortran Forum. High Performance Fortran language specification, version 1.0
- High
- 1993
(Show Context)
Citation Context ...tions are ignored. 1 Introduction Many high-level parallel programming languages encourage the use of dynamic fine-grained parallelism. Such languages include both data-parallel languages such as HPF =-=[Hig93]-=- and Nesl [BCH + 94], as well as control-parallel languages such as ID [ANP89], Sisal [FCO90] or Proteus [MNP + 90]. The goal of these languages is to have the user expose the full parallelism in an a... |

673 | An Introduction to Parallel Algorithms - JáJá - 1992 |

587 | Cilk: An Efficient Multithreaded Runtime System - Blumofe, Joerg, et al. - 1995 |

477 | Multilisp -- A Language for Concurrent Symbolic Computation - Halstead - 1985 |

441 | Bounds on multiprocessing timing anomalies - Graham - 1969 |

433 | Scheduling multithreaded computations by work stealing - Blumofe, Leiserson - 1994 |

343 | Bounds for certain multiprocessing anomalies - Graham - 1966 |

281 | Parallel prefix computation
- Ladner, Fischer
- 1980
(Show Context)
Citation Context ...led, even before any space has been allocated for the edges. Our scheduling algorithm performs a constant number of erew pram [J'aJ92] steps and a constant number of parallel prefix-sums computations =-=[LF80]-=- for each round of scheduling. 5.1 A stack-based scheduling algorithm We will use the following property of 1-dfts on series-parallel dags. Fact 5.1 Consider a 1-dft of a series-parallel dag G, and le... |

270 | Vector Models for Data-Parallel Computing - Blelloch - 1990 |

247 | The parallel evaluation of general arithmetic expressions - Brent - 1974 |

241 | I-Structures: Data Structures for Parallel Computing - Arvind, Pingali - 1987 |

208 | General Purpose Parallel Architectures - Valiant - 1990 |

203 | Programming parallel algorithms - Blelloch - 1992 |

200 | How to emulate shared memory - Ranade - 1991 |

184 | Implementation of a portable nested dataparallel language - Blelloch, Hardwick, et al. - 1993 |

183 | Computer and JobShop Scheduling Theory - Coffman - 1976 |

164 | Scans as primitive parallel operations
- Blelloch
- 1989
(Show Context)
Citation Context ...-parallel language which does w work, has depth d, uses sequential space s1 , and allocates at most O(w) space. This computation can be implemented on a pram with prefix-sums (i.e., on the scan model =-=[Ble89]-=-) in O(w=p+ d) time and s1 +O(p \Delta d) space, accounting for all computation, scheduling and synchronization costs. Since a prefix-sum can be implemented work-efficiently in O(lg p) time on any of ... |

156 | Programming with Sets: An Introduction to SETL - Schwartz, Dewar, et al. - 1986 |

149 | Synthesis of Parallel Algorithms - Reif - 1993 |

142 | Towards an Architecture-Independent Analysis of Parallel Algorithms - Papadimitriou, Yannakakis - 1990 |

108 | An analysis of dag-consistent distributed shared-memory algorithms - Blumofe, Frigo, et al. - 1996 |

104 | A report on the SISAL language project - Feo, Cann, et al. - 1990 |

100 | An overview of the PTRAN analysis system for multiprocessing - Allen, Burke, et al. - 1987 |

95 | NESL: A nested data-parallel language (version 3.1
- Blelloch
- 1995
(Show Context)
Citation Context ...ather than our S1 + O(p \Delta D). Provable time bounds for mapping nested data-parallel languages onto the pram were considered in [Ble90]. These results were used for implementing the Nesl language =-=[Ble93]-=- but the time bounds are restricted to a class of programs that are called contained. With this implementation, a containedsNesl program with W work and D depth runs in O(W=p+D lg p) time on a p-proce... |

89 | Randomized routing and sorting on fixed-connection networks - Leighton, Maggs, et al. - 1994 |

88 | Leiserson, “Space-efficient scheduling of multithreaded computations - Blumofe, E - 1998 |

84 | Executing functional programs on a virtual tree of processors - Burton, Sleep - 1981 |

80 | The incremental garbage collection of processes - Baker, Hewitt - 1977 |

73 | A provable time and space efficient implementation of NESL - Blelloch, Greiner - 1996 |

73 | Recursive star-tree parallel data structure - Berkman, Vishkin - 1993 |

68 | and Arvind. Resource Requirements of Dataflow Programs - Culler - 1988 |

57 | On time versus space - Hopcroft, Paul, et al. - 1977 |

56 | The Paralation Model: Architecture-Independent Parallel Programming - Sabot - 1988 |

45 | Towards a theory of nearly constant time parallel algorithms - Gil, Matias, et al. - 1991 |

45 | Converting high probability into nearly-constant time|with applications to parallel hashing - Matias, Vishkin - 1991 |

37 | Low-overhead scheduling of nested parallelism - Hummel, Schonberg - 1991 |

35 | Control of parallelism in the Manchester dataflow machine - Ruggiero, Sargeant - 1987 |

31 |
Parallel dictionaries on 2-3 trees
- Paul, Vishkin, et al.
- 1983
(Show Context)
Citation Context ...n at each step space linear in the number of representatives in the S data structure, or in the number of program variables, can be implemented with p processors and logarithmic time on the erew pram =-=[PVW83]-=-, and in O(lg p) time and linear work with high probability, on a crcw pram [GMV91]. In a recent work [BGM95], we have developed data structures that are weaker than the general dictionary data struct... |

29 | Fast parallel space allocation, estimation and integer sorting - Bast, Hagerup - 1995 |

29 | A foundation for an efficient multi-threaded scheme system - Jagannathan, Philbin - 1992 |

28 | Prototyping parallel and distributed programs - Mills, Nyland, et al. - 1991 |

27 | Space-efficient scheduling of parallelism with synchronization variables - Blelloch, Gibbons, et al. - 1997 |

23 | Dynamic processor self-scheduling for general parallel nested loops - Fang, Tang, et al. - 1990 |

23 | Using approximation algorithms to design parallel algorithms that may ignore processor allocation - Goodrich - 1991 |

22 | A communication-time tradeoff - Papadimitriou, Ullman - 1987 |

19 | Fast hashing on a PRAM|Designing by expectation - Gil, Matias - 1991 |

19 | Speedups of deterministic machines by synchronous parallel machines - Dymond, Tompa - 1983 |

19 | Já Já , An Introduction to Parallel Algorithms - unknown authors - 1992 |

18 | Parallelism in sequential functional languages - Blelloch, Greiner - 1995 |