## A provable time and space efficient implementation of nesl (1996)

### Cached

### Download Links

- [www-2.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www-2.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In International Conference on Functional Programming |

Citations: | 70 - 7 self |

### BibTeX

@INPROCEEDINGS{Blelloch96aprovable,

author = {Guy E. Blelloch and John Greiner},

title = {A provable time and space efficient implementation of nesl},

booktitle = {In International Conference on Functional Programming},

year = {1996}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed J-calculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementation bounds for functional languages by considering space and by including arrays. For modeling the cost of NESL we augment a standard call-by-value operational semantics to return two cost measures: a DAG representing the sequential dependence in the computation, and a measure of the space taken by a sequential implementation. We show that a NESL program with w work (nodes in the DAG), d depth (levels in the DAG), and s sequential space can be implemented on a p processor butterfly network, hypercube, or CRCW PRAM usin O(w/p + d log p) time and 0(s + dp logp) reachable space. For programs with sufficient parallelism these bounds are optimal in that they give linew speedup and use space within a constant factor of the sequential space. 1

### Citations

946 |
Performance FORTRAN Forum. High performance fortran language specification, version 2.0
- High
- 1997
(Show Context)
Citation Context ...his allows for the nesting of parallel calls. Such nested parallelism is crucial for expressing parallel divide-and-conquer or nested parallel loops (most data-parallel languages don't permit nesting =-=[19, 28]-=-). For the purpose of analyzing algorithms, the definition of Nesl includes rules for calculating the work and depth of a computation. These rules specify the work and depth for the primitives, and ho... |

636 |
An Introduction to Parallel Algorithms
- J'aJ'a
- 1992
(Show Context)
Citation Context ...ng the work (the total number of operations executed) and depth (the longest chain of sequential dependences) of a computation. These are standard cost measures in the analysis of parallel algorithms =-=[23, 22]-=-. In this paper we formalize these rules and give provable implementation bounds for both space and time. 1 The implementation is based on a randomized algorithm for the fetch-and-add operator and wil... |

260 |
Vector Models for DataParallel Computing
- Blelloch
- 1990
(Show Context)
Citation Context ... that the implementation discussed in this paper does not correspond to the current implementation of Nesl [8]. The current implementation is based on a technique called flattening nested parallelism =-=[5]-=-, which has very good performance characteristics, but can be space inefficient because it generates too much parallelism. For example the Nesl code --count(--a ! b: a in s) : b in s, which calculates... |

192 | Programming parallel algorithms
- Blelloch
- 1996
(Show Context)
Citation Context ... [6]. Nesl is a strongly typed call-by-value functional language loosely based on ML. It has been implemented on several parallel machines [8], and has been used both for teaching parallel algorithms =-=[9, 7]-=-, and implementing various applications [17, 4, 1]. The parallelism in the language is based on including a primitive sequence data type, a parallel map operation, and a small set of primitive operati... |

176 | Implementation of a portable nested data-parallel language
- Blelloch, Chatterjee, et al.
- 1993
(Show Context)
Citation Context ...ient implementation of the parallel programming language Nesl [6]. Nesl is a strongly typed call-by-value functional language loosely based on ML. It has been implemented on several parallel machines =-=[8]-=-, and has been used both for teaching parallel algorithms [9, 7], and implementing various applications [17, 4, 1]. The parallelism in the language is based on including a primitive sequence data type... |

103 |
Parallel algorithms for shared memory machines
- Karp, Ramachandran
- 1990
(Show Context)
Citation Context ...ng the work (the total number of operations executed) and depth (the longest chain of sequential dependences) of a computation. These are standard cost measures in the analysis of parallel algorithms =-=[23, 22]-=-. In this paper we formalize these rules and give provable implementation bounds for both space and time. 1 The implementation is based on a randomized algorithm for the fetch-and-add operator and wil... |

95 | NESL: A Nested Data-Parallel Language (version 2.6
- Blelloch
- 1993
(Show Context)
Citation Context ...speedup and use space within a constant factor of the sequential space. 1 Introduction This paper presents a provably time and space efficient implementation of the parallel programming language Nesl =-=[6]-=-. Nesl is a strongly typed call-by-value functional language loosely based on ML. It has been implemented on several parallel machines [8], and has been used both for teaching parallel algorithms [9, ... |

90 | Abstract models of memory management
- Morrisett, Felleisen, et al.
- 1995
(Show Context)
Citation Context ... language to those in machine models. There have been a handful of studies that use semantics to model the reachable space of sequential computations, in the context of both garbage collection (e.g., =-=[25]-=-) and copy avoidance (e.g., [20]). None of this work, however, has considered the extra reachable space required by a parallel evaluation. There have been a sequence of studies that place space bounds... |

89 | Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors
- Gottlieb, Lubachevsky, et al.
- 1983
(Show Context)
Citation Context ...n the fetch-and-add operation for all three machines, we process p log p states on each step instead of just p (i.e., we use a P-CEK(p log p) machine). Our simulation uses the fetch-and-add operation =-=[16]-=- (or multiprefix [26]). In this operation, each processor has an address and an integer value i. In parallel all processors can atomically fetch the value from the address while incrementing the value... |

81 | Space-efficient scheduling of multithreaded computations
- Blumofe, Leiserson
- 1998
(Show Context)
Citation Context ... this work, however, has considered the extra reachable space required by a parallel evaluation. There have been a sequence of studies that place space bounds on implementations of parallel languages =-=[11, 10, 12, 2]-=-. For a shared memory model, which is required to efficiently simulate the -calculus because of shared pointers, the best results are those by Blelloch, Gibbons, and Matias [2], which are the results ... |

80 | Provably efficient scheduling for languages with fine-grained parallelism
- Blelloch, Gibbons, et al.
- 1995
(Show Context)
Citation Context ...ams with sufficient parallelism, the parallel execution requires very little extra memory beyond a standard call-by-value sequential execution. These space bounds use recent results on DAG scheduling =-=[2]-=- and are non trivial. Although we use these extensions to prove bounds for Nesl, the techniques and results can be applied in a broader context. In particular we translate Nesl into a generic array la... |

77 |
Automatic complexity analysis
- Rosendahl
- 1989
(Show Context)
Citation Context ...icient parallel implementations of the -calculus using both call-by-value [3] and speculative parallelism [18]. These results accounted for work and depth of a computation using a profiling semantics =-=[29, 30]-=- and then related work and depth to running time on various machine models. This paper applies these ideas to the language Nesl and extends the work in two ways. First, it includes sequences (arrays) ... |

66 | C*: An extended C language for data parallel programming - Rose, Steele - 1987 |

58 | A Cost Calculus for Parallel Functional Programming
- Skillicorn, Cai
- 1995
(Show Context)
Citation Context ...There are w=q + d steps, and each step takes O(log p) time. 2 7 Related Work and Discussion Several researchers have used cost-augmented semantics for automatic time analysis or definitional purposes =-=[29, 30, 31, 27, 33, 14]-=-. Hudak and Anderson [21] used partially ordered multisets (pomsets) to model the dependences in various implementations of the -calculus. Because of the relationship between partially ordered sets an... |

49 | The semantics of future and its use in program optimization
- Flanagan, Felleisen
- 1995
(Show Context)
Citation Context ...There are w=q + d steps, and each step takes O(log p) time. 2 7 Related Work and Discussion Several researchers have used cost-augmented semantics for automatic time analysis or definitional purposes =-=[29, 30, 31, 27, 33, 14]-=-. Hudak and Anderson [21] used partially ordered multisets (pomsets) to model the dependences in various implementations of the -calculus. Because of the relationship between partially ordered sets an... |

48 | Parallel Programming using Functional Languages (Report CSC 91/R3
- Roe
- 1991
(Show Context)
Citation Context ...There are w=q + d steps, and each step takes O(log p) time. 2 7 Related Work and Discussion Several researchers have used cost-augmented semantics for automatic time analysis or definitional purposes =-=[29, 30, 31, 27, 33, 14]-=-. Hudak and Anderson [21] used partially ordered multisets (pomsets) to model the dependences in various implementations of the -calculus. Because of the relationship between partially ordered sets an... |

31 |
Calculi for Time Analysis of Functional Programs
- Sands
- 1990
(Show Context)
Citation Context ...icient parallel implementations of the -calculus using both call-by-value [3] and speculative parallelism [18]. These results accounted for work and depth of a computation using a profiling semantics =-=[29, 30]-=- and then related work and depth to running time on various machine models. This paper applies these ideas to the language Nesl and extends the work in two ways. First, it includes sequences (arrays) ... |

26 |
A semantic model of reference counting and its abstraction (detailed summary
- Hudak
- 1986
(Show Context)
Citation Context ...dels. There have been a handful of studies that use semantics to model the reachable space of sequential computations, in the context of both garbage collection (e.g., [25]) and copy avoidance (e.g., =-=[20]-=-). None of this work, however, has considered the extra reachable space required by a parallel evaluation. There have been a sequence of studies that place space bounds on implementations of parallel ... |

25 | On parallel hashing and integer sorting
- Matias, Vishkin
- 1990
(Show Context)
Citation Context ...r first. The stable fetch-and-add operation can be implemented in a butterfly or hypercube network by combining requests as they go through the network [26], and on a PRAM by various other techniques =-=[24, 15]-=-. For all machines, if each processor makes up to m fetch-andadd requests, all requests can be processed in O(m + log p) time with high probability (the bounds can be slightly improved on the CRCW PRA... |

24 |
A calculus for assignments in higher-order languages
- Felleisen, Friedman
- 1987
(Show Context)
Citation Context ...l role in proving the space bounds. Picking an arbitrary set of q states would not be space efficient. The P-CEK(1) machine (q = 1) is a sequential machine and closely corresponds to the CESK machine =-=[13]-=-. Section 5 defines the P-CEK(q) machine. Results. Our results are derived, by mapping the CoreNesl costs first to the array language, then to the P-CEK(q) machine and finally to the target machines. ... |

18 |
Steele Jr. C*: An extended C language for data parallel programming
- Rose, L
- 1987
(Show Context)
Citation Context ...his allows for the nesting of parallel calls. Such nested parallelism is crucial for expressing parallel divide-and-conquer or nested parallel loops (most data-parallel languages don't permit nesting =-=[19, 28]-=-). For the purpose of analyzing algorithms, the definition of Nesl includes rules for calculating the work and depth of a computation. These rules specify the work and depth for the primitives, and ho... |

17 | A provably time-efficient parallel implementation of full speculation
- Greiner, Blelloch
- 1999
(Show Context)
Citation Context ...bout the performance of the implementation. In previous work we have studied provably time efficient parallel implementations of the -calculus using both call-by-value [3] and speculative parallelism =-=[18]-=-. These results accounted for work and depth of a computation using a profiling semantics [29, 30] and then related work and depth to running time on various machine models. This paper applies these i... |

16 | Developing a Practical Projection-Based Parallel Delaunay Algorithm
- Blelloch, Miller, et al.
- 1996
(Show Context)
Citation Context ...unctional language loosely based on ML. It has been implemented on several parallel machines [8], and has been used both for teaching parallel algorithms [9, 7], and implementing various applications =-=[17, 4, 1]-=-. The parallelism in the language is based on including a primitive sequence data type, a parallel map operation, and a small set of primitive operations on sequences. To be useful for analyzing paral... |

16 | Parallelism in sequential functional languages. Pages 226–237 of
- Blelloch, Greiner
- 1995
(Show Context)
Citation Context ...defined and to make guarantees about the performance of the implementation. In previous work we have studied provably time efficient parallel implementations of the -calculus using both call-by-value =-=[3]-=- and speculative parallelism [18]. These results accounted for work and depth of a computation using a profiling semantics [29, 30] and then related work and depth to running time on various machine m... |

16 | A comparison of parallel algorithms for connected components
- GREINER
- 1994
(Show Context)
Citation Context ...unctional language loosely based on ML. It has been implemented on several parallel machines [8], and has been used both for teaching parallel algorithms [9, 7], and implementing various applications =-=[17, 4, 1]-=-. The parallelism in the language is based on including a primitive sequence data type, a parallel map operation, and a small set of primitive operations on sequences. To be useful for analyzing paral... |

16 |
Pomset interpretations of parallel functional programs
- Hudak, Anderson
- 1987
(Show Context)
Citation Context ... work of the computation. Although the operational semantics itself is sequential, the rules for combining DAGs explicitly define what constructs are parallel. This is similar to Hudak and Anderson's =-=[21]-=- use of pomsets (partially ordered multisets) to add intensional information on execution order to the denotational semantics of the -calculus. The second cost measure is an accounting of the reachabl... |

15 | Efficient compilation of high-level data parallel algorithms
- Suciu, Tannen
- 1994
(Show Context)
Citation Context ... [6], but the time bounds are restricted to a class of program that are called contained. Similar results were shown by Suciu and Tannen for a parallel language based on while loops and map recursion =-=[32]-=-. Practical issues. The design of the intermediate language and machine were optimized to simplify the proofs rather than for practical considerations. Here we briefly discuss some modifications to ma... |

14 |
Fluent Parallel Computation
- Ranade
- 1989
(Show Context)
Citation Context ...peration for all three machines, we process p log p states on each step instead of just p (i.e., we use a P-CEK(p log p) machine). Our simulation uses the fetch-and-add operation [16] (or multiprefix =-=[26]-=-). In this operation, each processor has an address and an integer value i. In parallel all processors can atomically fetch the value from the address while incrementing the value by i. In our case it... |

13 |
Storage management in virtual tree machines
- Burton
- 1988
(Show Context)
Citation Context ... this work, however, has considered the extra reachable space required by a parallel evaluation. There have been a sequence of studies that place space bounds on implementations of parallel languages =-=[11, 10, 12, 2]-=-. For a shared memory model, which is required to efficiently simulate the -calculus because of shared pointers, the best results are those by Blelloch, Gibbons, and Matias [2], which are the results ... |

13 |
Space efficient execution of deterministic parallel programs
- Simpson, Burton
- 1999
(Show Context)
Citation Context ... this work, however, has considered the extra reachable space required by a parallel evaluation. There have been a sequence of studies that place space bounds on implementations of parallel languages =-=[11, 10, 12, 2]-=-. For a shared memory model, which is required to efficiently simulate the -calculus because of shared pointers, the best results are those by Blelloch, Gibbons, and Matias [2], which are the results ... |

6 | notes: Programming parallel algorithms
- Blelloch, Hardwick
- 1993
(Show Context)
Citation Context ... [6]. Nesl is a strongly typed call-by-value functional language loosely based on ML. It has been implemented on several parallel machines [8], and has been used both for teaching parallel algorithms =-=[9, 7]-=-, and implementing various applications [17, 4, 1]. The parallelism in the language is based on including a primitive sequence data type, a parallel map operation, and a small set of primitive operati... |

5 |
Fast and efficient simulations among CRCW PRAMs
- Gil, Matias
- 1994
(Show Context)
Citation Context ...r first. The stable fetch-and-add operation can be implemented in a butterfly or hypercube network by combining requests as they go through the network [26], and on a PRAM by various other techniques =-=[24, 15]-=-. For all machines, if each processor makes up to m fetch-andadd requests, all requests can be processed in O(m + log p) time with high probability (the bounds can be slightly improved on the CRCW PRA... |

5 |
Complexity issues in the design of functional languages with explicit parallelism
- Zimmermann
- 1992
(Show Context)
Citation Context |

4 | A comparison of two n-body algorithms - Blelloch, Narlikar - 1994 |

1 | and Girija Narlikar. A comparison of two n-body algorithms - Blelloch - 1994 |

1 | Class not es: Programming parallel algorithms - Blelloch, Hardwick - 1993 |

1 | Consider a traversal T - Springer-Verlag - 1989 |