#### DMCA

## A provable time and space efficient implementation of nesl (1996)

### Cached

### Download Links

- [www-2.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www-2.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In International Conference on Functional Programming |

Citations: | 84 - 10 self |

### Citations

997 |
Performance Fortran Forum. High performance fortran language specification vertion 1.0,1993
- High
(Show Context)
Citation Context ...his allows for the nesting of parallel calls. Such nested parallelism is crucial for expressing parallel divide-and-conquer or nested parallel loops (most data-parallel languages don't permit nesting =-=[19, 28]-=-). For the purpose of analyzing algorithms, the definition of Nesl includes rules for calculating the work and depth of a computation. These rules specify the work and depth for the primitives, and ho... |

716 |
An Introduction to Parallel Algorithms.
- JaJa
- 1992
(Show Context)
Citation Context ...ng the work (the total number of operations executed) and depth (the longest chain of sequential dependences) of a computation. These are standard cost measures in the analysis of parallel algorithms =-=[23, 22]-=-. In this paper we formalize these rules and give provable implementation bounds for both space and time. 1 The implementation is based on a randomized algorithm for the fetch-and-add operator and wil... |

300 |
Vector Models for Data-Parallel Computing
- Blelloch
- 1990
(Show Context)
Citation Context ... that the implementation discussed in this paper does not correspond to the current implementation of Nesl [8]. The current implementation is based on a technique called flattening nested parallelism =-=[5]-=-, which has very good performance characteristics, but can be space inefficient because it generates too much parallelism. For example the Nesl code --count(--a ! b: a in s) : b in s, which calculates... |

235 | Parallel Algorithms ,
- Maggs
- 1996
(Show Context)
Citation Context ... [6]. Nesl is a strongly typed call-by-value functional language loosely based on ML. It has been implemented on several parallel machines [8], and has been used both for teaching parallel algorithms =-=[9, 7]-=-, and implementing various applications [17, 4, 1]. The parallelism in the language is based on including a primitive sequence data type, a parallel map operation, and a small set of primitive operati... |

203 | Implementation of a portable nested data-parallel language.
- Blelloch, Hardwick, et al.
- 1993
(Show Context)
Citation Context ...ient implementation of the parallel programming language Nesl [6]. Nesl is a strongly typed call-by-value functional language loosely based on ML. It has been implemented on several parallel machines =-=[8]-=-, and has been used both for teaching parallel algorithms [9, 7], and implementing various applications [17, 4, 1]. The parallelism in the language is based on including a primitive sequence data type... |

109 | NESL: A nested data-parallel language (version 3.1
- Blelloch
- 1995
(Show Context)
Citation Context ...speedup and use space within a constant factor of the sequential space. 1 Introduction This paper presents a provably time and space efficient implementation of the parallel programming language Nesl =-=[6]-=-. Nesl is a strongly typed call-by-value functional language loosely based on ML. It has been implemented on several parallel machines [8], and has been used both for teaching parallel algorithms [9, ... |

109 | C.E.: Space-efficient scheduling of multithreaded computations.
- Blumofe, Leiserson
- 1998
(Show Context)
Citation Context ... this work, however, has considered the extra reachable space required by a parallel evaluation. There have been a sequence of studies that place space bounds on implementations of parallel languages =-=[11, 10, 12, 2]-=-. For a shared memory model, which is required to efficiently simulate the -calculus because of shared pointers, the best results are those by Blelloch, Gibbons, and Matias [2], which are the results ... |

101 |
Parallel algorithms for shared memory machines," Handbook of Theoretical Computer Science
- Karp, Ramachandran
- 1988
(Show Context)
Citation Context ...ng the work (the total number of operations executed) and depth (the longest chain of sequential dependences) of a computation. These are standard cost measures in the analysis of parallel algorithms =-=[23, 22]-=-. In this paper we formalize these rules and give provable implementation bounds for both space and time. 1 The implementation is based on a randomized algorithm for the fetch-and-add operator and wil... |

94 | Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors.
- Gottlieb, Lubachevsky, et al.
- 1983
(Show Context)
Citation Context ...n the fetch-and-add operation for all three machines, we process p log p states on each step instead of just p (i.e., we use a P-CEK(p log p) machine). Our simulation uses the fetch-and-add operation =-=[16]-=- (or multiprefix [26]). In this operation, each processor has an address and an integer value i. In parallel all processors can atomically fetch the value from the address while incrementing the value... |

94 | Automatic complexity analysis
- Rosendahl
- 1989
(Show Context)
Citation Context ...icient parallel implementations of the -calculus using both call-by-value [3] and speculative parallelism [18]. These results accounted for work and depth of a computation using a profiling semantics =-=[29, 30]-=- and then related work and depth to running time on various machine models. This paper applies these ideas to the language Nesl and extends the work in two ways. First, it includes sequences (arrays) ... |

92 | Provably efficient scheduling for languages with fine-grained parallelism.
- Blelloch, Gibbons, et al.
- 1995
(Show Context)
Citation Context ...ams with sufficient parallelism, the parallel execution requires very little extra memory beyond a standard call-by-value sequential execution. These space bounds use recent results on DAG scheduling =-=[2]-=- and are non trivial. Although we use these extensions to prove bounds for Nesl, the techniques and results can be applied in a broader context. In particular we translate Nesl into a generic array la... |

88 | Abstract models of memory management
- Morrisett, Felleisen, et al.
- 1995
(Show Context)
Citation Context ... language to those in machine models. There have been a handful of studies that use semantics to model the reachable space of sequential computations, in the context of both garbage collection (e.g., =-=[25]-=-) and copy avoidance (e.g., [20]). None of this work, however, has considered the extra reachable space required by a parallel evaluation. There have been a sequence of studies that place space bounds... |

69 | C*: An extended C language for data parallel programming," in - Rose, Jr - 1987 |

61 | A cost calculus for parallel functional programming
- Skillicorn, Cai
(Show Context)
Citation Context ...There are w=q + d steps, and each step takes O(log p) time. 2 7 Related Work and Discussion Several researchers have used cost-augmented semantics for automatic time analysis or definitional purposes =-=[29, 30, 31, 27, 33, 14]-=-. Hudak and Anderson [21] used partially ordered multisets (pomsets) to model the dependences in various implementations of the -calculus. Because of the relationship between partially ordered sets an... |

58 | The semantics of future and its use in program optimization
- Flanagan, Felleisen
- 1995
(Show Context)
Citation Context ...There are w=q + d steps, and each step takes O(log p) time. 2 7 Related Work and Discussion Several researchers have used cost-augmented semantics for automatic time analysis or definitional purposes =-=[29, 30, 31, 27, 33, 14]-=-. Hudak and Anderson [21] used partially ordered multisets (pomsets) to model the dependences in various implementations of the -calculus. Because of the relationship between partially ordered sets an... |

54 | Parallel programming using functional languages
- Roe
- 1991
(Show Context)
Citation Context ...There are w=q + d steps, and each step takes O(log p) time. 2 7 Related Work and Discussion Several researchers have used cost-augmented semantics for automatic time analysis or definitional purposes =-=[29, 30, 31, 27, 33, 14]-=-. Hudak and Anderson [21] used partially ordered multisets (pomsets) to model the dependences in various implementations of the -calculus. Because of the relationship between partially ordered sets an... |

38 |
A calculus for assignments in higherorder languages
- Felleisen, Friedman
- 1987
(Show Context)
Citation Context ...l role in proving the space bounds. Picking an arbitrary set of q states would not be space efficient. The P-CEK(1) machine (q = 1) is a sequential machine and closely corresponds to the CESK machine =-=[13]-=-. Section 5 defines the P-CEK(q) machine. Results. Our results are derived, by mapping the CoreNesl costs first to the array language, then to the P-CEK(q) machine and finally to the target machines. ... |

34 |
Calculi for time analysis of functional programs
- Sands
- 1990
(Show Context)
Citation Context ...icient parallel implementations of the -calculus using both call-by-value [3] and speculative parallelism [18]. These results accounted for work and depth of a computation using a profiling semantics =-=[29, 30]-=- and then related work and depth to running time on various machine models. This paper applies these ideas to the language Nesl and extends the work in two ways. First, it includes sequences (arrays) ... |

32 |
A semantic model of reference counting and its abstraction (detailed summary
- Hudak
- 1986
(Show Context)
Citation Context ...dels. There have been a handful of studies that use semantics to model the reachable space of sequential computations, in the context of both garbage collection (e.g., [25]) and copy avoidance (e.g., =-=[20]-=-). None of this work, however, has considered the extra reachable space required by a parallel evaluation. There have been a sequence of studies that place space bounds on implementations of parallel ... |

31 | On parallel hashing and integer sorting
- Matias, Vishkin
- 1991
(Show Context)
Citation Context ...r first. The stable fetch-and-add operation can be implemented in a butterfly or hypercube network by combining requests as they go through the network [26], and on a PRAM by various other techniques =-=[24, 15]-=-. For all machines, if each processor makes up to m fetch-andadd requests, all requests can be processed in O(m + log p) time with high probability (the bounds can be slightly improved on the CRCW PRA... |

23 | A comparison of parallel algorithms for connected components
- Greiner
- 1994
(Show Context)
Citation Context ...unctional language loosely based on ML. It has been implemented on several parallel machines [8], and has been used both for teaching parallel algorithms [9, 7], and implementing various applications =-=[17, 4, 1]-=-. The parallelism in the language is based on including a primitive sequence data type, a parallel map operation, and a small set of primitive operations on sequences. To be useful for analyzing paral... |

22 | Parallelism in sequential functional languages
- Blelloch, Greiner
- 1995
(Show Context)
Citation Context ...defined and to make guarantees about the performance of the implementation. In previous work we have studied provably time efficient parallel implementations of the -calculus using both call-by-value =-=[3]-=- and speculative parallelism [18]. These results accounted for work and depth of a computation using a profiling semantics [29, 30] and then related work and depth to running time on various machine m... |

19 |
Pomset Interpretations of Parallel Functional Programs
- Hudak, Anderson
- 1987
(Show Context)
Citation Context ... work of the computation. Although the operational semantics itself is sequential, the rules for combining DAGs explicitly define what constructs are parallel. This is similar to Hudak and Anderson's =-=[21]-=- use of pomsets (partially ordered multisets) to add intensional information on execution order to the denotational semantics of the -calculus. The second cost measure is an accounting of the reachabl... |

18 | Developing a practical projection-based parallel Delaunay algorithm
- Blelloch, Miller, et al.
- 1996
(Show Context)
Citation Context ...unctional language loosely based on ML. It has been implemented on several parallel machines [8], and has been used both for teaching parallel algorithms [9, 7], and implementing various applications =-=[17, 4, 1]-=-. The parallelism in the language is based on including a primitive sequence data type, a parallel map operation, and a small set of primitive operations on sequences. To be useful for analyzing paral... |

18 | A provably time-efficient parallel implementation of full speculation
- Greiner, Blelloch
- 1999
(Show Context)
Citation Context ...bout the performance of the implementation. In previous work we have studied provably time efficient parallel implementations of the -calculus using both call-by-value [3] and speculative parallelism =-=[18]-=-. These results accounted for work and depth of a computation using a profiling semantics [29, 30] and then related work and depth to running time on various machine models. This paper applies these i... |

17 |
Fluent parallel computation,”
- Ranade
- 1989
(Show Context)
Citation Context ...peration for all three machines, we process p log p states on each step instead of just p (i.e., we use a P-CEK(p log p) machine). Our simulation uses the fetch-and-add operation [16] (or multiprefix =-=[26]-=-). In this operation, each processor has an address and an integer value i. In parallel all processors can atomically fetch the value from the address while incrementing the value by i. In our case it... |

17 |
Steele Jr. C*: An extended language for data parallel programming
- Rose, L
- 1987
(Show Context)
Citation Context ...his allows for the nesting of parallel calls. Such nested parallelism is crucial for expressing parallel divide-and-conquer or nested parallel loops (most data-parallel languages don't permit nesting =-=[19, 28]-=-). For the purpose of analyzing algorithms, the definition of Nesl includes rules for calculating the work and depth of a computation. These rules specify the work and depth for the primitives, and ho... |

16 | Efficient compilation of high-level data parallel algorithms.
- Suciu, Tannen
- 1994
(Show Context)
Citation Context ... [6], but the time bounds are restricted to a class of program that are called contained. Similar results were shown by Suciu and Tannen for a parallel language based on while loops and map recursion =-=[32]-=-. Practical issues. The design of the intermediate language and machine were optimized to simplify the proofs rather than for practical considerations. Here we briefly discuss some modifications to ma... |

12 |
Storage management in virtual tree machines
- Burton
- 1988
(Show Context)
Citation Context ... this work, however, has considered the extra reachable space required by a parallel evaluation. There have been a sequence of studies that place space bounds on implementations of parallel languages =-=[11, 10, 12, 2]-=-. For a shared memory model, which is required to efficiently simulate the -calculus because of shared pointers, the best results are those by Blelloch, Gibbons, and Matias [2], which are the results ... |

11 |
Space efficient execution of deterministic parallel programs. Unpublished manuscript
- Burton, Simpson
- 1994
(Show Context)
Citation Context ... this work, however, has considered the extra reachable space required by a parallel evaluation. There have been a sequence of studies that place space bounds on implementations of parallel languages =-=[11, 10, 12, 2]-=-. For a shared memory model, which is required to efficiently simulate the -calculus because of shared pointers, the best results are those by Blelloch, Gibbons, and Matias [2], which are the results ... |

6 | notes: Programming parallel algorithms
- Blelloch, Hardwick
- 1993
(Show Context)
Citation Context ... [6]. Nesl is a strongly typed call-by-value functional language loosely based on ML. It has been implemented on several parallel machines [8], and has been used both for teaching parallel algorithms =-=[9, 7]-=-, and implementing various applications [17, 4, 1]. The parallelism in the language is based on including a primitive sequence data type, a parallel map operation, and a small set of primitive operati... |

6 |
Complexity issues in the design of functional languages with explicit parallelism
- Zimmermann
- 1992
(Show Context)
Citation Context |

5 |
Fast and efficient simulations among CRCW PRAMs.
- Gil, Matias
- 1994
(Show Context)
Citation Context ...r first. The stable fetch-and-add operation can be implemented in a butterfly or hypercube network by combining requests as they go through the network [26], and on a PRAM by various other techniques =-=[24, 15]-=-. For all machines, if each processor makes up to m fetch-andadd requests, all requests can be processed in O(m + log p) time with high probability (the bounds can be slightly improved on the CRCW PRA... |

4 | A comparison of two n-body algorithms - Blelloch, Narlikar - 1994 |

1 | and Girija Narlikar. A comparison of two n-body algorithms - Blelloch - 1994 |

1 | Class not es: Programming parallel algorithms - Blelloch, Hardwick - 1993 |

1 | Consider a traversal T - Springer-Verlag - 1989 |