Results 1 
5 of
5
Transforming HighLevel DataParallel Programs into Vector Operations
 Proceedings Principles and Practices of Parallel Programming 93, ACM
, 1993
"... Fullyparallel execution of a highlevel dataparallel language based on nested sequences, higher order functions and generalized iterators can be realized in the vector model using a suitable representation of nested sequences and a small set of transformational rules to distribute iterators throug ..."
Abstract

Cited by 53 (21 self)
 Add to MetaCart
Fullyparallel execution of a highlevel dataparallel language based on nested sequences, higher order functions and generalized iterators can be realized in the vector model using a suitable representation of nested sequences and a small set of transformational rules to distribute iterators through the constructs of the language. 1.
Experiences with parallel Nbody simulation
, 2000
"... This paper describes our experiences developing highperformance code for astrophysical Nbody simulations. Recent Nbody methods are based on an adaptive tree structure. The tree must be built and maintained across physically distributed memory; moreover, the communication requirements are irregul ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
This paper describes our experiences developing highperformance code for astrophysical Nbody simulations. Recent Nbody methods are based on an adaptive tree structure. The tree must be built and maintained across physically distributed memory; moreover, the communication requirements are irregular and adaptive. Together with the need to balance the computational workload among processors, these issues pose interesting challenges and tradeoffs for highperformance implementation. Our implementation was guided by the need to keep solutions simple and general. We use a technique for implicitly representing a dynamic global tree across multiple processors which substantially reduces the programming complexity as well as the performance overheads of distributed memory architectures. The contributions include methods to vectorize the computation and minimize communication time which are theoretically and experimentally justified. The code has been tested by varying the number and distribution of bodies on different configurations of the Connection Machine CM5. The overall performance on instances with 10 million bodies is typically over 48 percent of the peak machine rate, which compares favorably with other approaches.
The Proteus System for the Development of Parallel Applications
, 1994
"... Target Language In our methodology we have identified a small set of specifications that comprise the abstract target language (ATL) of the refinement system. These are specifications of types such as arrays, lists, tuples, integers, characters, etc., that commonly appear in programming languages. ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Target Language In our methodology we have identified a small set of specifications that comprise the abstract target language (ATL) of the refinement system. These are specifications of types such as arrays, lists, tuples, integers, characters, etc., that commonly appear in programming languages. The refinement expresses a system as a definitional extension the ATL specs. Thus by associating a modela concrete type in a specific programming languagewith each ATL specification the complete system specification is compiled. 5.2.3 Proteus to DPL Translation The translation of Proteus to DPL consists of a series of major steps: 1. Expansion of iterator expressions into image and filter expressions. 2. Conversion to dataparallel form. 3. An interpretation of sequences into the nested sequence vocabulary of DPL. 4. Addition of storage management code. 5. Conversion into C. Source Mediating Target CORESEQ SEQASARRAY ARRAY SEQ Component 1 System Component 2 CORESEQ SEQASARRAY ...
Chapter 5: The Proteus System for the Development of Parallel Applications
"... In recent years technological advances have made the construction of largescale parallel computers economically attractive. These machines have the potential to provide fast solutions to computationally demanding problems that arise in computational science, realtime control, computer simulation, ..."
Abstract
 Add to MetaCart
In recent years technological advances have made the construction of largescale parallel computers economically attractive. These machines have the potential to provide fast solutions to computationally demanding problems that arise in computational science, realtime control, computer simulation, large database manipulation and other areas. However, applications that exploit this performance potential have been slow to appear; such applications have proved exceptionally
by
"... Given an ensemble of n bodies in space whose interaction is governed by a potential function, the Nbody problem is to calculate the force on each body in the ensemble that results from its interaction with all other bodies. An efficient algorithm for this problem is critical in the simulation of mo ..."
Abstract
 Add to MetaCart
Given an ensemble of n bodies in space whose interaction is governed by a potential function, the Nbody problem is to calculate the force on each body in the ensemble that results from its interaction with all other bodies. An efficient algorithm for this problem is critical in the simulation of molecular dynamics, turbulent fluid flow, intergalactic matter and other problems. The fast multipole algorithm (FMA) developed by Greengard approximates the solution with bounded error in time O(n). For nonuniform distributions of bodies, an adaptive variation of the algorithm is required to maintain this time complexity. The parallel execution of the FMA poses complex implementation issues in the decomposition of the problem over processors to reduce communication. As a result the 3D Adaptive FMA has, to our knowledge, never been implemented on a scalable parallel computer. This paper describes several variations on the parallel adaptive 3D FMA algorithm that are expressed using the dataparallel subset of the highlevel parallel prototyping language Proteus. These formulations have implicit parallelism that is executed sequentially using the current Proteus execution system to yield some insight into the performance of the variations. Efforts underway will make it possible to directly generate vector code from the formulations, rendering them executable on a broad class of parallel computers. 1.