Results 1 - 10
of
29
Implementation of a Portable Nested Data-Parallel Language
- Journal of Parallel and Distributed Computing
, 1994
"... This paper gives an overview of the implementation of Nesl, a portable nested data-parallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel alg ..."
Abstract
-
Cited by 154 (26 self)
- Add to MetaCart
This paper gives an overview of the implementation of Nesl, a portable nested data-parallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel algorithms on irregular data, such as sparse matrices and graphs. In addition, they maintain the advantages of data-parallel languages: a simple programming model and portability. The current Nesl implementation is based on an intermediate language called Vcode and a library of vector routines called Cvl. It runs on the Connection Machine CM-2, the Cray Y-MP C90, and serial machines. We compare initial benchmark results of Nesl with those of machine-specific code on these machines for three algorithms: least-squares line-fitting, median finding, and a sparse-matrix vector product. These results show that Nesl's performance is competitive with that of machine-specific codes for regular dense da...
Exploiting Task and Data Parallelism on a Multicomputer
- In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1993
"... For many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different tradeoffs between task and data parallelism are ..."
Abstract
-
Cited by 86 (21 self)
- Add to MetaCart
For many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different tradeoffs between task and data parallelism are appropriate for a parallel system. Most existing compilers focus on only one of data parallelism and task parallelism. Therefore, to achieve the desired results, the programmer must separately program the data and task parallelism. We have taken a unified approach to exploiting both kinds of parallelism in a single framework with an existing language. This approach eases the task of programming and exposes the tradeoffs between data and task parallelism to the compiler. We have implemented a parallelizing Fortran compiler for the iWarp system based on this approach. We discuss the design of our compiler, and present performance results to validate our approach. 1 Introduction Many applicati...
Compiling Fortran 90D/HPF for distributed memory MIMD computers
- Journal of Parallel and Distributed Computing
, 1994
"... This paper describes the design of the Fortran90D/HPF compiler, a source-to-source parallel compiler for distributed memory systems being developed at Syracuse University. Fortran 90D/HPF is a data parallel language with special directives to specify data alignment and distributions. A systematic me ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
This paper describes the design of the Fortran90D/HPF compiler, a source-to-source parallel compiler for distributed memory systems being developed at Syracuse University. Fortran 90D/HPF is a data parallel language with special directives to specify data alignment and distributions. A systematic methodology to process distribution directives of Fortran 90D/HPF is presented. Furthermore, techniques for data and computation partitioning, communication detection and generation, and the run-time support for the compiler are discussed. Finally, initial performance results for the compiler are presented. We believe that the methodology to process data distribution, computation partitioning, communication system design and the overall compiler design can be used by the implementors of compilers for HPF.
Fortran 90D/HPF Compiler for Distributed Memory MIMD Computers: Design, Implementation, and Performance Results
- In Proceedings of Supercomputing '93
, 1993
"... Fortran 90D/HPF is a data parallel language with special directives to enable users to specify data alignment and distributions. This paper describes the design and implementation of a Fortran90D/HPF compiler. Techniques for data and computation partitioning, communication detection and generation, ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
Fortran 90D/HPF is a data parallel language with special directives to enable users to specify data alignment and distributions. This paper describes the design and implementation of a Fortran90D/HPF compiler. Techniques for data and computation partitioning, communication detection and generation, and the run-time support for the compiler are discussed. Finally, initial performance results for the compiler are presented which show that the code produced by the compiler is portable, yet efficient. We believe that the methodology to process data distribution, computation partitioning, communication system design and the overall compiler design can be used by the implementors of HPF compilers. This work was supported in part by NSF under CCR-9110812 (Center for Research on Parallel Computation) and DARPA under contract # DABT63-91-C-0028. The content of the information does not necessarily reflect the position or the policy of the Government and no official endorsement should be inferr...
Mapping Algorithms and Software Environment for Data Parallel PDE . . .
- JOURNAL OF DISTRIBUTED AND PARALLEL COMPUTING
, 1994
"... We consider computations associated with data parallel iterative solvers used for the numerical solution of Partial Differential Equations (PDEs). The mapping of such computations into load balanced tasks requiring minimum synchronization and communication is a difficult combinatorial optimization p ..."
Abstract
-
Cited by 31 (19 self)
- Add to MetaCart
We consider computations associated with data parallel iterative solvers used for the numerical solution of Partial Differential Equations (PDEs). The mapping of such computations into load balanced tasks requiring minimum synchronization and communication is a difficult combinatorial optimization problem. Its optimal solution is essential for the efficient parallel processing of PDE computations. Determining data mappings that optimize a number of criteria, likeworkload balance, synchronization and local communication, often involves the solution of an NP-Complete problem. Although data mapping algorithms have been known for a few years there is lack of qualitative and quantitative comparisons based on the actual performance of the parallel computation. In this paper we present two new data mapping algorithms and evaluate them together with a large number of existing ones using the actual performance of data parallel iterative PDE solvers on the nCUBE II. Comparisons on the performance of data parallel iterative PDE solvers on medium and large scale problems demonstrate that some computationally inexpensive data block partitioning algorithms are as effective as the computationally expensive deterministic optimization algorithms. Also, these comparisons demonstrate that the existing approach in solving the data partitioning problem is inefficient for large scale problems. Finally, a software environment for the solution of the partitioning problem of data parallel iterative solvers is presented.
Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs
, 1994
"... For a wide variety of applications, both task and data parallelism must be exploited to achieve the best possible performance on a multicomputer. Recent research has underlined the importance of exploiting task and data parallelism in a single compilerframework, and such a compiler can map a single ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
For a wide variety of applications, both task and data parallelism must be exploited to achieve the best possible performance on a multicomputer. Recent research has underlined the importance of exploiting task and data parallelism in a single compilerframework, and such a compiler can map a single source program in many different ways onto a parallel machine. The tradeoffs between task and data parallelism are complex and depend on the characteristics of the program to be executed, most significantly the memory and communication requirements, and the performance parameters of the target parallel machine. In this paper, we present a framework to isolate and examine the specific characteristics of programs that determine the performance for different mappings. Our focus is on applications that process a stream of input, and whose computation structure is fairly static and predictable. We describe three such applications that were developed with our compiler: fast Fourier transforms, nar...
A Compilation Approach for Fortran 90D/HPF Compilers on Distributed Memory MIMD Computers
- In Proc. Sixth Annual Workshop on Languages and Compilers for Parallel Computing
, 1993
"... This paper describes a compilation approach for a Fortran 90D/HPF compiler, a source-tosource parallel compiler for distributed memory systems. Different from Fortran 77 parallelizing compilers, a Fortran90D/HPF compiler does not parallelize sequential constructs. Only parallelism expressed by Fortr ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
This paper describes a compilation approach for a Fortran 90D/HPF compiler, a source-tosource parallel compiler for distributed memory systems. Different from Fortran 77 parallelizing compilers, a Fortran90D/HPF compiler does not parallelize sequential constructs. Only parallelism expressed by Fortran 90D/HPF parallel constructs is exploited. The methodoly of parallelizing Fortran programs such as computation partitioning, communication detection and generation, and the run-time support for the compiler are discussed. An example of Gaussian Elimination is used to illustrate the compilation techniques with performance results. This work was supported in part by NSF under CCR-9110812 (Center for Research on Parallel Computation) and DARPA under contract # DABT63-91-C-0028. The content of the information does not necessarily reflect the position or the policy of the Government and no official endorsement should be inferred. y Corresponding Author: Alok Choudhary, 121 Link Hall, ECE De...
Data Access Reorganizations in Compiling Out-of-core Data Parallel Programs on Distributed Memory Machines
, 1994
"... This paper describes techniques for translating out-of-core programs written in a data parallel language like HPF to message passing node programs with explicit parallel I/O. We describe the basic compilation model and various steps involved in the compilation. The compilation process is explained w ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
This paper describes techniques for translating out-of-core programs written in a data parallel language like HPF to message passing node programs with explicit parallel I/O. We describe the basic compilation model and various steps involved in the compilation. The compilation process is explained with the help of an out-of-core matrix multiplication program. We first discuss how an out-of-core program can be translated by extending the method used for translating in-core programs. We demonstrate that straightforward extension of in-core compiler does not work for out-of-core programs. We then describe how the compiler can optimize the code by (1) estimating the I/O costs associated with different array access patterns, (2) reorganizing array accesses, (3) selecting the method with the least I/O cost, and (4) allocating memory according to access cost for competing out-of-core arrays. These optimizations can reduce the amount of I/O by as much as an order of magnitude. Performance resu...
Fast and Parallel Mapping Algorithms for Irregular Problems
- Journal of Supercomputing
, 1993
"... In this paper we develop simple index-based graph partitioning techniques. We show our methods to be very fast, easily parallelizable and that they produce good quality mappings. These properties make them useful for parallelization of a number of irregular and adaptive applications. Index Terms: M ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
In this paper we develop simple index-based graph partitioning techniques. We show our methods to be very fast, easily parallelizable and that they produce good quality mappings. These properties make them useful for parallelization of a number of irregular and adaptive applications. Index Terms: Mapping, Remapping, Parallel, Merging, Sorting 1 Introduction Parallelization of data-parallel programs on distributed-memory parallel computers requires careful attention to load balancing and reduction of communication to achieve a good performance. For most regular and synchronous problems [13], mapping can be performed at the time of compilation by giving directives to decompose the data and its corresponding computations [8]. For irregular applications, achieving a good mapping is considerably more difficult; the nature of the irregularities may not be known at the time of compilation and can be derived only at runtime [7]. These applications can be represented as computational graphs...
Automatic Mapping of Task and Data Parallel Programs for Efficient Execution on Multicomputers
, 1993
"... For a wide variety of applications, both task and data parallelism must be exploited to achieve the best possible performance on a multicomputer. Recent research has underlined the importance of exploitingtask and data parallelism in a single compiler framework, and such a compiler can map a single ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
For a wide variety of applications, both task and data parallelism must be exploited to achieve the best possible performance on a multicomputer. Recent research has underlined the importance of exploitingtask and data parallelism in a single compiler framework, and such a compiler can map a single source program in many different ways onto a parallel machine. There are several complex tradeoffs between task and data parallelism, depending on the characteristics of the program to be executed and the performance parameters of the target parallel machine. This makes it very difficult for a programmer to select a good mapping for a task and data parallel program. In this paper we isolate and examine specific characteristics of executing programs that determine the performance for different mappings on a parallel machine, and present an automatic system to obtain good mappings. The process consists of two steps: First, an instrumented input program is executed a fixed number of times with ...

