• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A framework for exploiting data and functional parallelism on distributed memory multicomputers (1994)

by S Ramaswamy, S Sapatnekar, P Banerjee
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Automatic Generation of Efficient Array Redistribution Routines for Distributed Memory Multicomputers

by Shankar Ramaswamy, Prithviraj Banerjee , 1995
"... Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It has also been found that some programs need to change their distributions during execution for better performance (redistribut ..."
Abstract - Cited by 53 (4 self) - Add to MetaCart
Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It has also been found that some programs need to change their distributions during execution for better performance (redistribution). This work focuses on automatically generating efficient routines for redistribution. We present a new mathematical representation for regular distributions called PITFALLS and then discuss algorithms for redistribution based on this representation. One of the significant contributions of this work is being able to handle arbitrary source and target processor sets while performing redistribution. Another important contribution is the ability to handle an arbitrary number of dimensions for the array involved in the redistribution in a scalable manner. Our implementation of these techniques is based on an MPI-like communication library. The results presented show the low overheads for our redistribution algorithm as compared to naive runtime methods.

Double Standards: Bringing Task Parallelism to HPF Via the Message Passing Interface

by Ian Foster, David R. Kohr, Jr., Rakesh Krishnaiyer , Alok Choudhary - IN PROCEEDINGS OF SUPERCOMPUTING '96 , 1996
"... High Performance Fortran (HPF) does not allow efficient expression of mixed task/dataparallel computations or the coupling of separately compiled data-parallel modules. In this paper, we show how a coordination library implementing the Message Passing Interface (MPI) can be used to represent these c ..."
Abstract - Cited by 27 (4 self) - Add to MetaCart
High Performance Fortran (HPF) does not allow efficient expression of mixed task/dataparallel computations or the coupling of separately compiled data-parallel modules. In this paper, we show how a coordination library implementing the Message Passing Interface (MPI) can be used to represent these common parallel program structures. This library allows data-parallel tasks to exchange distributed data structures using calls to simple communication functions. We present microbenchmark results that characterize the performance of this library and that quantify the impact of optimizations that allow reuse of communication schedules in common situations. In addition, results from twodimensional FFT, convolution, and multiblock programs demonstrate that the HPF/MPI library can provide performance superior to that of pure HPF. We conclude that this synergistic combination of two parallel programming standards represents a useful approach to task parallelism in a data-parallel framework, incre...

A Library-Based Approach to Task Parallelism in a Data-Parallel Language

by Ian Foster, David R. Kohr, Jr., Rakesh Krishnaiyer, Alok Choudhary , 1996
"... The data-parallel language High Performance Fortran (HPF) does not allow efficient expression of mixed task/data-parallel computations or the coupling of separately compiled data-parallel modules. In this paper, we show how these common parallel program structures can be represented, with only minor ..."
Abstract - Cited by 12 (1 self) - Add to MetaCart
The data-parallel language High Performance Fortran (HPF) does not allow efficient expression of mixed task/data-parallel computations or the coupling of separately compiled data-parallel modules. In this paper, we show how these common parallel program structures can be represented, with only minor extensions to the HPF model, by using a coordination library based on the Message Passing Interface (MPI). This library allows data-parallel tasks to exchange distributed data structures using calls to simple communication functions. We present microbenchmark results that characterize the performance of this library and that quantify the impact of optimizations that allow reuse of communication schedules in common situations. In addition, results from two-dimensional FFT, convolution, and multiblock programs demonstrate that the HPF/MPI library can provide performance superior to that of pure HPF. We conclude that this synergistic combination of two parallel programming standards represents...

The Compiler TwoL for the Design of Parallel Implementations

by Thomas Rauber, Gudula Rünger - In Proc. 4th Int. Conf. on Parallel Architectures and Compilation Techniques, IEEE , 1996
"... A large number of numerical algorithms exhibit a twolevel structure with both method parallelism and system parallelism. This structure can be exploited to produce alternative parallel implementations on distributed memory machines. The compiler system TwoL (Two Level) provides interactive and semia ..."
Abstract - Cited by 5 (3 self) - Add to MetaCart
A large number of numerical algorithms exhibit a twolevel structure with both method parallelism and system parallelism. This structure can be exploited to produce alternative parallel implementations on distributed memory machines. The compiler system TwoL (Two Level) provides interactive and semiautomatic support for the design and realization of efficient parallel algorithms in this twolevel parallel programming model. The design is structured into well-defined decision steps which are formalized in a TwoL specification language, and transformations on this language. We show how the design steps lead to a parallel algorithm, how the design is formalized in the TwoL system, how this compiler system is realized, and which algorithms are amenable to automated decision steps. Design or derivation steps are based on parameterized cost functions arising from runtime predictions for the specific parallel target machine. The design process is illustrated by the parallelization of several me...

Design And Optimization Of Coordination Mechanisms For Data-Parallel Tasks

by David R. Kohr, Jr., David R. Kohr , 1996
"... Data-parallel programming languages can reduce the difficulty of developing efficient applications for contemporary parallel computers. However, many applications can benefit from a mixture of task and data parallelism. We present a library-based approach that permits programmers to coordinate data- ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Data-parallel programming languages can reduce the difficulty of developing efficient applications for contemporary parallel computers. However, many applications can benefit from a mixture of task and data parallelism. We present a library-based approach that permits programmers to coordinate data-parallel tasks using explicit message-passing operations. We discuss in detail the design of a prototype library that supports inter-task transfers of arrays in an efficient manner on distributed-memory multicomputers. Measurements with a synthetic benchmark show that in many cases the library can realize a significant fraction of a multicomputer 's peak communication performance, and reveal the sources of overheads that reduce the library's performance in other cases. We also develop an analytic model of array transfer performance as a means of predicting inter-task communication costs. iii For Jingjun, for pushing me to finish iv Acknowledgments First of all, I thank my supervisor at A...

Scheduling of Multiprocessor Tasks for Numerical Applications

by Thomas Rauber, Gudula Rünger, Thomas Rauber Gudula R Unger , 1996
"... Many applications in the area of scientific computing use algorithms with a two-level parallelism based on potential method parallelism and potential system parallelism. In this paper, we investigate the efficient implementation of such algorithms on distributed memory machines. We consider parallel ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Many applications in the area of scientific computing use algorithms with a two-level parallelism based on potential method parallelism and potential system parallelism. In this paper, we investigate the efficient implementation of such algorithms on distributed memory machines. We consider parallel specifications consisting of an upper level of multiprocessor tasks each of which having an internal structure of uni-processor tasks. To achieve an optimal parallel execution time, the parallel execution of such a program requires an optimal scheduling of the multiprocessor tasks and an appropriate treatment of uniprocessor tasks. In particular, we consider an important class of parallel programs that are generated within a specific parallel programming model designing group-SPMD programs for scientific computing. We show how the costs of data redistributions between M-tasks can be taken into consideration and how the special structure of the resulting program can be exploited by using a s...

Simultaneous Allocation And Scheduling Using Convex Programming Techniques

by Shankar Ramaswamy, Prithviraj Banerjee , 1995
"... Simultaneous exploitation of task and data parallelism provides significant benefits for many applications. The basic approach for exploiting task and data parallelism is to use a task graph representation (Macro Dataflow Graph) for programs to decide on the degree of data parallelism to be used for ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Simultaneous exploitation of task and data parallelism provides significant benefits for many applications. The basic approach for exploiting task and data parallelism is to use a task graph representation (Macro Dataflow Graph) for programs to decide on the degree of data parallelism to be used for each task (allocation) and an execution order for the tasks (scheduling). Previously, we presented a two step approach for allocation and scheduling by considering the two steps to be independent of each other. In this paper, we present a new simultaneous approach which uses constraints to model the scheduler during allocation. The new simultaneous approach provides significant benefits over our earlier approach for the benchmark task graphs that we have considered.

Integrating Library Modules into Special Purpose Parallel Algorithms

by Thomas Rauber, Gundula Rünger, Fachbereich Informatik - Proc. 2nd Int. Workshop on Software Engineering for Parallel and Distributed Systems (PDSE'97 , 1997
"... Most programs from scientific computing can benefit from the use of numerical libraries which provide efficient implementations for standard solution methods that often occur in numerical simulations. This is especially true for parallel scientific computing. A methodology that allows the integratio ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Most programs from scientific computing can benefit from the use of numerical libraries which provide efficient implementations for standard solution methods that often occur in numerical simulations. This is especially true for parallel scientific computing. A methodology that allows the integration of library functions without any additional programming effort would ease this programming style. In this paper, we address the question how to integrate library procedures into hierarchically organized parallel programs. The hierarchical structure of a specific algorithms results from a top-down decomposition into submethods which can be realized by library functions. The integration of library functions not only requires a correct specification of data dependencies between different modules but has also to take into account a possible distribution of data among the processors. We present algorithms for the adaptation of library modules such that their functional type and underlying data ...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University