Results 11 - 20
of
50
Computing Global Combine Operations in the Multi-Port Postal Model
, 1996
"... Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n distributed pieces of data and to make the result known to all the n processors. This operation is frequent ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n distributed pieces of data and to make the result known to all the n processors. This operation is frequently used in many message-passing systems and is typically referred to as global combine, census computation, or gossiping. This paper explores the problem of global combine in the multi-port postal model for message-passing systems. This model is characterized by three parameters: n --- the number of processors, k --- the number of ports per processor, and --- the communication latency. In this model, in every round r, each processor can send k distinct messages to k other processors, and it can receive k messages that were sent out from k other processors \Gamma 1 rounds earlier. This paper provides an optimal algorithm for the global combine problem that requires the least number of comm...
The Multicomputer Toolbox - First-Generation Scalable Libraries
, 1993
"... "First-generation" scalable parallel libraries have been achieved, and are maturing, within the Multicomputer Toolbox. The Toolbox includes sparse, dense, iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms, plus an inter-architecture ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
"First-generation" scalable parallel libraries have been achieved, and are maturing, within the Multicomputer Toolbox. The Toolbox includes sparse, dense, iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms, plus an inter-architecture Makefile mechanism for building applications. We have devised C-based strategies for useful classes of distributed data structures, including distributed matrices and vectors. The underlying Zipcodemessage passing system has enabled process-grid abstractions of multicomputers, communication contexts, and process groups, all characteristics needed for building scalable libraries, and scalable application software. We describe the data-distribution-independent approach to building scalable libraries, which is needed so that applications do not unnecessarily have to redistribute data at high expense. We discuss the strategy used for implementing data-distribution mappings. We also describe hig...
The IBM External User Interface for Scalable Parallel Systems
- Parallel Computing
, 1994
"... The IBM External User Interface (EUI) for scalable parallel systems is a parallel programming library designed for the IBM line of scalable parallel computers. The first computer in this line, the IBM 9076 SP1, was announced in February 1993. This paper examines several aspects of the design and dev ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
The IBM External User Interface (EUI) for scalable parallel systems is a parallel programming library designed for the IBM line of scalable parallel computers. The first computer in this line, the IBM 9076 SP1, was announced in February 1993. This paper examines several aspects of the design and development of the EUI. 1 Introduction The IBM External User Interface (EUI) for scalable parallel systems is an application programming interface that was designed for the IBM line of scalable parallel computers. The first computer in this line, the IBM Scalable POWERparallel System 9076 SP1, was announced in February 1993. The design of the EUI is aimed at providing a scalable and efficient parallel programming environment over a wide range of parallel products from IBM. The EUI is a library of coordination and communication routines that can be invoked from within FORTRAN or C application programs. Over the past several years, a large number of programming environments and communication l...
A Toolkit for Parallel Image Processing
- Proceedings of the SPIE Conference on Parallel and Distributed Methods for Image processing
, 1998
"... In this paper, we present the design and implementation of a parallel image processing software library (the Parallel Image Processing Toolkit). The Toolkit not only supplies a rich set of image processing routines, it is designed principally as an extensible framework containing generalized paralle ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In this paper, we present the design and implementation of a parallel image processing software library (the Parallel Image Processing Toolkit). The Toolkit not only supplies a rich set of image processing routines, it is designed principally as an extensible framework containing generalized parallel computational kernels to support image processing. Users can easily add their own image processing routines without knowledge or explicit use of the underlying data distribution mechanisms or parallel computing model. Shared memory and multi-level memory hierarchies are exploited to achieve high performance on each node, thereby minimizing overall parallel execution time. Multiple load balancing schemes have been implemented within the parallel framework that transparently distribute the computational load evenly on a distributed memory computing environment. Inside the Toolkit, a message-passing model of parallelism is designed around the Message Passing Interface (MPI) standard. Experime...
The Multicomputer Toolbox: Current and Future Directions
- Proceedings of the Scalable Parallel Libraries Conference. IEEE Computer
, 1993
"... The Multicomputer Toolbox is a set of "firstgeneration " scalable parallel libraries. The Toolbox includes sparse, dense, direct and iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. The Toolbox has an object-oriented design; C-bas ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The Multicomputer Toolbox is a set of "firstgeneration " scalable parallel libraries. The Toolbox includes sparse, dense, direct and iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. The Toolbox has an object-oriented design; C-based strategies for classes of distributed data structures (including distributed matrices and vectors) as well as uniform calling interfaces are defined. At a high level in the Toolbox, data-distributionindependence (DDI) support is provided. DDI is needed to build scalable libraries, so that applications do not have to redistribute data before calling libraries. Data-distribution-independent mapping functions implement this capability. Data-distribution-independent algorithms are sometimes more efficient than fixeddata -distribution counterparts, because redistribution of data can be avoided. Underlying the system is a "performance and portability layer," which includes interfaces to sequent...
Object-Oriented Analysis and Design of the Message Passing Interface
, 1998
"... The major contribution of this paper is the application of modern analysis techniques to the important Message Passing Interface standard, work done in order to obtain information useful in designing both application programmer interfaces for objectoriented languages, and message passing systems. ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The major contribution of this paper is the application of modern analysis techniques to the important Message Passing Interface standard, work done in order to obtain information useful in designing both application programmer interfaces for objectoriented languages, and message passing systems. Recognition of "Design Patterns" within MPI is an important discernment of this work. A further contribution is a comparative discussion of the design and evolution of three actual object-oriented designs for the Message Passing Interface (MPI-1) application programmer interface (API), two of which have influenced the standardization of C++ explicit parallel programming with MPI-2, and which strongly indicate the value of a priori object-oriented design and analysis of such APIs. Knowledge of design patterns is assumed herein. Discussion provided here includes systems developed at Mississippi State University (MPI++), the University of Notre Dame (OOMPI), and the merger of these sys...
Reuse, Portability and Parallel Libraries
- In Proceedings of IFIP WG10.3---Programming Environments for Massively Parallel Distributed Systems
, 1994
"... Parallel programs are typically written in an explicitly parallel fashion using either message passing or shared memory primitives. Message passing is attractive for performance and portability since shared memory machines can efficiently execute message passing programs, however message passing mac ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Parallel programs are typically written in an explicitly parallel fashion using either message passing or shared memory primitives. Message passing is attractive for performance and portability since shared memory machines can efficiently execute message passing programs, however message passing machines cannot in general effectively execute shared memory programs. In order to write a parallel program using message passing, the programmer is often obliged to develop a significant amount of code which manages distributed data and events and parallel input/output, and such code may have little or nothing to do with the application. However many parallel applications have common structural elements and much of this additional code can be encapsulated within a parallel library and reused in several programs. We discuss the requirements the library writer and user makes of the basic message passing interface and describe how we have addressed these requirements in our Common High-Level Inte...
PCODE: An Efficient and Reliable Collective Communication Protocol for Unreliable Broadcast Domains
- IBM Research Report, RJ 9895
, 1994
"... Existing programming enwronments for clusters are typically built on top of a point-to-point coremunica- hon layer (send and receive) over local area networks (LANs) and, as a result. suffer from poor performance m the collective commumcahon part, For ezample, a broadcast that is implemented usin a ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Existing programming enwronments for clusters are typically built on top of a point-to-point coremunica- hon layer (send and receive) over local area networks (LANs) and, as a result. suffer from poor performance m the collective commumcahon part, For ezample, a broadcast that is implemented usin a TCP/IP protocol (which as a point-to-point protocolJ over a LAN is obviously inefficient as it is not utiliz,ng the fact that the LAN s a broadens! medium. We have observed that the main difference between o distributed computing paradzgm and a rne.ssage passing parallel computing paradigm is that, in a distributed environment tht actiwty of every processor *s independent whale in a parallel environment the collection of the usercommunication layers n the processors can be modeled as a single global program. We have formali,ed the requirements bg defining the notion of a correct global program. Th,s notion provides a precise specification of the interface between the transport layer and the user-communication layer. We have developed P('ODE, a new commumcahon prolocol that is driven by a global program. and proved its correctness.
Asynchronous Problems on SIMD Parallel Computers
- IEEE Trans. Parallel and Distributed Systems
, 1995
"... Abstract { One of the essential problems in parallel computing is: can SIMD machines handle asynchronous problems? This is a di cult, unsolved problem because of the mismatch between asynchronous problems and SIMD architectures. We propose a solution to let SIMD machines handle general asynchronous ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract { One of the essential problems in parallel computing is: can SIMD machines handle asynchronous problems? This is a di cult, unsolved problem because of the mismatch between asynchronous problems and SIMD architectures. We propose a solution to let SIMD machines handle general asynchronous problems. Our approach is to implement a runtime support system which can run MIMD-like software on SIMD hardware. The runtime support system, named P kernel, is thread-based. There are two major advantages of the thread-based model. First, for application problems with irregular and/or unpredictable features, automatic scheduling can move some threads from overloaded processors to underloaded processors. Second, and more importantly, the granularity of threads can be controlled to reduce system overhead. The P kernel is also able to handle bookkeeping and message management, as well as to make these low-level tasks transparent to users. Substantial performance has been obtained on Maspar MP-1. 1
Approaches to Support Parallel Programming on Workstation Clusters: A Survey
- A Survey, Informatik Berichte, Fachgruppe Informatik, Universitat-GH Siegen
, 1995
"... The goal of this report is to survey state of the art and existing approaches for parallel programming on workstation clusters with special emphasis on object-oriented programming. First, workstation clusters as parallel computing platforms are characterized and fundamental concepts for parallel pro ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The goal of this report is to survey state of the art and existing approaches for parallel programming on workstation clusters with special emphasis on object-oriented programming. First, workstation clusters as parallel computing platforms are characterized and fundamental concepts for parallel programming are discussed. Then, an overview of existing tools, systems, languages, and environments is given. The report concludes by identifying features of software systems suitable for parallel object-oriented programming on top of workstation clusters.

