Results 1 - 10
of
10
Message Multicasting In Heterogeneous Networks
, 1998
"... In heterogeneous networks, sending messages may incur different delays on different links, and each node may have a different switching time between messages. The well studied Telephone model is obtained when all link delays and switching times are equal to one unit. We investigate the problem of fi ..."
Abstract
-
Cited by 46 (0 self)
- Add to MetaCart
In heterogeneous networks, sending messages may incur different delays on different links, and each node may have a different switching time between messages. The well studied Telephone model is obtained when all link delays and switching times are equal to one unit. We investigate the problem of finding the minimum time required to multicast a message from one source to a subset of the nodes of size k. The problem is NP-hard even in the basic Telephone model. We present a polynomial time algorithm that approximates the minimum multicast time within a factor of O(log k). Our algorithm improves on the best known approximation factor for the Telephone model by a factor of O log n log log k . No approximation algorithms were known for the general model considered in this paper.
Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations
, 1995
"... Parallel computing on clusters of workstations and personal computers has very high potential, since it leverages existing hardware and software. Parallel programming environments offer the user a convenient way to express parallel computation and communication. In fact, recently, a Message Passing ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Parallel computing on clusters of workstations and personal computers has very high potential, since it leverages existing hardware and software. Parallel programming environments offer the user a convenient way to express parallel computation and communication. In fact, recently, a Message Passing Interface (MPI) has been proposed as an industrial standard for writing "portable" message-passing parallel programs. The communication part of MPI consists of the usual point-to-point communication as well as collective communication. However, existing implementations of programming environments for clusters are built on top of a point-to-point communication layer (send and receive) over local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part. In this paper, we present an efficient design and implementation of the collective communication part in MPI that is optimized for clusters of workstations. Our system consists of two main compone...
An Overview of Message Passing Environments
- Parallel Computing
, 1994
"... A majority of the MPP systems designed to date have been MIMD distributed memory systems. For almost all of these systems, message passing environments have provided the primary mechanism for programming multiprocessor applications. In this paper we provide an introduction to MPP systems in general. ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
A majority of the MPP systems designed to date have been MIMD distributed memory systems. For almost all of these systems, message passing environments have provided the primary mechanism for programming multiprocessor applications. In this paper we provide an introduction to MPP systems in general. We then introduce current MPP message passing interfaces, by tracing their historical development over the last 10 years. In addition to their use within a single MPP architecture, we discuss the use of message passing systems to interconnect more loosely coupled processors in heterogeneous environments. Finally we review the development of "portability platforms" - message passing systems that have been devised solely to allow portability of message passing programs between different systems. * Research supported in part by NSF Grand Challenges Applications Group grant ASC-9217394 and by NASA HPCC Group Grant NAG5-2218. + To appear in Parallel Computing, April 1994. TABLE OF CONTENTS ...
Computing Global Combine Operations in the Multi-Port Postal Model
, 1996
"... Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n distributed pieces of data and to make the result known to all the n processors. This operation is frequent ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n distributed pieces of data and to make the result known to all the n processors. This operation is frequently used in many message-passing systems and is typically referred to as global combine, census computation, or gossiping. This paper explores the problem of global combine in the multi-port postal model for message-passing systems. This model is characterized by three parameters: n --- the number of processors, k --- the number of ports per processor, and --- the communication latency. In this model, in every round r, each processor can send k distinct messages to k other processors, and it can receive k messages that were sent out from k other processors \Gamma 1 rounds earlier. This paper provides an optimal algorithm for the global combine problem that requires the least number of comm...
Broadcasting Multiple Messages in the Multiport Model
- Proceedings 10th International Parallel Processing Symposium
, 1999
"... We consider the problem of broadcasting multiple messages from one processor to many processors in the k-port model for message passing systems. In such systems, processors communicate in rounds, where in every round, each processor can send k messages to k processors and receive k messages from k ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We consider the problem of broadcasting multiple messages from one processor to many processors in the k-port model for message passing systems. In such systems, processors communicate in rounds, where in every round, each processor can send k messages to k processors and receive k messages from k processors. In this paper, we first present a simple and practical algorithm based on variations of k complete k-ary trees. We then present an optimal algorithm up to an additive term of one for this problem for any number of processors, any number of messages, and any value for k. 1 Introduction This paper explores the broadcast problem in the multiport model for message-passing systems. In particular, we consider (one-to-all) broadcast problem on a message-passing system modeled by a complete graph of n nodes with k-port model. We assume that there are n processors (nodes) in the system, denoted by 0; 1; : : : ; n \Gamma 1, where the source of the broadcast (the broadcaster) is processor ...
PCODE: An Efficient and Reliable Collective Communication Protocol for Unreliable Broadcast Domains
- IBM Research Report, RJ 9895
, 1994
"... Existing programming enwronments for clusters are typically built on top of a point-to-point coremunica- hon layer (send and receive) over local area networks (LANs) and, as a result. suffer from poor performance m the collective commumcahon part, For ezample, a broadcast that is implemented usin a ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Existing programming enwronments for clusters are typically built on top of a point-to-point coremunica- hon layer (send and receive) over local area networks (LANs) and, as a result. suffer from poor performance m the collective commumcahon part, For ezample, a broadcast that is implemented usin a TCP/IP protocol (which as a point-to-point protocolJ over a LAN is obviously inefficient as it is not utiliz,ng the fact that the LAN s a broadens! medium. We have observed that the main difference between o distributed computing paradzgm and a rne.ssage passing parallel computing paradigm is that, in a distributed environment tht actiwty of every processor *s independent whale in a parallel environment the collection of the usercommunication layers n the processors can be modeled as a single global program. We have formali,ed the requirements bg defining the notion of a correct global program. Th,s notion provides a precise specification of the interface between the transport layer and the user-communication layer. We have developed P('ODE, a new commumcahon prolocol that is driven by a global program. and proved its correctness.
GENERIC PROGRAMMING FOR HIGH-PERFORMANCE SCIENTIFIC COMPUTING
, 2002
"... by Lie-Quan Lee Generic programming is an important paradigm for software development, with an emphasis on reusability and performance, qualities that would seemingly make this para-digm especially suited for application to scientific computing. We apply generic pro-gramming to the development of a ..."
Abstract
- Add to MetaCart
by Lie-Quan Lee Generic programming is an important paradigm for software development, with an emphasis on reusability and performance, qualities that would seemingly make this para-digm especially suited for application to scientific computing. We apply generic pro-gramming to the development of a message passing framework (the Generic Message Passing library) for parallel computing in hybrid execution architectures (i.e., those hav-ing both shared and distributed memory). Although GMP supports both shared-memory and distributed-memory execution, it explicitly separates its programming and execution models, presenting a uniform message-based programming interface to enable source-code portability of parallel programs. At the same time, the implementation of GMP fully exploits the architectural characteristics of its execution target for maximum run-time performance. GMP is specifically designed to seamlessly integrate with modern generic C++ libraries such as the C++ Standard Library. C++ objects with complex data
Communication Latency Hiding - Model and. . .
, 1994
"... The potential of large numbers of workstations for solving very large problems is tremendous. Nevertheless, it is often considered inappropriate to parallelize applications with a fair amount of communication on computer networks, because communication via networks with high latency and low bandwidt ..."
Abstract
- Add to MetaCart
The potential of large numbers of workstations for solving very large problems is tremendous. Nevertheless, it is often considered inappropriate to parallelize applications with a fair amount of communication on computer networks, because communication via networks with high latency and low bandwidth presents a technological bottleneck. In this paper, a model to analyze the gain of communication latency hiding by overlapping computation and communication is described. This model captures the limitations and illustrates the opportunities of communication latency hiding for improving speedup and efficiency of parallel computations that can be structured appropriately. Furthermore, an implementation of a message passing protocol is presented that incorporates latency hiding on top of the TCP/IP transport layer. This protocol ensures efficient, deadlock-free communication in UNIX network environments. Experiments show that the presented latency hiding technique increases the range of appli...
A Tool for Distributed Application Development Based on Asynchronous Message Passing
"... This paper describes a tool for the development of distributed applications. The target execution environment is a network of heterogenous general-purpose workstations. This tool (called HERMES) mainly handles the message transport layer and does not deal with aspects more directly related to parall ..."
Abstract
- Add to MetaCart
This paper describes a tool for the development of distributed applications. The target execution environment is a network of heterogenous general-purpose workstations. This tool (called HERMES) mainly handles the message transport layer and does not deal with aspects more directly related to parallel computing, such as the topology of communicating processes or load balancing among processors. A coarse-grained concurrence is supported and the resulting parallel architecture is based on the multiple-instructions multiple-data paradigm. This system is mainly characterized by its ability of handling special messages (alarm and urgent) that are guaranteed immediate processing by the receiving process. Messages are always asynchronous, in that they are immediately delivered to destination, without any temporary buffering by the communication system (i.e. asynchronously with respect to the program execution). The system is still under development but a preliminary version, restricted to pro...
Parallel CG-Methods - Automatically Optimized For PC- And Workstation-Clusters
"... . We discuss parallel implementations of cg--methods for solving large scale systems in engineering applications. Furthermore we describe automatic optimizations for arbitrary clusters of workstations. We consider each cluster as a finite set of processor--memory pairs linked together with an interc ..."
Abstract
- Add to MetaCart
. We discuss parallel implementations of cg--methods for solving large scale systems in engineering applications. Furthermore we describe automatic optimizations for arbitrary clusters of workstations. We consider each cluster as a finite set of processor--memory pairs linked together with an interconnection network. They are modelled as a LogP machine, which is extended to use functions instead of constants for all important parameters. The optimizations guarantee runtimes within a small constant factor of the optimum. This bound is independent of the target machine. In practice, they give nearly optimal programs. Numerical experiments on a Parsytec PowerXplorer confirm the results obtained. Key words. parallel algorithms, compilers, iterative methods, sparse matrices, FD methods AMS subject classifications. 68Q22, 68N20, 65F10, 65F50, 65N06 1. Introduction. For many numerical problems, only parallel solutions are possible if the problem size is reasonably large, e.g. large scale s...

