Results 1 - 10
of
11
Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract
-
Cited by 41 (11 self)
- Add to MetaCart
There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the shared-memory abstraction as an easyto-use platform, the bandwidth limitations of current machines have diverted much attention to message-passing and distributed-memory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a shared-memory model can serve as an effective bridging model for parallel computation. In particular, can a shared-memory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing Shared-Memory (QSM) model, which accounts for limited communication bandwidth while still providing a simple shared-memory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple work-preserving emulation of the QSM on both the BSP, and on a related model, the (d, x)-BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
Optimal Software Multicast in Wormhole-Routed Multistage Networks
- In Proceedings of the Supercomputing Conference
, 1997
"... Multistage interconnection networks are a popular class of interconnection architecture for constructing scalable parallel computers (SPCs). The focus of this paper is on wormhole routed multistage networks supporting turnaround routing. Existing machines characterized by such a system model include ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Multistage interconnection networks are a popular class of interconnection architecture for constructing scalable parallel computers (SPCs). The focus of this paper is on wormhole routed multistage networks supporting turnaround routing. Existing machines characterized by such a system model include the IBM SP-1, TMC CM-5, and Meiko CS-2. Efficient collective communication among processor nodes is critical to the performance of SPCs. A system-level multicast service, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is fundamental in supporting collective communication primitives including the application-level broadcast, reduction, and barrier synchronization. This paper addresses how to efficiently implement multicast services in wormholerouted multistage networks, in the absence of hardware multicast support, by exploiting the properties of the switching technology. An optimal multicast algorithm is proposed. The results of imple...
Modeling parallel bandwidth: Local vs. global restrictions
"... Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., bsp and logp) account for bandwidth limitations using a per-processor parameter g> 1, such that eachpro cessor can send/receive at ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., bsp and logp) account for bandwidth limitations using a per-processor parameter g> 1, such that eachpro cessor can send/receive at most h messages in g h time. Other models (e.g., pram(m)) account for bandwidth limitations as an aggregate parameter m<p, such thatthe p processors can send at most m messages in total at each step. This paper provides the rst detailed study of the algorithmic implications of modeling parallel bandwidth as a per-processor (local) limitation versus an aggregate (global) limitation. We consider a number of basic problems
An Extended Dominating Node Approach to Broadcast and Global Combine in Multi-Port Wormhole-Routed Mesh Networks
- IEEE Transactions on Parallel and Distributed Systems
, 1996
"... A new approach to the design of collective communication operations in wormholerouted mesh networks is described. The approach extends the concept of dominating sets in graph theory by accounting for the relative distance-insensitivity of the wormhole switching strategy and by taking advantage of a ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
A new approach to the design of collective communication operations in wormholerouted mesh networks is described. The approach extends the concept of dominating sets in graph theory by accounting for the relative distance-insensitivity of the wormhole switching strategy and by taking advantage of a multi-port communication architecture, which allows each node to simultaneously transmit messages on different outgoing channels. Collective communication operations are defined in terms of sets of extended dominating nodes (EDNs). The nodes in a set of EDNs can deliver (receive) messages to (from) a different, larger set of nodes in a single message-passing step under dimension-ordered wormhole routing and without channel contention among messages. The EDN model can be applied to different collective operations in 2D and 3D mesh networks. In this paper, we focus on EDN-based broadcast and global combine operations. Performance evaluation results are presented that confirm the advantage of t...
Optimal Point-to-Point Broadcast Algorithms via Lopsided Trees
- In Proc. 5th Israeli Symp. on Theory of Computing and Systems
, 1997
"... We consider the broadcasting operation in point-to-point packet-switched parallel and distributed networks of processors. We develop a general technique for the design of optimal broadcast algorithms on a wide range of such systems. Our technique makes it easier to design such algorithms and, furthe ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We consider the broadcasting operation in point-to-point packet-switched parallel and distributed networks of processors. We develop a general technique for the design of optimal broadcast algorithms on a wide range of such systems. Our technique makes it easier to design such algorithms and, furthermore, provides tools that can be used to derive precise analyses of their running times. As direct applications of this method we give an exact analysis of a known algorithm for the postal model, and design and analyze an optimal broadcast algorithm for the multi port multi media model. We then show how our method can be applied to networks with different underlying topologies, by designing and giving an exact analysis of an optimal broadcast algorithm for the optical ring. 1 Introduction Communication subsystems of parallel and distributed systems and high-speed networks are commonly modeled as message-passing systems, in which any processor can submit to the network a point-to-point me...
WHAT GOOD ARE SHARED-MEMORY MODELS?
- INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING
, 1996
"... Shared memory models have been criticized for years for failing to model essential realities of parallel machines. Given the current wave of popular message-passing and distributed memory models (e.g., BSP, LOGP), it is natural to ask whether shared memory models have outlived any usefulness they ma ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Shared memory models have been criticized for years for failing to model essential realities of parallel machines. Given the current wave of popular message-passing and distributed memory models (e.g., BSP, LOGP), it is natural to ask whether shared memory models have outlived any usefulness they may have had. In this invited position papel; we discuss the continuing importance of shared memory models in the design and analysis of par-allel algorithms. We describe a new model, the Queuing Shared Memory (QSM) model, that accounts for limited communication bandwidth while still providing a shared memory abstraction, and provide evidence of its practicality. Finally, we discuss important areas for future models research. We argue that the compelling need for parallel computing in large scale data analysis (e.g., decision support, data mining) implies that the most important modeling issue going forward concerns how best to model disk I/O.
Optimal parallel prefix on the postal model
- J. Information Science and Engineering
, 2003
"... This paper explores the prefix operation on a message-passing fully connected multicomputer with multiport postal communication. We present an exact communication lower bound for the prefix operation on the model. Two efficient parallel prefix algorithms are also presented; they are optimal in terms ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper explores the prefix operation on a message-passing fully connected multicomputer with multiport postal communication. We present an exact communication lower bound for the prefix operation on the model. Two efficient parallel prefix algorithms are also presented; they are optimal in terms of the number of communication steps. For an input of size n, one of the algorithms using n processors is also time-optimal; the other algorithm using p < n processors can be cost-optimal and can achieve linear speedup.
Communication Complexity of Fault-Tolerant Information Diffusion
- in: Proceedings of Fifth IEEE Symposium on Parallel and Distributed Processing
, 1993
"... This paper considers problems of fault--tolerant information diffusion in a network with cost function. We show that the problem of determining the minimum cost necessary to perform fault-- tolerant gossiping among a given set of participants is NP-hard and give approximate (with respect to the cost ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper considers problems of fault--tolerant information diffusion in a network with cost function. We show that the problem of determining the minimum cost necessary to perform fault-- tolerant gossiping among a given set of participants is NP-hard and give approximate (with respect to the cost) fault-tolerant gossiping algorithms. We also analyze the communication time and communication complexity of fault-tolerant gossiping algorithms. Finally, we give an optimal cost fault tolerant broadcasting algorithm and apply our results to the atomic commitment problem. Key Words: Communication Networks, Gossiping, Atomic Commitment, Fault--Tolerance. Research partially supported by the Italian Ministry of University and of Scientific Research in the framework of the "Algoritmi, Modelli di Calcolo e Strutture Informative" project. 1 Introduction In this paper we study the problems of fault--tolerant broadcasting, gossiping, and atomic commitment in a weighted network. Gossiping in ...
Broadcasting on a Budget in the Multi-Service Communication Model
- in ‘‘Proceedings of the Fifth International Conference on High Performance Computing
, 1998
"... In this paper we introduce the multi_service model of network communication. This model attempts to capture recent communication technology trends, such as aspects of quality-of-service and their relation to the emerging technology of automatic pricing, e.g. for Internet services. The multi_service ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we introduce the multi_service model of network communication. This model attempts to capture recent communication technology trends, such as aspects of quality-of-service and their relation to the emerging technology of automatic pricing, e.g. for Internet services. The multi_service model differs from related models by taking communication and the activation time of a service provider into account, thus restricting parallelism to better fit reality. Hence, our model extends and refines previous successful models for network communication. We consider the application of this model to communication problems, where the service providers are point-to-point message passing agents each with its characteristic pricing policies and speed. We give some insights and an algorithm for optimal dissemination of information in this model when given a fixed, limited budget.
Fast Collective Communication by Packets in the Postal Model
, 1995
"... Collective communication operations play an important role in message-passing systems and have been extensively investigated. We study two widely used collective communication operations: gossiping and all-to-all personalized communication. We assume the multi-port postal model of communication that ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Collective communication operations play an important role in message-passing systems and have been extensively investigated. We study two widely used collective communication operations: gossiping and all-to-all personalized communication. We assume the multi-port postal model of communication that seems particularly suited for developing fast and portable algorithms on current technology parallel computers. Unlike most of the previous work on the subject, we assume that processors communicate by sending messages of limited size. Indeed, when the maximum size of a message is fixed, the number of rounds required by a communication algorithm gives a realistic measure of the performance of the algorithm. We provide an optimal algorithm for the gossiping operation and an almost-optimal algorithm for the all-to-all personalized communication operation. Work partially supported by the Italian Ministry of the University and Scientific Research, Project: Algoritmi, Modelli di Calcolo e Stru...

