Results 1 
8 of
8
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
 IEEE Transactions on Parallel and Distributed Systems
, 1995
"... AbstractA collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the a ..."
Abstract

Cited by 65 (7 self)
 Add to MetaCart
AbstractA collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel amputer products by IBM, has been designed. CCL is pact of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic pointtopoint communication model. Index Terms Collective communication algorithms, collective communication semantics, messagepassing parallel systems, portable library, process group, tunable algorithms. I.
Optimal Multiple Message Broadcasting in TelephoneLike Communication Systems
, 1996
"... We consider the problem of broadcasting multiple messages from one processor to many processors in telephonelike communication systems. In such systems, processors communicate in rounds, where in every round, each processor can communicate with exactly one other processor by exchanging messages wit ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
We consider the problem of broadcasting multiple messages from one processor to many processors in telephonelike communication systems. In such systems, processors communicate in rounds, where in every round, each processor can communicate with exactly one other processor by exchanging messages with it. Finding an optimal solution for this problem was open for over a decade. In this paper, we present an optimal algorithm for this problem when the number of processors is even. For an odd number of processors, we provide an algorithm which is within an additive term of 3 of the optimum. A byproduct of our solution is an algorithm for the problem of broadcasting multiple messages for any number of processors in the simultaneous send/receive model. In this latter model, in every round, each processor can send a message to one processor and receive a message from another processor. Index Terms: broadcasting, communication networks, multiple messages, distributed parallel computers, simult...
Minimizing Broadcast Costs under Edge Reductions in Tree Networks
 In 7th International Symposium on Spatial and Temporal Databases (SSTD 2001
, 1998
"... We study the broadcasting of messages in tree networks under edge reductions. When an edge is reduced, its cost becomes zero. Edge reductions model the decrease or elimination of broadcasting costs between adjacent nodes in the network. Let T be an nvertex tree and B be a target broadcast cost. We ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
We study the broadcasting of messages in tree networks under edge reductions. When an edge is reduced, its cost becomes zero. Edge reductions model the decrease or elimination of broadcasting costs between adjacent nodes in the network. Let T be an nvertex tree and B be a target broadcast cost. We present an O(n) time algorithm for determining the minimum number of edges of T to reduce so that a broadcast cost of B can be achieved. We present an O(n log n) time algorithm to determine the minimum number of edges to reduce so that a broadcast initiated at an arbitrary vertex of T costs at most B. Characterizations of where edge reductions are placed underly both algorithms and imply that reduced edges can be centrally located. Keywords: Analysis of algorithms; message broadcasting; blocking communication model; tree networks. Research supported in part by DARPA under contract DABT6392C0022ONR. The views and conclusions contained in this paper are those of the authors and should not...
Data Parallel Programming: A Survey and a Proposal for a New Model
, 1993
"... We give a brief description of what we consider to be data parallel programming and processing, trying to pinpoint the typical problems and pitfalls that occur. We then proceed with a short annotated history of data parallel programming, and sketch a taxonomy in which data parallel languages can be ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We give a brief description of what we consider to be data parallel programming and processing, trying to pinpoint the typical problems and pitfalls that occur. We then proceed with a short annotated history of data parallel programming, and sketch a taxonomy in which data parallel languages can be classified. Finally we present our own model of data parallel programming, which is based on the view of parallel data collections as functions. We believe that this model has a number of distinct advantages, such as being abstract, independent of implicitly assumed machine models, and general.
Distance Distribution of Nodes in Star Graphs
, 2005
"... The purpose of the paper is to provide an answer to a long standing problem to compute the distance distribution among the nodes in a star graph, i.e., to compute the exact number of nodes at a distance k from the identity node in a star graph where k varies from 0 to the diameter of the graph. A st ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The purpose of the paper is to provide an answer to a long standing problem to compute the distance distribution among the nodes in a star graph, i.e., to compute the exact number of nodes at a distance k from the identity node in a star graph where k varies from 0 to the diameter of the graph. A star graph is a Cayley graph like the hypercubes; for a hypercube Qn, thereareexactly ` ´ n nodes at a distance r from the identity r node where r varies from 0 to n.
On uncoordinated file distribution with nonaltruistic downloaders
, 2007
"... Abstract. We consider a BitTorrentlike file sharing system, where the peers interested in downloading a large file join an overlay network. The seed node possessing the file stays in the system, whereas all other peers are nonaltruistic in the sense that they leave the system as soon as they have ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract. We consider a BitTorrentlike file sharing system, where the peers interested in downloading a large file join an overlay network. The seed node possessing the file stays in the system, whereas all other peers are nonaltruistic in the sense that they leave the system as soon as they have downloaded the whole file. We consider a flash crowd scenario, where the peers join the overlay simultaneously. We show that the chunk selection algorithm is critical, propose an analytic approach to the process, and find that the encounters can be restricted to neighbours in a Chord overlay without losing much in performance. 1
Interconnection Networks for Parallel Processing: Graphs, Models and Algorithms
, 1995
"... Introduction Recent years  the 1980's and early 1990's  have seen a considerable growth in interest in parallel processing, both from a theoretical and from a practical point of view. Many parallel machines have been constructed and tested. It can be said with only a slight simplification that th ..."
Abstract
 Add to MetaCart
Introduction Recent years  the 1980's and early 1990's  have seen a considerable growth in interest in parallel processing, both from a theoretical and from a practical point of view. Many parallel machines have been constructed and tested. It can be said with only a slight simplification that the main difference between a sequential and a parallel computer is communication. Large codes running on highperformance supercomputers are mainly memory limited. Distributed memory of massively parallel computers offers much larger space and thus much bigger challenging problems could be processed. However this implies a lot of data movement along the interconnection network between modules, and thus communication is the key to the efficiency of algorithms. Communication performance could basically be improved by three factors : ffl the network topology (from both graph theory and VLSI technology point of view) ffl commutation mode (message switching
DISCRETE APPLIED
, 1991
"... This paper is a survey of existing methods of communication in usual networks. We particularly study the complete network, the ring, the torus, the grid, the hypercube, the cube connected cycles, the undirected de Bruijn graph, the star graph, the shuffleexchange graph, and the butterfly graph. Two ..."
Abstract
 Add to MetaCart
This paper is a survey of existing methods of communication in usual networks. We particularly study the complete network, the ring, the torus, the grid, the hypercube, the cube connected cycles, the undirected de Bruijn graph, the star graph, the shuffleexchange graph, and the butterfly graph. Two different models of communication time are analysed, namely the constant model and the linear model. Other constraints like fullduplex or halfduplex links, processorbound, DMAbound or linkbound possibilities are separately studied. For each case we give references, upper bound (algorithms) and lower bounds. We have also proposed improvements or new results when possible. Hopefully, optimal results are not always known and we present a list of open problems. 1.