Results 1 -
6 of
6
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
- IEEE Transactions on Parallel and Distributed Systems
, 1995
"... Abstract-A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the a ..."
Abstract
-
Cited by 65 (7 self)
- Add to MetaCart
Abstract-A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel amputer products by IBM, has been designed. CCL is pact of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model. Index Terms- Collective communication algorithms, collective communication semantics, message-passing parallel systems, portable library, process group, tunable algorithms. I.
Optimal Multiple Message Broadcasting in Telephone-Like Communication Systems
, 1996
"... We consider the problem of broadcasting multiple messages from one processor to many processors in telephone-like communication systems. In such systems, processors communicate in rounds, where in every round, each processor can communicate with exactly one other processor by exchanging messages wit ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We consider the problem of broadcasting multiple messages from one processor to many processors in telephone-like communication systems. In such systems, processors communicate in rounds, where in every round, each processor can communicate with exactly one other processor by exchanging messages with it. Finding an optimal solution for this problem was open for over a decade. In this paper, we present an optimal algorithm for this problem when the number of processors is even. For an odd number of processors, we provide an algorithm which is within an additive term of 3 of the optimum. A by-product of our solution is an algorithm for the problem of broadcasting multiple messages for any number of processors in the simultaneous send/receive model. In this latter model, in every round, each processor can send a message to one processor and receive a message from another processor. Index Terms: broadcasting, communication networks, multiple messages, distributed parallel computers, simult...
Minimizing Broadcast Costs under Edge Reductions in Tree Networks
- In 7th International Symposium on Spatial and Temporal Databases (SSTD 2001
, 1998
"... We study the broadcasting of messages in tree networks under edge reductions. When an edge is reduced, its cost becomes zero. Edge reductions model the decrease or elimination of broadcasting costs between adjacent nodes in the network. Let T be an n-vertex tree and B be a target broadcast cost. We ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We study the broadcasting of messages in tree networks under edge reductions. When an edge is reduced, its cost becomes zero. Edge reductions model the decrease or elimination of broadcasting costs between adjacent nodes in the network. Let T be an n-vertex tree and B be a target broadcast cost. We present an O(n) time algorithm for determining the minimum number of edges of T to reduce so that a broadcast cost of B can be achieved. We present an O(n log n) time algorithm to determine the minimum number of edges to reduce so that a broadcast initiated at an arbitrary vertex of T costs at most B. Characterizations of where edge reductions are placed underly both algorithms and imply that reduced edges can be centrally located. Keywords: Analysis of algorithms; message broadcasting; blocking communication model; tree networks. Research supported in part by DARPA under contract DABT63-92-C-0022ONR. The views and conclusions contained in this paper are those of the authors and should not...
Data Parallel Programming: A Survey and a Proposal for a New Model
, 1993
"... We give a brief description of what we consider to be data parallel programming and processing, trying to pinpoint the typical problems and pitfalls that occur. We then proceed with a short annotated history of data parallel programming, and sketch a taxonomy in which data parallel languages can be ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We give a brief description of what we consider to be data parallel programming and processing, trying to pinpoint the typical problems and pitfalls that occur. We then proceed with a short annotated history of data parallel programming, and sketch a taxonomy in which data parallel languages can be classified. Finally we present our own model of data parallel programming, which is based on the view of parallel data collections as functions. We believe that this model has a number of distinct advantages, such as being abstract, independent of implicitly assumed machine models, and general.
Distance Distribution of Nodes in Star Graphs
, 2005
"... The purpose of the paper is to provide an answer to a long standing problem to compute the distance distribution among the nodes in a star graph, i.e., to compute the exact number of nodes at a distance k from the identity node in a star graph where k varies from 0 to the diameter of the graph. A st ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The purpose of the paper is to provide an answer to a long standing problem to compute the distance distribution among the nodes in a star graph, i.e., to compute the exact number of nodes at a distance k from the identity node in a star graph where k varies from 0 to the diameter of the graph. A star graph is a Cayley graph like the hypercubes; for a hypercube Qn, thereareexactly ` ´ n nodes at a distance r from the identity r node where r varies from 0 to n.
Interconnection Networks for Parallel Processing: Graphs, Models and Algorithms
, 1995
"... Introduction Recent years - the 1980's and early 1990's - have seen a considerable growth in interest in parallel processing, both from a theoretical and from a practical point of view. Many parallel machines have been constructed and tested. It can be said with only a slight simplification that th ..."
Abstract
- Add to MetaCart
Introduction Recent years - the 1980's and early 1990's - have seen a considerable growth in interest in parallel processing, both from a theoretical and from a practical point of view. Many parallel machines have been constructed and tested. It can be said with only a slight simplification that the main difference between a sequential and a parallel computer is communication. Large codes running on high-performance supercomputers are mainly memory limited. Distributed memory of massively parallel computers offers much larger space and thus much bigger challenging problems could be processed. However this implies a lot of data movement along the interconnection network between modules, and thus communication is the key to the efficiency of algorithms. Communication performance could basically be improved by three factors : ffl the network topology (from both graph theory and VLSI technology point of view) ffl commutation mode (message switching

