• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Bandwidth-efficient Collective Communication for Clustered Wide Area Systems (1999)

by T Kielmann, H E Bal, S Gorlatch
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 13
Next 10 →

A Component Architecture for LAM/MPI

by Jeffrey M. Squyres, Andrew Lumsdaine - In Proceedings, 10th European PVM/MPI Users’ Group Meeting, number 2840 in Lecture Notes in Computer Science , 2003
"... Abstract. To better manage the ever increasing complexity of LAM/MPI, we have created a lightweight component architecture for it that is specifically designed for high-performance message passing. This paper describes the basic design of the component architecture, as well as some of the particular ..."
Abstract - Cited by 63 (11 self) - Add to MetaCart
Abstract. To better manage the ever increasing complexity of LAM/MPI, we have created a lightweight component architecture for it that is specifically designed for high-performance message passing. This paper describes the basic design of the component architecture, as well as some of the particular component instances that constitute the latest release of LAM/MPI. Performance comparisons against the previous, monolithic, version of LAM/MPI show no performance impact due to the new architecture—in fact, the newest version is slightly faster. The modular and extensible nature of this implementation is intended to make it significantly easier to add new functionality and to conduct new research using LAM/MPI as a development platform. 1

Send-receive considered harmful: Myths and realities of message passing

by Sergei Gorlatch, Universität Münster - ACM Transactions on Programming Languages and Systems
"... During the software crisis of the 1960s, Dijkstra’s famous thesis “goto considered harmful ” paved the way for structured programming. This short communication suggests that many current difficulties of parallel programming based on message passing are caused by poorly structured communication, whic ..."
Abstract - Cited by 28 (1 self) - Add to MetaCart
During the software crisis of the 1960s, Dijkstra’s famous thesis “goto considered harmful ” paved the way for structured programming. This short communication suggests that many current difficulties of parallel programming based on message passing are caused by poorly structured communication, which is a consequence of using low-level send-receive primitives. We argue that, like goto in sequential programs, send-receive should be avoided as far as possible and replaced by collective operations in the setting of message passing. We dispute some widely held opinions about the apparent superiority of pairwise communication over collective communication and present substantial theoretical and empirical evidence to the contrary in the context of MPI (Message Passing Interface).

The Component Architecture of Open MPI: Enabling Third-Party Collective Algorithms

by Jeffrey M. Squyres, Andrew Lumsdaine - In Proceedings, 18th ACM International Conference on Supercomputing, Workshop on Component Models and Systems for Grid Applications , 2004
"... Abstract As large-scale clusters become more distributed and heterogeneous, significant research interest has emerged in optimizing MPI collective operations because of the performance gains that can be realized. However, researchers wishing to develop new algorithms for MPI collective operations ar ..."
Abstract - Cited by 22 (9 self) - Add to MetaCart
Abstract As large-scale clusters become more distributed and heterogeneous, significant research interest has emerged in optimizing MPI collective operations because of the performance gains that can be realized. However, researchers wishing to develop new algorithms for MPI collective operations are typically faced with significant design, implementation, and logistical challenges. To address a number of needs in the MPI research community, Open MPI has been developed, a new MPI-2 implementation centered around a lightweight component architecture that provides a set of component frameworks for realizing collective algorithms, point-to-point communication, and other aspects of MPI implementations. In this paper, we focus on the collective algorithm component framework. The “coll” framework provides tools for researchers to easily design, implement, and experiment with new collective algorithms in the context of a production-quality MPI. Performance results with basic collective operations demonstrate that the component architecture of Open MPI does not introduce any performance penalty.

The Albatross Project: Parallel Application Support for Computational Grids

by Thilo Kielmann, Henri E. Bal, Jason Maassen, Rob van Nieuwpoort, Ronald Veldema, Rutger Hofman, Ceriel Jacobs, Kees Verstoep - In Proceedingof the 1st European GRID Forum Workshop , 2000
"... The aim of the Albatross project is to study applications and programming environments for computational grids consisting of multiple clusters that are connected by wide-area networks. Parallel processing on such systems is useful but challenging, given the large differences in latency and bandwi ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
The aim of the Albatross project is to study applications and programming environments for computational grids consisting of multiple clusters that are connected by wide-area networks. Parallel processing on such systems is useful but challenging, given the large differences in latency and bandwidth between LANs and WANs. We provide efficient algorithms and programming environments that exploit the hierarchical structure of wide-area clusters to minimize communication over the WANs. In addition, we use highly efficient local-area communication protocols. We illustrate this approach using the Manta high-performance Java system and the MagPIe MPI library, both of which are implemented on a collection of four Myrinet-based clusters connected by wide-area ATM networks. Our sample applications obtain high speedups on this wide-area system. 1 Introduction As computational grids become more widely available, it becomes feasible to run parallel applications on multiple clusters at d...

Collective operations for wide-area message passing systems using adaptive spanning trees

by Hideo Saito, Kenjiro Taura - In 6th IEEE/ACM International Workshop on Grid Computing , 2005
"... Abstract — We propose a method for wide-area message passing systems to perform collective operations using dynamically created spanning trees. In our proposal, broadcasts and reductions are performed efficiently using topology-aware spanning trees constructed at run-time; processors autonomously me ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract — We propose a method for wide-area message passing systems to perform collective operations using dynamically created spanning trees. In our proposal, broadcasts and reductions are performed efficiently using topology-aware spanning trees constructed at run-time; processors autonomously measure latency and bandwidth to create latency-aware trees for short messages and bandwidth-aware trees for long messages. Our spanning trees adapt to topology changes due to the joining or leaving of processors; when processors join or leave a computation, processors repair the spanning trees so that effective execution of collective operations can continue. With 128 to 201 processors distributed over 3 to 4 clusters, the latency of our broadcast was within a factor of 2 of a static topology-aware implementation, and our broadcast achieved 82 percent of the bandwidth of a static topology-aware implementation. Moreover, when some processors joined or left a computation, our broadcast temporarily performed poorly for about 8 seconds while the spanning trees adapted to the new topology, but completed successfully even during this time. I.

TACO - Exploiting Cluster Networks for High-Level Collective Operations

by Jörg Nolte, Mitsuhisa Sato, Yutaka Ishikawa, Tsukuba Mitsui Bldg F
"... TACO (Topologies and Collections) is a template library that introduces the flavour of distributed data parallel processing by means of reusable topology classes and C++ templates. This paper introduces TACO's basic abstractions and provides a performance analysis for basic collective operations on ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
TACO (Topologies and Collections) is a template library that introduces the flavour of distributed data parallel processing by means of reusable topology classes and C++ templates. This paper introduces TACO's basic abstractions and provides a performance analysis for basic collective operations on various cluster architectures with several different networks. 1 Introduction Collective operations on distributed object groups are a powerful means to coordinate parallel computations and control distributed or replicated resources. TACO (Topologies and Collections) is an extension of the Multiple Threads Template Library (MTTL) [8], a very efficient communication and threading library for cluster architectures. TACO supports high-level parallel programming with flexible distributed object groups and collective operations, that are generically implemented by means of reusable topology classes and C++ function templates. Topology classes are used to describe the group relationship betwe...

Topology-Based Hypercube Structures for Global Communication

by Silvia M. Figueira, Vijay Janapa Reddi - in Heterogeneous Networks,” Journal of Parallel and Distributed Computing. (in review
"... Hypercube structures are heavily used by parallel algorithms that require all-to-all communication. When communicating over a heterogeneous network, which may be the case in NOWs or GRID environments, the performance obtained by the hypercube structure will depend on the matching of the hypercube st ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Hypercube structures are heavily used by parallel algorithms that require all-to-all communication. When communicating over a heterogeneous network, which may be the case in NOWs or GRID environments, the performance obtained by the hypercube structure will depend on the matching of the hypercube structure to the topology of the underlying network. In this paper, we present strategies to build topology-based hypercubes structures. These strategies do not assume any kind of topology, and take into account the communication cost between pair of nodes to provide a performance-efficient hypercube structure. These enhanced hypercube structures help improve the performance of parallel applications that require all-toall communication in heterogeneous networks. 1

Improving MPI Multicast Performance over Grid Environment using Intelligent Message Scheduling

by Theewara Vorakosit, Putchong Uthayopas - Proceeding of International Conference on Scientific and Engineering Computation , 2004
"... Abstract: The multicast operation used by MPI under Grid environment can have a substantial impact on performance of parallel applications. Since the finding of an optimal multicast operation is an NP-hard problem, a near-optimal heuristic is crucial for building an efficient MPI runtime. This paper ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract: The multicast operation used by MPI under Grid environment can have a substantial impact on performance of parallel applications. Since the finding of an optimal multicast operation is an NP-hard problem, a near-optimal heuristic is crucial for building an efficient MPI runtime. This paper presents a new multicast algorithm called Longest Parallel Branch First (LPBF). This algorithm enhances the performance of multicast operation by exploiting the knowledge of the Grid two-level topology, cluster size, and inter/intracluster link bandwidths. This is done by reducing the intercluster multicast traffic and maximizing the opportunity for communication overlapping using an intelligent message scheduling. The experiments show that the LPBF algorithm offers a substantial improvement over previously proposed algorithms, such as ECEF (Earliest Completing Edge First) and Binomial Tree Algorithm. Thus, LPBF is a simple, fast, and offer a potential to improve MPI runtime performance on the Grid. 1.

Efficient High Performance Collective Communication for the Cell Blade ∗

by Qasim Ali, Samuel P. Midkiff, Vijay S. Pai
"... This paper presents high-performance collective communication algorithms and implementations that exploit the unique architectural features of the Cell heterogeneous multicore processor. This paper specifically describes novel algorithms for the barrier, broadcast, reduce, all-reduce, and all-gather ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
This paper presents high-performance collective communication algorithms and implementations that exploit the unique architectural features of the Cell heterogeneous multicore processor. This paper specifically describes novel algorithms for the barrier, broadcast, reduce, all-reduce, and all-gather collective operations, and shows the efficiency of these by comparing them to the previous fastest known implementations of these operations targeting the Cell. The new implementations are faster than the published stateof-the-art, achieving up to 19.21 times the performance (95 % reduction in latency) of the previous published collective communication work for the Cell [19, 25]. The results presented show performance both within a chip and across the two Cell chips on a Cell blade [10].

Discovery and Application of Network Information

by Bruce Lowekamp, Peter Steenkiste , 2000
"... USAF, under agreement number F30602-96-1-0287. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as neces ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
USAF, under agreement number F30602-96-1-0287. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Advanced Research
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University