Results 1 - 10
of
14
An Empirical Study of Reliable Multicast Protocols over Ethernet-Connected Networks
- In International Conference on Parallel Processing (ICPP'01
, 2001
"... Recent advances in multicasting over the Internet present new opportunities for improving communication performance in clusters of workstations. The standard IP multicast, however, only supports unreliable multicast, which is difficult to use for building high level message passing routines. Thus, r ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Recent advances in multicasting over the Internet present new opportunities for improving communication performance in clusters of workstations. The standard IP multicast, however, only supports unreliable multicast, which is difficult to use for building high level message passing routines. Thus, reliable multicast primitives must be implemented over the standard IP multicast to facilitate the use of multicast for high performance communication on clusters of workstations. Although many reliable multicast protocols have been proposed for the wide area Internet environment, the impact of architectural features of local area networks (LANs) on the reliable multicast protocols has not been thoroughly studied. Efficient reliable multicast protocols for LANs must exploit these features to achieve the best performance. In this paper, we study four types of reliable multicast protocols: the ACK--based protocols, the NAK-- based protocols with polling, the ring--based protocols, and the tree--based protocols. We evaluate the performance of the protocols over Ethernet--connected networks, study the impact of architectural features of the Ethernet on the performance of the protocols, and investigate the methods to exploit these features to achieve the best performance.
Choudhary “Dynamically Adapting File Domain Partitioning Methods for Collective I/O Based on Underlying Parallel File System Locking Protocols
- In Proceedings of the ACM/IEEE Conference on Supercomputing (SC
, 2008
"... Abstract—Collective I/O, such as that provided in MPI-IO, enables process collaboration among a group of processes for greater I/O parallelism. Its implementation involves file domain partitioning, and having the right partitioning is a key to achieving high-performance I/O. As modern parallel file ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract—Collective I/O, such as that provided in MPI-IO, enables process collaboration among a group of processes for greater I/O parallelism. Its implementation involves file domain partitioning, and having the right partitioning is a key to achieving high-performance I/O. As modern parallel file systems maintain data consistency by adopting a distributed file locking mechanism to avoid centralized lock management, different locking protocols can have significant impact to the degree of parallelism of a given file domain partitioning method. In this paper, we propose dynamic file partitioning methods that adapt according to the underlying locking protocols in the parallel file systems and evaluate the performance of four partitioning methods under two locking protocols. By running multiple I/O benchmarks, our experiments demonstrate that no single partitioning guarantees the best performance. Using MPI-IO as an implementation platform, we provide guidelines to select the most appropriate partitioning methods for various I/O patterns and file systems. I.
CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters
- In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP
, 2003
"... Compiled communication has recently been proposed to improve communication performance for clusters of workstations. The idea of compiled communication is to apply more aggressive optimizations to communications whose information is known at compile time. Existing MPI libraries do not support compil ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Compiled communication has recently been proposed to improve communication performance for clusters of workstations. The idea of compiled communication is to apply more aggressive optimizations to communications whose information is known at compile time. Existing MPI libraries do not support compiled communication. In this paper, we present an MPI prototype, CC--MPI, that supports compiled communication on Ethernet switched clusters. The unique feature of CC--MPI is that it allows the user to manage network resources such as multicast groups directly and to optimize communications based on the availability of the communication information. CC--MPI optimizes one--to--all, one--to-- many, all--to--all, and many--to--many collective communication routines using the compiled communication technique. We describe the techniques used in CC--MPI and report its performance. The results show that communication performance of Ethernet switched clusters can be significantly improved through compiled communication.
IMPI: Making MPI Interoperable
- Journal of Research of the National Institute of Standards and Technology
, 2000
"... The Message Passing Interface (MPI) is the de facto standard for writing parallel scientific applications in the message passing programming paradigm. Implementations of MPI were not designed to interoperate, thereby limiting the environments in which parallel jobs could be run. We briefly describe ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The Message Passing Interface (MPI) is the de facto standard for writing parallel scientific applications in the message passing programming paradigm. Implementations of MPI were not designed to interoperate, thereby limiting the environments in which parallel jobs could be run. We briefly describe a set of protocols, designed by a steering committee of current implementors of MPI, that enable two or more implementations of MPI to interoperate within a single application. Specifically, we introduce the set of protocols collectively called Interoperable MPI (IMPI). These protocols make use of novel techniques to handle difficult requirements such as maintaining interoperability among all IMPI implementations while also allowing for the independent evolution of the collective communication algorithms used in IMPI. Our contribution to this effort has been as a facilitator for meetings, editor of the IMPI Specification document, and as an early testbed for implementations of IMPI. This tes...
Performance evaluation of an open distributed platform for realistic traffic generation, Performance Evaluation: An
- International Journal 60 (1–4) (2005) 359–392 (Special Issue on Performance Modeling and Evaluation of High-Performance Parallel and Distributed Systems
"... Network researchers have dedicated a notable part of their efforts to the area of modeling traffic and to the implementation of efficient traffic generators. We feel that there is a strong demand for traffic generators capable to reproduce realistic traffic patterns according to theoretical models a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Network researchers have dedicated a notable part of their efforts to the area of modeling traffic and to the implementation of efficient traffic generators. We feel that there is a strong demand for traffic generators capable to reproduce realistic traffic patterns according to theoretical models and at the same time with high performance. This work presents an open distributed platform for traffic generation that we called distributed internet traffic generator (D-ITG), capable of producing traffic (network, transport and application layer) at packet level and of accurately replicating appropriate stochastic processes for both inter departure time (IDT) and packet size (PS) random variables. We implemented two different versions of our distributed generator. In the first one, a log server is in charge of recording the information transmitted by senders and receivers and these communications are based either on TCP or UDP. In the other one, senders and receivers make use of the MPI library. In this work a complete performance comparison among the centralized version and the two distributed versions of D-ITG is presented. © 2004 Elsevier B.V. All rights reserved.
A Grid Programming Primer
, 2001
"... A grid computing environment is inherently parallel, distributed, heterogeneous and dynamic, both in terms of the resources involved and their performance. Furthermore, grid applications will want to dynamically and flexibly compose resources and services across that dynamic environment. While it ma ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A grid computing environment is inherently parallel, distributed, heterogeneous and dynamic, both in terms of the resources involved and their performance. Furthermore, grid applications will want to dynamically and flexibly compose resources and services across that dynamic environment. While it may be possible to build grid applications using established programming tools, they are not particularly well-suited to effectively manage flexible composition or deal with heterogeneous hierarchies of machines, data and networks with heterogeneous performance. Hence, this paper investigates what properties and capabilities grid programming tools should possess to support not only efficient grid codes, but also their effective development. The required properties and capabilities are systematically considered and then current programming paradigms and tools are surveyed, examining their suitability for grid programming. Clearly no one tool will address all requirements in all situations. However, paradigms and tools that can incorporate and provide the widest possible support for grid programming will come to dominant. Across all identified grid programming issues, suggestions are made for focus areas in which further work is most likely to yield useful results.
S.: A Challenge towards NextGeneration Research Infrastructure for Advanced Life
- Science, New Generation Computing
, 2004
"... Recently, life scientists have expressed a strong need for computational power sufficient to complete their analyses within a realistic time as well as for a computational power capable of seamlessly retrieving biological data of interest from multiple and diverse bio-related databases for their res ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Recently, life scientists have expressed a strong need for computational power sufficient to complete their analyses within a realistic time as well as for a computational power capable of seamlessly retrieving biological data of interest from multiple and diverse bio-related databases for their research infrastructure. This need implies that life science strongly requires the benefits of advanced IT. In Japan, the Biogrid project has been promoted since 2002 toward the establishment of a next-generation research infrastructure for advanced life science. In this paper, the Biogrid strategy toward these ends is detailed along with the role and mission imposed on the Biogrid project. In addition, we present the current status of the development of the project as well as the future issues to be tackled. 2 H. NAKAMURA, et al.
A comprehensive study of reliable multicast protocols over ethernet-connected networks
, 2000
"... i ..."
Experiences Parallelizing, Configuring, Monitoring, and Visualizing Applications for Clusters and Multi-Clusters
"... To make it simpler to experiment with the impact different configurations can have on the performance of a parallel cluter application, we developed the PATHS system. The PATHS system use a “wrapper ” to provide a level of indirection to the actual run-time location of data making the data available ..."
Abstract
- Add to MetaCart
To make it simpler to experiment with the impact different configurations can have on the performance of a parallel cluter application, we developed the PATHS system. The PATHS system use a “wrapper ” to provide a level of indirection to the actual run-time location of data making the data available from wherever threads or processes are located. A wrapper specify where data is located, how to get there, and which protocols to use. Wrappers are also used to add or modify methods accessing data. Wrappers are specified dynamically. A “path ” is comprised of one or more wrappers. Sections of a path can be shared among two or more paths. By reconfiguring the LAM-MPI Allreduce operation we achieved a performance gain of 1.52, 1.79, and 1.98 on respectively two, four and eight-way clusters. We also measured the performance of the unmodified Allreduce operation when using two clusters interconnected by a WAN link with 30-50ms roundtrip latency. Configurations which resulted in multiple messages being sent across the WAN did not add any significant performance penalty to the unmodified Allreduce operation for packet sizes up to 4KB. For larger packet sizes the Allreduce operation rapidly detoriated performancewise. To log and visualize the performance data we developed EventSpace, a configurable data collecting, management and observation system used for monitoring low-level synchronization and communication behavior of parallel applications on clusters and multi-clusters. Event collectors detect events, create virtual events by

