Results 1 - 10
of
38
MagPIe: MPI’s Collective Communication Operations for Clustered Wide Area Systems
- Proc PPoPP'99
, 1999
"... Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective operations, such as broadcast and reduce. We have d ..."
Abstract
-
Cited by 138 (26 self)
- Add to MetaCart
Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective operations, such as broadcast and reduce. We have developed MAGPIE, a library of collective communication operations optimized for wide area systems. MAGPIE's algorithms send the minimal amount of data over the slow wide area links, and only incur a single wide area latency. Using our system, existing MPI applications can be run unmodified on geographically distributed systems. On moderate cluster sizes, using a wide area latency of 10 milliseconds and a bandwidth of 1 MByte/s, MAGPIE executes operations up to 10 times faster than MPICH, a widely used MPI implementation; application kernels improve by up to a factor of 4. Due to the structure of our algorithms, MAGPIE's advantage increases for higher wide area latencies.
The AppLeS Project: A Status Report
, 1997
"... Fast networks have made it possible to aggregate distributed CPU, memory, storage, and data to provide the potential for application performance superior to that attainable on any single system. However, achieving such performance on these metacomputing systems has proved to be difficult. Experience ..."
Abstract
-
Cited by 114 (9 self)
- Add to MetaCart
Fast networks have made it possible to aggregate distributed CPU, memory, storage, and data to provide the potential for application performance superior to that attainable on any single system. However, achieving such performance on these metacomputing systems has proved to be difficult. Experience with the I-WAY [DFP + ss] and other metacomputing platforms demonstrates that effective application scheduling is critical to the achievement of performance for metacomputing applications. Currently, application developers develop customized application schedules to achieve performance on a metacomputer. Such application-centric schedules promote the performance of the application by evaluating system performance in terms of application resource requirements. To formalize and generalize the, as yet, ad hoc notion of application-centric scheduling emerging from the practices of metacomputing application developers [EMRP, SAR, GWP93], we are developing metacomputing scheduling agents calle...
Scheduling From the Perspective of the Application
, 1996
"... Metacomputing is the aggregation of distributed and high-performance resources on coordinated networks. With careful scheduling, resource-intensive applications can be implemented efficiently on metacomputing systems at the sizes of interest to developers and users. In this paper, we focus on the pr ..."
Abstract
-
Cited by 86 (13 self)
- Add to MetaCart
Metacomputing is the aggregation of distributed and high-performance resources on coordinated networks. With careful scheduling, resource-intensive applications can be implemented efficiently on metacomputing systems at the sizes of interest to developers and users. In this paper, we focus on the problem of scheduling applications on metacomputing systems. We introduce the concept of application-centric scheduling in which everything about the system is evaluated in terms of its impact on the application. Application-centric scheduling is used by virtually all metacomputer programmers to achieve performance on metacomputing systems. We describe two successful metacomputing applications to illustrate this approach, and describe AppLeS scheduling agents which generalize the application-centric scheduling approach. Finally, we show preliminary results which compare AppLeS-derived schedules with conventional strip and blocked schedules for a two-dimensional Jacobi code. 1 Introduction Inc...
Efficient Collective Communication in Distributed Heterogeneous Systems
- Journal of Parallel and Distributed Computing
, 1999
"... The Information Power Grid (IPG) is emerging as an infrastructure that will enable distributed applications – such as video conferencing and distributed interactive simulation – to seamlessly integrate collections of heterogeneous workstations, multiprocessors, and mobile nodes, over heterogeneous w ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
The Information Power Grid (IPG) is emerging as an infrastructure that will enable distributed applications – such as video conferencing and distributed interactive simulation – to seamlessly integrate collections of heterogeneous workstations, multiprocessors, and mobile nodes, over heterogeneous wide-area networks. This paper introduces a framework for developing efficient collective communication schedules in such systems. Our framework consists of analytical models of the heterogeneous system, scheduling algorithms for the collective communication pattern, and performance evaluation mechanisms. We show that previous models, which considered node heterogeneity but ignored network heterogeneity, can lead to solutions which are worse than the optimal by an unbounded factor. We then introduce an enhanced communication model, and develop three heuristic algorithms for the broadcast and multicast patterns. The completion time of the schedule is chosen as the performance metric. The heuristic algorithms are FEF (Fastest Edge First), ECEF (Earliest Completing Edge First), and ECEF with look-ahead. For small system sizes, we find the optimal solution using exhaustive search. Our simulationexperiments indicate that the performance of our heuristic algorithms is close to optimal. For performance evaluation of larger systems, we have also developed a simple lower bound on the completion time. Our heuristic algorithms achieve significant performance improvements over previous approaches. 1.
Efficient collective communication on heterogeneous networks of workstations
- In International Conference on Parallel Processing
, 1998
"... banikaze,moorthy,panda¢ ..."
Wide-Area Implementation of the Message Passing Interface
- PARALLEL COMPUTING
, 1998
"... The Message Passing Interface (MPI) can be used as a portable, high-performance programming model for wide-area computing systems. The wide-area environmentintroduces challenging problems for the MPI implementor, due to the heterogeneity of both the underlying physical infrastructure and the softwar ..."
Abstract
-
Cited by 43 (10 self)
- Add to MetaCart
The Message Passing Interface (MPI) can be used as a portable, high-performance programming model for wide-area computing systems. The wide-area environmentintroduces challenging problems for the MPI implementor, due to the heterogeneity of both the underlying physical infrastructure and the software environment at different sites. In this article, we describe an MPI implementation that incorporates solutions to these problems. This implementation has been constructed by extending the Argonne MPICH implementation of MPI to use communication services provided by the Nexus communication library and authentication, resource allocation, process creation/management, and information services provided by the I-Soft system (initially) and the Globus metacomputing toolkit (work in progress). Nexus provides multimethod communication mechanisms that allowmultiple communication methods to be used in a single computation with a uniform interface; I-Soft and Globus provided standard authent...
Topology Discovery for Large Ethernet Networks
- In Proceedings of SIGCOMM 2001. ACM
"... Accurate network topology information is important for both network management and application performance prediction. Most topology discovery research has focused on wide-area networks and examined topology only at the IP router level, ignoring the need for LAN topology information. Recent work has ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
Accurate network topology information is important for both network management and application performance prediction. Most topology discovery research has focused on wide-area networks and examined topology only at the IP router level, ignoring the need for LAN topology information. Recent work has demonstrated that bridged Ethernet topology can be determined using standard SNMP MIBs; however, these algorithms require each bridge to learn about all other bridges in the network. Our approach to Ethernet topology discovery can determine the connection between a pair of the bridges that share forwarding entries for only three hosts. This minimal knowledge requirement significantly expands the size of the network that can be discovered. We have implemented the new algorithm, and it has accurately determined the topology of several different networks using a variety of hardware and network configurations. Our implementation requires access to only one endpoint to perform the queries needed for topology discovery.
ReMoS: A Resource Monitoring System for Network-Aware Applications
, 1997
"... Development of portable network-aware applications demands an interface to the network that allows an application to obtain information about its execution environment. This paper motivates and describes the design of Remos, an API that allows network-aware applications to obtain relevant informatio ..."
Abstract
-
Cited by 41 (8 self)
- Add to MetaCart
Development of portable network-aware applications demands an interface to the network that allows an application to obtain information about its execution environment. This paper motivates and describes the design of Remos, an API that allows network-aware applications to obtain relevant information. The major challenges in defining a uniform interface are network heterogeneity, diversity in traffic requirements, variability of the information, and resource sharing in the network. Remos addresses these issues with two abstraction levels, explicit management of resource sharing, and statistical measurements. The flows abstraction captures the communication between nodes, and the topologies abstraction provides a logical view of network connectivity. Remos measurements are made at network level, and therefore information to manage sharing of resources is available. Remos is designed to deliver best effort information to applications, and it explicitly adds statistical reliability and variability measures to the core information. The paper also presents preliminary results and experience with a prototype Remos implementation for a high speed IP-based network testbed.
Increasing Application Performance in Virtual Environments through Run-time Inference and Adaptation
- In Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC
, 2005
"... Virtual machine distributed computing greatly simplifies the use of widespread computing resources by lowering the level of abstraction, benefiting both resource providers and users. Towards that end our Virtuoso middleware closely emulates the existing process of buying, configuring and using physi ..."
Abstract
-
Cited by 33 (13 self)
- Add to MetaCart
Virtual machine distributed computing greatly simplifies the use of widespread computing resources by lowering the level of abstraction, benefiting both resource providers and users. Towards that end our Virtuoso middleware closely emulates the existing process of buying, configuring and using physical machines. Virtuoso's VNET component is a simple and efficient layer two virtual network tool that makes these virtual machines (VMs) appear to be physically connected to the home network of the user while simultaneously supporting arbitrary topologies and routing among them. Virtuoso's VTTIF component continually infers the communication behavior of the application running in a collection of VMs. The combination of overlays like VNET and inference frameworks like VTTIF has great potential to increase the performance, with no user or developer involvement, of existing, unmodified applications by adapting their virtual environments to the underlying computing infrastructure to best suit the applications. We show here how to use the continually inferred application topology and traffic to dynamically control three mechanisms of adaptation, VM migration, overlay topology, and forwarding to significantly increase the performance of two classes of applications, bulk synchronous parallel applications and transactional web ecommerce applications.
Communication modeling of heterogeneous networks of workstations for performance characterization of collective operations
- In HCW’99, the 8th Heterogeneous Computing Workshop
, 1999
"... Abstract: Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects and due to the multiplicity of vendors and platforms, the NOW environments are being gradually redefined as Heterogene ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Abstract: Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects and due to the multiplicity of vendors and platforms, the NOW environments are being gradually redefined as Heterogeneous Networks of Workstations (HNOW). Having an accurate model for the communication in HNOW systems is crucial for design and evaluation of efficient communication layers for such systems. In this paper we present a model for point-to-point communication in HNOW systems and show how it can be used for characterizing the performance of different collective communication operations. In particular, we show how the performance of broadcast, scatter, and gather operations can be modeled and analyzed. We also verify the accuracy of our proposed model by using an experimental HNOW testbed. Furthermore, it is shown how this model can be used for comparing the performance of different collective communication algorithms. We also show how the effect of heterogeneity on the performance of collective communication operations can be predicted. 1

