Results 1 - 10
of
11
Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus
, 2001
"... Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make ..."
Abstract
-
Cited by 81 (15 self)
- Add to MetaCart
Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make application development dicult. Here we describe an architecture and prototype implementation for a Grid-enabled computational framework based on Cactus, the MPICH-G2 Grid-enabled message-passing library, and a variety of specialized features to support efficient execution in Grid environments. We have used this framework to perform record-setting computations in numerical relativity, running across four supercomputers and achieving scaling of 88% (1140 CPU's) and 63% (1500 CPUs). The problem size we were able to compute was about five times larger than any other previous run. Further, we introduce and demonstrate adaptive methods that automatically adjust computational parameters during run time, to increase dramatically the efficiency of a distributed Grid simulation, without modification of the application and without any knowledge of the underlying network connecting the distributed computers.
Wide-Area Implementation of the Message Passing Interface
- PARALLEL COMPUTING
, 1998
"... The Message Passing Interface (MPI) can be used as a portable, high-performance programming model for wide-area computing systems. The wide-area environmentintroduces challenging problems for the MPI implementor, due to the heterogeneity of both the underlying physical infrastructure and the softwar ..."
Abstract
-
Cited by 43 (10 self)
- Add to MetaCart
The Message Passing Interface (MPI) can be used as a portable, high-performance programming model for wide-area computing systems. The wide-area environmentintroduces challenging problems for the MPI implementor, due to the heterogeneity of both the underlying physical infrastructure and the software environment at different sites. In this article, we describe an MPI implementation that incorporates solutions to these problems. This implementation has been constructed by extending the Argonne MPICH implementation of MPI to use communication services provided by the Nexus communication library and authentication, resource allocation, process creation/management, and information services provided by the I-Soft system (initially) and the Globus metacomputing toolkit (work in progress). Nexus provides multimethod communication mechanisms that allowmultiple communication methods to be used in a single computation with a uniform interface; I-Soft and Globus provided standard authent...
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
- INTERN. J. HIGH PERF. COMP. APPLICATIONS
, 2005
"... This paper describes capabilities, evolution, performance, and applications of the Global Arrays (GA) toolkit. GA was created to provide application programmers with an interface that allows them to distribute data while maintaining the type of global index space and programming syntax similar to th ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
This paper describes capabilities, evolution, performance, and applications of the Global Arrays (GA) toolkit. GA was created to provide application programmers with an interface that allows them to distribute data while maintaining the type of global index space and programming syntax similar to that available when programming on a single processor. The goal of GA is to free the programmer from the low level management of communication and allow them to deal with their problems at the level at which they were originally formulated. At the same time, compatibility of GA with MPI enables the programmer to take advantage of the existing MPI software/libraries when available and appropriate. The variety of applications that have been implemented using Global Arrays attests to the
A Software Architecture for Global Address Space Communications on Clusters:. . .
, 1998
"... Global address space parallel programming models' can be an effective alternative to send/receive style communication, simplifying programming or code generation and increasing performance for certain application types. Traditionally, global address space mechanisms have been implemented in hardware ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Global address space parallel programming models' can be an effective alternative to send/receive style communication, simplifying programming or code generation and increasing performance for certain application types. Traditionally, global address space mechanisms have been implemented in hardware in order to provide the necessary communication performance and responsiveness.
Cactus-G Toolkit: Supporting Efficient Execution in Heterogeneous Distributed Computing Environments
- In Proceedings of 4th Globus Retreat
, 2000
"... Improvements in the performance of processors and networks means that it can be both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources or Grids. However, the highly heterogeneous and dynamic nature of such Grids ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Improvements in the performance of processors and networks means that it can be both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources or Grids. However, the highly heterogeneous and dynamic nature of such Grids makes application development extremely difficult. Here we describe an architecture and prototype implementation for a Grid-enabled computational framework called Cactus-G. This framework integrates the Cactus simulation system with the MPICH-G2 Grid-enabled message passing library and in addition integrates a variety of specialized features to support efficient execution in Grid environments.
Efficiently Scheduling Advance Reservations in Grids
, 2005
"... Advance reservations (ARs) were introduced for application-level dynamic scheduling of resources in a Grid infrastructure. Advance reservations of resources for a specific time in future not only ensure that all resources would be simultaneously available at the execution time of the application but ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Advance reservations (ARs) were introduced for application-level dynamic scheduling of resources in a Grid infrastructure. Advance reservations of resources for a specific time in future not only ensure that all resources would be simultaneously available at the execution time of the application but also ensure that the QoS constraints of the Grid applications would be met. Previous research shows that ARs can meet their objectives but at a significant performance cost. In this paper, we argue that laxity in the reservation window of an AR can help improve performance of scheduling with advance reservations. Scheduling ARs with given laxities is an NP-Hard problem and the paper presents a scalable algorithm for scheduling on-demand and advance reservation requests with laxities. The paper then investigates in detail the effect of proportion of advance reservations, laxity and distribution of the size of tasks on performance through extensive experimentation. The paper also investigates that how much improvement in performance can be gained by task preemption and up to what percentage of overheads is preemption justified in scheduling of on-demand and advance reservation requests. We demonstrate how, for some workloads, laxity can be exchanged for preemption to achieve high utilization. Finally, the paper studies resource level policies to prevent starvation of on-demand requests.
Implementing High-Level Parallelism on Computational GRIDs
, 2006
"... This copy of the thesis has been supplied on the condition that anyone who consults it is understood to recognise that the copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without the prior written consent of the author or the ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This copy of the thesis has been supplied on the condition that anyone who consults it is understood to recognise that the copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without the prior written consent of the author or the university (as may be appropriate). I hereby declare that the work presented in this the-sis was carried out by myself at Heriot-Watt University, Edinburgh, except where due acknowledgement is made, and has not been submitted for any other degree.
Egida: A Toolkit for Low-overhead Fault-tolerance
, 1999
"... Log-based rollback recovery protocols—such as logging and checkpointing—are an attractive solution for building non-critical applications that can tolerate crash failures. Surprisingly though, very few of these protocols are being used in practice to build reli-able applications. We believe that thi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Log-based rollback recovery protocols—such as logging and checkpointing—are an attractive solution for building non-critical applications that can tolerate crash failures. Surprisingly though, very few of these protocols are being used in practice to build reli-able applications. We believe that this is because of two reasons. First, the few existing implementations of rollback-recovery protocols either support only a single protocol or re-quire a developer to implement each protocol from scratch. Second, the performance of rollback-recovery protocols in practice is not well understood. To address these issues, we develop Egida, an extensible toolkit for rollback-recovery. To design Egida, we develop a single unifying framework based on handling non-deterministic events in process execution. This framework allows us to characterize rollback-recovery protocols as event-driven programs that differ in their response to a same set of “relevant” events and to identify a set of functionalities that are at the core of all these protocols. To express rollback recovery protocols in terms of the relevant events, we have developed a simple specification language. From a specification, Egida can synthesize the correspond-ing protocol implementation by gluing together appropriate implementations of core mod-
Parallel Computing over the Internet with Java
"... . JET is a parallel library implemented with Java for parallel computing over the Internet. The JET library is oriented to long-running Master/Worker applications with a coarse-grain task distribution. The computation is performed by Java applets that are downloaded through a Web page. The paper des ..."
Abstract
- Add to MetaCart
. JET is a parallel library implemented with Java for parallel computing over the Internet. The JET library is oriented to long-running Master/Worker applications with a coarse-grain task distribution. The computation is performed by Java applets that are downloaded through a Web page. The paper describes some internals of JET and its mechanisms to provide support for fault-tolerance, interoperability with PVM/MPI and the use of statistics. The paper includes some performance figures that were taken with simple benchmarks and more complex applications. 1. Introduction In the last years we have seen an extraordinary increase in the number of machines that are connected to the Internet, this is estimated to continue with an exponential growth. According to a survey accomplished by Network Wizards [NetWizards] in January 1998, 29.6 millions hosts were connected to the Internet (against 16 million in January 1997). This mass of processors connected together represent a very significant p...

