Results 1 - 10
of
72
The Anatomy of the Grid - Enabling Scalable Virtual Organizations
- International Journal of Supercomputer Applications
, 2001
"... "Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid ..."
Abstract
-
Cited by 1734 (68 self)
- Add to MetaCart
"Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid problem," which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources---what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges. It is this class of problem that is addressed by Grid technologies. Next, we present an extensible and open Grid architecture,inwhich protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. We describe requirements that we believe any such mechanisms must satisfy and we discuss the importance of defining a compact set of intergrid protocols to enable interoperability among different Grid systems. Finally, we discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. We maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.
MagPIe: MPI’s Collective Communication Operations for Clustered Wide Area Systems
- Proc PPoPP'99
, 1999
"... Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective operations, such as broadcast and reduce. We have d ..."
Abstract
-
Cited by 138 (26 self)
- Add to MetaCart
Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective operations, such as broadcast and reduce. We have developed MAGPIE, a library of collective communication operations optimized for wide area systems. MAGPIE's algorithms send the minimal amount of data over the slow wide area links, and only incur a single wide area latency. Using our system, existing MPI applications can be run unmodified on geographically distributed systems. On moderate cluster sizes, using a wide area latency of 10 milliseconds and a bandwidth of 1 MByte/s, MAGPIE executes operations up to 10 times faster than MPICH, a widely used MPI implementation; application kernels improve by up to a factor of 4. Due to the structure of our algorithms, MAGPIE's advantage increases for higher wide area latencies.
MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes
- In Supercomputing
, 2002
"... Global Computing platforms, large scale clusters and future TeraGRID systems gather thousands of nodes for computing parallel scientific applications. At this scale, node failures or disconnections are frequent events. This Volatility reduces the MTBF of the whole system in the range of hours or min ..."
Abstract
-
Cited by 94 (10 self)
- Add to MetaCart
Global Computing platforms, large scale clusters and future TeraGRID systems gather thousands of nodes for computing parallel scientific applications. At this scale, node failures or disconnections are frequent events. This Volatility reduces the MTBF of the whole system in the range of hours or minutes.
Resource Co-Allocation in Computational Grids
- IN PROCEEDINGS OF THE EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING (HPDC-8
, 1999
"... Applications designed to execute on "computational grids" frequently require the simultaneous co-allocation of multiple resources in order to meet performance requirements. For example, several computers and network elements may be required in order to achieve real-time reconstruction of experimenta ..."
Abstract
-
Cited by 89 (1 self)
- Add to MetaCart
Applications designed to execute on "computational grids" frequently require the simultaneous co-allocation of multiple resources in order to meet performance requirements. For example, several computers and network elements may be required in order to achieve real-time reconstruction of experimental data, while a large numerical simulation may require simultaneous access to multiple supercomputers. Motivated by these concerns, we have developed a general resource management architecture for Grid environments, in which resource co-allocation is an integral component. In this paper, we examine the coallocation problem in detail and present mechanisms that allow an application to guide resource selection during the co-allocation process; these mechanisms address issues relating to the allocation, monitoring, control, and configuration of distributed computations. We describe the implementation of co-allocators based on these mechanisms and present the results of microbenchmark studies and largescale application experiments that provide insights into the costs and practical utility of our techniques.
Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus
, 2001
"... Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make ..."
Abstract
-
Cited by 81 (15 self)
- Add to MetaCart
Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make application development dicult. Here we describe an architecture and prototype implementation for a Grid-enabled computational framework based on Cactus, the MPICH-G2 Grid-enabled message-passing library, and a variety of specialized features to support efficient execution in Grid environments. We have used this framework to perform record-setting computations in numerical relativity, running across four supercomputers and achieving scaling of 88% (1140 CPU's) and 63% (1500 CPUs). The problem size we were able to compute was about five times larger than any other previous run. Further, we introduce and demonstrate adaptive methods that automatically adjust computational parameters during run time, to increase dramatically the efficiency of a distributed Grid simulation, without modification of the application and without any knowledge of the underlying network connecting the distributed computers.
Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance
, 2000
"... The ecient implementation of collective communication operations has received much attention. Initial eorts modeled network communication and produced \optimal" trees based on those models. However, the models used by these initial eorts assumed equal point-to-point latencies between any two process ..."
Abstract
-
Cited by 67 (10 self)
- Add to MetaCart
The ecient implementation of collective communication operations has received much attention. Initial eorts modeled network communication and produced \optimal" trees based on those models. However, the models used by these initial eorts assumed equal point-to-point latencies between any two processes. This assumption is violated in heterogeneous systems such as clusters of SMPs and wide-area \computational grids", and as a result, collective operations that utilize the trees generated by these models perform suboptimally. In response, more recent work has focused on creating topology-aware trees for collective operations that minimize communication across slower channels (e.g., a wide-area network). While these efforts have signicant communication benets, they all limit their view of the network to only two layers. We present a strategy based upon a multilayer view of the network. By creating multilevel topology trees we take advantage of communication cost dierences at every lev...
The Cactus Worm: Experiments with Dynamic Resource Discovery and Allocation in a Grid Environment
- International Journal of High Performance Computing Applications
, 2001
"... The ability to harness heterogeneous, dynamically available Grid resources is attractive to typically resource-starved computational scientists and engineers, as in principle it can increase, by significant factors, the number of cycles that can be delivered to applications. However, new adaptive ap ..."
Abstract
-
Cited by 50 (9 self)
- Add to MetaCart
The ability to harness heterogeneous, dynamically available Grid resources is attractive to typically resource-starved computational scientists and engineers, as in principle it can increase, by significant factors, the number of cycles that can be delivered to applications. However, new adaptive application structures and dynamic runtime system mechanisms are required if we are to operate effectively in Grid environments. In order to explore some of these issues in a practical setting, we are developing an experimental framework, called Cactus, that incorporates both adaptive application structures for dealing with changing resource characteristics and adaptive resource selection mechanisms that allow applications to change their resource allocations (e.g., via migration) when performance falls outside specified limits. We describe here the adaptive resource selection mechanisms and describe how they are used to achieve automatic application migration to better resources following performance degradation. Our results provide insights into the architectural structures required to support adaptive resource selection. In addition, we suggest that this Cactus Worm is an interesting challenge problem for Grid computing.
Ibis: A Flexible and Efficient Java-based Grid Programming Environment
- Concurrency & Computation: Practice & Experience
, 2005
"... In computational grids, performance-hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing grid programming environments stems exactly from the dynamic availability of compute cycles: grid programming environments (a) ne ..."
Abstract
-
Cited by 45 (15 self)
- Add to MetaCart
In computational grids, performance-hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing grid programming environments stems exactly from the dynamic availability of compute cycles: grid programming environments (a) need to be portable to run on as many sites as possible, (b) they need to be flexible to cope with different network protocols and dynamically changing groups of compute nodes, while (c) they need to provide efficient (local) communication that enables high-performance computing in the first place. Existing programming environments are either portable (Java), or they are flexible (Jini, Java RMI), or they are highly efficient (MPI). No system combines all three properties that are necessary for grid computing. In this paper, we present Ibis, a new programming environment that combines Java’s “run everywhere ” portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object-based communication. Ibis can transfer Java objects very efficiently by combining streaming object serialization with a zero-copy protocol. Using RMI as a simple test case, we show that Ibis outperforms existing RMI implementations, achieving up to 9 times higher throughputs with trees of objects. 1
A Decoupled Scheduling Approach for Grid Application Development Environments
- Journal of Parallel and Distributed Computing
, 2003
"... In this paper we propose an adaptive scheduling approach designed to improve the performance of parallel applications in Computational Grid environments. A primary contribution of our work is that our design is modular and provides a separation of the scheduler itself from the application-specific c ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
In this paper we propose an adaptive scheduling approach designed to improve the performance of parallel applications in Computational Grid environments. A primary contribution of our work is that our design is modular and provides a separation of the scheduler itself from the application-specific components needed for the scheduling process. As part of the scheduler, we have also developed a search procedure which effectively and efficiently identifies desirable schedules. As test cases for our approach, we selected two applications from the class of iterative, mesh-based applications. For each of the test applications, we developed data mappers and performance models. We used a prototype of our approach in conjunction with these application-specific components to perform validation experiments in production Grid environments. Our results show that our scheduler provides significantly better application performance than conventional scheduling strategies. We also show that our scheduler gracefully handles degraded levels of availability of application and Grid resource information. Finally, we demonstrate that the overheads introduced by our methodology
Service-Based Distributed Querying on the Grid
- IN PROC. OF THE 1ST INT. CONF. ON SERVICE ORIENTED COMPUTING
, 2003
"... Service-based approaches (such as Web Services and the Open Grid Services Architecture) have gained considerable attention recently for supporting distributed application development in e-business and e-science. The emergence of a service-oriented view of hardware and software resources raises t ..."
Abstract
-
Cited by 32 (21 self)
- Add to MetaCart
Service-based approaches (such as Web Services and the Open Grid Services Architecture) have gained considerable attention recently for supporting distributed application development in e-business and e-science. The emergence of a service-oriented view of hardware and software resources raises the question as to how database management systems and technologies can best be deployed or adapted for use in such an environment. This paper explores one aspect of service-based computing and data management, viz., how to integrate query processing technology with a service-based Grid. The paper describes in detail the design and implementation of a service-based distributed query processor for the Grid. The query processor is service-based in two orthogonal senses: firstly, it supports querying over data storage and analysis resources that are made available as services, and, secondly, its internal architecture factors out as services the functionalities related to the construction of distributed query plans on the one hand, and to their execution over the Grid on the other. The resulting system both provides a declarative approach to service orchestration in the Grid, and demonstrates how query processing can benefit from dynamic access to computational resources on the Grid.

