Results 1 - 10
of
86
The physiology of the grid: An open grid services architecture for distributed systems integration
, 2002
"... In both e-business and e-science, we often need to integrate services across distributed, heterogeneous, dynamic “virtual organizations ” formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships. This integration can be t ..."
Abstract
-
Cited by 973 (28 self)
- Add to MetaCart
In both e-business and e-science, we often need to integrate services across distributed, heterogeneous, dynamic “virtual organizations ” formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships. This integration can be technically challenging because of the need to achieve various qualities of service when running on top of different native platforms. We present an Open Grid Services Architecture that addresses these challenges. Building on concepts and technologies from the Grid and Web services communities, this architecture defines a uniform exposed service semantics (the Grid service); defines standard mechanisms for creating, naming, and discovering transient Grid service instances; provides location transparency and multiple protocol bindings for service instances; and supports integration with underlying native platform facilities. The Open Grid Services Architecture also defines, in terms of Web Services Description Language (WSDL) interfaces and associated conventions, mechanisms required for creating and composing sophisticated distributed systems, including lifetime management, change management, and notification. Service bindings can support reliable invocation, authentication, authorization, and delegation, if required. Our presentation complements an earlier foundational article, “The Anatomy of the Grid, ” by describing how Grid mechanisms can implement a service-oriented architecture, explaining how Grid functionality can be incorporated into a Web services framework, and illustrating how our architecture can be applied within commercial computing as a basis for distributed system integration—within and across organizational domains. This is a DRAFT document and continues to be revised. The latest version can be found at
Condor-G: A Computation Management Agent for Multi-Institutional Grids
- Cluster Computing
, 2001
"... In recent years, there has been a dramatic increase in the amount of available computing and storage resources. Yet few have been able to exploit these resources in an aggregated form. We present the Condor-G system, which leverages software from Globus arid Condor to allow users to harness multi-do ..."
Abstract
-
Cited by 416 (39 self)
- Add to MetaCart
In recent years, there has been a dramatic increase in the amount of available computing and storage resources. Yet few have been able to exploit these resources in an aggregated form. We present the Condor-G system, which leverages software from Globus arid Condor to allow users to harness multi-domain resources as if they all belong to one personal domain. We describe the structure of Condor-G and how it handles job management, resource selection, security, and fault tolerance. 1.
A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation
"... The realization of end-to-end quality of service (QoS) guarantees in emerging network-based applications requires mechanisms that support first dynamic discovery and then advance or immediate reservation of resources that will often be heterogeneous in type and implementation and independently contr ..."
Abstract
-
Cited by 177 (23 self)
- Add to MetaCart
The realization of end-to-end quality of service (QoS) guarantees in emerging network-based applications requires mechanisms that support first dynamic discovery and then advance or immediate reservation of resources that will often be heterogeneous in type and implementation and independently controlled and administered.We propose the Globus Architecture for Reservation and Allocation (GARA) to address these four issues.GARA treats both reservations and computational elements such as processes, network flows, and memory blocks as first class entities, allowing them to be created, monitored, and managed independently and uniformly.It simplifies management of heterogeneous resource types by defining uniform mechanisms for computers, networks, disk, memory, and other resources. Layering on these standard mechanisms, GARA enables the construction of application-level co-reservation and coallocation libraries that applications can use to dynamically assemble collections of resources, guided by both application QoS requirements and the local administration policy of individual resources.We describe a prototype GARA implementation that supports three different resource types— parallel computers, individual CPUs under control of the Dynamic Soft Real-Time scheduler, and Integrated Services networks—and provide performance results that quantify the costs of our techniques.
Giggle: A Framework for Constructing Scalable Replica Location Services
, 2002
"... In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location ser ..."
Abstract
-
Cited by 122 (36 self)
- Add to MetaCart
In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location service (RLS) as a system that maintains and provides access to information about the physical locations of copies. An RLS typically functions as one component of a data grid architecture. This paper makes the following contributions. First, we characterize RLS requirements. Next, we describe a parameterized architectural framework, which we name Giggle (for GIGa-scale Global Location Engine), within which a wide range of RLSs can be defined. We define several concrete instantiations of this framework with different performance characteristics. Finally, we present initial performance results for an RLS prototype, demonstrating that RLS systems can be constructed that meet performance goals.
On death, taxes, and the convergence of peer-to-peer and grid computing
- In 2nd International Workshop on Peer-to-Peer Systems (IPTPS’03
, 2003
"... It has been reported [26] that life holds but two certainties, death and taxes. And indeed, despite much effort devoted to circumventing both phenomena, it does appear that any society—and in the context of this paper, any large-scale distributed system—must address both death (failure) and the esta ..."
Abstract
-
Cited by 110 (2 self)
- Add to MetaCart
It has been reported [26] that life holds but two certainties, death and taxes. And indeed, despite much effort devoted to circumventing both phenomena, it does appear that any society—and in the context of this paper, any large-scale distributed system—must address both death (failure) and the establishment and maintenance of infrastructure (which we assert is a major motivation for taxes, so as to
Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus
, 2001
"... Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make ..."
Abstract
-
Cited by 81 (15 self)
- Add to MetaCart
Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make application development dicult. Here we describe an architecture and prototype implementation for a Grid-enabled computational framework based on Cactus, the MPICH-G2 Grid-enabled message-passing library, and a variety of specialized features to support efficient execution in Grid environments. We have used this framework to perform record-setting computations in numerical relativity, running across four supercomputers and achieving scaling of 88% (1140 CPU's) and 63% (1500 CPUs). The problem size we were able to compute was about five times larger than any other previous run. Further, we introduce and demonstrate adaptive methods that automatically adjust computational parameters during run time, to increase dramatically the efficiency of a distributed Grid simulation, without modification of the application and without any knowledge of the underlying network connecting the distributed computers.
Stork: Making Data Placement a First Class Citizen in the Grid
, 2004
"... Todays scientific applications have huge data requirements which continue to increase drastically every year. These data are generally accessed by many users from all across the the globe. This implies a major necessity to move huge amounts of data around wide area networks to complete the computati ..."
Abstract
-
Cited by 78 (18 self)
- Add to MetaCart
Todays scientific applications have huge data requirements which continue to increase drastically every year. These data are generally accessed by many users from all across the the globe. This implies a major necessity to move huge amounts of data around wide area networks to complete the computation cycle, which brings with it the problem of efficient and reliable data placement. The current approach to solve this problem of data placement is either doing it manually, or employing simple scripts which do not have any automation or fault tolerance capabilities. Our goal is to make data placement activities first class citizens in the Grid just like the computational jobs. They will be queued, scheduled, monitored, managed, and even check-pointed. More importantly, it will be made sure that they complete successfully and without any human interaction. We also believe that data placement jobs should be treated differently from computational jobs, since they may have different semantics and different characteristics. For this purpose, we have developed Stork, a scheduler for data placement activities in the Grid.
A National-Scale Authentication Infrastructure
, 2000
"... ver access policies and local security. It provides its own versions of common applications, such as FTP and remote login, and a programming interface for creating secure applications. Dozens of supercomputers and storage systems already use GSI, a level of acceptance reached by few other secur ..."
Abstract
-
Cited by 78 (8 self)
- Add to MetaCart
ver access policies and local security. It provides its own versions of common applications, such as FTP and remote login, and a programming interface for creating secure applications. Dozens of supercomputers and storage systems already use GSI, a level of acceptance reached by few other security infrastructures. MULTISITE AUTHENTICATION Virtual organizations must have a reliable means for identifying requestors, but participant independence complicates authentication across multiple sites. PACIs provide a good example of both the issues and technical requirements for a multisite authentication infrastructure. The PACI community The NSF PACIs---two consortia of some 50 universities and government laboratories---are dedicated to developing next-generation scientific problem-solving tools. Virtual organizations in their own right, the PACIs independently provide resources to an even larger and less-formal national user community of many
High-performance remote access to climate simulation data: A challenge problem for data grid technologies
, 2001
"... In numerous scientific disciplines, terabyte and soon petabyte-scale data collections are emerging as critical community resources. A new class of Data Grid infrastructure is required to support management, transport, distributed access to, and analysis of these datasets by potentially thousands of ..."
Abstract
-
Cited by 57 (9 self)
- Add to MetaCart
In numerous scientific disciplines, terabyte and soon petabyte-scale data collections are emerging as critical community resources. A new class of Data Grid infrastructure is required to support management, transport, distributed access to, and analysis of these datasets by potentially thousands of users. Researchers who face this challenge include the Climate Modeling community, which performs long-duration computations accompanied by frequent output of very large files that must be further analyzed. We describe the Earth System Grid prototype, which brings together advanced analysis, replica management, data transfer, request management, and other technologies to support high-performance, interactive analysis of replicated data. We present performance results that demonstrate our ability to manage the location and movement of large datasets from the user’s desktop. We report on experiments conducted over SciNET at SC’2000, where we achieved peak performance of 1.55Gb/s and sustained performance of 512.9Mb/s for data transfers between Texas and California. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

