Results 1 - 10
of
81
The physiology of the grid: An open grid services architecture for distributed systems integration
, 2002
"... In both e-business and e-science, we often need to integrate services across distributed, heterogeneous, dynamic “virtual organizations ” formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships. This integration can be t ..."
Abstract
-
Cited by 973 (28 self)
- Add to MetaCart
In both e-business and e-science, we often need to integrate services across distributed, heterogeneous, dynamic “virtual organizations ” formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships. This integration can be technically challenging because of the need to achieve various qualities of service when running on top of different native platforms. We present an Open Grid Services Architecture that addresses these challenges. Building on concepts and technologies from the Grid and Web services communities, this architecture defines a uniform exposed service semantics (the Grid service); defines standard mechanisms for creating, naming, and discovering transient Grid service instances; provides location transparency and multiple protocol bindings for service instances; and supports integration with underlying native platform facilities. The Open Grid Services Architecture also defines, in terms of Web Services Description Language (WSDL) interfaces and associated conventions, mechanisms required for creating and composing sophisticated distributed systems, including lifetime management, change management, and notification. Service bindings can support reliable invocation, authentication, authorization, and delegation, if required. Our presentation complements an earlier foundational article, “The Anatomy of the Grid, ” by describing how Grid mechanisms can implement a service-oriented architecture, explaining how Grid functionality can be incorporated into a Web services framework, and illustrating how our architecture can be applied within commercial computing as a basis for distributed system integration—within and across organizational domains. This is a DRAFT document and continues to be revised. The latest version can be found at
Matchmaking: Distributed Resource Management for High Throughput Computing
- In Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing
, 1998
"... Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not adapt well to distributed systems, particularly those built to support high-throughput computing. Obstacles include heteroge ..."
Abstract
-
Cited by 301 (19 self)
- Add to MetaCart
Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not adapt well to distributed systems, particularly those built to support high-throughput computing. Obstacles include heterogeneity of resources, which make uniform allocation algorithms difficult to formulate, and distributed ownership, leading to widely varying allocation policies. Faced with these problems, we developed and implemented the classified advertisement (classad) matchmaking framework, a flexible and general approach to resource management in distributed environment with decentralized ownership of resources. Novel aspects of the framework include a semi-structured data model that combines schema, data, and query in a simple but powerful specification language, and a clean separation of the matching and claiming phases of resource allocation. The representation and protocols result in a robust, scalabl...
Exploiting Process Lifetime Distributions for Dynamic Load Balancing
- ACM Transactions on Computer Systems
, 1996
"... We measure the distribution of lifetimes for UNIX processes and propose a functional form that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration po ..."
Abstract
-
Cited by 290 (30 self)
- Add to MetaCart
We measure the distribution of lifetimes for UNIX processes and propose a functional form that fits this distribution well. We use this functional form to derive a policy for preemptive migration, and then use a trace-driven simulator to compare our proposed policy with other preemptive migration policies, and with a non-preemptive load balancing strategy. We find that, contrary to previous reports, the performance benefits of preemptive migration are significantly greater than those of non-preemptive migration, even when the memorytransfer cost is high. Using a model of migration costs representative of current systems, we find that preemptive migration reduces the mean delay (queueing and migration) by 35 -- 50%, compared to non-preemptive migration. 1 Introduction Most systems that perform load balancing use remote execution (i.e. non-preemptive migration) based on a priori knowledge of process behavior, often in the form of a list of process names eligible for migration. Althoug...
Distributed Computing in Practice: The Condor Experience
- Concurrency and Computation: Practice and Experience
, 2005
"... Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history ..."
Abstract
-
Cited by 263 (6 self)
- Add to MetaCart
Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must correspond to social structures. Throughout, we reflect on the lessons of experience and chart the course traveled by research ideas as they grow into production systems.
Condor and the Grid
"... Since 1984, the Condor project has helped ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history ..."
Abstract
-
Cited by 143 (26 self)
- Add to MetaCart
Since 1984, the Condor project has helped ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must reflect the sociology of communities. Throughout, we reflect on the lessons of experience and chart the course travelled by research ideas as they grow into production systems.
Scheduling with Advanced Reservations
- IEEE Int. Par. and
, 2000
"... Some computational grid applications have very large resource requirements and need simultaneous access to resources from more than one parallel computer. Current scheduling systems do not provide mechanisms to gain such simultaneous access without the help of human administrators of the computer sy ..."
Abstract
-
Cited by 80 (4 self)
- Add to MetaCart
Some computational grid applications have very large resource requirements and need simultaneous access to resources from more than one parallel computer. Current scheduling systems do not provide mechanisms to gain such simultaneous access without the help of human administrators of the computer systems. In this work, we propose and evaluate several algorithms for supporting advanced reservation of resources in supercomputing scheduling systems. These advanced reservations allow users to request resources from scheduling systems at speci c times. We nd that the wait times of applications submitted to the queue increases when reservations are supported and the increase depends on how reservations are supported. Further, we nd that the best performance is achieved when we assume that applications can be terminated and restarted, back lling is performed, and relatively accurate run-time predictions are used. 1.
Adaptive Parallelism and Piranha
, 1995
"... . Under "adaptive parallelism," the set of processors executing a parallel program may grow or shrink as the program runs. Potential gains include the capacity to run a parallel program on the idle workstations in a conventional LAN---processors join the computation when they become idle, and withdr ..."
Abstract
-
Cited by 75 (0 self)
- Add to MetaCart
. Under "adaptive parallelism," the set of processors executing a parallel program may grow or shrink as the program runs. Potential gains include the capacity to run a parallel program on the idle workstations in a conventional LAN---processors join the computation when they become idle, and withdraw when their owners need them---and to manage the nodes of a dedicated multiprocessor efficiency. Experience to date with our Piranha system for adaptive parallelism suggests that these possibilities can be achieved in practice on real applications at comparatively modest costs. Keywords: Parallelism, networks, multiprocessors, adaptive parallelism, programming techniques, Linda, Piranha. 1 Introduction Most work on parallelism is "static": it assumes that programs are distributed over processor sets that remain fixed throughout the computation. If a program starts out on 64 processors, it runs on exactly 64 until completion, and specifically on the same 64. "Adaptive parallelism" (AP) abo...
The Utility of Exploiting Idle Workstations for Parallel Computation
, 1997
"... In this paper, we examine the utility of exploiting idle workstations for parallel computation. We attempt to answer the following questions. First, given a workstation pool, for what fraction of time can we expect to find a cluster of k workstations available? This provides an estimate of the oppor ..."
Abstract
-
Cited by 73 (5 self)
- Add to MetaCart
In this paper, we examine the utility of exploiting idle workstations for parallel computation. We attempt to answer the following questions. First, given a workstation pool, for what fraction of time can we expect to find a cluster of k workstations available? This provides an estimate of the opportunity for parallel computation. Second, how stable is a cluster of free machines and how does the stability vary with the size of the cluster? This indicates how frequently a parallel computation might have to stop for adapting to changes in processor availability. Third, what is the distribution of workstation idle-times? This information is useful for selecting workstations to place computation on. Fourth, how much benefit can a user expect? To state this in concrete terms, if I have a pool of size S, how big a parallel machine should I expect to get for free by harvesting idle machines. Finally, how much benefit can be achieved on a real machine and how hard does a parallel programmer ha...
Network-aware Mobile Programs
- In Proceedings of the 1997 USENIX Technical Conference
, 1997
"... In this paper, we investigate network-aware mobile programs, programs that can use mobility as a tool to adapt to variations in network characteristics. We present infrastructural support for mobility and network monitoring and show how adaptalk, a Java-based mobile Internet chat application can tak ..."
Abstract
-
Cited by 70 (6 self)
- Add to MetaCart
In this paper, we investigate network-aware mobile programs, programs that can use mobility as a tool to adapt to variations in network characteristics. We present infrastructural support for mobility and network monitoring and show how adaptalk, a Java-based mobile Internet chat application can take advantage of this support to dynamically place the chat server so as to minimize response time. Our conclusion was that on-line network monitoring and adaptive placement of shared data-structures can significantly improve performance of distributed applications on the Internet. 1
Nimrod: A Tool for Performing Parametised Simulations using Distributed Workstations
- 4th IEEE Symposium on High Performance Distributed Computing
, 1995
"... This paper discusses Nimrod, a tool for performing parametised simulations over networks of loosely coupled workstations. Using Nimrod the user interactively generates a parametised experiment. Nimrod then controls the distribution of jobs to machines and the collection of results. A simple graphica ..."
Abstract
-
Cited by 65 (26 self)
- Add to MetaCart
This paper discusses Nimrod, a tool for performing parametised simulations over networks of loosely coupled workstations. Using Nimrod the user interactively generates a parametised experiment. Nimrod then controls the distribution of jobs to machines and the collection of results. A simple graphical user interface which is built for each application allows the user to view the simulation in terms of their problem domain. The current version of Nimrod is implemented above OSF DCE and runs on DEC Alpha and IBM RS6000 workstations (including a 22 node SP2). Two different case studies are discussed as an illustration of the utility of the system. 1 INTRODUCTION A wide range of scientific and engineering experiments can be solved using numeric simulation. Examples include finite element analysis, computational fluid dynamics, electromagnetic and electronic simulation, pollution transport, granular flow and digital logic simulation. Accordingly, some very large codes have been written over ...

