Results 1 - 10
of
73
Condor-G: A Computation Management Agent for Multi-Institutional Grids
- Cluster Computing
, 2001
"... In recent years, there has been a dramatic increase in the amount of available computing and storage resources. Yet few have been able to exploit these resources in an aggregated form. We present the Condor-G system, which leverages software from Globus arid Condor to allow users to harness multi-do ..."
Abstract
-
Cited by 416 (39 self)
- Add to MetaCart
In recent years, there has been a dramatic increase in the amount of available computing and storage resources. Yet few have been able to exploit these resources in an aggregated form. We present the Condor-G system, which leverages software from Globus arid Condor to allow users to harness multi-domain resources as if they all belong to one personal domain. We describe the structure of Condor-G and how it handles job management, resource selection, security, and fault tolerance. 1.
Matchmaking: Distributed Resource Management for High Throughput Computing
- In Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing
, 1998
"... Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not adapt well to distributed systems, particularly those built to support high-throughput computing. Obstacles include heteroge ..."
Abstract
-
Cited by 301 (19 self)
- Add to MetaCart
Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not adapt well to distributed systems, particularly those built to support high-throughput computing. Obstacles include heterogeneity of resources, which make uniform allocation algorithms difficult to formulate, and distributed ownership, leading to widely varying allocation policies. Faced with these problems, we developed and implemented the classified advertisement (classad) matchmaking framework, a flexible and general approach to resource management in distributed environment with decentralized ownership of resources. Novel aspects of the framework include a semi-structured data model that combines schema, data, and query in a simple but powerful specification language, and a clean separation of the matching and claiming phases of resource allocation. The representation and protocols result in a robust, scalabl...
Distributed Computing in Practice: The Condor Experience
- Concurrency and Computation: Practice and Experience
, 2005
"... Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history ..."
Abstract
-
Cited by 263 (6 self)
- Add to MetaCart
Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must correspond to social structures. Throughout, we reflect on the lessons of experience and chart the course traveled by research ideas as they grow into production systems.
Job Scheduling in Multiprogrammed Parallel Systems
, 1997
"... Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of us ..."
Abstract
-
Cited by 145 (15 self)
- Add to MetaCart
Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of users, this is not necessarily the case. In the context of multiprogrammed parallel machines, scheduling refers to the execution of threads from competing programs. This is an operating system issue, involved with resource allocation, not a program development issue. Scheduling schemes for multiprogrammed parallel systems can be classified as one or two leveled. Single-level scheduling combines the allocation of processing power with the decision of which thread will use it. Two level scheduling decouples the two issues: first, processors are allocated to the job, and then the job's threads are scheduled using this pool of processors. The processors of a parallel system can be shared i...
Condor and the Grid
"... Since 1984, the Condor project has helped ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history ..."
Abstract
-
Cited by 143 (26 self)
- Add to MetaCart
Since 1984, the Condor project has helped ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational grid. In this chapter, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must reflect the sociology of communities. Throughout, we reflect on the lessons of experience and chart the course travelled by research ideas as they grow into production systems.
An Enabling Framework for Master-Worker Applications on the Computational Grid
- Cluster Computing
, 2000
"... We describe MW -- a software framework that allows users to quickly and easily parallelize scientific computations using the masterworker paradigm on the computational grid. MW provides both a "top level" interface to application software and a "bottom level" interface to existing grid computing ..."
Abstract
-
Cited by 75 (9 self)
- Add to MetaCart
We describe MW -- a software framework that allows users to quickly and easily parallelize scientific computations using the masterworker paradigm on the computational grid. MW provides both a "top level" interface to application software and a "bottom level" interface to existing grid computing toolkits. Both interfaces are briefly described. We conclude with a case study, where the necessary Grid services are provided by the Condor high-throughput computing system, and the MW-enabled application code is used to solve a combinatorial optimization problem of unprecedented complexity. This work was supported in part by Grants No. CDA-9726385 and CDA-9623632 from the National Science Foundation. y Department of Electrical and Computer Engineering, Northwestern University, and Mathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, Illinois 60439, goux@mcs.anl.gov z Computer Sciences Department, University of Wisconsin - Madison, 1210 West Dayton Street, Madison, WI 53706, fsanjeevk,yodermeg@cs.wisc.edu x Mathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, Illinois 60439, linderot@mcs.anl.gov 1 1
Mechanisms for High Throughput Computing
- SPEEDUP
, 1997
"... this paper we present three mechanisms employed by Condor to provide HTC services to its customers - the Classified Advertisement (ClassAd) mechanism, the Remote System Calls mechanism, and the Checkpointing mechanism. We view these three mechanisms as holding the key to the success of Condor in har ..."
Abstract
-
Cited by 63 (4 self)
- Add to MetaCart
this paper we present three mechanisms employed by Condor to provide HTC services to its customers - the Classified Advertisement (ClassAd) mechanism, the Remote System Calls mechanism, and the Checkpointing mechanism. We view these three mechanisms as holding the key to the success of Condor in harnessing the capacity of large collections of distributively owned resources. ClassAds enable Condor to pair Resource Requests and Resource Offers, while Remote System Calls enable Condor to allocate resources across administrative domains. The Checkpointing mechanism enables Condor to revoke resources that must be freed due to owners' constraints and to resume the application from where it left off on another resource. Condor was first installed as a production system in our Computer Sciences department over ten years ago. This Condor pool has since served as a major source of computing cycles to both faculty and students in our department. For many, it has revolutionized the role computing plays in their research. An increase in one, and sometimes even two, orders of magnitude in the computing throughput of a research project can have a profound impact on its size, complexity, and scope. In some cases, the almost unlimited amount of computing resources provided by the Condor pool enabled the exploration of more risky research directions. Over the years, the Condor Team has established collaborations with scientists from around the world and has provided them with access to surplus cycles (one of whom has consumed 50 CPU years). Today, our pool consists of more than 300 desk-top UNIX workstations. On a typical day, the pool delivers more than 180 CPU days. 2 The ClassAd Mechanism
Matchmaking Frameworks for Distributed Resource Management
, 2000
"... Federated distributed systems present new challenges to resource management. Conventional resource managers are based on a relatively static resource model and a centralized allocator that assigns resources to customers. Distributed envi-ronments, particularly those built to support high-throughput ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
Federated distributed systems present new challenges to resource management. Conventional resource managers are based on a relatively static resource model and a centralized allocator that assigns resources to customers. Distributed envi-ronments, particularly those built to support high-throughput computing (HTC), are often characterized by distributed management and distributed ownership. Distributed management introduces resource heterogeneity: Not only the set of available resources, but even the set of resource types is constantly changing. Distributed ownership introduces policy heterogeneity: Each resource may have its own idiosyncratic allocation policy. We propose a resource management framework based on a matchmaking paradigm to address these shortcomings. Matchmaking services enable discov-ery and exchange of goods and services in marketplaces. Agents that provide or require services advertise their presence by publishing constraints and pref-erences on the entities they would like to be matched with, as well as their own
SODA: a Service-On-Demand Architecture for Application Service Hosting Utility Platforms
, 2003
"... as utility: computation jobs can be scheduled on-demand in Grid hosts based on available computation capacity. In this paper, we study another emerging usage of Grid utility: the hosting of application services. Different from a computation job, an application service such as e-Laboratory or on-line ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
as utility: computation jobs can be scheduled on-demand in Grid hosts based on available computation capacity. In this paper, we study another emerging usage of Grid utility: the hosting of application services. Different from a computation job, an application service such as e-Laboratory or on-line shopping has longer lifetime, and performs multiple jobs requested by its clients. A service Hosting Utility Platform (HUP) is formed by a set of servers in the Grid, and multiple application services will be hosted on the HUP. We present the design and implementation of SODA, a Service-On-Demand Architecture that enables on-demand creation of application services on a HUP. With SODA, an application service will be created in the form of a set of virtual service nodes; each node is a virtual machine which is physically a `slice' of a real host in the HUP. SODA involves both OS and middleware level techniques, and has the following salient capabilities: (1) on-demand service priming: the image of an application service as well as the OS on which it runs will be created on-demand and bootstrapped automatically; (2) better service isolation: services sharing the same HUP host are isolated with respect to administration, faults, attacks, and resources; (3) integrated service request management: for each service, a service switch will be created to direct client requests to appropriate virtual service nodes. Moreover, the application service provider can replace the default request switching policy with a service-specific policy.
Optimizing Parallel Applications for Wide-Area Clusters
- IN IPPS-98 INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM
, 1998
"... Recent developments in networking technology cause a growing interest in connecting local-area clusters of workstations over wide-area links, creating multilevel clusters, or meta computers. Often, latency and bandwidth of local-area and wide-area networks differ by two orders of magnitude or more. ..."
Abstract
-
Cited by 32 (13 self)
- Add to MetaCart
Recent developments in networking technology cause a growing interest in connecting local-area clusters of workstations over wide-area links, creating multilevel clusters, or meta computers. Often, latency and bandwidth of local-area and wide-area networks differ by two orders of magnitude or more. One would expect only very coarse grain applications to achieve good performance. To test this intuition, we analyze the behavior of several existing medium-grain applications on a wide-area multicluster. We find that high performance can be obtained if the programs are optimized to take the multilevel network structure into account. The optimizations reduce intercluster traffic and hide intercluster latency, and substantially improve performance on wide-area multiclusters. As a result, the range of metacomputing applications is larger than previously assumed.

