Results 11 - 20
of
117
Market-based Proportional Resource Sharing for Clusters
, 1999
"... Enabling technologies in high speed communication and global process scheduling have pushed clusters of computers into the mainstream as general-purpose high-performance computing systems. More generality, however, implies more sharing and this raises new questions in the area of cluster resource ma ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
Enabling technologies in high speed communication and global process scheduling have pushed clusters of computers into the mainstream as general-purpose high-performance computing systems. More generality, however, implies more sharing and this raises new questions in the area of cluster resource management. In particular, in systems where the aggregate demand for computing resources can exceed the aggregate supply, how to allocate resources amongst competing applications is an important problem. Traditional solutions to this problem have focused mainly on global optimization with respect to system-centric performance metrics, metrics which ignore higher level user intent. In this paper, we propose an alternative market-based approach based on the notion of a computational economy which optimizes for user value. Starting with fundamental requirements, we describe an abstract architecture for market-based cluster resource management based on the idea of proportional resource sharing of...
Interfacing Condor and PVM to harness the cycles of workstation clusters
- Journal on Future Generations of Computer Systems
, 1995
"... A continuing challenge to the scientific research and engineering communities is how to fully utilize computational hardware. In particular, the proliferation of clusters of high performance workstations has become an increasingly attractive source of compute power. Developments to take advantage of ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
A continuing challenge to the scientific research and engineering communities is how to fully utilize computational hardware. In particular, the proliferation of clusters of high performance workstations has become an increasingly attractive source of compute power. Developments to take advantage of this environment have previously focused primarily on managing the resources, or on providing interfaces so that a number of machines can be used in parallel to solve large problems. Both approaches are desirable, and indeed should be complementary. Unfortunately, the resource management and parallel processing systems are usually developed by independent groups, and they usually do not interact well together. To bridge this gap, we have developed a framework for interfacing these two sorts of systems. Using this framework, we have interfaced PVM, a popular system for parallel programming with Condor, a powerful resource management system. This combined system is operational, and we have ma...
Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1998
"... In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing natural ..."
Abstract
-
Cited by 44 (2 self)
- Add to MetaCart
In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing naturally-occurring local events and their corresponding implicit information, i.e., information available outside of a defined interface. Many systems, particularly in distributed and networked environments, have leveraged implicit control to simplify the implementation of services with autonomous components. To concretely demonstrate the advantages of implicit control, we propose and implement implicit coscheduling, an algorithm for dynamically coordinating the time...
A Comparison of Queueing, Cluster and Distributed Computing Systems
- NASA Technical Memorandum 109025, NASA LaRC
, 1993
"... / j i t- " o) i' N9_-36932 ..."
Extending Proportional-Share Scheduling to a Network of Workstations
- In Proceedings of Parallel and Distributed Processing Techniques and Applications (PDPTA’97), Las Vegas, NV
, 1997
"... As networks of workstations (NOW) emerge as a viable platform for a wide range of workloads, a new scheduling approach is needed to allocate the collection of resources across competing users. In this paper, we show that extensions to a proportional-share scheduler for improving response time can st ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
As networks of workstations (NOW) emerge as a viable platform for a wide range of workloads, a new scheduling approach is needed to allocate the collection of resources across competing users. In this paper, we show that extensions to a proportional-share scheduler for improving response time can still fairly allocate resources to a mix of sequential, interactive, and parallel jobs in this distributed environment. We find that a proportional-share scheduler, specifically a stride-scheduler, running on each node in the cluster is a good building-block. Simple extensions are implemented and analyzed which provide better response-times for interactive jobs by giving those jobs their share of resources over a longer time-interval. When scheduling jobs across the cluster, we show that fairness can be guaranteed if each local scheduler knows the number of tickets issued to each user and if the tickets are balanced across all workstations. Finally, we show that a proportional-share of resourc...
High performance virtual machines (HPVM): Clusters with supercomputing APIs and performance
- in: Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing
, 1997
"... The HPVM project provides software which enables high-performance computing on clusters of PCs and workstations using standard supercomputing APIs such as MPI, SHMEM Put/Get, and Global Arrays. HPVMs—High-Performance Virtual Machines—are surprisingly competitive with MPP systems, such as the IBM SP2 ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
The HPVM project provides software which enables high-performance computing on clusters of PCs and workstations using standard supercomputing APIs such as MPI, SHMEM Put/Get, and Global Arrays. HPVMs—High-Performance Virtual Machines—are surprisingly competitive with MPP systems, such as the IBM SP2 and Cray T3D. The Illinois HPVM achieves impressive low-level communication performance across the cluster: one-way latencies of around 11 µsec and bandwidths> 50 MBytes/sec—even for small packets (< 256 bytes). Performance at higher levels, such as MPI, is expected to be approximately 17 µsec latency and also> 50 MByte/sec bandwidth.
The evolution of the grid
- Grid Computing: Making the Global Infrastructure a Reality
, 2003
"... In this paper we describe the evolution of grid systems, identifying three generations: first generation systems which were the forerunners of the Grid as we recognise it today; second generation systems with a focus on middleware to support large scale data and computation; and third generation sys ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
In this paper we describe the evolution of grid systems, identifying three generations: first generation systems which were the forerunners of the Grid as we recognise it today; second generation systems with a focus on middleware to support large scale data and computation; and third generation systems where the emphasis shifts to distributed global collaboration, a service oriented approach and information layer issues. In particular, we discuss the relationship between the Grid and the World Wide Web, and suggest that evolving web technologies will provide the basis for the next generation of the Grid. The latter aspect – which we define as the Semantic Grid – is explored in a companion paper. 1.
REXEC: A Decentralized, Secure Remote Execution Environment for Parallel and Sequential Programs
- 4th Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing
, 2000
"... Bringing clusters of computers into the mainstream as general-purpose computing systems requires that better facilities for transparent remote execution of parallel and sequential applications be developed. While much research has been done in this area, most of this work remains inaccessible for cl ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Bringing clusters of computers into the mainstream as general-purpose computing systems requires that better facilities for transparent remote execution of parallel and sequential applications be developed. While much research has been done in this area, most of this work remains inaccessible for clusters built using contemporary hardware and operating systems. Implementations are either too old and/or not publicly available, require use of operating systems which are not supported by modern hardware, or simply do not meet the functional requirements demanded by practical use in real world settings. To address these issues, we designed REXEC, a decentralized, secure remote execution facility. It provides high availability, scalability, transparent remote execution, dynamic cluster configuration, decoupled node discovery and selection, a well-defined failure and cleanup model, parallel and distributed program support, and strong authentication and encryption. The system is implemented and is currently installed and in use on a 32-node cluster of 2-way SMPs running the Linux 2.2.5 operating system.
The Nimrod Computational Workbench: A Case Study in Desktop Metacomputing
, 1997
"... The coordinated use of geographically distributed computers, or metacomputing, can in principle provide more accessible and cost-effective supercomputing than do conventional highperformance systems. However, we lack evidence that metacomputing systems can be made easily usable or that large numbers ..."
Abstract
-
Cited by 22 (12 self)
- Add to MetaCart
The coordinated use of geographically distributed computers, or metacomputing, can in principle provide more accessible and cost-effective supercomputing than do conventional highperformance systems. However, we lack evidence that metacomputing systems can be made easily usable or that large numbers of applications are able to exploit metacomputing resources. In this article, we present work that addresses both these concerns. The basis for this work is a system called Nimrod that provides a desktop problemsolving environment for parametric experiments. We describe how Nimrod has been extended to support the scheduling of computational resources located in a wide-area environment and report Proceedings of the 20th Australasian Computer Science Conference, Sydney, Australia, February 5--7 1997. on an experiment in which Nimrod was used to schedule a large parametric study across the Australian Internet. The experiment provided both new scientific results and insights into Nimrod capabi...

