Results 1 - 10
of
34
Condor Technical Summary
, 1991
"... Condor is a software package for executing long running "batch" type jobs on workstations which would otherwise be idle. Major features of Condor are automatic location and allocation of idle machines, and checkpointing and migration of processes. All of these features are achieved without any modif ..."
Abstract
-
Cited by 82 (7 self)
- Add to MetaCart
Condor is a software package for executing long running "batch" type jobs on workstations which would otherwise be idle. Major features of Condor are automatic location and allocation of idle machines, and checkpointing and migration of processes. All of these features are achieved without any modifications to the UNIX kernel whatsoever. Also, users of Condor do not need to change their source programs to run with Condor, although such programs must be specially linked. The features of Condor for both users and workstation owners along with the limitations on the kinds of jobs which may be executed by Condor are described. The mechanisms behind our implementations of checkpointing and process migration are discussed in detail. Finally, the software which detects idle machines and allocates those machines to Condor users is described along with the techniques used to configure that software to meet the demands of a particular computing site or workstation owner. 1. Introduction to the ...
Modeling Machine Availability in Enterprise and Wide-area Distributed Computing Environments
- In Euro-Par’05
, 2003
"... In this paper, we consider the problem of modeling machine availability in enterprise-area and wide-area distributed computing settings. Using availability data gathered from three different environments, we detail the suitability of four potential statistical distributions for each data set: expone ..."
Abstract
-
Cited by 51 (7 self)
- Add to MetaCart
In this paper, we consider the problem of modeling machine availability in enterprise-area and wide-area distributed computing settings. Using availability data gathered from three different environments, we detail the suitability of four potential statistical distributions for each data set: exponential, Pareto, Weibull, and hyperexponential. In each case, we use software we have developed to determine the necessary parameters automatically from each data collection.
Adaptive Parallelism with Piranha
"... "Adaptive parallelism" refers to parallel computations on a dynamically changing set of processors: processors may join or withdraw from the computation as it proceeds. Networks of fast workstations are the most important setting for adaptive parallelism at present. Workstations at most sites are ty ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
"Adaptive parallelism" refers to parallel computations on a dynamically changing set of processors: processors may join or withdraw from the computation as it proceeds. Networks of fast workstations are the most important setting for adaptive parallelism at present. Workstations at most sites are typically idle for significant fractions of the day, and those idle cycles may constitute in the aggregate a powerful computing resource. For this reason and others, we believe that adaptive parallelism is assured of playing an increasingly prominent role in parallel applications development over the next decade. The "Piranha" system now up and running on a heterogeneous network at Yale is a general-purpose adaptive parallelism environment. It has been used to run a variety of production applications, including applications in graphics, theoretical physics, electrical engineering and computational fluid dynamics. In this paper we describe the Piranha model and several archetypal Piranha algori...
Idletime scheduling with preemption intervals
- 20th ACM Symposium on Operating Systems Principles
, 2005
"... ABSTRACT * This paper presents the idletime scheduler; a generic, kernel-level mechanism for using idle resource capacity in the background without slowing down concurrent foreground use. Many operating systems fail to support transparent background use and concurrent foreground performance can decr ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
ABSTRACT * This paper presents the idletime scheduler; a generic, kernel-level mechanism for using idle resource capacity in the background without slowing down concurrent foreground use. Many operating systems fail to support transparent background use and concurrent foreground performance can decrease by 50 % or more. The idletime scheduler minimizes this interference by partially relaxing the work conservation principle during preemption intervals, during which it serves no background requests even if the resource is idle. The length of preemption intervals is a controlling parameter of the scheduler: short intervals aggressively utilize idle capacity; long intervals reduce the impact of background use on foreground performance. Unlike existing approaches to establish prioritized resource use, idletime scheduling requires only localized modifications to a limited number of system schedulers. In experiments, a FreeBSD implementation for idletime network scheduling maintains over 90 % of foreground TCP throughput, while allowing concurrent, high-rate UDP background flows to consume up to 80 % of remaining link capacity. A FreeBSD disk scheduler implementation maintains 80 % of foreground read performance, while enabling concurrent background operations to reach 70% throughput.
Fault-tolerant Parallel Processing Combining Linda, Checkpointing, and Transactions
, 1996
"... With the advent of high performance workstations and fast LANs, networks of workstations have recently emerged as a promising computing platform for long-running coarse grain parallel applications. Their advantages are wide availability and coste ectiveness, as compared to massively parallel compute ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
With the advent of high performance workstations and fast LANs, networks of workstations have recently emerged as a promising computing platform for long-running coarse grain parallel applications. Their advantages are wide availability and coste ectiveness, as compared to massively parallel computers. Long-running computation in the workstation environment, however, requires both fault tolerance and the e ective utilization of idle workstations. In this dissertation, we present avariant of Linda, called Persistent Linda (PLinda), that treats these two issues uniformly: speci cally, PLinda treats non-idleness as failure. PLinda provides a combination of checkpointing and transaction support on both data and program state (an encoding of continuations). The traditional transaction model is optimized and extended to support robust parallel computation. Treatable failures include processor and main memory hard and slowdown failures, and network omission and corruption failures. The programmer can customize fault tolerance when constructing an application, trading failure-free performance against recovery time. When creating a PLinda program,
Cluster computing: the commodity supercomputer
- Software-Practice and Experience
, 1999
"... The availability of high-speed networks and increasingly powerful commodity microprocessors is making the usage of clusters, or networks, of computers an appealing vehicle for cost effective parallel computing. Clusters, built using Commodity-Off-The-Shelf (COTS) hardware components as well as free, ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The availability of high-speed networks and increasingly powerful commodity microprocessors is making the usage of clusters, or networks, of computers an appealing vehicle for cost effective parallel computing. Clusters, built using Commodity-Off-The-Shelf (COTS) hardware components as well as free, or commonly used, software, are playing a major role in redefining the concept of supercomputing. In this paper we discuss the reasons why COTS-based clusters are becoming popular environments for running supercomputing applications. We describe the current enabling technologies and present four state-of-theart cluster-based projects. Finally, we summarise our findings and draw a number of conclusions relating to the usefulness and likely future of cluster computing. Copyright © 1999 John Wiley & Sons, Ltd. KEY WORDS: commodity components; clusters; message-passing; supercomputing; parallel computing
Optimal Scheduling for Disconnected Cooperation
, 2001
"... We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection of processors may establish communication. Before proc ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection of processors may establish communication. Before processors begin communication they execute tasks in the order given by their schedules. Our goal is to schedule work of isolated processors so that when communication is established for the rst time, the number of redundantly executed tasks is controlled. We quantify worst case redundancy as a function of processor advancements through their schedules. In this work we rene and simplify an extant deterministic construction for schedules with n t, and we develop a new analysis of its waste. The new analysis shows that for any pair of schedules, the number of redundant tasks can be controlled for the entire range of t tasks. Our new result is asymptotically optimal: the tails of these schedules are within a 1 +O(n 1 4 ) factor of the lower bound. We also present two new deterministic constructions one for t n, and the other for t n 3=2 , which substantially improve pairwise waste for all prexes of length t= p n, and oer near optimal waste for the tails of the schedules. Finally, we present bounds for waste of any collection of k 2 processors for both deterministic and randomized constructions. 1
Resource Management in the Condor System
, 1996
"... Condor is a distributed batch system for sharing the workload of jobs in a pool of Unix workstations connected by a communication network. ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Condor is a distributed batch system for sharing the workload of jobs in a pool of Unix workstations connected by a communication network.
Condor Flocking: Load Sharing between Pools of Workstations
, 1993
"... This report is the result of a six-month graduating term for the Operating Systems and Distributed Systems group of the Technical University of Delft. This assignment was performed within the Computer System Group of NIKHEF (National Institute for Nuclear Physics and High-Energy Physics) under the s ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This report is the result of a six-month graduating term for the Operating Systems and Distributed Systems group of the Technical University of Delft. This assignment was performed within the Computer System Group of NIKHEF (National Institute for Nuclear Physics and High-Energy Physics) under the supervision of R. van Dantzig (Spin Muon Collaboration), and I.S. Herschberg, D.H.J. Epema and J.F.C.M. de Jongh (Technical University of Delft)

