Results 1 - 10
of
49
Job Scheduling in Multiprogrammed Parallel Systems
, 1997
"... Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of us ..."
Abstract
-
Cited by 145 (15 self)
- Add to MetaCart
Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of users, this is not necessarily the case. In the context of multiprogrammed parallel machines, scheduling refers to the execution of threads from competing programs. This is an operating system issue, involved with resource allocation, not a program development issue. Scheduling schemes for multiprogrammed parallel systems can be classified as one or two leveled. Single-level scheduling combines the allocation of processing power with the decision of which thread will use it. Two level scheduling decouples the two issues: first, processors are allocated to the job, and then the job's threads are scheduled using this pool of processors. The processors of a parallel system can be shared i...
Predicting Application Run Times Using Historical Information
, 1997
"... We present a technique for deriving predictions for the run times of parallel applications from the run times of "similar" applications that have executed in the past. The novel aspect of our work is the use of search techniques to determine those application characteristics that yield the best ..."
Abstract
-
Cited by 83 (13 self)
- Add to MetaCart
We present a technique for deriving predictions for the run times of parallel applications from the run times of "similar" applications that have executed in the past. The novel aspect of our work is the use of search techniques to determine those application characteristics that yield the best definition of similarity for the purpose of making predictions. We use four workloads recorded from parallel computers at Argonne National Laboratory, the Cornell Theory Center, and the San Diego Supercomputer Center to evaluate the effectiveness of our approach. We show that on these workloads our techniques achieve predictions that are between 14 and 60 percent better than those achieved by other researchers; our approach achieves mean prediction errors that are between 40 and 59 percent of mean application run times.
Utilization and Predictability in Scheduling the IBM SP2 with Backfilling
- In Proceedings of the 1st Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (IPPS/SPDP-98), pages 542–547, Los Alamitos
, 1998
"... Scheduling jobs on the IBM SP2 system is usually done by giving each job a partition of the machine for its exclusive use. Allocating such partitions in the order that the jobs arrive (FCFS scheduling) is fair and predictable, but su ers from severe fragmentation, leading to low utilization. This mo ..."
Abstract
-
Cited by 55 (7 self)
- Add to MetaCart
Scheduling jobs on the IBM SP2 system is usually done by giving each job a partition of the machine for its exclusive use. Allocating such partitions in the order that the jobs arrive (FCFS scheduling) is fair and predictable, but su ers from severe fragmentation, leading to low utilization. This motivated Argonne National Lab, where the rst large SP1 was installed, to develop the EASY scheduler. This scheduler, which has since been adopted by many other SP2 sites, uses aggressive back lling: small jobs are moved ahead to ll in holes in the schedule, provided they do not delay the rst job in the queue. We show that a more conservative approach, in which small jobs move ahead only if they do not delay any job in the queue, produces essentially the same bene ts in terms of utilization. Our conservative scheme has the added advantage that queueing times can be predicted in advance, whereas in EASY the queueing time is unbounded. 1
Using run-time predictions to estimate queue wait times and improve scheduler performance
- Scheduling Strategies for Parallel Processing
, 1999
"... On many computers, a request to run a job is not serviced immediately but instead is placed in a queue and serviced only when resources are released bypreceding jobs. In this paper, we build on run-time prediction techniques that we developed inprevious research to explore two problems. The rst prob ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
On many computers, a request to run a job is not serviced immediately but instead is placed in a queue and serviced only when resources are released bypreceding jobs. In this paper, we build on run-time prediction techniques that we developed inprevious research to explore two problems. The rst problem is to predict how long applications will wait in a queue until they receive resources. We show that run-time estimates can be used for this and that using our run-time estimates result in more accurate wait-time predictions than when the run-time prediction techniques of other researches are used. The second problem we investigate is improving scheduling performance. We use run-time predictions to improve the performance of the least work rst and back ll scheduling algorithms. We nd that using our run-time predictor results in lower mean wait times for the workloads with higher o ered loads when compared to alternative run-time predictors. 1
Gang Scheduling with Memory Considerations
- in Proc. of the 14th Intl. Parallel and Distributed Processing Symp., 2000
"... A major problem with time slicing on parallel machines is memory pressure, as the resulting paging activity damages the synchronism among a job’s processes. An alternative is to impose admission controls, and only admit jobs that fit into the available memory. Despite suffering from delayed executio ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
A major problem with time slicing on parallel machines is memory pressure, as the resulting paging activity damages the synchronism among a job’s processes. An alternative is to impose admission controls, and only admit jobs that fit into the available memory. Despite suffering from delayed execution, this leads to better overall performance by preventing the harmful effects of paging and thrashing. 1.
Supporting priorities and improving utilization of the IBM SP2 scheduler using slack-based backfilling
- In Proceedings of the 13th International Parallel Processing Symposium
, 1999
"... ..."
The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance
- In Job Scheduling Strategies for Parallel Processing
, 2002
"... Abstract. The question of whether more accurate requested runtimes can significantly improve production parallel system performance has previously been studied for the FCFS-backfill scheduler, using a limited set of system performance measures. This paper examines the question for higher performance ..."
Abstract
-
Cited by 40 (3 self)
- Add to MetaCart
Abstract. The question of whether more accurate requested runtimes can significantly improve production parallel system performance has previously been studied for the FCFS-backfill scheduler, using a limited set of system performance measures. This paper examines the question for higher performance backfill policies, heavier system loads as are observed in current leading edge production systems such as the large Origin 2000 system at NCSA, and a broader range of system performance measures. The new results show that more accurate requested runtimes can improve system performance much more significantly than suggested in previous results. For example, average slowdown decreases by a factor of two to six, depending on system load and the fraction of jobs that have the more accurate requests. The new results also show that (a) nearly all of the performance improvement is realized even if the more accurate runtime requests are a factor of two higher than the actual runtimes, (b) most of the performance improvement is achieved when test runs are used to obtain more accurate runtime requests, and (c) in systems where only a fraction (e.g., 60%) of the jobs provide approximately accurate runtime requests, the users that provide the approximately accurate requests achieve even greater improvements in performance, such as an order of magnitude improvement in average slowdown for jobs that have runtime up to fifty hours. 1
The Elusive Goal of Workload Characterization
- Perf. Eval. Rev
, 1999
"... The study and design of computer systems requires good models of the workload to which these systems are subjected. Until recently, the data necessary to build these models---observations from production installations---were not available, especially for parallel computers. Instead, most models were ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
The study and design of computer systems requires good models of the workload to which these systems are subjected. Until recently, the data necessary to build these models---observations from production installations---were not available, especially for parallel computers. Instead, most models were based on assumptions and mathematical attributes that facilitate analysis. Recently a number of supercomputer sites have made accounting data available that make it possible to build realistic workload models. It is not clear, however, how to generalize from specific observations to an abstract model of the workload. This paper presents observations of workloads from several parallel supercomputers and discusses modeling issues that have caused problems for researchers in this area. 1 Introduction We like to think of building computer systems as a systematic process of engineering---we define requirements, draw designs, analyze their properties, evaluate options, and finally construct a w...
Backfilling using system-generated predictions rather than user runtime estimates
- In IEEE TPDS
, 2007
"... The most commonly used scheduling algorithm for parallel supercomputers is FCFS with backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs are allowed to run ahead of their time provided they do not delay previously queued jobs (or at least the first queued j ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
The most commonly used scheduling algorithm for parallel supercomputers is FCFS with backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs are allowed to run ahead of their time provided they do not delay previously queued jobs (or at least the first queued job). To make such determinations possible, users are required to provide estimates of how long jobs will run, and jobs that violate these estimates are killed. Empirical studies have repeatedly shown that user estimates are inaccurate, and that system-generated predictions based on history may be significantly better. However, predictions have not been incorporated into production schedulers, partially due to a misconception (that we resolve) claiming inaccuracy actually improves performance, but mainly because underprediction is technically unacceptable: users will not tolerate jobs being killed just because system predic-tions were too short. We solve this problem by divorcing kill-time from the runtime prediction, and correcting predictions adaptively as needed if they are proved wrong. The end result is a surprisingly simple scheduler, which requires minimal deviations from current practices (e.g. using FCFS as the basis), and behaves exactly like EASY as far as users are concerned; nev-

