Results 1 - 10
of
98
Scheduling Multithreaded Computations by Work Stealing
"... This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computa ..."
Abstract
-
Cited by 316 (32 self)
- Add to MetaCart
This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies. Specifically,
Job Scheduling in Multiprogrammed Parallel Systems
, 1997
"... Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of us ..."
Abstract
-
Cited by 145 (15 self)
- Add to MetaCart
Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of users, this is not necessarily the case. In the context of multiprogrammed parallel machines, scheduling refers to the execution of threads from competing programs. This is an operating system issue, involved with resource allocation, not a program development issue. Scheduling schemes for multiprogrammed parallel systems can be classified as one or two leveled. Single-level scheduling combines the allocation of processing power with the decision of which thread will use it. Two level scheduling decouples the two issues: first, processors are allocated to the job, and then the job's threads are scheduled using this pool of processors. The processors of a parallel system can be shared i...
Analyzing Scalability of Parallel Algorithms and Architectures
- Journal of Parallel and Distributed Computing
, 1994
"... The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of ..."
Abstract
-
Cited by 84 (17 self)
- Add to MetaCart
The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predict the performance of a parallel algorithm and a parallel architecture for a large number of processors from the known performance on fewer processors. For a fixed problem size, it may be used to determine the optimal number of processors to be used and the maximum possible speedup that can be obtained. The objective of this paper is to critically assess the state of the art in the theory of scalability analysis, and motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures. We survey a number of techniques and formalisms t...
On Adaptive Resource Allocation for Complex Real-Time Applications
- IN PROCEEDINGS OF THE 18TH IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS
, 1997
"... Resource allocation for high-performance real-time applications is challenging due to the applications' data-dependent nature, dynamic changes in their external environment, and limited resource availability in their target embedded system platforms. These challenges may be met by use of Adaptive Re ..."
Abstract
-
Cited by 83 (25 self)
- Add to MetaCart
Resource allocation for high-performance real-time applications is challenging due to the applications' data-dependent nature, dynamic changes in their external environment, and limited resource availability in their target embedded system platforms. These challenges may be met by use of Adaptive Resource Allocation (ARA) mechanisms that can promptly adjust resource allocation to changes in an application's resource needs, whenever there is a risk of failing to satisfy its timing constraints. By taking advantage of an application's adaptation capabilities, ARA eliminates the need for `over-sizing' real-time systems to meet worst-case application needs. This paper proposes a model for describing an application's adaptation capabilities and the runtime variation of its resource needs. The paper also proposes a satisfiability-driven set of performance metrics for capturing the impact of ARA mechanisms on the performance of adaptable real-time applications. The relevance of the proposed se...
Use of Application Characteristics and Limited Preemption for Run-To-Completion Parallel Processor Scheduling Policies
- In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems
, 1994
"... The performance potential of run-to-completion (RTC) parallel processor scheduling policies is investigated by examining whether (1) application execution rate characteristics such as average parallelism (avg) and processor working set (pws) and/or (2) limited preemption can be used to improve the p ..."
Abstract
-
Cited by 83 (2 self)
- Add to MetaCart
The performance potential of run-to-completion (RTC) parallel processor scheduling policies is investigated by examining whether (1) application execution rate characteristics such as average parallelism (avg) and processor working set (pws) and/or (2) limited preemption can be used to improve the performance of these policies. We address the first question by comparing policies (previous as well as new) that differ only in whether or not they use execution rate characteristics and by examining a wider range of the workload parameter space than previous studies. We address the second question by comparing a simple two-level queueing policy with RTC scheduling in the second level queue against RTC policies that don't allow any preemption and against dynamic equiallocation (EQ). Using simulation to estimate mean response times we find that for promising RTC policies such as adaptive static partitioning (ASP) and shortest demand first (SDF), a maximum allocation constraint that is for all...
A Parallel Workload Model and Its Implications for Processor Allocation
, 1996
"... We develop a workload model based on the observed behavior of parallel computers at the San Diego Supercomputer Center and the Cornell Theory Center. This model gives us insight into the performance of strategies for scheduling malleable jobs on space-sharing parallel computers. We find that Adaptiv ..."
Abstract
-
Cited by 68 (3 self)
- Add to MetaCart
We develop a workload model based on the observed behavior of parallel computers at the San Diego Supercomputer Center and the Cornell Theory Center. This model gives us insight into the performance of strategies for scheduling malleable jobs on space-sharing parallel computers. We find that Adaptive Static Partitioning (ASP), which has been reported to work well for other workloads, is inferior to some FIFO strategies that adapt better to system load. The best of the strategies we consider is one that explicitly restricts cluster sizes when load is high (a variation of Sevcik's A+ strategy [13]).
Workload Characterization: A Survey
"... The performance of a system is determined by its characteristics as well as by the composition of the load being processed. Hence, its quantitative description is a fundamental part of all performance evaluation studies. Several methodologies for the construction of workload models, which are functi ..."
Abstract
-
Cited by 67 (5 self)
- Add to MetaCart
The performance of a system is determined by its characteristics as well as by the composition of the load being processed. Hence, its quantitative description is a fundamental part of all performance evaluation studies. Several methodologies for the construction of workload models, which are functions of the objective of the study, of the architecture of the system to be analyzed, and of the techniques adopted, are presented. A survey of a few applications of these methodologies to various types of systems (i.e., batch, interactive, database, network-based, parallel, supercomputer) is given.
Efficient Distributed Shared Memory Based On Multi-Protocol Release Consistency
, 1994
"... A distributed shared memory (DSM) system allows shared memory parallel programs to be executed on distributed memory multiprocessors. The challenge in building a DSM system is to achieve good performance over a wide range of shared memory programs without requiring extensive modifications to the s ..."
Abstract
-
Cited by 61 (5 self)
- Add to MetaCart
A distributed shared memory (DSM) system allows shared memory parallel programs to be executed on distributed memory multiprocessors. The challenge in building a DSM system is to achieve good performance over a wide range of shared memory programs without requiring extensive modifications to the source code. The performance challenge translates into reducing the amount of communication performed by the DSM system to that performed by an equivalent message passing program. This thesis describes four novel techniques for reducing the communication overhead of DSM, including: (i) the use of software release consistency, (ii) support for multiple consistency protocols, (iii) a multiple writer protocol, and (iv) an update timeout mechanism. Release consistency allows modifications of shared data to be handled via a delayed update queue, which masks network latencies. Providing multiple cons...
A Historical Application Profiler for Use by Parallel Schedulers
- In Job Scheduling Strategies for Parallel Processing
, 1997
"... Scheduling algorithms that use application and system knowledge have been shown to be more effective at scheduling parallel jobs on a multiprocessor than algorithms that do not. This paper focuses on obtaining such information for use by a scheduler in a network of workstations environment. The log ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
Scheduling algorithms that use application and system knowledge have been shown to be more effective at scheduling parallel jobs on a multiprocessor than algorithms that do not. This paper focuses on obtaining such information for use by a scheduler in a network of workstations environment. The log files from three parallel systems are examined to determine both how to categorize parallel jobs for storage in a job database and what job information would be useful to a scheduler. A Historical Profiler is proposed that stores information about programs and users, and manipulates this information to provide schedulers with execution time predictions. Several preemptive and non-preemptive versions of the FCFS, EASY and Least Work First scheduling algorithms are compared to evaluate the utility of the profiler. It is found that both preemption and the use of application execution time predictions obtained from the Historical Profiler lead to improved performance.

