Results 1 - 10
of
27
Job Scheduling in Multiprogrammed Parallel Systems
, 1997
"... Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of us ..."
Abstract
-
Cited by 145 (15 self)
- Add to MetaCart
Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of users, this is not necessarily the case. In the context of multiprogrammed parallel machines, scheduling refers to the execution of threads from competing programs. This is an operating system issue, involved with resource allocation, not a program development issue. Scheduling schemes for multiprogrammed parallel systems can be classified as one or two leveled. Single-level scheduling combines the allocation of processing power with the decision of which thread will use it. Two level scheduling decouples the two issues: first, processors are allocated to the job, and then the job's threads are scheduled using this pool of processors. The processors of a parallel system can be shared i...
Effective Distributed Scheduling of Parallel Workloads
, 1996
"... We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particul ..."
Abstract
-
Cited by 106 (5 self)
- Add to MetaCart
We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particular importance is the blocking algorithm which decides the action of a process waiting for a communication or synchronization event to complete. Through simulation of bulk-synchronous parallel applications, we find that a simple two-phase fixed-spin blocking algorithm performs well; a two-phase adaptive algorithm that gathers run-time data on barrier wait-times performs slightly better. Our results hold for a range of machine parameters and parallel program characteristics. These findings are in direct contrast to the literature that states explicit coscheduling is necessary for fine-grained programs. We show that the choice of the local scheduler is crucial, with a priority-based scheduler p...
Scheduling with Implicit Information in Distributed Systems
- In Proceedings of the 1998 ACM Sigmetrics International Conference on Measurement and Modeling of Computer Systems
, 1998
"... Implicit coscheduling is a distributed algorithm for time-sharing communicating processes in a cluster of workstations. By observing and reacting to implicit information, local schedulers in the system make independent decisions that dynamically coordinate the scheduling of communicating processes. ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
Implicit coscheduling is a distributed algorithm for time-sharing communicating processes in a cluster of workstations. By observing and reacting to implicit information, local schedulers in the system make independent decisions that dynamically coordinate the scheduling of communicating processes. The principal mechanism involved is two-phase spin-blocking: a process waiting for a message response spins for some amount of time, and then relinquishes the processor if the response does not arrive. In this paper, we describe our experience implementing implicit coscheduling on a cluster of 16 UltraSPARC I workstations; this has led to contributions in three main areas. First, we more rigorously analyze the two-phase spin-block algorithm and show that spin time should be increased when a process is receiving messages. Second, we present performance measurements for a wide range of synthetic benchmarks and for seven Split-C parallel applications. Finally, we show how implicit coscheduling ...
Predicting Queue Times on Space-Sharing Parallel Computers
, 1997
"... We present statistical techniques for predicting the queue times experienced by jobs submitted to a space-sharing parallel machine with first-come-first-served (FCFS) scheduling. We apply these techniques to trace data from the Intel Paragon at the San Diego Supercomputer Center and the IBM SP2 at t ..."
Abstract
-
Cited by 59 (1 self)
- Add to MetaCart
We present statistical techniques for predicting the queue times experienced by jobs submitted to a space-sharing parallel machine with first-come-first-served (FCFS) scheduling. We apply these techniques to trace data from the Intel Paragon at the San Diego Supercomputer Center and the IBM SP2 at the Cornell Theory Center. We show that it is possible to predict queue times with accuracy that is acceptable for several intended applications. The coefficient of correlation between our predicted queue times and the actual queue times from simulated schedules is between 0:65 and 0:72. 1 Introduction On space-sharing parallel computers, it is useful to be able to predict how long a submitted job will be queued before processors are allocated to it. Some of the applications of these predictions are: Load metrics: They provide a measure of load that is more concrete than abstractions such as load average, allowing users to make decisions about what jobs to run, where to run them or what si...
Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1998
"... In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing natural ..."
Abstract
-
Cited by 44 (2 self)
- Add to MetaCart
In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing naturally-occurring local events and their corresponding implicit information, i.e., information available outside of a defined interface. Many systems, particularly in distributed and networked environments, have leveraged implicit control to simplify the implementation of services with autonomous components. To concretely demonstrate the advantages of implicit control, we propose and implement implicit coscheduling, an algorithm for dynamically coordinating the time...
Scheduling multiprocessor tasks -- An overview
- EUROPEAN JOURNAL OF OPERATIONAL RESEARCH
, 1996
"... Multiprocessor tasks require more than one processor at the same moment of time. This relatively new concept in scheduling theory emerged with the advent of parallel computing systems. In this work we present the state of the art for multiprocessor task scheduling. We show the rationale behind the c ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Multiprocessor tasks require more than one processor at the same moment of time. This relatively new concept in scheduling theory emerged with the advent of parallel computing systems. In this work we present the state of the art for multiprocessor task scheduling. We show the rationale behind the concept of multiprocessor tasks. The standard three-field notation is extended to accommodate multiprocessor tasks. The main part of the work is presentation of the results in multiprocessor tasks scheduling both for parallel and for dedicated processors.
Optimal Sharing of Partitionable Workloads in Heterogeneous Networks of Workstations (Extended Abstract)
- Intl. Wkshp. on Cluster Computing -- Technologies, Environments, and Applications (CC-TEA'2000). In Intl. Conf. on Parallel and Distr. Processing Techniques and Applications (PDPTA'2000
, 2000
"... # Arnold L. Rosenberg Department of Computer Science University of Massachusetts Amherst, MA 01003, USA rsnbrg@cs.umass.edu Abstract Two problems related to worksharing in a heterogeneous network of workstations #NOW# are formalized and solved optimally . In both problems, one has access to a ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
# Arnold L. Rosenberg Department of Computer Science University of Massachusetts Amherst, MA 01003, USA rsnbrg@cs.umass.edu Abstract Two problems related to worksharing in a heterogeneous network of workstations #NOW# are formalized and solved optimally . In both problems, one has access to a NOWcomprising n workstations of differing computational powers, to assist with a large partitionable computational workload #e.g., from a data-parallel computation#. In the NOW-Rental Problem, one must complete W units of work by #renting" the NOW for as short a time as is necessary to complete that work. In the NOW-Exploitation Problem, one has access to the NOW for a #xed duration of L time units and wishes to accomplish as much work as possible during that time. Using a single mathematical formulation that encompasses both of these problems, a protocol is developed which takes a suite of 2n +3 parameters that characterize the computational and communicational ef- #ciency of the NOW and de...
Array Decompositions for Nonuniform Computational Environments
, 1996
"... Two-dimensional arrays are useful in a large variety of scientific and engineering applications. Parallelization of these applications requires the decomposition of array elements among different machines. Several data-decomposition techniques have been studied in the literature for machines with un ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Two-dimensional arrays are useful in a large variety of scientific and engineering applications. Parallelization of these applications requires the decomposition of array elements among different machines. Several data-decomposition techniques have been studied in the literature for machines with uniform computational power. In this paper we develop new methods for decomposing arrays into a cluster of machines with nonuniform computational power. Simulation results show that our methods provide superior decomposition over naive schemes. 1 Introduction Data-parallel applications requires the partitioning of data among processors in a way that the computation load on each node is proportional to its computational power, while minimizing communication. Two-dimensional arrays are widely used in scientific and engineering problems such as weather prediction and image processing. In this paper we discuss the decomposition of twodimensional arrays for a nonuniform computational environment ...
Modeling the Effects of Contention on the Performance of Heterogeneous Applications
, 1996
"... Fast networks have made it possible to coordinate distributed heterogeneous CPU, memory, and storage resources to provide a powerful platform for executing high-performance applications. However, the performance of these applications on such systems is highly dependent on the allocation and efficien ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
Fast networks have made it possible to coordinate distributed heterogeneous CPU, memory, and storage resources to provide a powerful platform for executing high-performance applications. However, the performance of these applications on such systems is highly dependent on the allocation and efficient coordination of application tasks. A key component for a performance-efficient allocation strategy is a predictive model which provides a realistic estimate of application performance under varying resource loads. In this paper, we present a model for predicting the effects of contention on application behavior in heterogeneous systems. In particular, our model calculates the slowdown imposed on communication and computation for non-dedicated twomachine heterogeneous platforms. We describe the model for the Sun/CM2 and Sun/Paragon coupled heterogeneous systems. We present experiments on production systems with emulated contention which show the predicted communication and computation costs...
Parallel Raytracing: A Case Study on Partitioning and Scheduling on Workstation Clusters
- in Proc. Thirtieth International Conference on System Sciences, Hawaii
, 1997
"... In this paper, a case study is presented which is aimed at investigating the performance of several parallel versions of the POV--Ray raytracing package implemented on a workstation cluster using the MPI message passing library. Based on a manager/worker scheme, variants of workload partitioning and ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
In this paper, a case study is presented which is aimed at investigating the performance of several parallel versions of the POV--Ray raytracing package implemented on a workstation cluster using the MPI message passing library. Based on a manager/worker scheme, variants of workload partitioning and message scheduling strategies, in conjunction with different task granularities, are evaluated with respect to their runtime behaviour. The results indicate that dynamic, adaptive strategies are required to cope with both the unbalanced workload characteristics of the parallel raytracing application and the different computational capabilities of the machines in a workstation cluster environment. 1 Introduction Raytracing [9, 13, 24] is a widely used method for generating realistically looking images on a computer, and it is employed by many 3D modelling and animation systems for the final image rendering. The input to a raytracing algorithm is the scene -- the description of the geometry...

