Results 11  20
of
183
Benchmarking and Comparison of the Task Graph Scheduling Algorithms
, 1999
"... The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NPcompleteness of the problem has stimulated researchers to propose a myriad of ..."
Abstract

Cited by 80 (2 self)
 Add to MetaCart
The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NPcompleteness of the problem has stimulated researchers to propose a myriad of heuristic algorithms. While most of these algorithms are reported to be efficient, it is not clear how they compare against each other. A meaningful performance evaluation and comparison of these algorithms is a complex task and it must take into account a number of issues. First, most scheduling algorithms are based upon diverse assumptions, making the performance comparison rather purposeless. Second, there does not exist a standard set of benchmarks to examine these algorithms. Third, most algorithms are evaluated using small problem sizes, and, therefore, their scalability is unknown. In this paper, we first provide a taxonomy for classifying various algorithms into distinct categories a...
On Exploiting Task Duplication in Parallel Program Scheduling
, 1998
"... One of the main obstacles in obtaining high performance from messagepassing multicomputer systems is the inevitable communication overhead which is incurred when tasks executing on different processors exchange data. Given a task graph, duplicationbased scheduling can mitigate this overhead by all ..."
Abstract

Cited by 58 (7 self)
 Add to MetaCart
One of the main obstacles in obtaining high performance from messagepassing multicomputer systems is the inevitable communication overhead which is incurred when tasks executing on different processors exchange data. Given a task graph, duplicationbased scheduling can mitigate this overhead by allocating some of the tasks redundantly on more than one processors. In this paper, we focus on the problem of using duplication in static scheduling of task graphs on parallel and distributed systems. We discuss five previously proposed algorithms, and examine their merits and demerits. We describe some of the essential principles for exploiting duplication in a more useful manner, and based on these principles propose an algorithm which outperforms the previous algorithms. The proposed algorithm generates optimal solutions for a number of task graphs. The algorithm assumes an unbounded number of processors. For scheduling on a bounded number of processors, we propose a second algorithm which...
Multidimensional Synchronous Dataflow
 IEEE Transactions on Signal Processing
, 2002
"... Signal flow graphs with dataflow semantics have been used in signal processing system simulation, algorithm development, and realtime system design. Dataflow semantics implicitly expose function parallelism by imposing only a partial ordering constraint on the execution of functions. One particular ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
Signal flow graphs with dataflow semantics have been used in signal processing system simulation, algorithm development, and realtime system design. Dataflow semantics implicitly expose function parallelism by imposing only a partial ordering constraint on the execution of functions. One particular form of dataflow called synchronous dataflow (SDF) has been quite popular in programming environments for digital signal processing (DSP) since it has strong formal properties and is ideally suited for expressing multirate DSP algorithms. However, SDF and other dataflow models use firstin firstout (FIFO) queues on the communication channels and are thus ideally suited only for onedimensional (1D) signal processing algorithms. While multidimensional systems can also be expressed by collapsing arrays into 1D streams, such modeling is often awkward and can obscure potential data parallelism that might be present. SDF can be generalized...
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers)
, 2001
"... In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the perform ..."
Abstract

Cited by 49 (24 self)
 Add to MetaCart
In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform blockcyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous platforms with respect to the performance of the processors. When targeting unidimensional grids, the loadbalancing problem can be solved rather easily. When targeting twodimensional grids, which are the key to scalability and efficiency for numerical kernels, the problem turns out to be surprisingly difficult. We formally state the 2D loadbalancing problem and prove its NPcompleteness. Next, we introduce a data allocation heuristic, which turns out to be very satisfactory: Its practical usefulness is demonstrated by MPI experiments conducted with a heterogeneous network of workstations.
Dynamic thread assignment on heterogeneous multiprocessor architectures. CF
, 2006
"... In a multiprogrammed computing environment, threads of execution exhibit different runtime characteristics and hardware resource requirements. Not only do the behaviors of distinct threads differ, but each thread may also present diversity in its performance and resource usage over time. A heteroge ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
In a multiprogrammed computing environment, threads of execution exhibit different runtime characteristics and hardware resource requirements. Not only do the behaviors of distinct threads differ, but each thread may also present diversity in its performance and resource usage over time. A heterogeneous chip multiprocessor (CMP) architecture consists of processor cores and caches of varying size and complexity. Prior work has shown that heterogeneous CMPs can meet the needs of a multiprogrammed computing environment better than a homogeneous CMP system. In fact, the use of a combination of cores with different caches and instruction issue widths better accommodates threads with different computational requirements. A central issue in the design and use of heterogeneous systems is to determine an assignment of tasks to processors which better exploits the hardware resources in order to improve performance. In this paper we argue that the benefits of heterogeneous CMPs are bolstered by the use of a dynamic assignment policy, i.e., a runtime mechanism which observes the behavior of the running threads and exploits thread migration between cores. We validate our analysis by means of simulation. Specifically, our model assumes a combination of Alpha EV5 and Alpha EV6 processors and of integer and floating point programs from the SPEC2000 benchmark suite. We show that a dynamic assignment can outperform a static one by 20 % to 40 % on average and by as much as 80 % in extreme cases, depending on the degree of multithreading simulated. 1.
Efficient Operating System Scheduling for PerformanceAsymmetric MultiCore Architectures
 in SC ’07
, 2007
"... Recent research advocates asymmetric multicore architectures, where cores in the same processor can have different performance. These architectures support singlethreaded performance and multithreaded throughput at lower costs (e.g., die size and power). However, they also pose unique challenges t ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
Recent research advocates asymmetric multicore architectures, where cores in the same processor can have different performance. These architectures support singlethreaded performance and multithreaded throughput at lower costs (e.g., die size and power). However, they also pose unique challenges to operating systems, which traditionally assume homogeneous hardware. This paper presents AMPS, an operating system scheduler that efficiently supports both SMPand NUMAstyle performanceasymmetric architectures. AMPS contains three components: asymmetryaware load balancing, fastercorefirst scheduling, and NUMAaware migration. We have implemented AMPS in Linux kernel 2.6.16 and used CPU clock modulation to emulate performance asymmetry on an SMP and NUMA system. For various workloads, we show that AMPS achieves a median speedup of 1.16 with a maximum of 1.44 over stock Linux on the SMP, and a median of 1.07 with a maximum of 2.61 on the NUMA system. Our results also show that AMPS improves fairness and repeatability of application performance measurements. 1.
Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing
 IEEE Trans. Parallel and Distributed Systems
, 2002
"... AbstractÐIn a heterogeneous distributed computing system, machine and network failures are inevitable and can have an adverse effect on applications executing on the system. To reduce the effect of failures on an application executing on a failureprone system, matching and scheduling algorithms whi ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
AbstractÐIn a heterogeneous distributed computing system, machine and network failures are inevitable and can have an adverse effect on applications executing on the system. To reduce the effect of failures on an application executing on a failureprone system, matching and scheduling algorithms which minimize not only the execution time but also the probability of failure of the application must be devised. However, because of the conflicting requirements, it is not possible to minimize both of the objectives at the same time. Thus, the goal of this paper is to develop matching and scheduling algorithms which account for both the execution time and the reliability of the application. This goal is achieved by modifying an existing matching and scheduling algorithm. The reliability of resources is taken into account using an incremental cost function proposed in this paper and the new algorithm is referred to as the reliable dynamic level scheduling algorithm. The incremental cost function can be defined based on one of the three cost functions developed here. These cost functions are unique in the sense that they are not restricted to treebased networks and a specific matching and scheduling algorithm. The simulation results confirm that the proposed incremental cost function can be incorporated into matching and scheduling algorithms to produce schedules where the effect of failures of machines and network resources on the execution of the application is reduced and the execution time of the application is minimized as well. Index TermsÐMatching and scheduling, precedenceconstrained tasks, heterogeneous computing, reliability, articulation points and bridges, DLS algorithm. 1
Efficient Scheduling of Arbitrary Task Graphs to Multiprocessors using A Parallel Genetic Algorithm
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1997
"... Given a parallel program represented by a task graph, the objective of a scheduling algorithm is to minimize the overall execution time of the program by properly assigning the nodes of the graph to the processors. This multiprocessor scheduling problem is NPcomplete even with simplifying assumptio ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
Given a parallel program represented by a task graph, the objective of a scheduling algorithm is to minimize the overall execution time of the program by properly assigning the nodes of the graph to the processors. This multiprocessor scheduling problem is NPcomplete even with simplifying assumptions, and becomes more complex under relaxed assumptions such as arbitrary precedence constraints, and arbitrary task execution and communication times. The present literature on this topic is a large repertoire of heuristics that produce good solutions in a reasonable amount of time. These heuristics, however, have restricted applicability in a practical environment because they have a number of fundamental problems including high time complexity, lack of scalability, and no performance guarantee with respect to optimal solutions. Recently, genetic algorithms (GAs) have been widely reckoned as a useful vehicle for obtaining high quality or even optimal solutions for a broad range of combinato...
Dynamic, Competitive Scheduling of Multiple DAGs in a Distributed Heterogeneous Environment
, 1998
"... With the advent of large scale heterogeneous environments, there is a need for matching and scheduling algorithms which can allow multiple DAGstructured applications to share the computational resources of the network. This paper presents a matching and scheduling framework where multiple applicati ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
With the advent of large scale heterogeneous environments, there is a need for matching and scheduling algorithms which can allow multiple DAGstructured applications to share the computational resources of the network. This paper presents a matching and scheduling framework where multiple applications compete for the computational resources on the network. In this environment, each application makes its own scheduling decisions. Thus, no centralized scheduling resource is required. Applications do not need direct knowledge of the other applications. The only knowledge of other applications arrives indirectly through load estimates (like queue lengths). This paper also presents algorithms for each portion of this scheduling framework. One of these algorithms is modification of a static scheduling algorithm, the DLS algorithm, first presented by Sih and Lee [1]. Other algorithms attempt to predict the future task arrivals by modeling the task arrivals as Poisson random processes. A series of simulations are presented to examine the performance of these algorithms in this environment. These simulations also compare the performance of this environment to a more conventional, single user environment.
Parallelizing Existing Applications in a Distributed Heterogeneous Environment
 4TH HETEROGENEOUS COMPUTING WORKSHOP (HCW '95
, 1995
"... Applications based upon the finite element method are well known for their demand for computational resources. An effective method for satisfying this demand is heterogeneous parallel computing. This paper presents the results obtained by applying heterogeneous computing to a large, existing finite ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
Applications based upon the finite element method are well known for their demand for computational resources. An effective method for satisfying this demand is heterogeneous parallel computing. This paper presents the results obtained by applying heterogeneous computing to a large, existing finite element application code: CSTEM. A difficult problem associated with heterogeneous computing is the mapping and scheduling problemthe process of assigning the tasks of a parallel program to the individual processors. A simple assignment heuristic, Levelized MinTime (LMT), is presented, along with simulated results from applying the LMT algorithm to heterogeneous CSTEM on a variety of different heterogeneous machine clusters.