Results 1 - 10
of
33
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempt to achieve the minimum completion time by distributing the workload as evenly ..."
Abstract
-
Cited by 133 (2 self)
- Add to MetaCart
Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempt to achieve the minimum completion time by distributing the workload as evenly as possible, while minimizing the number of synchronization operations required. In this paper we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to non-local data. We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. We propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. We compare the performance of this new algorithm to ot...
Run-Time Parallelization and Scheduling of Loops
- IEEE Transactions on Computers
, 1991
"... %de Universi t j- ..."
Adaptive Computing on the Grid Using AppLeS
, 2003
"... Ensembles of distributed, heterogeneous resources, also known as Computational Grids are emerging as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, second ..."
Abstract
-
Cited by 90 (7 self)
- Add to MetaCart
Ensembles of distributed, heterogeneous resources, also known as Computational Grids are emerging as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, secondary storage, and other resources during a single execution. However, achieving this performance potential in dynamic, heterogeneous environments is challenging. Recent experience with distributed applications indicates that adaptivity is fundamental to achieving application performance in dynamic Grid environments. The AppLeS (Application Level Scheduling) project provides a methodology, application software, and software environments for adaptively scheduling and deploying applications in dynamic, heterogeneous, multi-user Grid environments. In this paper, we discuss the AppLeS project and outline our results.
Customized Dynamic Load Balancing for a Network of Workstations
, 1997
"... this paper we show that different load balancing schemes are best for different applications under varying program and system parameters. Therefore, application-driven customized dynamic load balancing becomes essential for good performance. We present a hybrid compile-time and run-time modeling and ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
this paper we show that different load balancing schemes are best for different applications under varying program and system parameters. Therefore, application-driven customized dynamic load balancing becomes essential for good performance. We present a hybrid compile-time and run-time modeling and decision process which selects (customizes) the best scheme, along with automatic generation of parallel code with calls to a run-time library for load balancing. 1997 Academic Press 1.
Adaptive Scheduling of Master/Worker Applications on Distributed Computational Resources
, 2001
"... xvi 1 ..."
Compile-time Scheduling Algorithms for Heterogeneous Network of Workstations
- THE COMPUTER JOURNAL
, 1997
"... In this paper, we study the problem of scheduling parallel loops at compile-time for a heterogeneous network of workstations. We consider heterogeneity in various aspects of parallel programming: program, processor, memory and network. A heterogeneous program has parallel loops with different amount ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
In this paper, we study the problem of scheduling parallel loops at compile-time for a heterogeneous network of workstations. We consider heterogeneity in various aspects of parallel programming: program, processor, memory and network. A heterogeneous program has parallel loops with different amount of work in each iteration; heterogeneous processors have different speeds; heterogeneous memory refers to the different amount of user-available memory on the machines; and a heterogeneous network has different cost of communication between processors. We propose a simple yet comprehensive model for use in compiling for a network of processors, and develop compiler algorithms for generating optimal and
Reform Prolog: The Language and its Implementation
- In Proc. of the 10th Int'l Conference on Logic Programming
, 1993
"... Reform Prolog is an (dependent) AND-parallel system based on recursionparallelism and Reform compilation. The system supports selective, userdeclared, parallelization of binding-deterministic Prolog programs (nondeterminism local to each parallel process is allowed). The implementation extends a con ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
Reform Prolog is an (dependent) AND-parallel system based on recursionparallelism and Reform compilation. The system supports selective, userdeclared, parallelization of binding-deterministic Prolog programs (nondeterminism local to each parallel process is allowed). The implementation extends a convential Prolog machine with support for data sharing and process managment. Extensive global dataflow analysis is employed to facilitate parallelization. Promising performance figures, showing high parallel efficiency and low overhead for parallelization, have been obtained on a 24 processor shared-memory multiprocessor. The high performance is due to efficient process managment and scheduling, made possible by the execution model. 1 INTRODUCTION Most systems for AND-parallel logic programming defines the procedural meaning of conjunction to be inherently parallel. These designs are based on an ambition to maximize the amount of parallelism in computations. We present and evaluate an approa...
Parallel Classification for Data Mining on Shared-Memory Multiprocessors
, 1998
"... We present parallel algorithms for building decision-tree classifiers on shared-memory multiprocessor (SMP) systems. The proposed algorithms span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This basic scheme is extended with task pi ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
We present parallel algorithms for building decision-tree classifiers on shared-memory multiprocessor (SMP) systems. The proposed algorithms span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This basic scheme is extended with task pipelining and dynamic load balancing to yield faster implementations. The task parallel approach uses dynamic subtree partitioning among processors. Our performance evaluation shows that the construction of a decision-tree classifier can be effectively parallelized on an SMP machine with good speedup. 1
Loop scheduling for heterogeneity
, 1994
"... In this paper, we study the problem of scheduling parallel loops at compile-time for a heterogeneous network of machines. We consider heterogeneity in three aspects of parallel programming: program, processor and network. A heterogeneous program has parallel loops with different amount of work in ea ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
In this paper, we study the problem of scheduling parallel loops at compile-time for a heterogeneous network of machines. We consider heterogeneity in three aspects of parallel programming: program, processor and network. A heterogeneous program has parallel loops with different amount of work in each iteration; heterogeneous processors have different speeds; and a heterogeneous network has different cost of communication between processors. We propose a simple yet comprehensive model for use in compiling for a network of processors, and develop compiler algorithms for generating optimal and sub-optimal schedules of loops for load balancing, communication optimizationsand network contention. Experiments show that a significant improvement of performance is achieved using our techniques. 1
Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems
- IEEE Trans. on Parallel and Distributed Systems
, 1997
"... Using runtime information of load distributions and processor affinity, we propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Using runtime information of load distributions and processor affinity, we propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving the execution performance of parallel loops by making scheduling decisions that match the real workload distributions at runtime. We experimentally compared the performance of our algorithm and its variations with several existing scheduling algorithms on two parallel machines: the KSR-1 and the Convex Exemplar. The kernel application programs we used for performance evaluation were carefully selected for different classes of parallel loops. Our results show that using runtime information to adaptively adjust scheduling granularity is an effective way to handle loops with a wide range of load distributions when no prior knowledge of the execution can be used. The overhead caused by coll...

