Results 1 - 10
of
48
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempt to achieve the minimum completion time by distributing the workload as evenly ..."
Abstract
-
Cited by 133 (2 self)
- Add to MetaCart
Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempt to achieve the minimum completion time by distributing the workload as evenly as possible, while minimizing the number of synchronization operations required. In this paper we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to non-local data. We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. We propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. We compare the performance of this new algorithm to ot...
Adaptive Computing on the Grid Using AppLeS
, 2003
"... Ensembles of distributed, heterogeneous resources, also known as Computational Grids are emerging as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, second ..."
Abstract
-
Cited by 90 (7 self)
- Add to MetaCart
Ensembles of distributed, heterogeneous resources, also known as Computational Grids are emerging as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, secondary storage, and other resources during a single execution. However, achieving this performance potential in dynamic, heterogeneous environments is challenging. Recent experience with distributed applications indicates that adaptivity is fundamental to achieving application performance in dynamic Grid environments. The AppLeS (Application Level Scheduling) project provides a methodology, application software, and software environments for adaptively scheduling and deploying applications in dynamic, heterogeneous, multi-user Grid environments. In this paper, we discuss the AppLeS project and outline our results.
The Organic Grid: Self-Organizing Computation on a Peer-to-Peer Network
- IEEE Transactions on Systems, Man, and Cybernetics
, 2004
"... Desktop grids have recently been used to perform some of the largest computations in the world and have the potential to grow by several more orders of magnitude. However, current approaches to utilizing desktop resources require either centralized servers or extensive knowledge of the underlying sy ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
Desktop grids have recently been used to perform some of the largest computations in the world and have the potential to grow by several more orders of magnitude. However, current approaches to utilizing desktop resources require either centralized servers or extensive knowledge of the underlying system, limiting their scalability.
Adaptive Scheduling of Master/Worker Applications on Distributed Computational Resources
, 2001
"... xvi 1 ..."
Space-Efficient Scheduling of Nested Parallelism
- ACM Transactions on Programming Languages and Systems
, 1999
"... This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler spa ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler space) on p processors. Here, K is a user-adjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a trade-o# between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors, we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time e#cient in theory, we demonstrate that it is e#ective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance
Dependence Uniformization: A Loop Parallelization Technique
, 1991
"... In general, any nested loop can be parallelized as long as all dependence constraints among iterations are preserved by applying appropriate synchronizations. However, the performance is significantly affected by the synchronization overhead. In case of irregular and complex dependence constraints, ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
In general, any nested loop can be parallelized as long as all dependence constraints among iterations are preserved by applying appropriate synchronizations. However, the performance is significantly affected by the synchronization overhead. In case of irregular and complex dependence constraints, it is very difficult to efficiently and systematically arrange synchronization primitives. In this paper, we propose a new method, data dependence uniformization, to overcome the difficulties in parallelizing a doubly nested loop with irregular dependence constraints. Our approach is based on the concept of vector decomposition. A simple set of basic dependences is developed of which all dependence constraints can be composed. The set of basic dependences then will be added to every iteration to replace all original dependences so that the dependence constraints become uniform. Finally, an efficient synchronization method is presented to obey the uniform dependence constraints in every iter...
Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems
- IEEE Trans. on Parallel and Distributed Systems
, 1997
"... Using runtime information of load distributions and processor affinity, we propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Using runtime information of load distributions and processor affinity, we propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving the execution performance of parallel loops by making scheduling decisions that match the real workload distributions at runtime. We experimentally compared the performance of our algorithm and its variations with several existing scheduling algorithms on two parallel machines: the KSR-1 and the Convex Exemplar. The kernel application programs we used for performance evaluation were carefully selected for different classes of parallel loops. Our results show that using runtime information to adaptively adjust scheduling granularity is an effective way to handle loops with a wide range of load distributions when no prior knowledge of the execution can be used. The overhead caused by coll...
ParC - An Extension of C for Shared Memory Parallel Processing
"... this paper we describe the features and semantics of ParC. The rest of this section explains the motivation for designing a new language, the eect of the motivating forces on the design, and the structure of the software environment that surrounds it. The next section describes the parallel construc ..."
Abstract
-
Cited by 15 (11 self)
- Add to MetaCart
this paper we describe the features and semantics of ParC. The rest of this section explains the motivation for designing a new language, the eect of the motivating forces on the design, and the structure of the software environment that surrounds it. The next section describes the parallel constructs and scoping rules. The exact semantics of parallel constructs when there are more activities than processors have been widely neglected in the literature. We discuss this issue and provide guidelines for acceptable implementations. We then describe the innovative instructions for forced termination, which are based on analogies with C instructions that break out of a construct, followed by a discussion of synchronization mechanisms. A discussion of the programming methodology of ParC is then given and is followed by a discussion of our experiences with ParC . A comparison of ParC with other parallel programming languages is delayed until the end of the paper, after we have described all of its features
Space-Efficient Implementation of Nested Parallelism
- In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1996
"... Many of today's high level parallel languages support dynamic, fine-grained parallelism. These languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an efficient scheduling algorithm is required to assig ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Many of today's high level parallel languages support dynamic, fine-grained parallelism. These languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an efficient scheduling algorithm is required to assign computations to processors at runtime. Besides having low overheads and good load balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel program. This paper presents a scheduling algorithm that is provably space-efficient and time-efficient for nested parallel languages. In addition to proving the space and time bounds of the parallel schedule generated by the algorithm, we demonstrate that it is efficient in practice. We have implemented a runtime system that uses our algorithm to schedule parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to...

