Results 1  10
of
27
On Runtime Parallel Scheduling for Processor Load Balancing
 IEEE Trans. Parallel and Distributed Systems
, 1997
"... Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compiletime or runtime. It provides highquality load balancing. This paper pre ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compiletime or runtime. It provides highquality load balancing. This paper presents an overview of the parallel scheduling technique. Scheduling algorithms for tree, hypercube, and mesh networks are presented. These algorithms can fully balance the load and maximize locality 1. Introduction Static scheduling balances the workload before runtime and can be applied to problems with a predictable structure, which are called static problems. Dynamic scheduling performs scheduling activities concurrently at runtime, which applies to problems with an unpredictable structure, which are called dynamic problems. Static scheduling utilizes the knowledge of problem characteristics to reach a wellbalanced load [1, 2, 3, 4]. However, it is not able to balance the load for dynami...
Multipol: A Distributed Data Structure Library
 in Fifth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming
, 1995
"... Applications with dynamic data structures, unpredictable computational costs, and irregular data access patterns require substantial effort to parallelize. Much of their programming complexity comes from the implementation of distributed data structures. We describe a library of such data structures ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Applications with dynamic data structures, unpredictable computational costs, and irregular data access patterns require substantial effort to parallelize. Much of their programming complexity comes from the implementation of distributed data structures. We describe a library of such data structures, Multipol, which includes parallel versions of classic data structures such as trees, sets, lists, graphs, and queues. The library is built on a portable runtime layer that provides basic communication, synchronization, and caching. The data structures address the classic tradeoff between locality and load balance through a combination of replication, partitioning, and dynamic caching. To tolerate remote communication latencies, some of the operations are split into a separate initiation and completion phase, allowing for computation and communication overlap at the library interface level. This leads to a form of relaxed consistency semantics for the data types. In this paper we give an o...
Distributed data structures and algorithms for Gröbner basis computation
 Lisp and Symbolic Computation
, 1994
"... We present the design and implementation of a parallel algorithm for computing Gröbner bases on distributed memory multiprocessors. The parallel algorithm is irregular both in space and time: the data structures are dynamic pointerbased structures and the computations on the structures have unpre ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
We present the design and implementation of a parallel algorithm for computing Gröbner bases on distributed memory multiprocessors. The parallel algorithm is irregular both in space and time: the data structures are dynamic pointerbased structures and the computations on the structures have unpredictable duration. The algorithm is presented as a series of refinements on a transition rule program, in which computation proceeds by nondeterministic invocations of guarded commands. Two key data structures, a set and a priority queue, are distributed across processors in the parallel algorithm. The data structures are designed for high throughput and latency tolerance, as appropriate for distributed memory machines. The programming style represents a compromise between sharedmemory and messagepassing models. The distributed nature of the data structures shows through their interface in that the semantics are weaker than with shared atomic objects, but they still provide a shared abstraction that can be used for reasoning about program correctness. In the data structure design there is a classic tradeoff between locality and load balance. We argue that this is best solved by designing scheduling structures in tandem with the state data structures, since the decision to replicate or partition state affects the overhead of dynamically moving tasks.
Tree Shaped Computations as a Model for Parallel Applications
 In ALV'98 Workshop on application based load balancing. SFB 342, TU Munchen
, 1998
"... It is shown how a large class of applications can be parallelized by modeling them as tree shaped computations. In particular this class contains many highly irregular and completely unpredictable computations as they occur in heuristic search. ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
It is shown how a large class of applications can be parallelized by modeling them as tree shaped computations. In particular this class contains many highly irregular and completely unpredictable computations as they occur in heuristic search.
Distributed mining of molecular fragments
 Proc. of IEEE DMGrid, Workshop on Data Mining and Grid of IEEE ICDM
, 2004
"... In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, highdimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this contex ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, highdimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the wellknown National Cancer Institute’s HIVscreening dataset. We present experimental results on a smallscale computing environment. 1.
Scalable Dynamic Load Balancing Using UPC
"... An asynchronous workstealing implementation of dynamic load balance is implemented using Unified Parallel C (UPC) and evaluated using the Unbalanced Tree Search (UTS) benchmark [1]. The UTS benchmark presents a synthetic treestructured search space that is highly imbalanced. Parallel implementatio ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
An asynchronous workstealing implementation of dynamic load balance is implemented using Unified Parallel C (UPC) and evaluated using the Unbalanced Tree Search (UTS) benchmark [1]. The UTS benchmark presents a synthetic treestructured search space that is highly imbalanced. Parallel implementation of the search requires continuous dynamic load balancing to keep all processors engaged in the search. Our implementation achieves better scaling and parallel efficiency in both shared memory and distributed memory settings than previous efforts using UPC [1] and MPI [2]. We observe parallel efficiency of 80 % using 1024 processors performing over 85,000 total load balancing operations per second continuously. The UPC programming model provides substantial simplifications in the expression of the asynchronous work stealing protocol compared with MPI. However, to obtain performance portability with UPC in both shared memory and distributed memory settings requires the careful use of onesided reads and writes to minimize the impact of high latency communication. Additional protocol improvements are made to improve dissemination of available work and to decrease the cost of termination detection.
Runtime Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures
 Journal of Parallel and Distributed Computing
, 1997
"... Automatic scheduling for directed acyclic graphs (DAG) and its applications for coarsegrained irregular problems such as large nbody simulation have been studied in the literature. However solving irregular problems with mixed granularities such as sparse matrix factorization is challenging since ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Automatic scheduling for directed acyclic graphs (DAG) and its applications for coarsegrained irregular problems such as large nbody simulation have been studied in the literature. However solving irregular problems with mixed granularities such as sparse matrix factorization is challenging since it requires efficient runtime support to execute a DAG schedule. In this paper, we investigate runtime optimization techniques for executing general asynchronous DAG schedules on distributed memory machines and discuss an approach for exploiting parallelism from commuting operations in the DAG model. Our solution tightly integrates the runtime scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying. We present a consistency model incorporating the above optimizations, and taking advantage of task dependence properties to ensure the correctness of execution. We demonstrate the applications of this scheme in sparse matrix factorizations ...
Parallelizing the Phylogeny Problem
 In Supercomputing '95
, 1994
"... The problem of determining the evolutionary history of species in the form of phylogenetic trees is known as the phylogeny problem. We present a parallelization of the character compatibility method for solving the phylogeny problem. Abstractly, the algorithm searches through all subsets of characte ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
The problem of determining the evolutionary history of species in the form of phylogenetic trees is known as the phylogeny problem. We present a parallelization of the character compatibility method for solving the phylogeny problem. Abstractly, the algorithm searches through all subsets of characters, which may be traits like opposable thumbs or DNA sequence values, looking for a maximal consistent subset. The notion of consistency in this case is the existence of a particular kind of phylogenetic tree called a perfect phylogeny tree. The two challenges to achieving an efficient implementation are load balancing and efficient sharing of information to enable pruning. In both cases, there is a tradeoff between communication overhead and the quality of the solution. For load balancing we use a distributed task queue, which has imperfect load information but avoids centralization bottlenecks. To prune the search space, we use the following property: If a perfect phylogeny tree does not ...
Symmetrical Hopping: A Scalable Scheduling Algorithm for Irregular Problems
 Practice and Experience
, 1995
"... A runtime support is necessary for parallel computations with irregular and dynamic structures. One important component in the support system is the runtime scheduler which balances the working load in the system. We present a new algorithm, Symmetrical Hopping, for dynamic scheduling of ultralight ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
A runtime support is necessary for parallel computations with irregular and dynamic structures. One important component in the support system is the runtime scheduler which balances the working load in the system. We present a new algorithm, Symmetrical Hopping, for dynamic scheduling of ultralightweight processes. It is a dynamic, distributed, adaptive, and scalable scheduling algorithm. This algorithm is described and compared to four other algorithms that have been proposed in this context, namely the randomized allocation, the senderinitiated scheduling, the receiverinitiated scheduling, and the gradient model. The performance of these algorithms on Intel Touchstone Delta is presented. The experimental results show that the Symmetrical Hopping algorithm achieves much better performance due to its adaptiveness. 1. Introduction Large distributed memory parallel machines are becoming increasingly available. To efficiently use such large machines to solve an application problem, th...
Efficient Runtime Support for Irregular Task Computations with Mixed Granularities
 In Proceedings of IEEE International Parallel Processing Symposium
, 1996
"... Many irregular scientific computing problems can be modeled by directed acyclic task graphs (DAGs). In this paper, we present an efficient runtime system for executing general asynchronous DAG schedules on distributed memory machines. Our solution tightly integrates the runtime scheme with a fast ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Many irregular scientific computing problems can be modeled by directed acyclic task graphs (DAGs). In this paper, we present an efficient runtime system for executing general asynchronous DAG schedules on distributed memory machines. Our solution tightly integrates the runtime scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying, and takes advantage of task dependence properties to ensure the correctness of execution. We demonstrate the applications of this scheme in sparse LU and Cholesky factorizations for which actual speedups have been hard to obtain in the literature because parallelism in these problems is irregular and limited. Our experiments on Meiko CS2 show the promising results of our approach in exploiting irregular task parallelism with mixed granularities. 1 Introduction Solving irregular problems, where patterns of computations and communications are unstructured and/or changing dynamically, is very important...