Results 11  20
of
124
Automated Parallelization of Discrete Statespace Generation
 Journal of Parallel and Distributed Computing
, 1997
"... We consider the problem of generating a large statespace in a distributed fashion. Unlike previously proposed solutions that partition the set of reachable states according to a hashing function provided by the user, we explore heuristic methods that completely automate the process. The first step ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
We consider the problem of generating a large statespace in a distributed fashion. Unlike previously proposed solutions that partition the set of reachable states according to a hashing function provided by the user, we explore heuristic methods that completely automate the process. The first step is an initial random walk through the state space to initialize a search tree, duplicated in each processor. Then, the reachability graph is built in a distributed way, using the search tree to assign each newly found state to classes assigned to the available processors. Furthermore, we explore two remapping criteria that attempt to balance memory usage or future workload, respectively. We show how the cost of computing the global snapshot required for remapping will scale up for system sizes in the foreseeable future. An extensive set of results is presented to support our conclusions that remapping is extremely beneficial. 1 Introduction Discrete systems are frequently analyzed by genera...
Performance Analysis of a Distributed Question Answering System
 IEEE Transactions on Parallel and Distributed Systems
, 2002
"... The problem of question/answering (Q/A) is to find answers to opendomain questions by searching large collections of documents. Unlike information retrieval systems, very common today in the form of Internet search engines, Q/A systems do not retrieve documents, but instead provide short, relevant ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
The problem of question/answering (Q/A) is to find answers to opendomain questions by searching large collections of documents. Unlike information retrieval systems, very common today in the form of Internet search engines, Q/A systems do not retrieve documents, but instead provide short, relevant answers located in small fragments of text. This enhanced functionality comes with a price: Q/A systems are significantly slower and require more hardware resources than information retrieval systems. This paper proposes a distributed Q/A architecture that: enhances the system throughput through the exploitation of interquestion parallelism and dynamic load balancing, and reduces the individual question response time through the exploitation of intraquestion parallelism. Inter and intraquestion parallelism are both exploited using several scheduling points: one before the Q/A task is started, and two embedded in the Q/A task. An analytical performance model is introduced. The model analyzes both the interquestion parallelism overhead generated by the migration of questions, and the intraquestion parallelism overhead generated by the partitioning of the Q/A task. The analytical model indicates that both question migration and partitioning are required for a highperformance system: intraquestion
Nearest Neighbor Algorithms for Load Balancing in Parallel Computers
, 1995
"... With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on localized workload information and manages workload migrations within its neighborhood. This paper compares a couple of fairly wellknown nearest neighbor algorithms, the dimensionexchange (DE, for shor ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on localized workload information and manages workload migrations within its neighborhood. This paper compares a couple of fairly wellknown nearest neighbor algorithms, the dimensionexchange (DE, for short) and the diffusion (DF, for short) methods and their several variantsthe average dimensionexchange (ADE), the optimallytuned dimensionexchange (ODE), the local average diffusion (ADF) and the optimallytuned diffusion (ODF). The measures of interest are their efficiency in driving any initial workload distribution to a uniform distribution and their ability in controlling the growth of the variance among the processors' workloads. The comparison is made with respect to both oneport and allport communication architectures and in consideration of various implementation strategies including synchronous/asynchronous invocation policies and static/dynamic random workload behaviors. It t...
DataParallel Load Balancing Strategies
 Parallel Computing
, 1996
"... Programming irregular and dynamic dataparallel algorithms requires to take data distribution into account. The implementation of a load balancing algorithm is a quite difficult task for the programmer. However, a load balancing strategy may be developed independently of the application. The integra ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Programming irregular and dynamic dataparallel algorithms requires to take data distribution into account. The implementation of a load balancing algorithm is a quite difficult task for the programmer. However, a load balancing strategy may be developed independently of the application. The integration of such a strategy in the dataparallel algorithm may be relevant to a library or a dataparallel compiler runtime. We propose load distribution dataparallel algorithms for a class of irregular dataparallel algorithms called stack algorithms. Our algorithms allow the use of regular and/or irregular communication patterns to exchange the works between processors. The results of theoretical analysis of these algorithms are presented. They allow a comparison of the different load balancing algorithms and the identification of criterion for the choice of a load balancing algorithm.
Runtime incremental parallel scheduling (RIPS) on distributed memory computers
 In Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation
, 1995
"... Abstract  Runtime Incremental Parallel Scheduling (RIPS) is an alternative strategy to the commonly used dynamic scheduling. In this scheduling strategy, the system scheduling activity alternates with the underlying computation work. RIPS utilizes the advanced parallel scheduling technique to produ ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
Abstract  Runtime Incremental Parallel Scheduling (RIPS) is an alternative strategy to the commonly used dynamic scheduling. In this scheduling strategy, the system scheduling activity alternates with the underlying computation work. RIPS utilizes the advanced parallel scheduling technique to produce a lowoverhead, highquality load balancing, as well as adapting to irregular applications. This paper presents methods for scheduling a single job on a dedicated parallel machine.
Fast Priority Queues for Parallel BranchandBound
 In Workshop on Algorithms for Irregularly Structured Problems, number 980 in LNCS
, 1995
"... . Currently used parallel best first branchandbound algorithms either suffer from contention at a centralized priority queue or can only approximate the best first strategy. Bottleneck free algorithms for parallel priority queues are known but they cannot be implemented very efficiently on contemp ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
. Currently used parallel best first branchandbound algorithms either suffer from contention at a centralized priority queue or can only approximate the best first strategy. Bottleneck free algorithms for parallel priority queues are known but they cannot be implemented very efficiently on contemporary machines. We present quite simple randomized algorithms for parallel priority queues on distributed memory machines. For branchandbound they are asymptotically as efficient as previously known PRAM algorithms with high probability. The simplest versions require not much more communication than the approximated branchandbound algorithm of Karp and Zhang. Keywords: Analysis of randomized algorithms, distributed memory, load balancing, median selection, parallel best first branchandbound, parallel pritority queue. 1 Introduction Branchandbound search is an important technique for many combinatorial optimization problems. Since it can be a quite time consuming technique, paralleli...
Dynamic Load Balancing for Structured Adaptive Mesh Refinement Applications
 Proc. of 30th International Conference on Parallel Processing’2001
, 2001
"... Adaptive Mesh Refinement (AMR) is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. One of the key issues related to AMR is dynamic load balancing (DLB), which allows largescale adaptive applications to run efficien ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
Adaptive Mesh Refinement (AMR) is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. One of the key issues related to AMR is dynamic load balancing (DLB), which allows largescale adaptive applications to run efficiently on parallel systems. In this paper, we present an efficient DLB scheme for Structured AMR (SAMR) applications. Our DLB scheme combines a gridsplitting technique with direct grid movements (e.g., direct movement from an overloaded processor to an underloaded processor), for which the objective is to efficiently redistribute workload among all the processors so as to reduce the parallel execution time. The potential benefits of our DLB scheme are examined by incorporating our techniques into a parallel, cosmological application that uses SAMR techniques. Experiments show that by using our scheme, the parallel execution time can be reduced by up to 47 % and the quality of loadbalancing can be improved by a factor of four. 1
An Analytical Comparison of Nearest Neighbor Algorithms for Load Balancing in Parallel Computers
, 1995
"... With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on its local information and manages workload migrations within its neighborhood. This paper compares a couple of fairly wellknown nearest neighbor algorithms, the dimension exchange and the diffusion metho ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on its local information and manages workload migrations within its neighborhood. This paper compares a couple of fairly wellknown nearest neighbor algorithms, the dimension exchange and the diffusion methods and their variants in terms of their performances in both oneport and allport communication architectures. It turns out that the dimension exchange method outperforms the diffusion method in the oneport communication model, and that the strength of the diffusion method is in asynchronous implementations in the allport communication model. The underlying communication networks considered assume the most popular topologies, the mesh and the torus and their special cases: the hypercube and the kary ncube. 1 Introduction Massively parallel computers have been shown to be very efficient at solving problems that can be partitioned into tasks with static computation and communication patt...
A linear time delay model for studying load balancing instabilities in parallel computations
 The International Journal of System Science
, 2003
"... A linear timedelay system is used to model load balancing in a cluster of computer nodes used for parallel computations. The linear model is analyzed for stability in terms of the delays in the transfer of information between nodes and the gains in the load balancing algorithm. This model is compar ..."
Abstract

Cited by 12 (12 self)
 Add to MetaCart
A linear timedelay system is used to model load balancing in a cluster of computer nodes used for parallel computations. The linear model is analyzed for stability in terms of the delays in the transfer of information between nodes and the gains in the load balancing algorithm. This model is compared with an experimental implementation of the algorithm on a parallel computer network. 1