Results 1 - 10
of
51
Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms
, 2003
"... External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to ..."
Abstract
-
Cited by 39 (10 self)
- Add to MetaCart
External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACMWorking Group on Storage I/O for Large-Scale Computing.
Solving Large FPT Problems On Coarse Grained Parallel Machines
"... Fixed-parameter tractability(FPT) techniques have recently been successful in solving NP-complete problem instances of practical importance which were too large to be solved with previous methods. In this paper we show how to enhance this approach through the addition of parallelism, thereby allowin ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Fixed-parameter tractability(FPT) techniques have recently been successful in solving NP-complete problem instances of practical importance which were too large to be solved with previous methods. In this paper we show how to enhance this approach through the addition of parallelism, thereby allowing even larger problem instances to be solved in practice. More precisely, we demonstrate the potential of parallelism when applied to the bounded tree search phase of FPT algorithms. We apply our methodology to the k-Vertex Cover problem which has important applications, e.g., in multiple sequence alignments for computational biochemistry. We have implemented our parallel FPT method and application specific "plug-in" code for the k-Vertex Cover problem using C and the MPI communication library, and tested it on a network of 10 Sun SPARC workstations. This is the first experimental examination of parallel FPT techniques. In our experiments, we obtain excellent speedup results. Not only do we achieve a speedup of p in most cases, many cases even exhibit a super linear speedup. The latter result implies that our parallel methods, when simulated on a single processor, also yield a significant improvement over existing sequential methods.
Parallelizing the data cube
- Distributed and Parallel Databases
, 2002
"... Abstract. This paper presents a general methodology for the e cient parallelization of existing data cube construction algorithms. We describe two di erent partitioning strategies, one for top-down and one for bottomup cube algorithms. Both partitioning strategies assign subcubes to individual proce ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Abstract. This paper presents a general methodology for the e cient parallelization of existing data cube construction algorithms. We describe two di erent partitioning strategies, one for top-down and one for bottomup cube algorithms. Both partitioning strategies assign subcubes to individual processors in such away that the loads assigned to the processors are balanced. Our methods reduce inter-processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel as is done in previous parallel approaches. In fact, after the initial load distribution phase, each processor can compute its assigned subcube without any communication with the other processors. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting. The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights re ect algorithm speci c cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array. Experimental results presented show that our partitioning strategies generate a close to optimal load balance between processors. 1
Graph Coloring on a Coarse Grained Multiprocessor (Extended Abstract)
, 2000
"... We present the first efficient algorithm for a coarse grained multiprocessor that colors a graph G with a guarantee of at most D G +1 colors. 1 ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
We present the first efficient algorithm for a coarse grained multiprocessor that colors a graph G with a guarantee of at most D G +1 colors. 1
Communication-Optimal Parallel Minimum Spanning Tree Algorithms
, 1998
"... Lower and upper bounds for finding a minimum spanning tree (MST) in a weighted undirected graph on the BSP model are presented. We provide the first non-trivial lower bounds on the communication volume required to solve the MST problem. Let p denote the number of processors, n the number of nodes of ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Lower and upper bounds for finding a minimum spanning tree (MST) in a weighted undirected graph on the BSP model are presented. We provide the first non-trivial lower bounds on the communication volume required to solve the MST problem. Let p denote the number of processors, n the number of nodes of the input graph, and m the number of edges of the input graph. We show that in the worst case, a total of \Omega\Gamma \Delta min(m; pn)) bits need to be communicated in order to solve the MST problem, where is the number of bits required to represent a single edge weight. This implies that if each message communicates at most bits, any BSP algorithm for finding an MST requires communication time \Omega\Gamma g \Delta min(m=p; n)), where g is the gap parameter of the BSP model. In addition, we present two algorithms with communication requirements that match our lower bound in different situations. Both algorithms perform linear work for appropriate values of n, m and p, and use a numbe...
Solving large FPT problems on coarse-grained parallel machines
- JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 2003
"... Fixed-paramd er tractability (FPT) techniques have recently been successful in solving NP-complete problem instances of practicalimct tance which were too large to be solved with previousmevio s. In this paper, we show how to enhance this approach through the addition of parallelism thereby allowing ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Fixed-paramd er tractability (FPT) techniques have recently been successful in solving NP-complete problem instances of practicalimct tance which were too large to be solved with previousmevio s. In this paper, we show how to enhance this approach through the addition of parallelism thereby allowing even larger problem instances to be solved in practice. More precisely, wedem nstrate the potential of parallelism when applied to the bounded-tree search phase of FPT algorithmr We apply ourmrFWWqN ogy to the k-Vertex Cover problem which hasimFI tant applications in, forexamV e, the analysis ofmFWBNWW sequence align mgnF forcomN tational biochemchFI . We have ime emeF ed our parallel PTmFWWfi for using C and the MPI comFI4 cation library, and tested it on a 32-node Beowulf cluster. This is the first experimqBVB exam nation of parallel PT techniques. As part of our experi mperi we solved larger instances of k-Vertex Cover than in any previously reported imported ations. orexamfi e, our code can solve problem instances with kX400 in less than 1.5h.
The Deterministic Complexity of Parallel Multisearch (Extended Abstract)
, 1996
"... ) Appeared in: Proceedings of 5th SWAT '96, Springer LNCS 1097, 1996 Armin Baumker 1 Wolfgang Dittrich 1 Andrea Pietracaprina 2 1 Department of Mathematics and Computer Science and Heinz Nixdorf Institute, University of Paderborn, Paderborn, Germany 2 Dipartimento di Matematica Pura e Appli ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
) Appeared in: Proceedings of 5th SWAT '96, Springer LNCS 1097, 1996 Armin Baumker 1 Wolfgang Dittrich 1 Andrea Pietracaprina 2 1 Department of Mathematics and Computer Science and Heinz Nixdorf Institute, University of Paderborn, Paderborn, Germany 2 Dipartimento di Matematica Pura e Applicata, Via Belzoni 7, Universit`a di Padova, Padova, Italy Abstract. Given m ordered segments that form a partition of some universe (e.g., a 2D strip), the multisearch problem consists of determining, for a set of n query points in the universe, the segments they belong to. We present the first parallel deterministic scheme that efficiently solves the problem in the case m n. The scheme is designed on the BSP* model, a variant of Valiant's BSP that rewards blockwise communication, and uses a suitable redundant representation of the data. Both computation and communication complexities are studied as functions of the redundancy. In particular, it is shown that optimal speed-up can be achieve...
PRO: a model for Parallel Resource-Optimal computation
- IN 16TH ANNUAL INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS AND APPLICATIONS. IEEE, THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS
, 2002
"... We present a new parallel computation model that enables the design of resource-optimal scalable parallel algorithms and simplifies their analysis. The model rests on the novel idea of incorporating relative optimality as an integral part and measuring the quality of a parallel algorithm in terms of ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
We present a new parallel computation model that enables the design of resource-optimal scalable parallel algorithms and simplifies their analysis. The model rests on the novel idea of incorporating relative optimality as an integral part and measuring the quality of a parallel algorithm in terms of granularity.
Coarse grained parallel next element search
- In Proceedings of the 11-th International Parallel Processing Symposium
, 1997
"... We present a parallel algorithm for solving the next element search problem on a set of line segments, using a BSP like model referred to as the Coarse Grained Multicomputer (CGM). The algorithm requires O(1) communication rounds (h-relations with h=O(n/p)), O((n/ p) log n) local computation, and O( ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We present a parallel algorithm for solving the next element search problem on a set of line segments, using a BSP like model referred to as the Coarse Grained Multicomputer (CGM). The algorithm requires O(1) communication rounds (h-relations with h=O(n/p)), O((n/ p) log n) local computation, and O((n/p) log n) storage per processor. Our result implies solutions to the point location, trapezoidal decomposition and polygon triangulation problems. A simplified version for axis parallel segments requires only O(n/p) storage per processor, and we discuss an implementation of this version. As in a previous paper by Develliers and Fabri[11], our algorithm is based on a distributed implementation of segment trees which are of size O(n log n). This paper

