Results 1  10
of
51
Efficient External Memory Algorithms by Simulating CoarseGrained Parallel Algorithms
, 2003
"... External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to ..."
Abstract

Cited by 39 (10 self)
 Add to MetaCart
External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACMWorking Group on Storage I/O for LargeScale Computing.
Solving Large FPT Problems On Coarse Grained Parallel Machines
"... Fixedparameter tractability(FPT) techniques have recently been successful in solving NPcomplete problem instances of practical importance which were too large to be solved with previous methods. In this paper we show how to enhance this approach through the addition of parallelism, thereby allowin ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Fixedparameter tractability(FPT) techniques have recently been successful in solving NPcomplete problem instances of practical importance which were too large to be solved with previous methods. In this paper we show how to enhance this approach through the addition of parallelism, thereby allowing even larger problem instances to be solved in practice. More precisely, we demonstrate the potential of parallelism when applied to the bounded tree search phase of FPT algorithms. We apply our methodology to the kVertex Cover problem which has important applications, e.g., in multiple sequence alignments for computational biochemistry. We have implemented our parallel FPT method and application specific "plugin" code for the kVertex Cover problem using C and the MPI communication library, and tested it on a network of 10 Sun SPARC workstations. This is the first experimental examination of parallel FPT techniques. In our experiments, we obtain excellent speedup results. Not only do we achieve a speedup of p in most cases, many cases even exhibit a super linear speedup. The latter result implies that our parallel methods, when simulated on a single processor, also yield a significant improvement over existing sequential methods.
Parallelizing the data cube
 Distributed and Parallel Databases
, 2002
"... Abstract. This paper presents a general methodology for the e cient parallelization of existing data cube construction algorithms. We describe two di erent partitioning strategies, one for topdown and one for bottomup cube algorithms. Both partitioning strategies assign subcubes to individual proce ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
Abstract. This paper presents a general methodology for the e cient parallelization of existing data cube construction algorithms. We describe two di erent partitioning strategies, one for topdown and one for bottomup cube algorithms. Both partitioning strategies assign subcubes to individual processors in such away that the loads assigned to the processors are balanced. Our methods reduce interprocessor communication overhead by partitioning the load in advance instead of computing each individual groupby in parallel as is done in previous parallel approaches. In fact, after the initial load distribution phase, each processor can compute its assigned subcube without any communication with the other processors. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting. The bottomup partitioning strategy balances the number of single attribute external memory sorts made by each processor. The topdown strategy partitions a weighted tree in which weights re ect algorithm speci c cost measures like estimated groupby sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array. Experimental results presented show that our partitioning strategies generate a close to optimal load balance between processors. 1
Graph Coloring on a Coarse Grained Multiprocessor (Extended Abstract)
, 2000
"... We present the first efficient algorithm for a coarse grained multiprocessor that colors a graph G with a guarantee of at most D G +1 colors. 1 ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
We present the first efficient algorithm for a coarse grained multiprocessor that colors a graph G with a guarantee of at most D G +1 colors. 1
CommunicationOptimal Parallel Minimum Spanning Tree Algorithms
, 1998
"... Lower and upper bounds for finding a minimum spanning tree (MST) in a weighted undirected graph on the BSP model are presented. We provide the first nontrivial lower bounds on the communication volume required to solve the MST problem. Let p denote the number of processors, n the number of nodes of ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Lower and upper bounds for finding a minimum spanning tree (MST) in a weighted undirected graph on the BSP model are presented. We provide the first nontrivial lower bounds on the communication volume required to solve the MST problem. Let p denote the number of processors, n the number of nodes of the input graph, and m the number of edges of the input graph. We show that in the worst case, a total of \Omega\Gamma \Delta min(m; pn)) bits need to be communicated in order to solve the MST problem, where is the number of bits required to represent a single edge weight. This implies that if each message communicates at most bits, any BSP algorithm for finding an MST requires communication time \Omega\Gamma g \Delta min(m=p; n)), where g is the gap parameter of the BSP model. In addition, we present two algorithms with communication requirements that match our lower bound in different situations. Both algorithms perform linear work for appropriate values of n, m and p, and use a numbe...
PRO: a model for Parallel ResourceOptimal computation
 IN 16TH ANNUAL INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS AND APPLICATIONS. IEEE, THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS
, 2002
"... We present a new parallel computation model that enables the design of resourceoptimal scalable parallel algorithms and simplifies their analysis. The model rests on the novel idea of incorporating relative optimality as an integral part and measuring the quality of a parallel algorithm in terms of ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
We present a new parallel computation model that enables the design of resourceoptimal scalable parallel algorithms and simplifies their analysis. The model rests on the novel idea of incorporating relative optimality as an integral part and measuring the quality of a parallel algorithm in terms of granularity.
Solving large FPT problems on coarsegrained parallel machines
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 2003
"... Fixedparamd er tractability (FPT) techniques have recently been successful in solving NPcomplete problem instances of practicalimct tance which were too large to be solved with previousmevio s. In this paper, we show how to enhance this approach through the addition of parallelism thereby allowing ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Fixedparamd er tractability (FPT) techniques have recently been successful in solving NPcomplete problem instances of practicalimct tance which were too large to be solved with previousmevio s. In this paper, we show how to enhance this approach through the addition of parallelism thereby allowing even larger problem instances to be solved in practice. More precisely, wedem nstrate the potential of parallelism when applied to the boundedtree search phase of FPT algorithmr We apply ourmrFWWqN ogy to the kVertex Cover problem which hasimFI tant applications in, forexamV e, the analysis ofmFWBNWW sequence align mgnF forcomN tational biochemchFI . We have ime emeF ed our parallel PTmFWWfi for using C and the MPI comFI4 cation library, and tested it on a 32node Beowulf cluster. This is the first experimqBVB exam nation of parallel PT techniques. As part of our experi mperi we solved larger instances of kVertex Cover than in any previously reported imported ations. orexamfi e, our code can solve problem instances with kX400 in less than 1.5h.
The Deterministic Complexity of Parallel Multisearch (Extended Abstract)
, 1996
"... ) Appeared in: Proceedings of 5th SWAT '96, Springer LNCS 1097, 1996 Armin Baumker 1 Wolfgang Dittrich 1 Andrea Pietracaprina 2 1 Department of Mathematics and Computer Science and Heinz Nixdorf Institute, University of Paderborn, Paderborn, Germany 2 Dipartimento di Matematica Pura e Appli ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
) Appeared in: Proceedings of 5th SWAT '96, Springer LNCS 1097, 1996 Armin Baumker 1 Wolfgang Dittrich 1 Andrea Pietracaprina 2 1 Department of Mathematics and Computer Science and Heinz Nixdorf Institute, University of Paderborn, Paderborn, Germany 2 Dipartimento di Matematica Pura e Applicata, Via Belzoni 7, Universit`a di Padova, Padova, Italy Abstract. Given m ordered segments that form a partition of some universe (e.g., a 2D strip), the multisearch problem consists of determining, for a set of n query points in the universe, the segments they belong to. We present the first parallel deterministic scheme that efficiently solves the problem in the case m n. The scheme is designed on the BSP* model, a variant of Valiant's BSP that rewards blockwise communication, and uses a suitable redundant representation of the data. Both computation and communication complexities are studied as functions of the redundancy. In particular, it is shown that optimal speedup can be achieve...
Coarse grained parallel next element search
 In Proceedings of the 11th International Parallel Processing Symposium
, 1997
"... We present a parallel algorithm for solving the next element search problem on a set of line segments, using a BSP like model referred to as the Coarse Grained Multicomputer (CGM). The algorithm requires O(1) communication rounds (hrelations with h=O(n/p)), O((n/ p) log n) local computation, and O( ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We present a parallel algorithm for solving the next element search problem on a set of line segments, using a BSP like model referred to as the Coarse Grained Multicomputer (CGM). The algorithm requires O(1) communication rounds (hrelations with h=O(n/p)), O((n/ p) log n) local computation, and O((n/p) log n) storage per processor. Our result implies solutions to the point location, trapezoidal decomposition and polygon triangulation problems. A simplified version for axis parallel segments requires only O(n/p) storage per processor, and we discuss an implementation of this version. As in a previous paper by Develliers and Fabri[11], our algorithm is based on a distributed implementation of segment trees which are of size O(n log n). This paper