Results 1  10
of
80
Efficient External Memory Algorithms by Simulating CoarseGrained Parallel Algorithms
, 2003
"... External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to ..."
Abstract

Cited by 44 (12 self)
 Add to MetaCart
External memory (EM) algorithms are designed for largescale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACMWorking Group on Storage I/O for LargeScale Computing.
Parallelizing the data cube
 Distributed and Parallel Databases
, 2002
"... Abstract. This paper presents a general methodology for the e cient parallelization of existing data cube construction algorithms. We describe two di erent partitioning strategies, one for topdown and one for bottomup cube algorithms. Both partitioning strategies assign subcubes to individual proce ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents a general methodology for the e cient parallelization of existing data cube construction algorithms. We describe two di erent partitioning strategies, one for topdown and one for bottomup cube algorithms. Both partitioning strategies assign subcubes to individual processors in such away that the loads assigned to the processors are balanced. Our methods reduce interprocessor communication overhead by partitioning the load in advance instead of computing each individual groupby in parallel as is done in previous parallel approaches. In fact, after the initial load distribution phase, each processor can compute its assigned subcube without any communication with the other processors. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting. The bottomup partitioning strategy balances the number of single attribute external memory sorts made by each processor. The topdown strategy partitions a weighted tree in which weights re ect algorithm speci c cost measures like estimated groupby sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array. Experimental results presented show that our partitioning strategies generate a close to optimal load balance between processors. 1
Solving Large FPT Problems On Coarse Grained Parallel Machines
"... Fixedparameter tractability(FPT) techniques have recently been successful in solving NPcomplete problem instances of practical importance which were too large to be solved with previous methods. In this paper we show how to enhance this approach through the addition of parallelism, thereby allowin ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Fixedparameter tractability(FPT) techniques have recently been successful in solving NPcomplete problem instances of practical importance which were too large to be solved with previous methods. In this paper we show how to enhance this approach through the addition of parallelism, thereby allowing even larger problem instances to be solved in practice. More precisely, we demonstrate the potential of parallelism when applied to the bounded tree search phase of FPT algorithms. We apply our methodology to the kVertex Cover problem which has important applications, e.g., in multiple sequence alignments for computational biochemistry. We have implemented our parallel FPT method and application specific "plugin" code for the kVertex Cover problem using C and the MPI communication library, and tested it on a network of 10 Sun SPARC workstations. This is the first experimental examination of parallel FPT techniques. In our experiments, we obtain excellent speedup results. Not only do we achieve a speedup of p in most cases, many cases even exhibit a super linear speedup. The latter result implies that our parallel methods, when simulated on a single processor, also yield a significant improvement over existing sequential methods.
Graph Coloring on a Coarse Grained Multiprocessor (Extended Abstract)
, 2000
"... We present the first efficient algorithm for a coarse grained multiprocessor that colors a graph G with a guarantee of at most D G +1 colors. 1 ..."
Abstract

Cited by 15 (10 self)
 Add to MetaCart
We present the first efficient algorithm for a coarse grained multiprocessor that colors a graph G with a guarantee of at most D G +1 colors. 1
CommunicationOptimal Parallel Minimum Spanning Tree Algorithms
, 1998
"... Lower and upper bounds for finding a minimum spanning tree (MST) in a weighted undirected graph on the BSP model are presented. We provide the first nontrivial lower bounds on the communication volume required to solve the MST problem. Let p denote the number of processors, n the number of nodes of ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Lower and upper bounds for finding a minimum spanning tree (MST) in a weighted undirected graph on the BSP model are presented. We provide the first nontrivial lower bounds on the communication volume required to solve the MST problem. Let p denote the number of processors, n the number of nodes of the input graph, and m the number of edges of the input graph. We show that in the worst case, a total of \Omega\Gamma \Delta min(m; pn)) bits need to be communicated in order to solve the MST problem, where is the number of bits required to represent a single edge weight. This implies that if each message communicates at most bits, any BSP algorithm for finding an MST requires communication time \Omega\Gamma g \Delta min(m=p; n)), where g is the gap parameter of the BSP model. In addition, we present two algorithms with communication requirements that match our lower bound in different situations. Both algorithms perform linear work for appropriate values of n, m and p, and use a numbe...
PRO: a model for Parallel ResourceOptimal computation
 IN 16TH ANNUAL INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS AND APPLICATIONS. IEEE, THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS
, 2002
"... We present a new parallel computation model that enables the design of resourceoptimal scalable parallel algorithms and simplifies their analysis. The model rests on the novel idea of incorporating relative optimality as an integral part and measuring the quality of a parallel algorithm in terms of ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
We present a new parallel computation model that enables the design of resourceoptimal scalable parallel algorithms and simplifies their analysis. The model rests on the novel idea of incorporating relative optimality as an integral part and measuring the quality of a parallel algorithm in terms of granularity.
Bulk Synchronous Parallel Algorithms for the External Memory Model
, 2002
"... Blockwise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully parallel disk I/O. In this paper we present a simple, deterministic simulation technique which transforms certain Bulk Synchr ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Blockwise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully parallel disk I/O. In this paper we present a simple, deterministic simulation technique which transforms certain Bulk Synchronous Parallel (BSP) algorithms into efficient parallel EM algorithms. It optimizes blockwise data access and parallel disk I/O and, at the same time, utilizes multiple processors connected via a communication network or shared memory. We obtain new improved parallel EM algorithms for a large number of problems including sorting, permutation, matrix transpose, several geometric and GIS problems including threedimensional convex hulls (twodimensional Voronoi diagrams), and various graph problems. We show that certain parallel algorithms known for the BSP model can be used to obtain EM algorithms that meet well known I/O complexity lower bounds for various problems, including sorting.
Solving large FPT problems on coarsegrained parallel machines
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 2003
"... Fixedparamd er tractability (FPT) techniques have recently been successful in solving NPcomplete problem instances of practicalimct tance which were too large to be solved with previousmevio s. In this paper, we show how to enhance this approach through the addition of parallelism thereby allowing ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Fixedparamd er tractability (FPT) techniques have recently been successful in solving NPcomplete problem instances of practicalimct tance which were too large to be solved with previousmevio s. In this paper, we show how to enhance this approach through the addition of parallelism thereby allowing even larger problem instances to be solved in practice. More precisely, wedem nstrate the potential of parallelism when applied to the boundedtree search phase of FPT algorithmr We apply ourmrFWWqN ogy to the kVertex Cover problem which hasimFI tant applications in, forexamV e, the analysis ofmFWBNWW sequence align mgnF forcomN tational biochemchFI . We have ime emeF ed our parallel PTmFWWfi for using C and the MPI comFI4 cation library, and tested it on a 32node Beowulf cluster. This is the first experimqBVB exam nation of parallel PT techniques. As part of our experi mperi we solved larger instances of kVertex Cover than in any previously reported imported ations. orexamfi e, our code can solve problem instances with kX400 in less than 1.5h.
The Deterministic Complexity of Parallel Multisearch (Extended Abstract)
 PROCEEDINGS OF 5TH SWAT '96, SPRINGER LNCS 1097, 1996
, 1996
"... Given m ordered segments that form a partition of some universe (e.g., a 2D strip), the multisearch problem consists of determining, for a set of n query points in the universe, the segments they belong to. We present the first parallel deterministic scheme that efficiently solves the problem in th ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Given m ordered segments that form a partition of some universe (e.g., a 2D strip), the multisearch problem consists of determining, for a set of n query points in the universe, the segments they belong to. We present the first parallel deterministic scheme that efficiently solves the problem in the case m n. The scheme is designed on the BSP* model, a variant of Valiant's BSP that rewards blockwise communication, and uses a suitable redundant representation of the data. Both computation and communication complexities are studied as functions of the redundancy. In particular, it is shown that optimal speedup can be achieve...