Results 1  10
of
10
Optimal Aggregation Algorithms for Middleware
 In PODS
, 2001
"... Abstract: Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its g ..."
Abstract

Cited by 540 (4 self)
 Add to MetaCart
Abstract: Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its grade under that attribute, sorted by grade (highest grade first). There is some monotone aggregation function, orcombining rule, such as min or average, that combines the individual grades to obtain an overall grade. To determine the top k objects (that have the best overall grades), the naive algorithm must access every object in the database, to find its grade under each attribute. Fagin has given an algorithm (“Fagin’s Algorithm”, or FA) that is much more efficient. For some monotone aggregation functions, FA is optimal with high probability in the worst case. We analyze an elegant and remarkably simple algorithm (“the threshold algorithm”, or TA) that is optimal in a much stronger sense than FA. We show that TA is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a highprobability worstcase sense, but over every database. Unlike FA, which requires large buffers (whose size may grow unboundedly as the database size grows), TA requires only a small, constantsize buffer. TA allows early stopping, which yields, in a precise sense, an approximate version of the top k answers.
The QueueRead QueueWrite PRAM Model: Accounting for Contention in Parallel Algorithms
 Proc. 5th ACMSIAM Symp. on Discrete Algorithms
, 1997
"... Abstract. This paper introduces the queueread queuewrite (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to sharedmemory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to thi ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
Abstract. This paper introduces the queueread queuewrite (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to sharedmemory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models: the crcw model does not adequately penalize algorithms with high contention to sharedmemory locations, while the erew model is too strict in its insistence on zero contention at each step. The�qrqw pram is strictly more powerful than the erew pram. This paper shows a separation of log n between the two models, and presents faster and more efficient qrqw algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a workpreserving emulation of the qrqw pram with only logarithmic slowdown on Valiant’s bsp model, and hence on hypercubetype noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the bestknown emulation result for the erew pram, and considerably improves upon the bestknown efficient emulation for the crcw pram on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.
Selection on the Reconfigurable Mesh
 Proc. Frontiers of Massively Parallel Computation
, 1992
"... Our main result is a \Theta(log n) time algorithm to select the kth smallest element in a set of n elements on a reconfigurable mesh with n processors. This improves on the previous fastest algorithm's running time by a factor of log n. We also show that some variants of this problem can be solved e ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Our main result is a \Theta(log n) time algorithm to select the kth smallest element in a set of n elements on a reconfigurable mesh with n processors. This improves on the previous fastest algorithm's running time by a factor of log n. We also show that some variants of this problem can be solved even faster. First we show that a good approximation to the median of n elements can be found in \Theta(log log n) time. This can be used to solve twodimensional linear programming over n equations in \Theta(log n log log n) time, an improvement of log n= log log n time over the previous fastest algorithm. Next, we show that, for any constant ffl ? 0, selecting the kth smallest element in a set of n 1\Gammaffl elements evenly spaced throughout the mesh can be done in constant time. We also show that one can select the kth smallest element from n bbit words in \Theta((b= log b) maxflog n \Gamma log b; 1g) time, which implies that if the elements come from a polynomial range, one can...
New lower bounds for parallel computation
 In Proceedings of the 18 th Annual ACM Symposium on Theory of Computing
, 1986
"... Abstract. Lower bounds are proven on the paralleltime complexity of several basic functions on the most powerful concurrentread concurrentwrite PRAM with unlimited shared memory and unlimited power of individual processors (denoted by PRIORITY(m)): (1) It is proved that with a number of processor ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Abstract. Lower bounds are proven on the paralleltime complexity of several basic functions on the most powerful concurrentread concurrentwrite PRAM with unlimited shared memory and unlimited power of individual processors (denoted by PRIORITY(m)): (1) It is proved that with a number of processors polynomial in n, fi(log n) time is needed for addition, multiplication or bitwise OR of n numbers, when each number has II ’ bits. Hence even the bit complexity (i.e., the time complexity as a function of the total number of bits in the input) is logarithmic in this case. This improves a beautiful result of Meyer auf der Heide and Wigderson [22]. They proved a log n lower bound using Ramseytype techniques. Using Ramsey theory, it is possible to get an upper bound on the number of bits in the inputs used. However, for the case of polynomially many processors, this upper bound is more than a polynomial in n. (2) An R(log n) lower bound is given for PRIORITY(m) with no” ’ processors on a function with inputs from (0, 11, namely for the functionf(xl,.,x.) = C:‘=, x,a ’ where a is fixed and x, E (0, 1). (3) Finally, by a new efficient simulation of PRIORITY(m) by unbounded fanin circuits, that with less than exponential number of processors, it is proven a PRIORITY(m) cannot compute PARITY in constant time, and with nO” ’ processors Q(G) time is needed. The simulation technique is of
Lower Bounds for Randomized Exclusive Write PRAMs
 in Proc. 7th ACM Symp. on Parallel Algorithms and Architectures, (ACM
, 1995
"... In this paper we study the question: How useful is randomization in speeding up Exclusive Write PRAM computations? Our results give further evidence that randomization is of limited use in these types of computations. First we examine a compaction problem on both the CREW and EREW PRAM models, and w ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
In this paper we study the question: How useful is randomization in speeding up Exclusive Write PRAM computations? Our results give further evidence that randomization is of limited use in these types of computations. First we examine a compaction problem on both the CREW and EREW PRAM models, and we present randomized lower bounds which match the best deterministic lower bounds known. (For the CREW PRAM model, the lower bound is asymptotically optimal.) These are the first nontrivial randomized lower bounds known for the compaction problem on these models. We show that our lower bounds also apply to the problem of approximate compaction. Next we examine the problem of computing boolean functions on the CREW PRAM model, and we present a randomized lower bound which improves on the previous best randomized lower bound for many boolean functions, including the OR function. (The previous lower bounds for these functions were asymptotically optimal, but we improve the constant multiplicat...
The Random Adversary: A LowerBound Technique For Randomized Parallel Algorithms
 in Proc. of the 3rd SODA (ACM
, 1997
"... . The randomadversary technique is a general method for proving lower bounds on randomized parallel algorithms. The bounds apply to the number of communication steps, and they apply regardless of the processors' instruction sets, the lengths of messages, etc. This paper introduces the ra ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
.<F3.82e+05> The randomadversary technique is a general method for proving lower bounds on randomized parallel algorithms. The bounds apply to the number of communication steps, and they apply regardless of the processors' instruction sets, the lengths of messages, etc. This paper introduces the randomadversary technique and shows how it can be used to obtain lower bounds on randomized parallel algorithms for load balancing, compaction, padded sorting, and finding Hamiltonian cycles in random graphs. Using the randomadversary technique, we obtain the first lower bounds for randomized parallel algorithms which are provably faster than their deterministic counterparts (specifically, for load balancing and related problems).<F4.005e+05> Key words.<F3.82e+05> parallel algorithms, parallel computation, PRAM model, randomized parallel algorithms, expected time, lower bounds, load balancing<F4.005e+05> AMS subject classifications.<F3.82e+05> 68Q10, 68Q22, 68Q25<F4.005e+05> PII.<F3.82e+05> ...
Average And Randomized Complexity Of Distributed Problems
, 1996
"... . A.C. Yao proved that in the decisiontree model the average complexity of the best deterministic algorithm is a lower bound on the complexity of randomized algorithms that solve the same problem. Here it is shown that a similar result does not always hold in the common model of distributed computa ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
. A.C. Yao proved that in the decisiontree model the average complexity of the best deterministic algorithm is a lower bound on the complexity of randomized algorithms that solve the same problem. Here it is shown that a similar result does not always hold in the common model of distributed computation, the model in which all the processors run the same program (that may depend on the processors' input). We, therefore, construct a new technique, that together with Yao's method, enables us to show that in many cases a similar relationship does hold in the distributed model. This relationship enables us to carry over known lower bounds on the complexity of deterministic computations to the realm of randomized computations, thus obtaining new results. The new technique can also be used for obtaining results concerning algorithms with bounded error. 1. Introduction. In 1977 Yao presented results relating the average deterministic complexity and the randomized complexity of the same proble...
ERCW PRAMs and Optical Communication
 in Proceedings of the European Conference on Parallel Processing, EUROPAR ’96
, 1996
"... This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance because o ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance because of the close relationship of the ERCW model to the OCPC model, a model of parallel computing based on dynamically reconfigurable optical networks, and of BFO circuits to the OCPC model with limited dynamic reconfiguration ability. Topics: Parallel Algorithms, Theory of Parallel and Distributed Computing. This research was supported by Texas Advanced Research Projects Grant 003658480. (philmac@cs.utexas.edu) y This research was supported in part by Texas Advanced Research Projects Grants 003658480 and 003658386, and NSF Grant CCR 9023059. (vlr@cs.utexas.edu) 1 Introduction In this paper we develop algorithms and lower bounds for fundamental problems on the Exclusive Read Concurrent Wri...
An Improved Lower Bound for the QRQW PRAM
 In Proc. 7th IEEE Symp. on Para. and Distr. Proc
, 1996
"... The queueread, queuewrite (QRQW) parallel random access machine (PRAM) model is a shared memory model which allows concurrent reading and writing with a time cost proportional to the contention. This is designed to model currently available parallel machines more accurately than either the CRCW PR ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
The queueread, queuewrite (QRQW) parallel random access machine (PRAM) model is a shared memory model which allows concurrent reading and writing with a time cost proportional to the contention. This is designed to model currently available parallel machines more accurately than either the CRCW PRAM or EREW PRAM models. Here we present a lower bound for the problem of Linear Approximate Compaction (LAC) on the QRQW PRAM. The input to LAC consists of at most m marked items in an array of size n, and the output consists of the marked items in an array of size O(m). There is an O( p log n) expected time randomized algorithm for LAC on the QRQW PRAM. We prove a lower bound of \Omega\Gamma/66 log n) expected time for any randomized algorithm for LAC, an improvement over the previous best bound of \Omega\Gamma/11 log log n). Our bound applies regardless of the number of processors and memory cells of the QRQW PRAM. 1 Introduction The PRAM model of computation has been the most widely us...
Ultrafast Parallel Algorithms and Reconfigurable Meshes
 Proc. of DARPA Software Technology Conference
, 1992
"... Introduction This research is concerned with the development of very fast parallel algorithms, ones faster than those available through normal programming techniques or standard parallel computers. Algorithms have been developed for problems in geometry, graph theory, arithmetic, sorting, and image ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Introduction This research is concerned with the development of very fast parallel algorithms, ones faster than those available through normal programming techniques or standard parallel computers. Algorithms have been developed for problems in geometry, graph theory, arithmetic, sorting, and image processing. The computing models that these algorithms have been developed for are concurrent read concurrent write parallel random access machines (CRCW PRAMs ), and reconfigurable meshes (rmeshes, defined below). For CRCW PRAMS, our work has shown that by combining randomization with the use of some extra memory, one can solve some problems far faster than they can be solved if only randomization is used. We have developed ultrafast algorithms for several problems, where by ultrafast algorithm we mean a parallel algorithm with an input of size n which uses at most a linear number of processors and finishes in polylog