Results 1  10
of
13
Optimal Aggregation Algorithms for Middleware
 In PODS
, 2001
"... Abstract: Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its g ..."
Abstract

Cited by 547 (4 self)
 Add to MetaCart
Abstract: Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its grade under that attribute, sorted by grade (highest grade first). There is some monotone aggregation function, orcombining rule, such as min or average, that combines the individual grades to obtain an overall grade. To determine the top k objects (that have the best overall grades), the naive algorithm must access every object in the database, to find its grade under each attribute. Fagin has given an algorithm (“Fagin’s Algorithm”, or FA) that is much more efficient. For some monotone aggregation functions, FA is optimal with high probability in the worst case. We analyze an elegant and remarkably simple algorithm (“the threshold algorithm”, or TA) that is optimal in a much stronger sense than FA. We show that TA is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a highprobability worstcase sense, but over every database. Unlike FA, which requires large buffers (whose size may grow unboundedly as the database size grows), TA requires only a small, constantsize buffer. TA allows early stopping, which yields, in a precise sense, an approximate version of the top k answers.
The QueueRead QueueWrite PRAM Model: Accounting for Contention in Parallel Algorithms
 Proc. 5th ACMSIAM Symp. on Discrete Algorithms
, 1997
"... Abstract. This paper introduces the queueread queuewrite (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to sharedmemory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to thi ..."
Abstract

Cited by 24 (10 self)
 Add to MetaCart
Abstract. This paper introduces the queueread queuewrite (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to sharedmemory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models: the crcw model does not adequately penalize algorithms with high contention to sharedmemory locations, while the erew model is too strict in its insistence on zero contention at each step. The�qrqw pram is strictly more powerful than the erew pram. This paper shows a separation of log n between the two models, and presents faster and more efficient qrqw algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a workpreserving emulation of the qrqw pram with only logarithmic slowdown on Valiant’s bsp model, and hence on hypercubetype noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the bestknown emulation result for the erew pram, and considerably improves upon the bestknown efficient emulation for the crcw pram on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.
THE COMPLEXITY OF PARALLEL SORTING
, 1987
"... The model we consider is the (concurrentwrite, PRIORITY) PRAM. It has n synchronous processors, which communicate via an infinite shared memory. When several processors simultaneously write to the same cell, the one with the largest index succeeds. We allow the processors arbitrary computational po ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
The model we consider is the (concurrentwrite, PRIORITY) PRAM. It has n synchronous processors, which communicate via an infinite shared memory. When several processors simultaneously write to the same cell, the one with the largest index succeeds. We allow the processors arbitrary computational power. Our main result is that sorting n integers requires l)(x/i6g n) steps in this strong model. This bound is proved in two stages. First, using a novel Ramsey theoretic argument, we "reduce " sorting on a PRAM to sorting on a parallel merge tree. This tree is a generalization of Valiant’s parallel comparison tree from V] in which at every step n pairs of (previously ordered) sets are merged (rather then n pairs of elements compared). The second stage is proving the lower bound for such trees. The Ramsey theoretic technique, together with known methods for bounding the "degree" of the computation, can be used to unify and generalize previous lower bounds for PRAM’s. For example, we can show that the computation of any symmetric polynomial (e.g. the sum or product) on n integers requires exactly log2 n steps.
Selection on the reconfigurable mesh
 Proceedings of 4th Symposium on the Frontiers of Massively Parallel Computation
, 1992
"... ..."
New lower bounds for parallel computation
 In Proceedings of the 18 th Annual ACM Symposium on Theory of Computing
, 1986
"... Abstract. Lower bounds are proven on the paralleltime complexity of several basic functions on the most powerful concurrentread concurrentwrite PRAM with unlimited shared memory and unlimited power of individual processors (denoted by PRIORITY(m)): (1) It is proved that with a number of processor ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Abstract. Lower bounds are proven on the paralleltime complexity of several basic functions on the most powerful concurrentread concurrentwrite PRAM with unlimited shared memory and unlimited power of individual processors (denoted by PRIORITY(m)): (1) It is proved that with a number of processors polynomial in n, fi(log n) time is needed for addition, multiplication or bitwise OR of n numbers, when each number has II ’ bits. Hence even the bit complexity (i.e., the time complexity as a function of the total number of bits in the input) is logarithmic in this case. This improves a beautiful result of Meyer auf der Heide and Wigderson [22]. They proved a log n lower bound using Ramseytype techniques. Using Ramsey theory, it is possible to get an upper bound on the number of bits in the inputs used. However, for the case of polynomially many processors, this upper bound is more than a polynomial in n. (2) An R(log n) lower bound is given for PRIORITY(m) with no” ’ processors on a function with inputs from (0, 11, namely for the functionf(xl,.,x.) = C:‘=, x,a ’ where a is fixed and x, E (0, 1). (3) Finally, by a new efficient simulation of PRIORITY(m) by unbounded fanin circuits, that with less than exponential number of processors, it is proven a PRIORITY(m) cannot compute PARITY in constant time, and with nO” ’ processors Q(G) time is needed. The simulation technique is of
Average And Randomized Complexity Of Distributed Problems
, 1996
"... . A.C. Yao proved that in the decisiontree model the average complexity of the best deterministic algorithm is a lower bound on the complexity of randomized algorithms that solve the same problem. Here it is shown that a similar result does not always hold in the common model of distributed computa ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
. A.C. Yao proved that in the decisiontree model the average complexity of the best deterministic algorithm is a lower bound on the complexity of randomized algorithms that solve the same problem. Here it is shown that a similar result does not always hold in the common model of distributed computation, the model in which all the processors run the same program (that may depend on the processors' input). We, therefore, construct a new technique, that together with Yao's method, enables us to show that in many cases a similar relationship does hold in the distributed model. This relationship enables us to carry over known lower bounds on the complexity of deterministic computations to the realm of randomized computations, thus obtaining new results. The new technique can also be used for obtaining results concerning algorithms with bounded error. 1. Introduction. In 1977 Yao presented results relating the average deterministic complexity and the randomized complexity of the same proble...
The random adversary: A lower bound technique for randomized parallel algorithms
 in Proc. of the 3rd ACMSIAM Symposium on Discrete Algorithms (ACM
, 1993
"... ..."
ERCW PRAMs and Optical Communication
 in Proceedings of the European Conference on Parallel Processing, EUROPAR ’96
, 1996
"... This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance beca ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance because of the close relationship of the ERCW model to the OCPC model, a model of parallel computing based on dynamically reconfigurable optical networks, and of BFO circuits to the OCPC model with limited dynamic reconfiguration ability. Topics: Parallel Algorithms, Theory of Parallel and Distributed Computing. This research was supported by Texas Advanced Research Projects Grant 003658480. (philmac@cs.utexas.edu) y This research was supported in part by Texas Advanced Research Projects Grants 003658480 and 003658386, and NSF Grant CCR 9023059. (vlr@cs.utexas.edu) 1 Introduction In this paper we develop algorithms and lower bounds for fundamental problems on the Exclusive Read Concurrent Wri...
An Improved Lower Bound for the QRQW PRAM
 In Proc. 7th IEEE Symp. on Para. and Distr. Proc
, 1996
"... The queueread, queuewrite (QRQW) parallel random access machine (PRAM) model is a shared memory model which allows concurrent reading and writing with a time cost proportional to the contention. This is designed to model currently available parallel machines more accurately than either the CRCW PR ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
The queueread, queuewrite (QRQW) parallel random access machine (PRAM) model is a shared memory model which allows concurrent reading and writing with a time cost proportional to the contention. This is designed to model currently available parallel machines more accurately than either the CRCW PRAM or EREW PRAM models. Here we present a lower bound for the problem of Linear Approximate Compaction (LAC) on the QRQW PRAM. The input to LAC consists of at most m marked items in an array of size n, and the output consists of the marked items in an array of size O(m). There is an O( p log n) expected time randomized algorithm for LAC on the QRQW PRAM. We prove a lower bound of \Omega\Gamma/66 log n) expected time for any randomized algorithm for LAC, an improvement over the previous best bound of \Omega\Gamma/11 log log n). Our bound applies regardless of the number of processors and memory cells of the QRQW PRAM. 1 Introduction The PRAM model of computation has been the most widely us...