Results 1 - 10
of
73
Complexity Measures and Decision Tree Complexity: A Survey
- Theoretical Computer Science
, 2000
"... We discuss several complexity measures for Boolean functions: certificate complexity, sensitivity, block sensitivity, and the degree of a representing or approximating polynomial. We survey the relations and biggest gaps known between these measures, and show how they give bounds for the decision tr ..."
Abstract
-
Cited by 95 (13 self)
- Add to MetaCart
We discuss several complexity measures for Boolean functions: certificate complexity, sensitivity, block sensitivity, and the degree of a representing or approximating polynomial. We survey the relations and biggest gaps known between these measures, and show how they give bounds for the decision tree complexity of Boolean functions on deterministic, randomized, and quantum computers. 1 Introduction Computational Complexity is the subfield of Theoretical Computer Science that aims to understand "how much" computation is necessary and sufficient to perform certain computational tasks. For example, given a computational problem it tries to establish tight upper and lower bounds on the length of the computation (or on other resources, like space). Unfortunately, for many, practically relevant, computational problems no tight bounds are known. An illustrative example is the well known P versus NP problem: for all NP-complete problems the current upper and lower bounds lie exponentially ...
Communication-Efficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a p-processor bulk-synchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processor-to-processor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
We study the problem of sorting n numbers on a p-processor bulk-synchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processor-to-processor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparison-based sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
Contention in Shared Memory Algorithms
, 1993
"... Most complexitymeasures for concurrent algorithms for asynchronous sharedmemory architectures focus on process steps and memory consumption. In practice, however, performance of multiprocessor algorithms is heavily influenced by contention, the extent to which processes access the same location at t ..."
Abstract
-
Cited by 57 (1 self)
- Add to MetaCart
Most complexitymeasures for concurrent algorithms for asynchronous sharedmemory architectures focus on process steps and memory consumption. In practice, however, performance of multiprocessor algorithms is heavily influenced by contention, the extent to which processes access the same location at the same time. Nevertheless, even though contention is one of the principal considerations affecting the performance of real algorithms on real multiprocessors, there are no formal tools for analyzing the contention of asynchronous shared-memory algorithms. This paper introduces the first formal complexity model for contention in multiprocessors. We focus on the standard multiprocessor architecture in which n asynchronous processes communicate by applying read, write, and read-modify-write operations to a shared memory. We use our model to derive two kinds of results: (1) lower bounds on contention for well known basic problems such as agreement and mutual exclusion, and (2) trade-offs betwe...
Parallel Algorithms for Higher-Dimensional Convex Hulls
"... We give fast randomized and deterministic parallel meth-ods for constructing convex hulls in IR d, for any fixed d. Our methods are for the weakest shared-memory model,the EREW PRAM, and have optimal work bounds (with high probability for the randomized methods). In partic-ular, we show that the co ..."
Abstract
-
Cited by 42 (14 self)
- Add to MetaCart
We give fast randomized and deterministic parallel meth-ods for constructing convex hulls in IR d, for any fixed d. Our methods are for the weakest shared-memory model,the EREW PRAM, and have optimal work bounds (with high probability for the randomized methods). In partic-ular, we show that the convex hull of n points in IRd canbe constructed in O(log n) time using O(n log n + nbd=2c)work, with high probability. We also show that it can be constructed deterministically in O(log2 n) time using O(n log n) work for d = 3 and in O(log n) time using O(nbd=2c logc(dd=2e\Gamma bd=2c) n) work, for d * 4, where c? 0is a constant, which is optimal for even d * 4. We also showhow to make our 3-dimensional methods output-sensitive with only a small increase in running time.These methods can be applied to other problems as well. A variation of the convex hull algorithm for even dimen-sions deterministically constructs a (1=r)-cutting of n hy-perplanes in IR d in O(log n) time using optimal O(nrd\Gamma 1) work; when r = n, we obtain their arrangement and a pointlocation data structure for it. With appropriate modifications, our deterministic 3-dimensional convex hull algorithmcan be used to compute, in the same resource bounds, the intersection of n balls of equal radius in R³. This leads to asequential algorithm for computing the diameter of a point set in IR3 with running time O(n log³ n), which is arguablysimpler than an algorithm with the same running time by Brönnimann et al.
Hundreds of Impossibility Results for Distributed Computing
- Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, fault-tolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, fault-tolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
On the Number of Rounds Necessary to Disseminate Information (Extended Abstract)
- In First ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1989
"... ) S. Even 13 B. Monien 2 Abstract We study how efficiently information can be spread in a communication network and ask how many rounds it takes until all processors know all pieces of information. This problem has a wellknown solution in the "telephone communication mode", where in each round each ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
) S. Even 13 B. Monien 2 Abstract We study how efficiently information can be spread in a communication network and ask how many rounds it takes until all processors know all pieces of information. This problem has a wellknown solution in the "telephone communication mode", where in each round each processor can send or receive only via one of its links and the communication is two way. For the "telegraph communication node", where in each round also each processor is active only via one of its links and the communication is one way, i.e. each processor can either send or receive, up to now only an upper bound was known. We prove a lower bound which differs from the upper bound at most by an additive constant of 1. Our lower bound technique uses elements from matrix theory, especially matrix norms. This result shows for the first time that in the two way mode information can be distributed faster than in the one way mode. We also apply our upper and lower bound techniques for chara...
Parallel RAMs with Owned Global Memory and Deterministic Context-Free Language Recognition
, 1997
"... We identify and study a natural and frequently occurring subclass of Concurrent-Read, Exclusive-Write Parallel Random Access Machines (CREW-PRAMs). Called Concurrent-Read, Owner-Write, or CROW-PRAMs, these are machines in which each global memory location is assigned a unique "owner" processor, whi ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
We identify and study a natural and frequently occurring subclass of Concurrent-Read, Exclusive-Write Parallel Random Access Machines (CREW-PRAMs). Called Concurrent-Read, Owner-Write, or CROW-PRAMs, these are machines in which each global memory location is assigned a unique "owner" processor, which is the only processor allowed to write into it. Considering the difficulties that would be involved in physically realizing a full CREW-PRAM model, it is interesting to observe that in fact, most known CREW-PRAM algorithms satisfy the CROW restriction or can be easily modified to do so. This paper makes three main contributions. First, we formally define the CROW-PRAM model and demonstrate its stability
Parallel Sorting With Limited Bandwidth
- in Proc. 7th ACM Symp. on Parallel Algorithms and Architectures
, 1995
"... We study the problem of sorting on a parallel computer with limited communication bandwidth. By using the recently proposed PRAM(m) model, where p processors communicate through a small, globally shared memory consisting of m bits, we focus on the trade-off between the amount of local computation an ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
We study the problem of sorting on a parallel computer with limited communication bandwidth. By using the recently proposed PRAM(m) model, where p processors communicate through a small, globally shared memory consisting of m bits, we focus on the trade-off between the amount of local computation and the amount of interprocessor communication required for parallel sorting algorithms. We prove a lower bound of \Omega\Gamma n log m m ) on the time to sort n numbers in an exclusive-read variant of the PRAM(m) model. We show that Leighton's Columnsort can be used to give an asymptotically matching upper bound in the case where m grows as a fractional power of n. The bounds are of a surprising form, in that they have little dependence on the parameter p. This implies that attempting to distribute the workload across more processors while holding the problem size and the size of the shared memory fixed will not improve the optimal running time of sorting in this model. We also show that bot...
Combining Tentative and Definite Executions for Very Fast Dependable Parallel Computing (Extended Abstract)
, 1991
"... We present a general and efficient strategy for computing robustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly be restored later. We first introduce the no ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
We present a general and efficient strategy for computing robustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly be restored later. We first introduce the notions of definite and tentative algorithms for executing a single parallel step of an ideal parallel machine on the unreliable machine. A definite algorithm is one that guarantees a correct execution of a
The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms
- Proc. 5th ACM-SIAM Symp. on Discrete Algorithms
, 1997
"... Abstract. This paper introduces the queue-read queue-write (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to thi ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
Abstract. This paper introduces the queue-read queue-write (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the well-studied crcw pram or erew pram models: the crcw model does not adequately penalize algorithms with high contention to shared-memory locations, while the erew model is too strict in its insistence on zero contention at each step. The�qrqw pram is strictly more powerful than the erew pram. This paper shows a separation of log n between the two models, and presents faster and more efficient qrqw algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a work-preserving emulation of the qrqw pram with only logarithmic slowdown on Valiant’s bsp model, and hence on hypercube-type noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the bestknown emulation result for the erew pram, and considerably improves upon the best-known efficient emulation for the crcw pram on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.

