Results 1  10
of
73
Complexity Measures and Decision Tree Complexity: A Survey
 Theoretical Computer Science
, 2000
"... We discuss several complexity measures for Boolean functions: certificate complexity, sensitivity, block sensitivity, and the degree of a representing or approximating polynomial. We survey the relations and biggest gaps known between these measures, and show how they give bounds for the decision tr ..."
Abstract

Cited by 122 (15 self)
 Add to MetaCart
We discuss several complexity measures for Boolean functions: certificate complexity, sensitivity, block sensitivity, and the degree of a representing or approximating polynomial. We survey the relations and biggest gaps known between these measures, and show how they give bounds for the decision tree complexity of Boolean functions on deterministic, randomized, and quantum computers. 1 Introduction Computational Complexity is the subfield of Theoretical Computer Science that aims to understand "how much" computation is necessary and sufficient to perform certain computational tasks. For example, given a computational problem it tries to establish tight upper and lower bounds on the length of the computation (or on other resources, like space). Unfortunately, for many, practically relevant, computational problems no tight bounds are known. An illustrative example is the well known P versus NP problem: for all NPcomplete problems the current upper and lower bounds lie exponentially ...
CommunicationEfficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparisonbased sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
Contention in Shared Memory Algorithms
, 1993
"... Most complexitymeasures for concurrent algorithms for asynchronous sharedmemory architectures focus on process steps and memory consumption. In practice, however, performance of multiprocessor algorithms is heavily influenced by contention, the extent to which processes access the same location at t ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
Most complexitymeasures for concurrent algorithms for asynchronous sharedmemory architectures focus on process steps and memory consumption. In practice, however, performance of multiprocessor algorithms is heavily influenced by contention, the extent to which processes access the same location at the same time. Nevertheless, even though contention is one of the principal considerations affecting the performance of real algorithms on real multiprocessors, there are no formal tools for analyzing the contention of asynchronous sharedmemory algorithms. This paper introduces the first formal complexity model for contention in multiprocessors. We focus on the standard multiprocessor architecture in which n asynchronous processes communicate by applying read, write, and readmodifywrite operations to a shared memory. We use our model to derive two kinds of results: (1) lower bounds on contention for well known basic problems such as agreement and mutual exclusion, and (2) tradeoffs betwe...
Hundreds of Impossibility Results for Distributed Computing
 Distributed Computing
, 2003
"... We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refe ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
We survey results from distributed computing that show tasks to be impossible, either outright or within given resource bounds, in various models. The parameters of the models considered include synchrony, faulttolerance, different communication media, and randomization. The resource bounds refer to time, space and message complexity. These results are useful in understanding the inherent difficulty of individual problems and in studying the power of different models of distributed computing.
On the Number of Rounds Necessary to Disseminate Information (Extended Abstract)
 In First ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1989
"... ) S. Even 13 B. Monien 2 Abstract We study how efficiently information can be spread in a communication network and ask how many rounds it takes until all processors know all pieces of information. This problem has a wellknown solution in the "telephone communication mode", where in each round each ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
) S. Even 13 B. Monien 2 Abstract We study how efficiently information can be spread in a communication network and ask how many rounds it takes until all processors know all pieces of information. This problem has a wellknown solution in the "telephone communication mode", where in each round each processor can send or receive only via one of its links and the communication is two way. For the "telegraph communication node", where in each round also each processor is active only via one of its links and the communication is one way, i.e. each processor can either send or receive, up to now only an upper bound was known. We prove a lower bound which differs from the upper bound at most by an additive constant of 1. Our lower bound technique uses elements from matrix theory, especially matrix norms. This result shows for the first time that in the two way mode information can be distributed faster than in the one way mode. We also apply our upper and lower bound techniques for chara...
Parallel RAMs with Owned Global Memory and Deterministic ContextFree Language Recognition
, 1997
"... We identify and study a natural and frequently occurring subclass of ConcurrentRead, ExclusiveWrite Parallel Random Access Machines (CREWPRAMs). Called ConcurrentRead, OwnerWrite, or CROWPRAMs, these are machines in which each global memory location is assigned a unique "owner" processor, whi ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
We identify and study a natural and frequently occurring subclass of ConcurrentRead, ExclusiveWrite Parallel Random Access Machines (CREWPRAMs). Called ConcurrentRead, OwnerWrite, or CROWPRAMs, these are machines in which each global memory location is assigned a unique "owner" processor, which is the only processor allowed to write into it. Considering the difficulties that would be involved in physically realizing a full CREWPRAM model, it is interesting to observe that in fact, most known CREWPRAM algorithms satisfy the CROW restriction or can be easily modified to do so. This paper makes three main contributions. First, we formally define the CROWPRAM model and demonstrate its stability
Parallel Sorting With Limited Bandwidth
 in Proc. 7th ACM Symp. on Parallel Algorithms and Architectures
, 1995
"... We study the problem of sorting on a parallel computer with limited communication bandwidth. By using the recently proposed PRAM(m) model, where p processors communicate through a small, globally shared memory consisting of m bits, we focus on the tradeoff between the amount of local computation an ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
We study the problem of sorting on a parallel computer with limited communication bandwidth. By using the recently proposed PRAM(m) model, where p processors communicate through a small, globally shared memory consisting of m bits, we focus on the tradeoff between the amount of local computation and the amount of interprocessor communication required for parallel sorting algorithms. We prove a lower bound of \Omega\Gamma n log m m ) on the time to sort n numbers in an exclusiveread variant of the PRAM(m) model. We show that Leighton's Columnsort can be used to give an asymptotically matching upper bound in the case where m grows as a fractional power of n. The bounds are of a surprising form, in that they have little dependence on the parameter p. This implies that attempting to distribute the workload across more processors while holding the problem size and the size of the shared memory fixed will not improve the optimal running time of sorting in this model. We also show that bot...
Combining Tentative and Definite Executions for Very Fast Dependable Parallel Computing (Extended Abstract)
, 1991
"... We present a general and efficient strategy for computing robustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly be restored later. We first introduce the no ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
We present a general and efficient strategy for computing robustly on unreliable parallel machines. The model of a parallel machine that we use is a CRCW PRAM with dynamic resource fluctuations: processors can fail during the computation, and may possibly be restored later. We first introduce the notions of definite and tentative algorithms for executing a single parallel step of an ideal parallel machine on the unreliable machine. A definite algorithm is one that guarantees a correct execution of a
The QueueRead QueueWrite PRAM Model: Accounting for Contention in Parallel Algorithms
 Proc. 5th ACMSIAM Symp. on Discrete Algorithms
, 1997
"... Abstract. This paper introduces the queueread queuewrite (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to sharedmemory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to thi ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
Abstract. This paper introduces the queueread queuewrite (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to sharedmemory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models: the crcw model does not adequately penalize algorithms with high contention to sharedmemory locations, while the erew model is too strict in its insistence on zero contention at each step. The�qrqw pram is strictly more powerful than the erew pram. This paper shows a separation of log n between the two models, and presents faster and more efficient qrqw algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a workpreserving emulation of the qrqw pram with only logarithmic slowdown on Valiant’s bsp model, and hence on hypercubetype noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the bestknown emulation result for the erew pram, and considerably improves upon the bestknown efficient emulation for the crcw pram on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.
Massively parallel algorithms for privatecache chip multiprocessors. Under submission
, 2008
"... In this paper, we study massively parallel algorithms for privatecache chip multiprocessors (CMPs), focusing on methods for foundational problems that can scale to hundreds or even thousands of cores. By focusing on privatecache CMPs, we show that we can design efficient algorithms that need no add ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
In this paper, we study massively parallel algorithms for privatecache chip multiprocessors (CMPs), focusing on methods for foundational problems that can scale to hundreds or even thousands of cores. By focusing on privatecache CMPs, we show that we can design efficient algorithms that need no additional assumptions about the way that cores are interconnected, for we assume that all interprocessor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present two sorting algorithms, a distribution sort and a mergesort. All algorithms in the paper are asymptotically optimal in terms of the parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks. In addition, we study sorting lower bounds in a computational model, which we call the parallel externalmemory (PEM) model, that formalizes the essential properties of our algorithms for privatecache chip multiprocessors. [Regular paper submission to SPAA 2008, which may be considered for a normal track or the special track on multicore systems.] ∗ Center for Massive Data Algorithmics – a Center of the Danish National Research Foundation 1