Results 1  10
of
15
Efficient LowContention Parallel Algorithms
 the 1994 ACM Symp. on Parallel Algorithms and Architectures
, 1994
"... The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention prope ..."
Abstract

Cited by 30 (12 self)
 Add to MetaCart
The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models, and can be efficiently emulated with only logarithmic slowdown on hypercubetype noncombining networks. This paper describes fast, lowcontention, workoptimal, randomized qrqw pram algorithms for the fundamental problems of load balancing, multiple compaction, generating a random permutation, parallel hashing, and distributive sorting. These logarithmic or sublogarithmic time algorithms considerably improve upon the best known erew pram algorithms for these problems, while avoiding the highcontention steps typical of crcw pram algorithms. An illustrative expe...
Delayed path coupling and generating random permutations via distributed stochastic processes
, 1999
"... We analyze various stochastic processes for generating permutations almost uniformly at random in distributed and parallel systems. All our protocols are simple, elegant and are based on performing disjoint transpositions executed in parallel. The challenging problem of our concern is to prove that ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
We analyze various stochastic processes for generating permutations almost uniformly at random in distributed and parallel systems. All our protocols are simple, elegant and are based on performing disjoint transpositions executed in parallel. The challenging problem of our concern is to prove that the output configurations in our processes reach almost uniform probability distribution very rapidly, i.e. in a (low) polylogarithmic time. For the analysis of the aforementioned protocols we develop a novel technique, called delayed path coupling, for proving rapid mixing of Markov chains. Our approach is an extension of the path coupling method of Bubley and Dyer. We apply delayed path coupling to three stochastic processes for generating random permutations. For one
The Parallel Complexity of Growth Models
 Journal of Statistical Physics
, 1994
"... This paper investigates the parallel complexity of several nonequilibrium growth models. Invasion percolation, Eden growth, ballistic deposition and solidonsolid growth are all seemingly highly sequential processes that yield selfsimilar or selfaffine random clusters. Nonetheless, we present f ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
This paper investigates the parallel complexity of several nonequilibrium growth models. Invasion percolation, Eden growth, ballistic deposition and solidonsolid growth are all seemingly highly sequential processes that yield selfsimilar or selfaffine random clusters. Nonetheless, we present fast parallel randomized algorithms for generating these clusters. The running times of the algorithms scale as O(log 2 N ), where N is the system size, and the number of processors required scale as a polynomial in N . The algorithms are based on fast parallel procedures for finding minimum weight paths; they illuminate the close connection between growth models and selfavoiding paths in random environments. In addition to their potential practical value, our algorithms serve to classify these growth models as less complex than other growth models, such as diffusionlimited aggregation, for which fast parallel algorithms probably do not exist. Keywords: Ballistic deposition, Computationa...
Fast Generation of Random Permutations Via Networks Simulation
 ALGORITHMICA
, 1998
"... We consider the problem of generating random permutations with uniform distribution. That is, we require that for an arbitrary permutation π of n elements, with probability 1/n! the machine halts with the ith output cell containing π(i), for 1 ≤ i ≤ n. We study this problem on two models of parall ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We consider the problem of generating random permutations with uniform distribution. That is, we require that for an arbitrary permutation π of n elements, with probability 1/n! the machine halts with the ith output cell containing π(i), for 1 ≤ i ≤ n. We study this problem on two models of parallel computations: the CREW PRAM and the EREW PRAM. The main result of the paper is an algorithm for generating random permutations that runs in O(log log n) time and uses O(n1+o(1) ) processors on the CREW PRAM. This is the first o(log n)time CREW PRAM algorithm for this problem. On the EREW PRAM we present a simple algorithm that generates a random permutation in time O(log n) using n processors and O(n) space. This algorithm outperforms each of the previously known algorithms for the exclusive write PRAMs. The common and novel feature of both our algorithms is first to design a suitable random switching network generating a permutation and then to simulate this network on the PRAM model in a fast way.
Randomization helps to perform independent tasks reliably, Random Structures and Algorithms
"... This paper is about algorithms that schedule tasks to be performed in a distributed failureprone environment, when processors communicate by messagepassing, and when tasks are independent and of unit length. The processors work under synchrony and may fail by crashing. Failure patterns are imposed ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
This paper is about algorithms that schedule tasks to be performed in a distributed failureprone environment, when processors communicate by messagepassing, and when tasks are independent and of unit length. The processors work under synchrony and may fail by crashing. Failure patterns are imposed by adversaries. The question how the power of adversaries affects the optimality of randomized algorithmic solutions is among the problems studied. Linearlybounded adversaries may fail up to a constant fraction of the processors. Weaklyadaptive adversaries have to select, prior to the start of an execution, a subset of processors to be failureprone, and then may fail only the selected processors, at arbitrary steps, in the course of the execution. Strongly adaptive adversaries have a total number of failures as the only restriction on failure patterns. The measures of complexity are work, measured as the available processor steps, and communication, measured as the number of pointtopoint messages. A randomized algorithm is developed, that attains both O(n log ∗ n) expected work and O(n log ∗ n) expected communication, against weaklyadaptive linearlybounded adversaries, in the case when the numbers of tasks and processors are both equal to n. This is in contrast with the performance of algorithms against stronglyadaptive linearlybounded adversaries, that has to be Ω(n log n / log log n) in terms of work. Key words: distributed algorithm, randomized algorithm, message passing, crash failures, adaptive adversary, independent tasks, load balancing, lower bound.
The complexity of distributions
, 2010
"... Complexity theory typically studies the complexity of computing a function h(x): {0, 1} m → {0, 1} n of a given input x. We advocate the study of the complexity of generating the distribution h(x) for uniform x, given random bits. Our main results are: 1. Any function f: {0, 1} ℓ → {0, 1} n such tha ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Complexity theory typically studies the complexity of computing a function h(x): {0, 1} m → {0, 1} n of a given input x. We advocate the study of the complexity of generating the distribution h(x) for uniform x, given random bits. Our main results are: 1. Any function f: {0, 1} ℓ → {0, 1} n such that (i) each output bit fi () depends on n o(log n) input bits, and (ii) ℓ ≤ log2 αn + n0.99, has output distribution f(U) at statistical distance ≥ 1 − 1/n 0.49 from the uniform distribution over nbit strings of hamming weight αn. We also prove lower bounds for generating (X, b(X)) for boolean b, and in the case in which each bit fi is a smalldepth decision tree. These lower bounds seem to be the first of their kind; the proofs use anticoncentration results for the sum of random variables. 2. Lower bounds for generating distributions imply succinct data structures lower bounds. As a corollary of (1), we obtain the first lower bound for the membership problem of representing a set S ⊆ [n] of size αn, in the case where 1/α is a power of 2: If queries “i ∈ S? ” are answered by nonadaptively probing o(log n) bits, then the representation uses ≥ log 2 n αn + Ω(log n) bits. 3. Upper bounds complementing the bounds in (1) for various settings of parameters. 4. Uniform randomized AC 0 circuits of poly(n) size and depth d = O(1) with error ɛ can be simulated by uniform randomized AC 0 circuits of poly(n) size and depth d + 1 with error ɛ + o(1) using ≤ (log n) O(log log n) random bits. Previous derandomizations [Ajtai and Wigderson ’85; Nisan ’91] increase the depth by a constant factor, or else have poor seed length. Supported by NSF grant CCF0845003.
Randomized RangeMaxima In NearlyConstant Parallel Time
, 1992
"... . Given an array of n input numbers, the rangemaxima problem is that of preprocessing the data so that queries of the type "what is the maximum value in subarray [i::j]" can be answered quickly using one processor. We present a randomized preprocessing algorithm that runs in O(log n) t ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
. Given an array of n input numbers, the rangemaxima problem is that of preprocessing the data so that queries of the type "what is the maximum value in subarray [i::j]" can be answered quickly using one processor. We present a randomized preprocessing algorithm that runs in O(log n) time with high probability, using an optimal number of processors on a CRCW PRAM; each query can be processed in constant time by one processor. We also present a randomized algorithm for a parallel comparison model. Using an optimal number of processors, the preprocessing algorithm runs in O(ff(n)) time with high probability; each query can be processed in O(ff(n)) time by one processor. (As is standard, ff(n) is the inverse of Ackermann function.) A constant time query can be achieved by some slowdown in the performance of the preprocessing stage. Key words. parallel algorithms; randomized algorithms; PRAM; comparison model; range maximum; prefix maximum. Subject classifications. 68Q20. 1 APPEARED...
Efficient Sampling of Random Permutations
 in &quot;J. on Discrete Algorithms&quot;, accepted
, 2005
"... We show how to uniformly distribute data at random (not to be confounded with permutation routing) in two settings that are able to deal with massive data: coarse grained parallelism and external memory. In contrast to previously known work for parallel setups, our method is able to fulfill the thre ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We show how to uniformly distribute data at random (not to be confounded with permutation routing) in two settings that are able to deal with massive data: coarse grained parallelism and external memory. In contrast to previously known work for parallel setups, our method is able to fulfill the three criteria of uniformity, workoptimality and balance among the processors simultaneously. To guarantee the uniformity we investigate the matrix of communication requests between the processors. We show that its distribution is a generalization of the multivariate hypergeometric distribution and we give algorithms to sample it efficiently in the two settings. Key words: random permutations, random shuffling, coarse grained parallelism, external memory algorithms, uniformly generated communication matrix 1
BroadcastEfficient Secure Multiparty Computation
"... Secure multiparty computation (MPC) is perhaps the most popular paradigm in the area of cryptographic protocols. It allows several mutually untrustworthy parties to jointly compute a function of their private inputs, without revealing to each other information about those inputs. In the case of unco ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Secure multiparty computation (MPC) is perhaps the most popular paradigm in the area of cryptographic protocols. It allows several mutually untrustworthy parties to jointly compute a function of their private inputs, without revealing to each other information about those inputs. In the case of unconditional (informationtheoretic) security, protocols are known which tolerate a dishonest minority of players, who may coordinate their attack and deviate arbitrarily from the protocol specification. It is typically assumed in these results that parties are connected pairwise by authenticated, private channels, and that in addition they have access to a “broadcast ” channel. Broadcast allows one party to send a consistent message to all other parties, guaranteeing consistency even if the broadcaster is corrupted. Because broadcast cannot be simulated on the pointtopoint network when more than a third of the parties are corrupt, it is impossible to construct general MPC protocols in this setting without using a broadcast channel (or some equivalent addition to the model). A great deal of research has focused on increasing the efficiency of MPC, primarily in terms of round complexity and communication complexity. In this work we propose a refinement of the round complexity which we term broadcast complexity. We view the broadcast channel as an expensive resource