Results 1 
3 of
3
Sequential Random Permutation, List Contraction and Tree Contraction are Highly Parallel
"... We show that simple sequential randomized iterative algorithms for random permutation, list contraction, and tree contraction are highly parallel. In particular, if iterations of the algorithms are run as soon as all of their dependencies have been resolved, the resulting computations have logarit ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We show that simple sequential randomized iterative algorithms for random permutation, list contraction, and tree contraction are highly parallel. In particular, if iterations of the algorithms are run as soon as all of their dependencies have been resolved, the resulting computations have logarithmic depth (parallel time) with high probability. Our proofs make an interesting connection between the dependence structure of two of the problems and random binary trees. Building upon this analysis, we describe linearwork, polylogarithmicdepth algorithms for the three problems. Although asymptotically no better than the many prior parallel algorithms for the given problems, their advantages include very simple and fast implementations, and returning the same result as the sequential algorithm. Experiments on a 40core machine show reasonably good performance relative to the sequential algorithms. 1
Is Your Permutation Algorithm Unbiased for n ̸ = 2 m?
"... Abstract. Many papers on parallel random permutation algorithms assume the input size n to be a power of two and imply that these algorithms can be easily generalized to arbitrary n, e.g., by padding the input array to a power of two. We show that this simplifying assumption is not necessarily corre ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Many papers on parallel random permutation algorithms assume the input size n to be a power of two and imply that these algorithms can be easily generalized to arbitrary n, e.g., by padding the input array to a power of two. We show that this simplifying assumption is not necessarily correct since it may result in a bias (i.e., not all possible permutations are generated with equal likelihood). Many of these algorithms are, however, consistent, i.e., iterating them ultimately converges against an unbiased permutation. We prove this convergence along with proving exponential convergence speed. Furthermore, we present an analysis of iterating applied to a butterfly permutation network, which works inplace and is wellsuited for implementation on manycore systems such as GPUs. We also show a method that improves the convergence speed even further and yields a practical implementation of the permutation network on current GPUs.
Sharedmemory parallelism can be simple, . . .
, 2015
"... Parallelism is the key to achieving high performance in computing. However, writing efficient and scalable parallel programs is notoriously difficult, and often requires significant expertise. To address this challenge, it is crucial to provide programmers with highlevel tools to enable them to de ..."
Abstract
 Add to MetaCart
Parallelism is the key to achieving high performance in computing. However, writing efficient and scalable parallel programs is notoriously difficult, and often requires significant expertise. To address this challenge, it is crucial to provide programmers with highlevel tools to enable them to develop solutions efficiently, and at the same time emphasize the theoretical and practical aspects of algorithm design to allow the solutions developed to run efficiently under all possible settings. This thesis addresses this challenge using a threepronged approach consisting of the design of sharedmemory programming techniques, frameworks, and algorithms for important problems in computing. The thesis provides evidence that with appropriate programming techniques, frameworks, and algorithms, sharedmemory programs can be simple, fast, and scalable, both in theory and in practice. The results developed in this thesis serve to ease the transition into the multicore era. The first part of this thesis introduces tools and techniques for deterministic