Results 1  10
of
122
A Comparison of Sorting Algorithms for the Connection Machine CM2
"... We have implemented three parallel sorting algorithms on the Connection Machine Supercomputer model CM2: Batcher's bitonic sort, a parallel radix sort, and a sample sort similar to Reif and Valiant's flashsort. We have also evaluated the implementation of many other sorting algorithms pro ..."
Abstract

Cited by 185 (7 self)
 Add to MetaCart
(Show Context)
We have implemented three parallel sorting algorithms on the Connection Machine Supercomputer model CM2: Batcher's bitonic sort, a parallel radix sort, and a sample sort similar to Reif and Valiant's flashsort. We have also evaluated the implementation of many other sorting algorithms proposed in the literature. Our computational experiments show that the sample sort algorithm, which is a theoretically efficient "randomized" algorithm, is the fastest of the three algorithms on large data sets. On a 64Kprocessor CM2, our sample sort implementation can sort 32 10 6 64bit keys in 5.1 seconds, which is over 10 times faster than the CM2 library sort. Our implementation of radix sort, although not as fast on large data sets, is deterministic, much simpler to code, stable, faster with small keys, and faster on small data sets (few elements per processor). Our implementation of bitonic sort, which is pipelined to use all the hypercube wires simultaneously, is the least efficient of the three on large data sets, but is the most efficient on small data sets, and is considerably more space efficient. This paper analyzes the three algorithms in detail and discusses many practical issues that led us to the particular implementations.
Efficient algorithms for geometric optimization
 ACM Comput. Surv
, 1998
"... We review the recent progress in the design of efficient algorithms for various problems in geometric optimization. We present several techniques used to attack these problems, such as parametric searching, geometric alternatives to parametric searching, pruneandsearch techniques for linear progra ..."
Abstract

Cited by 117 (12 self)
 Add to MetaCart
We review the recent progress in the design of efficient algorithms for various problems in geometric optimization. We present several techniques used to attack these problems, such as parametric searching, geometric alternatives to parametric searching, pruneandsearch techniques for linear programming and related problems, and LPtype problems and their efficient solution. We then describe a variety of applications of these and other techniques to numerous problems in geometric optimization, including facility location, proximity problems, statistical estimators and metrology, placement and intersection of polygons and polyhedra, and ray shooting and other querytype problems.
Lossless condensers, unbalanced expanders, and extractors
 In Proceedings of the 33rd Annual ACM Symposium on Theory of Computing
, 2001
"... Abstract Trevisan showed that many pseudorandom generator constructions give rise to constructionsof explicit extractors. We show how to use such constructions to obtain explicit lossless condensers. A lossless condenser is a probabilistic map using only O(log n) additional random bitsthat maps n bi ..."
Abstract

Cited by 96 (20 self)
 Add to MetaCart
(Show Context)
Abstract Trevisan showed that many pseudorandom generator constructions give rise to constructionsof explicit extractors. We show how to use such constructions to obtain explicit lossless condensers. A lossless condenser is a probabilistic map using only O(log n) additional random bitsthat maps n bits strings to poly(log K) bit strings, such that any source with support size Kis mapped almost injectively to the smaller domain. Our construction remains the best lossless condenser to date.By composing our condenser with previous extractors, we obtain new, improved extractors. For small enough minentropies our extractors can output all of the randomness with only O(log n) bits. We also obtain a new disperser that works for every entropy loss, uses an O(log n)bit seed, and has only O(log n) entropy loss. This is the best disperser construction to date,and yields other applications. Finally, our lossless condenser can be viewed as an unbalanced
Randomized routing and sorting on fixedconnection networks
 JOURNAL OF ALGORITHMS
, 1994
"... This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps ..."
Abstract

Cited by 89 (13 self)
 Add to MetaCart
(Show Context)
This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps, using constantsize queues. In this paradigm, the design of a routing algorithm is broken into three parts: (1) showing that the underlying network can emulate a leveled network, (2) designing a path selection strategy for the leveled network, and (3) applying the scheduling algorithm. This strategy yields randomized algorithms for routing and sorting in time proportional to the diameter for meshes, butterflies, shuffleexchange graphs, multidimensional arrays, and hypercubes. It also leads to the construction of an areauniversal network: an Nnode network with area Θ(N) that can simulate any other network of area O(N) with slowdown O(log N).
Expanders that Beat the Eigenvalue Bound: Explicit Construction and Applications
 Combinatorica
, 1993
"... For every n and 0 ! ffi ! 1, we construct graphs on n nodes such that every two sets of size n ffi share an edge, having essentially optimal maximum degree n 1\Gammaffi+o(1) . Using known and new reductions from these graphs, we explicitly construct: 1. A k round sorting algorithm using n 1+1=k ..."
Abstract

Cited by 87 (24 self)
 Add to MetaCart
For every n and 0 ! ffi ! 1, we construct graphs on n nodes such that every two sets of size n ffi share an edge, having essentially optimal maximum degree n 1\Gammaffi+o(1) . Using known and new reductions from these graphs, we explicitly construct: 1. A k round sorting algorithm using n 1+1=k+o(1) comparisons. 2. A k round selection algorithm using n 1+1=(2 k \Gamma1)+o(1) comparisons. 3. A depth 2 superconcentrator of size n 1+o(1) . 4. A depth k widesense nonblocking generalized connector of size n 1+1=k+o(1) . All of these results improve on previous constructions by factors of n\Omega\Gamma37 , and are optimal to within factors of n o(1) . These results are based on an improvement to the extractor construction of Nisan & Zuckerman: our algorithm extracts an asymptotically optimal number of random bits from a defective random source using a small additional number of truly random bits. 1
CommunicationEfficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
(Show Context)
We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparisonbased sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
Deterministic Sorting in Nearly Logarithmic Time on the Hypercube and Related Computers
 Journal of Computer and System Sciences
, 1996
"... This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an nprocessor hypercube, shuffleexchange, or cubeconnected cycles in O(log n (log log n) 2 ) time in the worst case. The algorithm requires only a constant amount of storage at each processor. Th ..."
Abstract

Cited by 68 (10 self)
 Add to MetaCart
(Show Context)
This paper presents a deterministic sorting algorithm, called Sharesort, that sorts n records on an nprocessor hypercube, shuffleexchange, or cubeconnected cycles in O(log n (log log n) 2 ) time in the worst case. The algorithm requires only a constant amount of storage at each processor. The fastest previous deterministic algorithm for this problem was Batcher's bitonic sort, which runs in O(log 2 n) time. Supported by an NSERC postdoctoral fellowship, and DARPA contracts N0001487K825 and N00014 89J1988. 1 Introduction Given n records distributed uniformly over the n processors of some fixed interconnection network, the sorting problem is to route the record with the ith largest associated key to processor i, 0 i ! n. One of the earliest parallel sorting algorithms is Batcher's bitonic sort [3], which runs in O(log 2 n) time on the hypercube [10], shuffleexchange [17], and cubeconnected cycles [14]. More recently, Leighton [9] exhibited a boundeddegree,...
Online algorithms for path selection in a nonblocking network
 SIAM Journal on Computing
, 1996
"... This paper presents the first optimaltime algorithms for path selection in an optimalsize nonblocking network. In particular, we describe an Ninput, Noutput, nonblocking network with O(N log N) boundeddegree nodes, and an algorithm that can satisfy any request for a connection or disconnection ..."
Abstract

Cited by 63 (14 self)
 Add to MetaCart
This paper presents the first optimaltime algorithms for path selection in an optimalsize nonblocking network. In particular, we describe an Ninput, Noutput, nonblocking network with O(N log N) boundeddegree nodes, and an algorithm that can satisfy any request for a connection or disconnection between an input and an output in O(log N) bit steps, even if many requests are made at once. Viewed in a telephone switching context, the algorithm can put through any set of calls among N parties in O(log N) bit steps, even if many calls are placed simultaneously. Parties can hang up and call again whenever they like; every call is still put through O(log N) bit steps after being placed. Viewed in a distributed memory machine context, our algorithm allows any processor to access any idle block of memory within O(log N) bit steps, no matter what other connections have been made previously or are being made simultaneously.
PrivacyPreserving Access of Outsourced Data via Oblivious RAM Simulation. ArXiv eprints, April 2011. Eprint 1007.1259v2
"... Suppose a client, Alice, has outsourced her data to an external storage provider, Bob, because he has capacity for her massive data set, of size n, whereas her private storage is much smaller—say, of size O(n1/r), for some constant r> 1. Alice trusts Bob to maintain her data, but she would like t ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
(Show Context)
Suppose a client, Alice, has outsourced her data to an external storage provider, Bob, because he has capacity for her massive data set, of size n, whereas her private storage is much smaller—say, of size O(n1/r), for some constant r> 1. Alice trusts Bob to maintain her data, but she would like to keep its contents private. She can encrypt her data, of course, but she also wishes to keep her access patterns hidden from Bob as well. We describe schemes for the oblivious RAM simulation problem with a small logarithmic or polylogarithmic amortized increase in access times, with a very high probability of success, while keeping the external storage to be of size O(n). To achieve this, our algorithmic contributions include a parallel MapReduce cuckoohashing algorithm and an externalmemory dataoblivious sorting algorithm.
Two applications of inductive counting for complementation problems
 SIAM Journal of Computing
, 1989
"... nondeterministic spacebounded complexity classes are closed under complementation, two further applications of the inductive counting technique are developed. First, an errorless probabilistic algorithm for the undirected graph st connectivity problem that runs in O(log n) space and polynomial exp ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
(Show Context)
nondeterministic spacebounded complexity classes are closed under complementation, two further applications of the inductive counting technique are developed. First, an errorless probabilistic algorithm for the undirected graph st connectivity problem that runs in O(log n) space and polynomial expected time is given. Then it is shown that the class LOGCFL is closed under complementation. The latter is a special case of a general result that shows closure under complementation of classes defined by semiunbounded fanin circuits (or, equivalently, nondeterministic auxiliary pushdown automata or treesize bounded alternating Turing machines). As one consequence, it is shown that small numbers of "role switches " in twoperson pebbling can be eliminated.