## Ultra-fast expected time parallel algorithms (1991)

### Cached

### Download Links

- [www.eecs.umich.edu]
- [web.eecs.umich.edu]
- [www.eecs.umich.edu]
- [www.eecs.umich.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Proc. of the 2nd SODA |

Citations: | 20 - 3 self |

### BibTeX

@INPROCEEDINGS{Mackenzie91ultra-fastexpected,

author = {Philip D. Mackenzie and Quentin F. Stout},

title = {Ultra-fast expected time parallel algorithms},

booktitle = {Proc. of the 2nd SODA},

year = {1991},

pages = {414--423}

}

### Years of Citing Articles

### OpenURL

### Abstract

It has been shown previously that sorting n items into n locations with a polynomial number of processors requires Ω(log n/log log n) time. We sidestep this lower bound with the idea of Padded Sorting, or sorting n items into n + o(n) locations. Since many problems do not rely on the exact rank of sorted items, a Padded Sort is often just as useful as an unpadded sort. Our algorithm for Padded Sort runs on the Tolerant CRCW PRAM and takes Θ(log log n/log log log n) expected time using n log log log n/log log n processors, assuming the items are taken from a uniform distribution. Using similar techniques we solve some computational geometry problems, including Voronoi Diagram, with the same processor and time bounds, assuming points are taken from a uniform distribution in the unit square. Further, we present an Arbitrary CRCW PRAM algorithm to solve the Closest Pair problem in constant expected time with n processors regardless of the distribution of points. All of these algorithms achieve linear speedup in expected time over their optimal serial counterparts. 1 Research done while at the University of Michigan and supported by an AT&T Fellowship.

### Citations

1764 |
Computational Geometry: An Introduction
- Preparata, Shamos
- 1985
(Show Context)
Citation Context ... S contains O(n 0:3 ) points, all the Outer Voronoi Cells can be found simultaneously, and this completes the construction of the Voronoi Diagram. 4.4 Delaunay Triangulation From Preparata and Shamos =-=[28]-=-, we know that the Delaunay Triangulation is simply the straight line dual of the Voronoi Diagram. Thus we can find the Voronoi Diagram as above, and easily construct the dual in \Theta(log log n= log... |

284 |
Parallel merge sort
- Cole
- 1988
(Show Context)
Citation Context ... product of an algorithm is O(n), the PT-optimality is obvious, and we will sometimes omit this indication. Leighton's [25] modification to the AKS sorting network [3], and Cole's parallel merge sort =-=[13]-=- both use n processors and achieve \Theta(log n) worst case time for sorting, which is PT-optimal. We note that any PT-optimal algorithm for sorting must use at least log n time [4]. Reischuk [33] giv... |

256 |
Fast probabilistic algorithms for hamiltonian circuits and matchings
- Angluin, Valiant
- 1979
(Show Context)
Citation Context ...which is the sum of n independent random variables. For a binomial random varible Z �� B(n; p), where Z is the sum of n independent Bernoulli trials with probability of success p, Angluin and Vali=-=ant [5]-=- show that for 0 ! fi ! 1, one can obtain the bounds P (Zs(1 + fi)np)se \Gammafi 2 np=3 ; and P (Zs(1 \Gamma fi)np)se \Gammafi 2 np=2 : From this we obtain the bound P (Zs2np)s2 \Gamma4np=9 : Also, fo... |

209 |
An O(n log n) sorting network
- Ajtai, Komlós, et al.
- 1983
(Show Context)
Citation Context ...lower bound. When the processor time product of an algorithm is O(n), the PT-optimality is obvious, and we will sometimes omit this indication. Leighton's [25] modification to the AKS sorting network =-=[3]-=-, and Cole's parallel merge sort [13] both use n processors and achieve \Theta(log n) worst case time for sorting, which is PT-optimal. We note that any PT-optimal algorithm for sorting must use at le... |

167 |
Tight Bounds on the Complexity of parallel sorting
- Leighton
(Show Context)
Citation Context ...rocessors used, the time is equal to a known lower bound. When the processor time product of an algorithm is O(n), the PT-optimality is obvious, and we will sometimes omit this indication. Leighton's =-=[25]-=- modification to the AKS sorting network [3], and Cole's parallel merge sort [13] both use n processors and achieve \Theta(log n) worst case time for sorting, which is PT-optimal. We note that any PT-... |

118 |
The Art of Computer Programming, Volume 1
- Knuth
- 1998
(Show Context)
Citation Context ...ger than all the numbers stored at positions before k. The number of maximal positions is then equal to the number of maximal points. The analysis of the number of maximal positions is given in Knuth =-=[23]-=-. The average is less than log n and the standard deviation is less than p log n. Then by Chebyshev's Inequality, the probability that there are more than log 2 n extreme points is O(1= log 3 n). 2 We... |

105 |
Parallelism in comparison problems
- Valiant
- 1975
(Show Context)
Citation Context ... of processors and runs in O((log log n) O(1) ) time. Some examples of problems with known ultra-fast parallel algorithms include merging two lists of size n [24] and finding the maximum of n numbers =-=[36]-=-. We also define an ultra-fast expected time parallel algorithm as one which uses at most a linear number of processors and runs in O((log log n) O(1) ) expected time. In this paper, we will develop u... |

92 |
Probabilistic algorithms
- Rabin
- 1976
(Show Context)
Citation Context ...unay Triangulation and Largest Empty Circle algorithms follow immediately. Katajainen, Nevalainen, and Teuhola [22] exhibit a linear expected time algorithm for the Relative Neighborhood Graph. Rabin =-=[29]-=- has given a randomized linear expected time algorithm for Closest Pair which is distribution independent. There has been a great amount of work on parallel algorithms for distribution independent ver... |

88 | A.C.: Optimal expected-time algorithms for closest point problems
- Bentley, Weide, et al.
- 1980
(Show Context)
Citation Context ...unds on the expected times of their solutions, and serial algorithms have been developed for all of them which attain this lower bound. The solution for sorting is well known. Bentley, Weide, and Yao =-=[7]-=- exhibit linear expected time algorithms for All Nearest Neighbors and Voronoi Diagram, and linear expected time Delaunay Triangulation and Largest Empty Circle algorithms follow immediately. Katajain... |

78 |
Expected length of the longest probe sequence in hash code searching
- Gonnet
- 1981
(Show Context)
Citation Context ...d parallelization of the above technique does not lead to an efficient algorithm. The first problem is placing items in bins. The expected maximum number of items in a bin is \Theta(log n= log log n) =-=[19]-=-, and thus it would take that many naive attempts before all items were placed in bins. The other problem is that assuming each processor took one bin to sort, the expected maximum time would be\Omega... |

61 |
Parallel computational geometry
- Aggarwal, Chazelle, et al.
- 1988
(Show Context)
Citation Context ...lems in \Theta(log log n= log log log n) expected time with linear speedup (n log log log n= log log n processors). Padded Sort Given n values taken from a uniform distribution over the unit interval =-=[0; 1]-=-, arrange them in sorted order in an array of size n+o(n), with the value NULL in all unfilled locations. All Nearest Neighbors Given a set S of n points taken from a uniform distribution over the uni... |

61 | Optimal and sublogarithmic time randomized parallel sorting algorithms
- Rajasekaran, Reif
- 1989
(Show Context)
Citation Context ...ote that any PT-optimal algorithm for sorting must use at least log n time [4]. Reischuk [33] gives a PT-optimal randomized n processor, \Theta(log n) time algorithm for sorting. Rajasekaran and Reif =-=[30]-=- give a randomized algorithm for general sorting which achieves \Theta(log n= log log n) time with n log ffl n processors for any ffl ? 0, which is optimal, a randomized algorithm for integer sorting ... |

59 |
Approximate and Exact Parallel Scheduling with Applications to List, Tree and
- Cole, Vishkin
- 1986
(Show Context)
Citation Context ...is addition, and the input array consists of n numbers, each of O(log n) bits, the prefix operation can be performed in \Theta(log n= log log n) time with n log log n= log n processors on a CRCW PRAM =-=[16]-=-. Compression, in which m marked records out of a total of n records must be compressed to the front of the output array, can easily be reduced to prefix addition, and thus can be performed in the sam... |

57 |
Probabilistic Parallel Algorithms for Sorting and Selection
- Reischuk
- 1985
(Show Context)
Citation Context ...ort [13] both use n processors and achieve \Theta(log n) worst case time for sorting, which is PT-optimal. We note that any PT-optimal algorithm for sorting must use at least log n time [4]. Reischuk =-=[33]-=- gives a PT-optimal randomized n processor, \Theta(log n) time algorithm for sorting. Rajasekaran and Reif [30] give a randomized algorithm for general sorting which achieves \Theta(log n= log log n) ... |

48 |
A theorem on probabilistic constant depth computations
- Ajtai, Ben-Or
- 1984
(Show Context)
Citation Context ...d Sort Beame and Hastad [6] have shown that finding the parity of n bits in any PRAM model requires \Omega\Gammaequ n= log log n) time using any polynomial number of processors. From Ajtai and Ben-Or =-=[2]-=- and Chandra, Stockmeyer, and Vishkin [11], we can see that this lower bound applies even when randomization is allowed and/or the bits are chosen at random. This implies that sorting n items into n l... |

45 |
Towards a Theory of Nearly Constant Time Parallel Algorithms
- Gil, Matias, et al.
- 1991
(Show Context)
Citation Context ... processors, and a random permutation can be constructed in \Theta(log n= log log n) expected time with n log log n= log n processors. This result is not optimal, as shown by Gil, Matias, and Vishkin =-=[17]-=-, who give an algorithm to construct a random permutation in \Theta(log n) expected time using n= log n processors, where log (1) n j log n, log (i) n j log(log (i\Gamma1) n) for i ? 1, and log n j mi... |

42 |
Optimal bounds for decision problems
- Beame, Hastad
- 1989
(Show Context)
Citation Context ...=4 : Proof: For ks16, / n k ! ` 2 n ' k ` 1 \Gamma 2 n ' n\Gammak / n k ! ` 2 n ' k ` en k ' k ` 2 n ' k ` 2e k ' ks2 \Gammak(log k\Gammalog 2e)se \Gamma(k log k)=4 : 2 3 Padded Sort Beame and Hastad =-=[6]-=- have shown that finding the parity of n bits in any PRAM model requires \Omega\Gammaequ n= log log n) time using any polynomial number of processors. From Ajtai and Ben-Or [2] and Chandra, Stockmeyer... |

37 |
Highly Parallelizable Problems
- Berman, Breslauer, et al.
- 1989
(Show Context)
Citation Context ...Circle can be found in \Theta(log n) worst case time with n processors given the Voronoi Diagram, so the time and processor bounds above also apply to finding the Largest Empty Circle. Berkman et al. =-=[8]-=- give n= log log n processor, \Theta(log log n) worst case time algorithms for some other geometric problems, but these problems are highly constrained. On the other hand, we solve much more general p... |

37 | Optimal doubly logarithmic parallel algorithms based on finding all nearest smaller values
- Berkman, Schieber, et al.
- 1993
(Show Context)
Citation Context ...ssed in \Theta(log n) time [18]. We will call this a marked compression. When \Phi is maximum or minimum, the prefix operation can be performed in \Theta(log log n) time using n= log log n processors =-=[9, 34]-=-. This obviously implies that the maximum or minimum of n elements can be found in \Theta(log log n) time with n= log log n processors [36]. However, if we can use n 1+b processors, for any b ? 0, the... |

25 |
Waste makes haste: Tight bounds for loose parallel sorting
- Hagerup, Raman
- 1992
(Show Context)
Citation Context ...rithm to sort n random integers in the range [1; n]. Following the work in this paper, MacKenzie [27] has proven a lower bound of\Omega\Gamma/44 n) expected time for Padded Sort and Hagerup and Raman =-=[21]-=- have shown that this is optimal by giving an O(log n) expected time algorithm for Padded Sort. We note that their algorithm also improves on the algorithm given here in that it does not rely on any a... |

24 |
Random sampling techniques and parallel algorithms design
- Rajasekaran, Sen
- 1993
(Show Context)
Citation Context ...ound on the probability that 4 log n items fall into any block and multiply this by the number of blocks. 2 The ideas in the following lemma were previously used in Stout [35] and Rajasekaran and Sen =-=[31]-=- Lemma 2.3 For any b ? 0, n processors can each be allocated a position in an array of n 1+b positions in constant time (which depends on b) with probability of failure less than 1=n in the CRCW PRAM ... |

20 |
Recursive *-tree parallel data-structure
- Berkman, Vishkin
- 1989
(Show Context)
Citation Context ...rom [0; 1] and performing a Padded Sort, we will obviously be left with the processors in random order. We can easily obtain a random cycle of the processors from this using an algorithm for chaining =-=[10]-=-, and we can obtain a random permutation of the processors by compressing the padded list using a prefix sum operation. (In a random cycle of processors, each processor contains a link to another proc... |

19 |
Optimal parallel algorithms for polygon and point-set problems
- Cole, Goodrich
- 1992
(Show Context)
Citation Context ...(log n= log log n) time with n(log log n) 2 = log n processors, and a PT-optimal randomized algorithm for integer sorting which achieves \Theta(log n) time with n= log n processors. Cole and Goodrich =-=[14]-=- and Willard and Wee [37] both present PT-optimal n processor, \Theta(log n) worst case time algorithms for solving the All Nearest Neighbor and Closest Pair problems. Aggarwal et al. [1] present an n... |

18 |
The Average Complexity of Deterministic and Randomized Parallel Comparison Sorting Algorithms
- Alon, Azar
- 1987
(Show Context)
Citation Context ...rallel merge sort [13] both use n processors and achieve \Theta(log n) worst case time for sorting, which is PT-optimal. We note that any PT-optimal algorithm for sorting must use at least log n time =-=[4]-=-. Reischuk [33] gives a PT-optimal randomized n processor, \Theta(log n) time algorithm for sorting. Rajasekaran and Reif [30] give a randomized algorithm for general sorting which achieves \Theta(log... |

13 |
Load balancing requires\Omega\Gammaqui n) expected time
- MacKenzie
- 1992
(Show Context)
Citation Context ...g has been worked on by Chlebus [12]. He obtains a \Theta(log n) expected time, n= log n processor algorithm to sort n random integers in the range [1; n]. Following the work in this paper, MacKenzie =-=[27]-=- has proven a lower bound of\Omega\Gamma/44 n) expected time for Padded Sort and Hagerup and Raman [21] have shown that this is optimal by giving an O(log n) expected time algorithm for Padded Sort. W... |

13 | Polling: a new randomized sampling technique for computational geometry
- Reif, Sen
- 1989
(Show Context)
Citation Context ...algorithms for finding the Voronoi Diagram in \Theta(log 2 n) worst case time with n= log n processors, and in \Theta(log n log log n) worst case time with n log n= log log n processors. Reif and Sen =-=[32]-=- have recently given PT-optimal n processor, \Theta(log n) expected time randomized algorithms for constructing the Voronoi Diagram and finding All Nearest Neighbors. We note that the Largest Empty Ci... |

12 |
Merging free trees in parallel for efficient Voronoi diagram construction
- Cole, Goodrich, et al.
- 1990
(Show Context)
Citation Context ...e All Nearest Neighbor and Closest Pair problems. Aggarwal et al. [1] present an n processor, \Theta(log 2 n) worst case time algorithm for finding the Voronoi Diagram. Cole, Goodrich, and O'Dunlaing =-=[15]-=- give algorithms for finding the Voronoi Diagram in \Theta(log 2 n) worst case time with n= log n processors, and in \Theta(log n log log n) worst case time with n log n= log log n processors. Reif an... |

12 |
Design and analysis of some parallel algorithms
- Schieber
- 1987
(Show Context)
Citation Context ...ssed in \Theta(log n) time [18]. We will call this a marked compression. When \Phi is maximum or minimum, the prefix operation can be performed in \Theta(log log n) time using n= log log n processors =-=[9, 34]-=-. This obviously implies that the maximum or minimum of n elements can be found in \Theta(log log n) time with n= log log n processors [36]. However, if we can use n 1+b processors, for any b ? 0, the... |

11 |
Every ROBUST CRCW PRAM can Efficiently Simulate a PRIORITY PRAM
- Hagerup, Radzik
- 1990
(Show Context)
Citation Context ...An algorithm for the Tolerant CRCW PRAM implies algorithms with the same time and processor bounds on stronger models, such as the Collision and Arbitrary models (see, for example, Hagerup and Radzik =-=[20]-=-). For the Closest Pair algorithm we will be using the Arbitrary CRCW PRAM model. In this model, if two or more processors write to a cell simultaneously, the one which succeeds in writing is chosen a... |

8 |
Parallel iterated bucket sort
- Chlebus
- 1989
(Show Context)
Citation Context ...from a uniform distribution over the unit square can also be constructed in \Theta(log n) expected time with n= log n processors. Distribution dependent parallel sorting has been worked on by Chlebus =-=[12]-=-. He obtains a \Theta(log n) expected time, n= log n processor algorithm to sort n random integers in the range [1; n]. Following the work in this paper, MacKenzie [27] has proven a lower bound of\Ome... |

7 |
Complexity theory for unbounded fan-in parallelism
- CHANDRA, STOCKMEYER, et al.
- 1982
(Show Context)
Citation Context ...at finding the parity of n bits in any PRAM model requires \Omega\Gammaequ n= log log n) time using any polynomial number of processors. From Ajtai and Ben-Or [2] and Chandra, Stockmeyer, and Vishkin =-=[11]-=-, we can see that this lower bound applies even when randomization is allowed and/or the bits are chosen at random. This implies that sorting n items into n locations requires \Omega\Gammaqui n= log l... |

7 |
Counting and Packing in Parallel
- Gil, Rudolph
(Show Context)
Citation Context ...e reduced to prefix addition, and thus can be performed in the same time bounds. Using only the processors assigned to the marked records, those marked records can be compressed in \Theta(log n) time =-=[18]-=-. We will call this a marked compression. When \Phi is maximum or minimum, the prefix operation can be performed in \Theta(log log n) time using n= log log n processors [9, 34]. This obviously implies... |

5 |
An Optimal Expected-time Parallel Algorithm for Voronoi Diagrams
- Levcopoulos, Katajainen, et al.
(Show Context)
Citation Context ...allest enclosing circle, and closest pair can all be found in constant expected time with n processors. These results can also be extended to more general regions. Levcopoulos, Katajainen, and Lingas =-=[26]-=- show that the Voronoi Diagram of n points taken from a uniform distribution over the unit square can be constructed in \Theta(log n) expected time with n= log n processors. Katajainen, Nevalainen, an... |

3 |
Searching, merging, and sorting
- Kruskal
- 1983
(Show Context)
Citation Context ...lgorithm as one which uses a linear number of processors and runs in O((log log n) O(1) ) time. Some examples of problems with known ultra-fast parallel algorithms include merging two lists of size n =-=[24]-=- and finding the maximum of n numbers [36]. We also define an ultra-fast expected time parallel algorithm as one which uses at most a linear number of processors and runs in O((log log n) O(1) ) expec... |

3 |
Constant-time geometry on PRAMs
- Stout
- 1988
(Show Context)
Citation Context ...nowledge of input distribution to obtain linear speedup and o(log log n) expected time solutions. Two groups have previously done work on parallel distribution dependent expected-time geometry. Stout =-=[35]-=- shows that given n points taken from a uniform distribution over the unit square, the maximal points, extreme points, diameter, smallest enclosing rectangle, smallest enclosing circle, and closest pa... |

3 |
Every robust CRCW PRAM can e ciently simulate a Priority PRAM
- Hagerup, Radzik
- 1990
(Show Context)
Citation Context ...An algorithm for the Tolerant CRCW PRAM implies algorithms with the same time and processor bounds on stronger models, such as the Collision and Arbitrary models (see, for example, Hagerup and Radzik =-=[20]-=-). For the Closest Pair algorithm we will be using the Arbitrary CRCW PRAM model. In this model, if two or more processors write to a cell simultaneously, the one which succeeds in writing is chosen a... |

3 |
Load balancing requires (log n) expected time
- MacKenzie
- 1992
(Show Context)
Citation Context ... sorting has been worked on by Chlebus [12]. He obtains a (log n) expected time, n= log n processor algorithm to sort n random integers in the range [1;n]. Following the work in this paper, MacKenzie =-=[27]-=- has proven a lower bound of (log n) expected time for Padded Sort and Hagerup and Raman [21] have shown that this is optimal by giving an O(log n) expected time algorithm for Padded Sort. We note tha... |

2 |
A linear expected-time algorithm for computing planar relative neighbourhood graphs
- Katajainen, Nevalainen, et al.
- 1987
(Show Context)
Citation Context ...e algorithms for All Nearest Neighbors and Voronoi Diagram, and linear expected time Delaunay Triangulation and Largest Empty Circle algorithms follow immediately. Katajainen, Nevalainen, and Teuhola =-=[22]-=- exhibit a linear expected time algorithm for the Relative Neighborhood Graph. Rabin [29] has given a randomized linear expected time algorithm for Closest Pair which is distribution independent. Ther... |

2 |
Quasi-valid range querying and its implications for nearest neighbor problems
- Willard, Wee
- 1988
(Show Context)
Citation Context ...ith n(log log n) 2 = log n processors, and a PT-optimal randomized algorithm for integer sorting which achieves \Theta(log n) time with n= log n processors. Cole and Goodrich [14] and Willard and Wee =-=[37]-=- both present PT-optimal n processor, \Theta(log n) worst case time algorithms for solving the All Nearest Neighbor and Closest Pair problems. Aggarwal et al. [1] present an n processor, \Theta(log 2 ... |

2 |
Merging free trees in parallel for e cient voronoï diagram construction
- Cole, Goodrich, et al.
- 1990
(Show Context)
Citation Context ...solving the All Nearest Neighbor and Closest Pair problems. Aggarwal et al. [1] present ann processor, (log 2 n)worst case time algorithm for nding the Voronoi Diagram. Cole, Goodrich, and O'Dunlaing =-=[15]-=- give algorithms for nding the Voronoi Diagram in (log 2 n)worst case time with n= log n processors, and in (log n log log n) worst case time with n log n= log log n processors. Reif and Sen [32] have... |

1 |
Load balancing requires Ω(log ∗ n) expected time
- MacKenzie
- 1992
(Show Context)
Citation Context ... sorting has been worked on by Chlebus [12]. He obtains a Θ(log n) expected time, n/log n processor algorithm to sort n random integers in the range [1,n]. Following the work in this paper, MacKenzie =-=[27]-=- has proven a lower bound of Ω(log ∗ n) expected 3time for Padded Sort and Hagerup and Raman [21] have shown that this is optimal by giving an O(log ∗ n) expected time algorithm for Padded Sort. We n... |