## A Novel Parallel Sorting Algorithm for Contemporary Architectures (2007)

Citations: | 3 - 0 self |

### BibTeX

@MISC{Cheng07anovel,

author = {David R. Cheng and Viral B. Shah and John R. Gilbert and Alan Edelman},

title = {A Novel Parallel Sorting Algorithm for Contemporary Architectures},

year = {2007}

}

### OpenURL

### Abstract

Traditionally, the field of scientific computing has been dominated by numerical methods. However, modern scientific codes often combine numerical methods with combinatorial methods. Sorting, a widely studied problem in computer science, is an important primitive for combinatorial scientific computing. As high

### Citations

8512 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1991
(Show Context)
Citation Context ...f the weighted median of medians as a pivot, because it is used to split the input for the next iteration. It is a well-known result that the weighted median of medians can be computed in linear time =-=[5, 13]-=-. One possible way is to partition the values with the (unweighted) median, accumulate the weights on each side of the median, and recurse on the side that has too much weight. Therefore, the amount o... |

1127 |
A bridging model for parallel computation
- Valiant
- 1990
(Show Context)
Citation Context ...a common property of other parallel sorting algorithms, particularly sample sort (i.e. [1, 16, 12], as noted in [11]). 2.9 Analysis in the BSP Model A bulk-synchronous parallel computer, described in =-=[17]-=-, models a system with three parameters: p, the number of processors; L, the minimum amount of time between subsequent rounds of communication; and 6sg, a measure of bandwidth in time per message size... |

368 | Time bounds for selection
- Blum, Pratt, et al.
- 1972
(Show Context)
Citation Context ...the simpler problem of selecting just one target, an element of global rank 1 r. The algorithm for this task is motivated by the sequential methods for the same problem, most notably the one given in =-=[2]-=-. Although it may be clearer to define the selection algorithm recursively, the practical implementation and extension into simultaneous selection proceed more naturally from an iterative description.... |

173 | A comparison of sorting algorithms for the connection machine cm-2
- Blelloch, Leiserson, et al.
- 1991
(Show Context)
Citation Context ...nd communication required. It moves lesser data than widely used sample sorting algorithms, and is computationally a lot more efficient on distributed and shared memory architectures. Blelloch et al. =-=[1]-=- compare several parallel sorting algorithms on the CM–2, and report that a sampling based sort and radix sort are good algorithms to use in practice. We first tried a sampling based sort, but quickly... |

152 | Open MPI: Goals, concept, and design of a next generation MPI implementation
- Gabriel, Fagg, et al.
- 2004
(Show Context)
Citation Context ...vailable on most platforms nowadays, we expect reasonable performance on distributed as well as shared memory architectures. We use the MPI libraries provided by the SGI MPT on the Altix, and OpenMPI =-=[9]-=- on clusters. Our choice of the C++ STL sequential sorting routines and MPI allows our code to be robust, scalable and portable without sacrificing performance. We tested our implementation on an SGI ... |

100 | Parallel Sorting by Regular Sampling
- Shi, Schaeffer
- 1992
(Show Context)
Citation Context ...al for p 2 ≤ n p ⇒ p3 ≤ n. Returning to the formulation given earlier, we have p = ⌊n 1/3 ⌋. This requirement is a common property of other parallel sorting algorithms, particularly sample sort (i.e. =-=[1, 16, 12]-=-, as noted in [11]). 2.9 Analysis in the BSP Model A bulk-synchronous parallel computer, described in [17], models a system with three parameters: p, the number of processors; L, the minimum amount of... |

64 | Communication-efficient parallel sorting
- Goodrich
- 1999
(Show Context)
Citation Context .... Returning to the formulation given earlier, we have p = ⌊n 1/3 ⌋. This requirement is a common property of other parallel sorting algorithms, particularly sample sort (i.e. [1, 16, 12], as noted in =-=[11]-=-). 2.9 Analysis in the BSP Model A bulk-synchronous parallel computer, described in [17], models a system with three parameters: p, the number of processors; L, the minimum amount of time between subs... |

58 | A message passing standard for MPP and Workstations
- Dongarra, Otto, et al.
- 1996
(Show Context)
Citation Context ...ting code which would form a building block for higher level combinatorial algorithms. We built our code using standards based library software such as the C++ STL (Standard Template Library) and MPI =-=[6]-=-, which allows us to achieve our goals of scalability, robustness and portability without sacrificing performance. Our code is highly modular, which lets the user replace any stage of the algorithm wi... |

48 | Deterministic sorting and randomized median finding on the BSP model
- Gerbessiotis, Siniolakis
- 1996
(Show Context)
Citation Context ...s: p, the number of processors; L, the minimum amount of time between subsequent rounds of communication; and 6sg, a measure of bandwidth in time per message size. Following the naming conventions of =-=[10]-=-, define π to be the ratio of computation cost of the BSP algorithm to the computation cost of a sequential algorithm. Similarly, define µ to be the ratio of communication cost of the BSP algorithm to... |

34 | R.: Funnel heap - a cache oblivious priority queue
- Brodal, Fagerberg
(Show Context)
Citation Context ...lace from this tree. Cache oblivious algorithms may yield better performance across a variety of architectures. We refer the reader to the literature on cache-oblivious data structures and algorithms =-=[3, 8]-=-. Notice that a merge will move a particular element exactly once (from one buffer to its sorted position in the other buffer). Furthermore, there is at most one comparison for each element move. Fina... |

23 | Engineering a cache-oblivious sorting algorithm
- Brodal, Fagerberg, et al.
(Show Context)
Citation Context ... based. We use std::sort and std::stable sort from the C++ Standard Template Library (STL) library for sequential sorting.The C++ STL has one of the fastest general purpose sorting routines available =-=[4]-=-. We use MPI (Message Passing Interface) for communication. It is the most portable and widely used method for communication in parallel computing. Since vendor optimized MPI implementations are avail... |

22 | A new deterministic parallel sorting algorithm with an experimental evaluation
- Helman, JáJá, et al.
- 1998
(Show Context)
Citation Context ...al for p 2 ≤ n p ⇒ p3 ≤ n. Returning to the formulation given earlier, we have p = ⌊n 1/3 ⌋. This requirement is a common property of other parallel sorting algorithms, particularly sample sort (i.e. =-=[1, 16, 12]-=-, as noted in [11]). 2.9 Analysis in the BSP Model A bulk-synchronous parallel computer, described in [17], models a system with three parameters: p, the number of processors; L, the minimum amount of... |

11 |
A linear selection algorithm for sets of elements with weights
- Reiser
- 1978
(Show Context)
Citation Context ...f the weighted median of medians as a pivot, because it is used to split the input for the next iteration. It is a well-known result that the weighted median of medians can be computed in linear time =-=[5, 13]-=-. One possible way is to partition the values with the (unweighted) median, accumulate the weights on each side of the median, and recurse on the side that has too much weight. Therefore, the amount o... |

9 | An In-Place Sorting with O (n log n) Comparisons and O (n) Moves
- Franceschini, Geffert
- 2003
(Show Context)
Citation Context ...compute the ratio π to be Equation 2 over T ∗ cn lg n s (n, p) = p . Thus, π = 1 + p3 cn + p2 lg n 1 + = 1 + o(1) as n → ∞ cn c lg n Furthermore, there exist movement-optimal sorting algorithms (i.e. =-=[7]-=-), so we compute µ against gn p . It is straightforward to verify that the BSP cost of exact splitting is O(lg n max{L, gp2 lg n}), giving us µ = 1 + pL lg n gn + p3 lg 2 n = 1 + o(1) as n → ∞ n There... |

8 | Sparse matrices in Matlab*P: Design and implementation
- Shah, Gilbert
- 2004
(Show Context)
Citation Context ... also form a basic building block to implement higher level combinatorial algorithms and computations with irregular communication patterns and workloads - such as parallel sparse matrix computations =-=[15]-=-. We describe the design and implementation of an algorithm for parallel sorting on contemporary architectures. Distributed memory architectures are widely in use today. The cost of communication is a... |

5 | A note on parallel selection on coarse grained multicomputers
- Saukas, Song
- 1999
(Show Context)
Citation Context ...he sampling process itself requires “well chosen” parameters to yield “good” samples. We noticed that we can do away with both these steps if we can determine exact splitters quickly. Saukas and Song =-=[14]-=- describe a quick parallel selection algorithm. Our algorithm extends this work to efficiently find p − 1 exact splitters in O(p log n) rounds of communication. Our goal was to design a scalable, robu... |