Results 1 
4 of
4
Highly Scalable Parallel Sorting
 In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS
, 2010
"... Abstract — Sorting is a commonly used process with a wide breadth of applications in the high performance computing field. Early research in parallel processing has provided us with comprehensive analysis and theory for parallel sorting algorithms. However, modern supercomputers have advanced rapid ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
Abstract — Sorting is a commonly used process with a wide breadth of applications in the high performance computing field. Early research in parallel processing has provided us with comprehensive analysis and theory for parallel sorting algorithms. However, modern supercomputers have advanced rapidly in size and changed significantly in architecture, forcing new adaptations to these algorithms. To fully utilize the potential of highly parallel machines, tens of thousands of processors are used. Efficiently scaling parallel sorting on machines of this magnitude is inhibited by the communicationintensive problem of migrating large amounts of data between processors. The challenge is to design a highly scalable sorting algorithm that uses minimal communication, maximizes overlap between computation and communication, and uses memory efficiently. This paper presents a scalable extension of the Histogram Sorting method, making fundamental modifications to the original algorithm in order to minimize message contention and exploit overlap. We implement Histogram Sort, Sample Sort, and Radix Sort in CHARM++ and compare their performance. The choice of algorithm as well as the importance of the optimizations is validated by performance tests on two predominant modern supercomputer architectures: XT4 at ORNL (Jaguar) and Blue Gene/P at ANL (Intrepid). I.
SCALABLE COLLECTIVE MESSAGEPASSING ALGORITHMS
, 2011
"... Governments, universities, and companies expend vast resources building the top supercomputers. The processors and interconnect networks become faster, while the number of nodes grows exponentially. Problems of scale emerge, not least of which is collective performance. This thesis identifies and pr ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Governments, universities, and companies expend vast resources building the top supercomputers. The processors and interconnect networks become faster, while the number of nodes grows exponentially. Problems of scale emerge, not least of which is collective performance. This thesis identifies and proposes solutions for two major scalability problems. Our first contribution is a novel algorithm for processpartitioning and remapping for exascale systems that has far better time and space scaling than known algorithms. Our evaluations predict an improvement of up to 60x for large exascale systems and arbitrary reduction in the large temporary buffer space required for generating new communicators. Our second contribution consists of several novel collective algorithms for Clos and torus networks. Known allgather, reducescatter, and composite algorithms for Clos networks suffer the worst congestion when the largest messages are exchanged, damaging performance. Known algorithms for torus networks use only one network port, regardless of how many are available.
Deterministic Parallel Sorting Algorithm for 2D Mesh of Connected Computers
, 2008
"... Sorting is one of the most important operations in database systems and its efficiency can influences drastically the overall system performance. To accelerate the performance of database systems, parallelism is applied to the execution of the data administration operations. We propose a new deter ..."
Abstract
 Add to MetaCart
Sorting is one of the most important operations in database systems and its efficiency can influences drastically the overall system performance. To accelerate the performance of database systems, parallelism is applied to the execution of the data administration operations. We propose a new deterministic Parallel Sorting Algorithm (DPSA) that improves the performance of Quick sort in sorting an array of size n. where we use p Processor Elements (PE) that work in parallel to sort a matrix r*c where r is the number of rows r = 3 and c is the number of columns c = n/3. The simulation results show that the performance of the proposed algorithm DPSA out performs Quick sort when it works sequentially.