Results 1 - 10
of
14
The influence of caches on the performance of sorting
- IN PROCEEDINGS OF THE SEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1997
"... We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all t ..."
Abstract
-
Cited by 104 (3 self)
- Add to MetaCart
We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all three algorithms the improvementincache performance leads to a reduction in total execution time. We also investigate the performance of radix sort. Despite the extremely low instruction count incurred by this linear time sorting algorithm, its relatively poor cache performance results in worse overall performance than the e cient comparison based sorting algorithms. For each algorithm we provide an analysis that closely predicts the number of cache misses incurred by the algorithm.
Two-tier relaxed heaps
- Proceedings of the 17th International Symposium on Algorithms and Computation, Lecture Notes in Computer Science 4288, Springer-Verlag
, 2006
"... Abstract. We introduce an adaptation of run-relaxed heaps which provides efficient heap operations with respect to the number of element comparisons performed. Our data structure guarantees the worst-case cost of O(1) for find-min, insert, and decrease; and the worst-case cost of O(lg n) with at mos ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
Abstract. We introduce an adaptation of run-relaxed heaps which provides efficient heap operations with respect to the number of element comparisons performed. Our data structure guarantees the worst-case cost of O(1) for find-min, insert, and decrease; and the worst-case cost of O(lg n) with at most lg n + 3 lg lg n + O(1) element comparisons for delete, improving the bound of 3lg n + O(1) on the number of element comparisons known for run-relaxed heaps. Here, n denotes the number of elements stored prior to the operation in question, and lg n equals max {1, log 2 n}. 1
A framework for speeding up priorityqueue operations
, 2004
"... Abstract. We introduce a framework for reducing the number of element comparisons performed in priority-queue operations. In particular, we give a priority queue which guarantees the worst-case cost of O(1) per minimum finding and insertion, and the worst-case cost of O(log n) with at most log n + O ..."
Abstract
-
Cited by 8 (8 self)
- Add to MetaCart
Abstract. We introduce a framework for reducing the number of element comparisons performed in priority-queue operations. In particular, we give a priority queue which guarantees the worst-case cost of O(1) per minimum finding and insertion, and the worst-case cost of O(log n) with at most log n + O(1) element comparisons per minimum deletion and deletion, improving the bound of 2log n + O(1) on the number of element comparisons known for binomial queues. Here, n denotes the number of elements stored in the data structure prior to the operation in question, and log n equals max {1,log 2 n}. We also give a priority queue that provides, in addition to the above-mentioned methods, the priority-decrease (or decrease-key) method. This priority queue achieves the worst-case cost of O(1) per minimum finding, insertion, and priority decrease; and the worst-case cost of O(log n) with at most log n + O(log log n) element comparisons per minimum deletion and deletion. CR Classification. E.1 [Data Structures]: Lists, stacks, and queues; E.2 [Data
Load Balanced Priority Queues on Distributed Memory Machines (Extended Abstract)
- In Lecture Notes in Computer Science
, 1994
"... ) Ajay K. Gupta ? Andreas G. Photiou Western Michigan University Lake States Insurance Company Kalamazoo, MI 49008, USA Traverse City, MI 49685, USA Abstract. We consider efficient algorithms for priority queues on distributed memory multiprocessors, such as nCUBE, iPSc, MPP and looselycoupled sy ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
) Ajay K. Gupta ? Andreas G. Photiou Western Michigan University Lake States Insurance Company Kalamazoo, MI 49008, USA Traverse City, MI 49685, USA Abstract. We consider efficient algorithms for priority queues on distributed memory multiprocessors, such as nCUBE, iPSc, MPP and looselycoupled systems consisting of networked workstations. For a p-processor distributed memory multicomputer P and n data items in the priority queue, n ? p, we investigate two priority queues; horizontally sliced and vertically sliced. Both of these achieve load balance, i.e. at most \Theta(n=p) data items are stored at every processor of P . Horizontally sliced priority queue allows deletions and insertions of \Theta(p) items in time O( p bw øc + øpp log n) on hypercubic networks where øc is the communication time between a pair of processors, øp is the unit processing time and bw is the width of the communication channel between a pair of processors. Vertically sliced priority queue allows deletio...
The Ultimate Heapsort
- In Proceedings of the Computing: the 4th Australasian Theory Symposium, Australian Computer Science Communications
, 1998
"... . A variant of Heapsort---named Ultimate Heapsort---is presented that sorts n elements in-place in \Theta(n log 2 (n+ 1)) worst-case time by performing at most n log 2 n + \Theta(n) key comparisons and n log 2 n + \Theta(n) element moves. The secret behind Ultimate Heapsort is that it occasionally ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
. A variant of Heapsort---named Ultimate Heapsort---is presented that sorts n elements in-place in \Theta(n log 2 (n+ 1)) worst-case time by performing at most n log 2 n + \Theta(n) key comparisons and n log 2 n + \Theta(n) element moves. The secret behind Ultimate Heapsort is that it occasionally transforms the heap it operates with to a two-layer heap which keeps small elements at the leaves. Basically, Ultimate Heapsort is like Bottom-Up Heapsort but, due to the two-layer heap property, an element taken from a leaf has to be moved towards the root only O(1) levels, on an average. Let a[1::n] be an array of n elements each consisting of a key and some information associated with this key. This array is a (maximum) heap if, for all i 2 f2; : : : ; ng, the key of element a[bi=2c] is larger than or equal to that of element a[i]. That is, a heap is a pointer-free representation of a left complete binary tree, where the elements stored are partially ordered according to their keys. Ele...
Fast and Scalable Parallel Algorithms for Knapsack-Like Problems
- Journal of Parallel and Distributed Computing
, 1996
"... We present two new algorithms for searching in sorted X+Y +R+S, one based on heaps and the other on sampling. Each of the algorithms runs in time O(n 2 logn) (n being the size of the sorted arrays X, Y , R and S). Hence in each case, by constructing arrays of size n = O(2 s=4 ), we obtain a new ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present two new algorithms for searching in sorted X+Y +R+S, one based on heaps and the other on sampling. Each of the algorithms runs in time O(n 2 logn) (n being the size of the sorted arrays X, Y , R and S). Hence in each case, by constructing arrays of size n = O(2 s=4 ), we obtain a new algorithm for solving certain NP-Complete problems such as Knapsack on s data items in time equal (up to a constant factor) to the best algorithm currently known. Each of the algorithms is capable of being efficiently implemented in parallel and so solving large instances of these NP-Complete problems fast on coarse-grained distributed memory parallel computers. The parallel version of the heap based algorithm is communication-efficient and exhibits optimal speedup for a number of processors less than n using O(n) space in each one; the sampling based algorithm exhibits optimal speedup for any number of processors up to n using O(n) space in total provided that the architecture is capable of...
Optimal Median Smoothing
, 1994
"... Median smoothing of a series of data values is considered. Naive programming of such an algorithm would result in large amount of computation, especially when the series of data values is long. By maintaining a heap structure that we update when moving along the data we obtain an optimal median smoo ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Median smoothing of a series of data values is considered. Naive programming of such an algorithm would result in large amount of computation, especially when the series of data values is long. By maintaining a heap structure that we update when moving along the data we obtain an optimal median smoothing algorithm.
Load Sharing with Parallel Priority Queues
- Center for
, 1991
"... For maximum efficiency in a multiprocessor system the load should be shared evenly over all processors, that is, there should be no idle processors when tasks are available. The delay in a load sharing algorithm is the larger of the maximum time that any processor can be idle before a task is assign ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
For maximum efficiency in a multiprocessor system the load should be shared evenly over all processors, that is, there should be no idle processors when tasks are available. The delay in a load sharing algorithm is the larger of the maximum time that any processor can be idle before a task is assigned to it, and the maximum time that it must wait to be relieved of an excess task. A simple parallel priority queue architecture for load sharing in a p-processor multiprocessor system is proposed. This architecture uses O(p log(n=p)) special-purpose processors (where n is the maximal size of the priority queue), an interconnection pattern of bounded degree, and achieves delay O(logp), which is optimal for any bounded degree system. 1 Introduction One advantage that multiprocessor computers have over uniprocessors is the ability to speed up computation by having the processors compute in parallel. The archetypal model studied is the PRAM, in which it is assumed that concurrent access to a ...

