Results 1  10
of
22
The influence of caches on the performance of sorting
 IN PROCEEDINGS OF THE SEVENTH ANNUAL ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1997
"... We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all t ..."
Abstract

Cited by 122 (4 self)
 Add to MetaCart
(Show Context)
We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all three algorithms the improvementincache performance leads to a reduction in total execution time. We also investigate the performance of radix sort. Despite the extremely low instruction count incurred by this linear time sorting algorithm, its relatively poor cache performance results in worse overall performance than the e cient comparison based sorting algorithms. For each algorithm we provide an analysis that closely predicts the number of cache misses incurred by the algorithm.
On universal types
 PROC. ISIT 2004
, 2004
"... We define the universal type class of a sequence x n, in analogy to the notion used in the classical method of types. Two sequences of the same length are said to be of the same universal (LZ) type if and only if they yield the same set of phrases in the incremental parsing of Ziv and Lempel (1978 ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
(Show Context)
We define the universal type class of a sequence x n, in analogy to the notion used in the classical method of types. Two sequences of the same length are said to be of the same universal (LZ) type if and only if they yield the same set of phrases in the incremental parsing of Ziv and Lempel (1978). We show that the empirical probability distributions of any finite order of two sequences of the same universal type converge, in the variational sense, as the sequence length increases. Consequently, the normalized logarithms of the probabilities assigned by any kth order probability assignment to two sequences of the same universal type, as well as the kth order empirical entropies of the sequences, converge for all k. We study the size of a universal type class, and show that its asymptotic behavior parallels that of the conventional counterpart, with the LZ78 code length playing the role of the empirical entropy. We also estimate the number of universal types for sequences of length n, and show that it is of the form exp((1+o(1))γ n/log n) for a well characterized constant γ. We describe algorithms for enumerating the sequences in a universal type class, and for drawing a sequence from the class with uniform probability. As an application, we consider the problem of universal simulation of individual sequences. A sequence drawn with uniform probability from the universal type class of x n is an optimal simulation of x n in a well defined mathematical sense.
Twotier relaxed heaps
 Proceedings of the 17th International Symposium on Algorithms and Computation, Lecture Notes in Computer Science 4288, SpringerVerlag
, 2006
"... Abstract. We introduce an adaptation of runrelaxed heaps which provides efficient heap operations with respect to the number of element comparisons performed. Our data structure guarantees the worstcase cost of O(1) for findmin, insert, and decrease; and the worstcase cost of O(lg n) with at mos ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
(Show Context)
Abstract. We introduce an adaptation of runrelaxed heaps which provides efficient heap operations with respect to the number of element comparisons performed. Our data structure guarantees the worstcase cost of O(1) for findmin, insert, and decrease; and the worstcase cost of O(lg n) with at most lg n + 3 lg lg n + O(1) element comparisons for delete, improving the bound of 3lg n + O(1) on the number of element comparisons known for runrelaxed heaps. Here, n denotes the number of elements stored prior to the operation in question, and lg n equals max {1, log 2 n}. 1
A framework for speeding up priorityqueue operations
, 2004
"... Abstract. We introduce a framework for reducing the number of element comparisons performed in priorityqueue operations. In particular, we give a priority queue which guarantees the worstcase cost of O(1) per minimum finding and insertion, and the worstcase cost of O(log n) with at most log n + O ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
(Show Context)
Abstract. We introduce a framework for reducing the number of element comparisons performed in priorityqueue operations. In particular, we give a priority queue which guarantees the worstcase cost of O(1) per minimum finding and insertion, and the worstcase cost of O(log n) with at most log n + O(1) element comparisons per minimum deletion and deletion, improving the bound of 2log n + O(1) on the number of element comparisons known for binomial queues. Here, n denotes the number of elements stored in the data structure prior to the operation in question, and log n equals max {1,log 2 n}. We also give a priority queue that provides, in addition to the abovementioned methods, the prioritydecrease (or decreasekey) method. This priority queue achieves the worstcase cost of O(1) per minimum finding, insertion, and priority decrease; and the worstcase cost of O(log n) with at most log n + O(log log n) element comparisons per minimum deletion and deletion. CR Classification. E.1 [Data Structures]: Lists, stacks, and queues; E.2 [Data
Load Balanced Priority Queues on Distributed Memory Machines (Extended Abstract)
 In Lecture Notes in Computer Science
, 1994
"... ) Ajay K. Gupta ? Andreas G. Photiou Western Michigan University Lake States Insurance Company Kalamazoo, MI 49008, USA Traverse City, MI 49685, USA Abstract. We consider efficient algorithms for priority queues on distributed memory multiprocessors, such as nCUBE, iPSc, MPP and looselycoupled sy ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
(Show Context)
) Ajay K. Gupta ? Andreas G. Photiou Western Michigan University Lake States Insurance Company Kalamazoo, MI 49008, USA Traverse City, MI 49685, USA Abstract. We consider efficient algorithms for priority queues on distributed memory multiprocessors, such as nCUBE, iPSc, MPP and looselycoupled systems consisting of networked workstations. For a pprocessor distributed memory multicomputer P and n data items in the priority queue, n ? p, we investigate two priority queues; horizontally sliced and vertically sliced. Both of these achieve load balance, i.e. at most \Theta(n=p) data items are stored at every processor of P . Horizontally sliced priority queue allows deletions and insertions of \Theta(p) items in time O( p bw øc + øpp log n) on hypercubic networks where øc is the communication time between a pair of processors, øp is the unit processing time and bw is the width of the communication channel between a pair of processors. Vertically sliced priority queue allows deletio...
The Ultimate Heapsort
 In Proceedings of the Computing: the 4th Australasian Theory Symposium, Australian Computer Science Communications
, 1998
"... . A variant of Heapsortnamed Ultimate Heapsortis presented that sorts n elements inplace in \Theta(n log 2 (n+ 1)) worstcase time by performing at most n log 2 n + \Theta(n) key comparisons and n log 2 n + \Theta(n) element moves. The secret behind Ultimate Heapsort is that it occasionally ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
. A variant of Heapsortnamed Ultimate Heapsortis presented that sorts n elements inplace in \Theta(n log 2 (n+ 1)) worstcase time by performing at most n log 2 n + \Theta(n) key comparisons and n log 2 n + \Theta(n) element moves. The secret behind Ultimate Heapsort is that it occasionally transforms the heap it operates with to a twolayer heap which keeps small elements at the leaves. Basically, Ultimate Heapsort is like BottomUp Heapsort but, due to the twolayer heap property, an element taken from a leaf has to be moved towards the root only O(1) levels, on an average. Let a[1::n] be an array of n elements each consisting of a key and some information associated with this key. This array is a (maximum) heap if, for all i 2 f2; : : : ; ng, the key of element a[bi=2c] is larger than or equal to that of element a[i]. That is, a heap is a pointerfree representation of a left complete binary tree, where the elements stored are partially ordered according to their keys. Ele...
Load Sharing with Parallel Priority Queues
 Center for
, 1991
"... For maximum efficiency in a multiprocessor system the load should be shared evenly over all processors, that is, there should be no idle processors when tasks are available. The delay in a load sharing algorithm is the larger of the maximum time that any processor can be idle before a task is assign ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
For maximum efficiency in a multiprocessor system the load should be shared evenly over all processors, that is, there should be no idle processors when tasks are available. The delay in a load sharing algorithm is the larger of the maximum time that any processor can be idle before a task is assigned to it, and the maximum time that it must wait to be relieved of an excess task. A simple parallel priority queue architecture for load sharing in a pprocessor multiprocessor system is proposed. This architecture uses O(p log(n=p)) specialpurpose processors (where n is the maximal size of the priority queue), an interconnection pattern of bounded degree, and achieves delay O(logp), which is optimal for any bounded degree system. 1 Introduction One advantage that multiprocessor computers have over uniprocessors is the ability to speed up computation by having the processors compute in parallel. The archetypal model studied is the PRAM, in which it is assumed that concurrent access to a ...
Optimal Median Smoothing
, 1994
"... Median smoothing of a series of data values is considered. Naive programming of such an algorithm would result in large amount of computation, especially when the series of data values is long. By maintaining a heap structure that we update when moving along the data we obtain an optimal median smoo ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Median smoothing of a series of data values is considered. Naive programming of such an algorithm would result in large amount of computation, especially when the series of data values is long. By maintaining a heap structure that we update when moving along the data we obtain an optimal median smoothing algorithm.
Fast and Scalable Parallel Algorithms for KnapsackLike Problems
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1996
"... We present two new algorithms for searching in sorted X+Y +R+S, one based on heaps and the other on sampling. Each of the algorithms runs in time O(n 2 logn) (n being the size of the sorted arrays X, Y , R and S). Hence in each case, by constructing arrays of size n = O(2 s=4 ), we obtain a new ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We present two new algorithms for searching in sorted X+Y +R+S, one based on heaps and the other on sampling. Each of the algorithms runs in time O(n 2 logn) (n being the size of the sorted arrays X, Y , R and S). Hence in each case, by constructing arrays of size n = O(2 s=4 ), we obtain a new algorithm for solving certain NPComplete problems such as Knapsack on s data items in time equal (up to a constant factor) to the best algorithm currently known. Each of the algorithms is capable of being efficiently implemented in parallel and so solving large instances of these NPComplete problems fast on coarsegrained distributed memory parallel computers. The parallel version of the heap based algorithm is communicationefficient and exhibits optimal speedup for a number of processors less than n using O(n) space in each one; the sampling based algorithm exhibits optimal speedup for any number of processors up to n using O(n) space in total provided that the architecture is capable of...