Results 1 
7 of
7
Engineering a cacheoblivious sorting algorithm
 In Proc. 6th Workshop on Algorithm Engineering and Experiments
, 2004
"... The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory mod ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory model. Since the introduction of the cacheoblivious model by Frigo et al. in 1999, a number of algorithms and data structures in the model has been proposed and analyzed. However, less attention has been given to whether the nice theoretical proporities of cacheoblivious algorithms carry over into practice. This paper is an algorithmic engineering study of cacheoblivious sorting. We investigate a number of implementation issues and parameters choices for the cacheoblivious sorting algorithm Lazy Funnelsort by empirical methods, and compare the final algorithm with Quicksort, the established standard for comparison based sorting, as well as with recent cacheaware proposals. The main result is a carefully implemented cacheoblivious sorting algorithm, which we compare to the best implementation of Quicksort we can find, and find that it competes very well for input residing in RAM, and outperforms Quicksort for input on disk. 1
Optimal sparse matrix dense vector multiplication in the I/OModel
, 2010
"... We study the problem of sparsematrix densevector multiplication (SpMV) in external memory. The task of SpMV is to compute y: = Ax, where A is a sparse N × N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider the class of matrices where the number of n ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
We study the problem of sparsematrix densevector multiplication (SpMV) in external memory. The task of SpMV is to compute y: = Ax, where A is a sparse N × N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider the class of matrices where the number of nonzero entries is kN, i.e., where the average number of nonzero entries per column is k. We investigate what is the external worstcase complexity, i.e., the best possible upper bound on the number of I/Os, as a function of k, N and the parameters M (memory size) and B (track size) of the I/Omodel. We determine this complexity up to a constant factor for all meaningful choices of these parameters, as long as k ≤ N 1−ε, where ε depends on the problem variant. Our model of computation for the lower bound is a combination of the I/Omodels of Aggarwal and Vitter, and of Hong and Kung. We study variants of the problem, differing in the memory layout of A. If A is stored in n column major layout, we prove that SpMV has I/O comkN plexity Θ min B max
On the adaptiveness of quicksort
 IN: WORKSHOP ON ALGORITHM ENGINEERING & EXPERIMENTS, SIAM
, 2005
"... Quicksort was first introduced in 1961 by Hoare. Many variants have been developed, the best of which are among the fastest generic sorting algorithms available, as testified by the choice of Quicksort as the default sorting algorithm in most programming libraries. Some sorting algorithms are adapti ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Quicksort was first introduced in 1961 by Hoare. Many variants have been developed, the best of which are among the fastest generic sorting algorithms available, as testified by the choice of Quicksort as the default sorting algorithm in most programming libraries. Some sorting algorithms are adaptive, i.e. they have a complexity analysis which is better for inputs which are nearly sorted, according to some specified measure of presortedness. Quicksort is not among these, as it uses Ω(n log n) comparisons even when the input is already sorted. However, in this paper we demonstrate empirically that the actual running time of Quicksort is adaptive with respect to the presortedness measure Inv. Differences close to a factor of two are observed between instances with low and high Inv value. We then show that for the randomized version of Quicksort, the number of element swaps performed is provably adaptive with respect to the measure Inv. More precisely, we prove that randomized Quicksort performs expected O(n(1+log(1+ Inv/n))) element swaps, where Inv denotes the number of inversions in the input sequence. This result provides a theoretical explanation for the observed behavior, and gives new insights on the behavior of the Quicksort algorithm. We also give some empirical results on the adaptive behavior of Heapsort and Mergesort.
Low Depth CacheOblivious Algorithms
, 2009
"... In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural s ..."
Abstract
 Add to MetaCart
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cacheoblivious model. We describe several cacheoblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparsematrix vector multiply on matrices with good vertex separators. Our sorting algorithm yields the first cacheoblivious algorithms with polylogarithmic depth and low sequential cache complexities for list ranking, Euler tour tree labeling, tree contraction, least common ancestors, graph connectivity, and minimum spanning forest. Using known mappings, our results lead to low cache complexities on multicore processors (and sharedmemory multiprocessors) with a single level of private caches or a single shared cache. We generalize these mappings to a multilevel parallel treeofcaches model that reflects current and future trends in multicore cache hierarchies—these new mappings imply that our algorithms also have low cache complexities on such hierarchies. The key factor in obtaining these low parallel cache complexities is the low depth of the
and
"... We study the performance of the most practical inversionsensitive internal sorting algorithms. Experimental results illustrate that adaptive AVL sort consumes the fewest number of comparisons unless the number of inversions is less than 1%; in such case Splaysort consumes the fewest number of compa ..."
Abstract
 Add to MetaCart
We study the performance of the most practical inversionsensitive internal sorting algorithms. Experimental results illustrate that adaptive AVL sort consumes the fewest number of comparisons unless the number of inversions is less than 1%; in such case Splaysort consumes the fewest number of comparisons. On the other hand, the running time of Quicksort is superior unless the number of inversions is less than 1.5%; in such case Splaysort has the shortest running time. Another interesting result is that although the number of cache misses for the cacheoptimal Greedysort algorithm was the least, compared to other adaptive sorting algorithms under investigation, it was outperformed by Quicksort.
unknown title
, 2006
"... We analyze the problem of sparsematrix densevector multiplication (SpMV) in the I/O model. In the SpMV, the objective is to compute y = Ax, where A is a sparse matrix and x and y are vectors. We give tight upper and lower bounds on the number of block transfers as a function of the sparsity k, the ..."
Abstract
 Add to MetaCart
We analyze the problem of sparsematrix densevector multiplication (SpMV) in the I/O model. In the SpMV, the objective is to compute y = Ax, where A is a sparse matrix and x and y are vectors. We give tight upper and lower bounds on the number of block transfers as a function of the sparsity k, the number of nonzeros in a column of A. Parameter k is a knob that bridges the problems of permuting (k = 1) and dense matrix multiplication (k = N). When the nonzero elements of A are stored in columnmajor order, SpMV takes O min of Ω min
Intel Labs Pittsburgh
, 2009
"... In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural s ..."
Abstract
 Add to MetaCart
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cacheoblivious model. We describe several cacheoblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparsematrix vector multiply on matrices with good vertex separators. Our sorting algorithm yields the first cacheoblivious algorithms with polylogarithmic depth and low sequential cache complexities for list ranking, Euler tour tree labeling, tree contraction, least common ancestors, graph connectivity, and minimum spanning forest. Using known mappings, our results lead to low cache complexities on multicore processors (and sharedmemory