Results 1 
8 of
8
Engineering a cacheoblivious sorting algorithm
 In Proc. 6th Workshop on Algorithm Engineering and Experiments
, 2004
"... The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory mod ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory model. Since the introduction of the cacheoblivious model by Frigo et al. in 1999, a number of algorithms and data structures in the model has been proposed and analyzed. However, less attention has been given to whether the nice theoretical proporities of cacheoblivious algorithms carry over into practice. This paper is an algorithmic engineering study of cacheoblivious sorting. We investigate a number of implementation issues and parameters choices for the cacheoblivious sorting algorithm Lazy Funnelsort by empirical methods, and compare the final algorithm with Quicksort, the established standard for comparison based sorting, as well as with recent cacheaware proposals. The main result is a carefully implemented cacheoblivious sorting algorithm, which we compare to the best implementation of Quicksort we can find, and find that it competes very well for input residing in RAM, and outperforms Quicksort for input on disk. 1
Low Depth CacheOblivious Algorithms
, 2009
"... In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural s ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cacheoblivious model. We describe several cacheoblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparsematrix vector multiply on matrices with good vertex separators. Our sorting algorithm yields the first cacheoblivious algorithms with polylogarithmic depth and low sequential cache complexities for list ranking, Euler tour tree labeling, tree contraction, least common ancestors, graph connectivity, and minimum spanning forest. Using known mappings, our results lead to low cache complexities on multicore processors (and sharedmemory multiprocessors) with a single level of private caches or a single shared cache. We generalize these mappings to a multilevel parallel treeofcaches model that reflects current and future trends in multicore cache hierarchies—these new mappings imply that our algorithms also have low cache complexities on such hierarchies. The key factor in obtaining these low parallel cache complexities is the low depth of the
Cacheoblivious algorithms and data structures
 IN SWAT
, 2004
"... Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as stand ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the twolevel I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal offline cache replacement strategy. The result are algorithms that automatically apply to multilevel memory hierarchies. This paper gives an overview of the results achieved on cacheoblivious algorithms and data structures since the seminal paper by Frigo et al.
Masking Patterns in Sequences: A New Class of Motif Discovery with Don’t Cares
, 2009
"... In this paper, we introduce a new notion of motifs, called masks, that succinctly represent the repeated patterns for an input sequence T of n symbols drawn from an alphabet Σ. We show how to build the set of all maximal masks of length L and quorum q, in O(2 L n) time and space in the worst case. W ..."
Abstract
 Add to MetaCart
In this paper, we introduce a new notion of motifs, called masks, that succinctly represent the repeated patterns for an input sequence T of n symbols drawn from an alphabet Σ. We show how to build the set of all maximal masks of length L and quorum q, in O(2 L n) time and space in the worst case. We analytically show that our algorithms perform better than constanttime enumerating and checking all the potential (Σ  + 1) L candidate patterns in T after a polynomialtime preprocessing of T. Our algorithms are also cachefriendly, attaining O(2 L sort(n)) block transfers, where sort(n) is the cache oblivious complexity of sorting n items. Key words: Motif inference, motifs with don’t care, motif partial order, motifs with masks. 1.
and
"... This paper is an algorithmic engineering study of cacheoblivious sorting. We investigate by empirical methods a number of implementation issues and parameter choices for the cacheoblivious sorting algorithm Lazy Funnelsort and compare the final algorithm with Quicksort, the established standard f ..."
Abstract
 Add to MetaCart
This paper is an algorithmic engineering study of cacheoblivious sorting. We investigate by empirical methods a number of implementation issues and parameter choices for the cacheoblivious sorting algorithm Lazy Funnelsort and compare the final algorithm with Quicksort, the established standard for comparisonbased sorting, as well as with recent cacheaware proposals. The main result is a carefully implemented cacheoblivious sorting algorithm, which, our experiments show, can be faster than the best Quicksort implementation we are able to find for input sizes well within the limits of RAM. It is also at least as fast as the recent cacheaware implementations included in the test. On disk, the difference is even more pronounced regarding Quicksort and the cacheaware algorithms, whereas the algorithm is slower than a careful implementation of multiway Mergesort,
Optimal InPlace Sorting of Vectors and Records
"... Abstract. We study the problem of determining the complexity of optimal comparisonbased inplace sorting when the key length, k, is not a constant. We present the first algorithm for lexicographically sorting n keys in O(nk+n log n) time using O(1) auxiliary data locations, which is simultaneously ..."
Abstract
 Add to MetaCart
Abstract. We study the problem of determining the complexity of optimal comparisonbased inplace sorting when the key length, k, is not a constant. We present the first algorithm for lexicographically sorting n keys in O(nk+n log n) time using O(1) auxiliary data locations, which is simultaneously optimal in time and space. 1
A Novel InPlace Sorting Algorithm with O(n log z) Comparisons and O(n log z) Moves
, 2006
"... Abstract—Inplace sorting algorithms play an important role in many fields such as very large database systems, data warehouses, data mining, etc. Such algorithms maximize the size of data that can be processed in main memory without input/output operations. In this paper, a novel inplace sorting a ..."
Abstract
 Add to MetaCart
Abstract—Inplace sorting algorithms play an important role in many fields such as very large database systems, data warehouses, data mining, etc. Such algorithms maximize the size of data that can be processed in main memory without input/output operations. In this paper, a novel inplace sorting algorithm is presented. The algorithm comprises two phases; rearranging the input unsorted array in place, resulting segments that are ordered relative to each other but whose elements are yet to be sorted. The first phase requires linear time, while, in the second phase, elements of each segment are sorted inplace in the order of z log (z), where z is the size of the segment, and O(1) auxiliary storage. The algorithm performs, in the worst case, for an array of size n, an O(n log z) element comparisons and O(n log z) element moves. Further, no auxiliary arithmetic operations with indices are required. Besides these theoretical achievements of this algorithm, it is of practical interest, because of its simplicity. Experimental results also show that it outperforms other inplace sorting algorithms. Finally, the analysis of time and space complexity, and required number of moves are presented, along with the auxiliary storage requirements of the proposed algorithm. Keywords—Auxiliary storage sorting, inplace sorting, sorting. I.
ProgramCentric Cost Models for Locality and Parallelism
, 2013
"... Good locality is critical for the scalability of parallel computations. Many cost models that quantify locality and parallelism of a computation with respect to specific machine models have been proposed. A significant drawback of these machinecentric cost models is their lack of portability. Sinc ..."
Abstract
 Add to MetaCart
Good locality is critical for the scalability of parallel computations. Many cost models that quantify locality and parallelism of a computation with respect to specific machine models have been proposed. A significant drawback of these machinecentric cost models is their lack of portability. Since the design and analysis of good algorithms in most machinecentric cost models is a nontrivial task, lack of portability can lead to a significant wastage of design effort. Therefore, a machineindependent portable cost model for locality and parallelism that is relevant to a broad class of machines can be a valuable guide for the design of portable and scalable algorithms as well as for understanding the complexity of problems. This thesis addresses the problem of portable analysis by presenting programcentric metrics for measuring the locality and parallelism of nestedparallel programs written for shared memory machines – metrics based solely on the program structure without reference to machine parameters such as processors, caches and connections. The metrics we present for this purpose are the parallel cache com