Results 11  20
of
47
Engineering a cacheoblivious sorting algorithm
 In Proc. 6th Workshop on Algorithm Engineering and Experiments
, 2004
"... The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory mod ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory model. Since the introduction of the cacheoblivious model by Frigo et al. in 1999, a number of algorithms and data structures in the model has been proposed and analyzed. However, less attention has been given to whether the nice theoretical proporities of cacheoblivious algorithms carry over into practice. This paper is an algorithmic engineering study of cacheoblivious sorting. We investigate a number of implementation issues and parameters choices for the cacheoblivious sorting algorithm Lazy Funnelsort by empirical methods, and compare the final algorithm with Quicksort, the established standard for comparison based sorting, as well as with recent cacheaware proposals. The main result is a carefully implemented cacheoblivious sorting algorithm, which we compare to the best implementation of Quicksort we can find, and find that it competes very well for input residing in RAM, and outperforms Quicksort for input on disk. 1
The cost of cacheoblivious searching
 IN PROC. 44TH ANN. SYMP. ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2003
"... This paper gives tight bounds on the cost of cacheoblivious searching. The paper shows that no cacheoblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the bloc ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
This paper gives tight bounds on the cost of cacheoblivious searching. The paper shows that no cacheoblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the block sizes are limited to be powers of 2. The paper gives modified versions of the van Emde Boas layout, where the expected number of memory transfers between any two levels of the memory hierarchy is arbitrarily close to [lge+O(lglgB/lgB)]log B N +O(1). This factor approaches lge ≈ 1.443 as B increases. The expectation is taken over the random placement in memory of the first element of the structure. Because searching in the diskaccess machine (DAM) model can be performed in log B N+O(1) block transfers, thisresultestablishes aseparation between the (2level) DAM model and cacheoblivious model. The DAM model naturally extends to k levels. The paper also shows that as k grows, the search costs of the optimal klevel DAM search structure and the optimal cacheoblivious search structure rapidly converge. This result demonstrates that for a multilevel memory hierarchy, a simple cacheoblivious structure almost replicates the performance of an optimal parameterized klevel DAM structure.
CacheOblivious Streaming Btrees
, 2007
"... A streaming Btree is a dictionary that efficiently implements insertions and range queries. We present two cacheoblivious streaming Btrees, the shuttle tree, and the cacheoblivious lookahead array (COLA). For blocktransfer size B and on N elements, the shuttle tree implements searches in optima ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
A streaming Btree is a dictionary that efficiently implements insertions and range queries. We present two cacheoblivious streaming Btrees, the shuttle tree, and the cacheoblivious lookahead array (COLA). For blocktransfer size B and on N elements, the shuttle tree implements searches in optimal O ` logB+1 N ´ transfers, range queries of L successive elements in optimal O ` logB+1 N + L/B ´ transfers, and insertions in O “ (logB+1 N)/BΘ(1/(loglogB)2 ”) +(log2 N)/B transfers, which is an asymptotic speedup over traditional Btrees if B ≥ (logN) 1+c/logloglog2 N for any constant c> 1. A COLA implements searches in O(logN) transfers, range queries in O(logN + L/B) transfers, and insertions in amortized O((logN)/B) transfers, matching the bounds for a (cacheaware) buffered repository tree. A partially deamortized COLA matches these bounds but reduces the worstcase insertion cost to O(logN) if memory size M = Ω(logN). We also present a cacheaware version of the COLA, the lookahead array, which achieves the same bounds as Brodal and Fagerberg’s (cacheaware) Bεtree. We compare our COLA implementation to a traditional Btree. Our COLA implementation runs 790 times faster for random insertions, 3.1 times slower for insertions of sorted data, and 3.5 times slower for searches.
Cache oblivious algorithms
 Algorithms for Memory Hierarchies, LNCS 2625
, 2003
"... Abstract. The cache oblivious model is a simple and elegant model to design algorithms that perform well in hierarchical memory models ubiquitous on current systems. This model was first formulated in [22] and has since been a topic of intense research. Analyzing and designing algorithms and data st ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Abstract. The cache oblivious model is a simple and elegant model to design algorithms that perform well in hierarchical memory models ubiquitous on current systems. This model was first formulated in [22] and has since been a topic of intense research. Analyzing and designing algorithms and data structures in this model involves not only an asymptotic analysis of the number of steps executed in terms of the input size, but also the movement of data optimally among the different levels of the memory hierarchy. This chapter is aimed as an introduction to the “idealcache ” model of [22] and techniques used to design cache oblivious algorithms. The chapter also presents some experimental insights and results. Part of this work was done while the author was visiting MPISaarbrücken. The
Cacheoblivious algorithms and data structures
 IN SWAT
, 2004
"... Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as stand ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the twolevel I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal offline cache replacement strategy. The result are algorithms that automatically apply to multilevel memory hierarchies. This paper gives an overview of the results achieved on cacheoblivious algorithms and data structures since the seminal paper by Frigo et al.
T.: Priority queues resilient to memory faults
 In: Proc. 10th International Workshop on Algorithms and Data Structures
, 2007
"... Abstract. In the faultymemory RAM model, the content of memory cells can get corrupted at any time during the execution of an algorithm, and a constant number of uncorruptible registers are available. A resilient data structure in this model works correctly on the set of uncorrupted values. In this ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Abstract. In the faultymemory RAM model, the content of memory cells can get corrupted at any time during the execution of an algorithm, and a constant number of uncorruptible registers are available. A resilient data structure in this model works correctly on the set of uncorrupted values. In this paper we introduce a resilient priority queue. The deletemin operation of a resilient priority queue returns either the minimum uncorrupted element or some corrupted element. Our resilient priority queue uses O(n) space to store n elements. Both insert and deletemin operations are performed in O(log n + δ) time amortized, where δ is the maximum amount of corruptions tolerated. Our priority queue matches the performance of classical optimal priority queues in the RAM model when the number of corruptions tolerated is O(log n). We prove matching worst case lower bounds for resilient priority queues storing only structural information in the uncorruptible registers between operations. 1
An Optimal CacheOblivious Priority Queue and its Application to Graph Algorithms
 SIAM JOURNAL ON COMPUTING
, 2007
"... We develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in $O(\frac{1}{B}\log_{M/B}\frac{N}{B})$ amortized memory transfers, where $M$ and $B$ are the memory and block transfer sizes of any two consecutive levels of a multilevel ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in $O(\frac{1}{B}\log_{M/B}\frac{N}{B})$ amortized memory transfers, where $M$ and $B$ are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hierarchy. In a cacheoblivious data structure, $M$ and $B$ are not used in the description of the structure. Our structure is as efficient as several previously developed external memory (cacheaware) priority queue data structures, which all rely crucially on knowledge about $M$ and $B$. Priority queues are a critical component in many of the best known external memory graph algorithms, and using our cacheoblivious priority queue we develop several cacheoblivious graph algorithms.
ExternalMemory Exact and Approximate AllPairs ShortestPaths in Undirected Graphs
, 2004
"... We present several new externalmemory algorithms for finding allpairs shortest paths in a Vnode, Eedge undirected graph. For allpairs shortest paths and diameter in unweighted undirected graphs we present cacheoblivious algorithnls with O(V. ~ log. ~ ~) I/Os, where B is the blocksize and M is ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We present several new externalmemory algorithms for finding allpairs shortest paths in a Vnode, Eedge undirected graph. For allpairs shortest paths and diameter in unweighted undirected graphs we present cacheoblivious algorithnls with O(V. ~ log. ~ ~) I/Os, where B is the blocksize and M is the size of internal memory. For weighted tmdirected graphs we present a cacheaware APSP algorithm that performs O(V. ( V/ ~ + ~ log ~)) I/Os. We also present efficient cacheaware algorithms that find paths between all pairs of vertices in an unweighted graph with lengths within a small additive constant of the shortest path length. All of our results improve earlier results known for these problems. For approximate APSP we provide the first nontrivial results. Our diameter result uses C9(V + E) extra space, and all of our other algorithms use O(V 2) space. 1
Simple and semidynamic structures for cacheoblivious planar orthogonal range searching
 In Proc. 22nd ACM Symposium on Computational Geometry
, 2006
"... In this paper, we develop improved cacheoblivious data structures for two and threesided planar orthogonal range searching. Our main result is an optimal static structure for twosided range searching that uses linear space and supports queries in O(logB N + T/B) memory transfers, where B is the ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
In this paper, we develop improved cacheoblivious data structures for two and threesided planar orthogonal range searching. Our main result is an optimal static structure for twosided range searching that uses linear space and supports queries in O(logB N + T/B) memory transfers, where B is the block size of any level in a multilevel memory hierarchy and T is the number of reported points. Our structure is the first linearspace cacheoblivious structure for a planar range searching problem with the optimal O(logB N +T/B) query bound. The structure is very simple, and we believe it to be of practical interest. We also show that our twosided range search structure can be constructed cacheobliviously in O(N logB N) memory transfers. Using the logarithmic method and fractional cascading, this leads to a semidynamic linearspace structure that supports twosided range queries in O(log2 N + T/B) memory transfers and insertions in O(log2 N ·logB N) memory transfers amortized. This structure is the first (semi)dynamic structure for any planar range searching problem with a query bound that is logarithmic in the number of elements in the structure and linear in the output size. Finally, using a simple standard construction, we also obtain a static O(N log2 N)space structure for threesided range searching that supports queries in the optimal bound of O(logB N +T/B) memory transfers. These bounds match the bounds of the best previously known structure for this