Results 1  10
of
19
Engineering a cacheoblivious sorting algorithm
 In Proc. 6th Workshop on Algorithm Engineering and Experiments
, 2004
"... The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory mod ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory model. Since the introduction of the cacheoblivious model by Frigo et al. in 1999, a number of algorithms and data structures in the model has been proposed and analyzed. However, less attention has been given to whether the nice theoretical proporities of cacheoblivious algorithms carry over into practice. This paper is an algorithmic engineering study of cacheoblivious sorting. We investigate a number of implementation issues and parameters choices for the cacheoblivious sorting algorithm Lazy Funnelsort by empirical methods, and compare the final algorithm with Quicksort, the established standard for comparison based sorting, as well as with recent cacheaware proposals. The main result is a carefully implemented cacheoblivious sorting algorithm, which we compare to the best implementation of Quicksort we can find, and find that it competes very well for input residing in RAM, and outperforms Quicksort for input on disk. 1
The cost of cacheoblivious searching
 IN PROC. 44TH ANN. SYMP. ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2003
"... This paper gives tight bounds on the cost of cacheoblivious searching. The paper shows that no cacheoblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the bloc ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
This paper gives tight bounds on the cost of cacheoblivious searching. The paper shows that no cacheoblivious search structure can guarantee a search performance of fewer than lgelog B N memory transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the block sizes are limited to be powers of 2. The paper gives modified versions of the van Emde Boas layout, where the expected number of memory transfers between any two levels of the memory hierarchy is arbitrarily close to [lge+O(lglgB/lgB)]log B N +O(1). This factor approaches lge ≈ 1.443 as B increases. The expectation is taken over the random placement in memory of the first element of the structure. Because searching in the diskaccess machine (DAM) model can be performed in log B N+O(1) block transfers, thisresultestablishes aseparation between the (2level) DAM model and cacheoblivious model. The DAM model naturally extends to k levels. The paper also shows that as k grows, the search costs of the optimal klevel DAM search structure and the optimal cacheoblivious search structure rapidly converge. This result demonstrates that for a multilevel memory hierarchy, a simple cacheoblivious structure almost replicates the performance of an optimal parameterized klevel DAM structure.
CacheOblivious Streaming Btrees
, 2007
"... A streaming Btree is a dictionary that efficiently implements insertions and range queries. We present two cacheoblivious streaming Btrees, the shuttle tree, and the cacheoblivious lookahead array (COLA). For blocktransfer size B and on N elements, the shuttle tree implements searches in optima ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
A streaming Btree is a dictionary that efficiently implements insertions and range queries. We present two cacheoblivious streaming Btrees, the shuttle tree, and the cacheoblivious lookahead array (COLA). For blocktransfer size B and on N elements, the shuttle tree implements searches in optimal O ` logB+1 N ´ transfers, range queries of L successive elements in optimal O ` logB+1 N + L/B ´ transfers, and insertions in O “ (logB+1 N)/BΘ(1/(loglogB)2 ”) +(log2 N)/B transfers, which is an asymptotic speedup over traditional Btrees if B ≥ (logN) 1+c/logloglog2 N for any constant c> 1. A COLA implements searches in O(logN) transfers, range queries in O(logN + L/B) transfers, and insertions in amortized O((logN)/B) transfers, matching the bounds for a (cacheaware) buffered repository tree. A partially deamortized COLA matches these bounds but reduces the worstcase insertion cost to O(logN) if memory size M = Ω(logN). We also present a cacheaware version of the COLA, the lookahead array, which achieves the same bounds as Brodal and Fagerberg’s (cacheaware) Bεtree. We compare our COLA implementation to a traditional Btree. Our COLA implementation runs 790 times faster for random insertions, 3.1 times slower for insertions of sorted data, and 3.5 times slower for searches.
Vitter: On Searching Compressed String Collections CacheObliviously
 PODS
"... Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cachefriendly and compressed. We provide new insights on f ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cachefriendly and compressed. We provide new insights on front coding [24], introduce other novel linearizations, and study how close their space occupancy is to the informationtheoretic minimum. The moral is that they are not just heuristics. Our second contribution is a novel dictionary encoding scheme that builds upon such linearizations and achieves nearly optimal space, offers competitive I/Osearch time, and is also conscious of the query distribution. Finally, we combine those data structures with cacheoblivious tries [2, 5] and obtain a succinct variant whose space is close to the informationtheoretic minimum.
CacheOblivious Databases: Limitations and Opportunities
, 2008
"... Cacheoblivious techniques, proposed in the theory community, have optimal asymptotic bounds on the amount of data transferred between any two adjacent levels of an arbitrary memory hierarchy. Moreover, this optimal performance is achieved without any hardware platform specific tuning. These propert ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Cacheoblivious techniques, proposed in the theory community, have optimal asymptotic bounds on the amount of data transferred between any two adjacent levels of an arbitrary memory hierarchy. Moreover, this optimal performance is achieved without any hardware platform specific tuning. These properties are highly attractive to autonomous databases, especially because the hardware architectures are becoming increasingly complex and diverse. In this paper, we present our design, implementation, and evaluation of the first cacheoblivious inmemory query processor, EaseDB. Moreover, we discuss the inherent limitations of the cacheoblivious approach as well as the opportunities given by the upcoming hardware architectures. Specifically, a cacheoblivious technique usually requires sophisticated algorithm design to achieve a comparable performance to its cacheconscious counterpart. Nevertheless, this developmenttime effort is compensated by the automaticity of performance achievement and the reduced ownership cost. Furthermore, this automaticity enables cacheoblivious techniques to outperform their cacheconscious counterparts in multithreading processors.
Fast and compact hash tables for integer keys
 in Proc. 32nd Australasian Conf. Comput. Sci. (ACSC’09), 2009
"... A hash table is a fundamental data structure in computer science that can offer rapid storage and retrieval of data. A leading implementation for string keys is the cacheconscious array hash table. Although fast with strings, there is currently no information in the research literature on its perfor ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
A hash table is a fundamental data structure in computer science that can offer rapid storage and retrieval of data. A leading implementation for string keys is the cacheconscious array hash table. Although fast with strings, there is currently no information in the research literature on its performance with integer keys. More importantly, we do not know how efficient an integerbased array hash table is compared to other hash tables that are designed for integers, such as bucketized cuckoo hashing. In this paper, we explain how to efficiently implement an array hash table for integers. We then demonstrate, through careful experimental evaluations, which hash table, whether it be a bucketized cuckoo hash table, an array hash table, or alternative hash table schemes such as linear probing, offers the best performance—with respect to time and space— for maintaining a large dictionary of integers inmemory, on a current cacheoriented processor.
An Adaptive PackedMemory Array
"... The packedmemory array (PMA) is a data structure that maintains a dynamic set of N elements in sorted order in a Θ(N)sized array. The idea is to intersperse Θ(N) empty spaces or gaps among the elements so that only a small number of elements need to be shifted around on an insert or delete. Becaus ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The packedmemory array (PMA) is a data structure that maintains a dynamic set of N elements in sorted order in a Θ(N)sized array. The idea is to intersperse Θ(N) empty spaces or gaps among the elements so that only a small number of elements need to be shifted around on an insert or delete. Because the elements are stored physically in sorted order in memory or on disk, the PMA can be used to support extremely efficient range queries. Specifically, the cost to scan L consecutive elements is O(1 + L/B) memory transfers. This paper gives the first adaptive packedmemory array (APMA), which automatically adjusts to the input pattern. Like the traditional PMA, any pattern of updates costs only O(log 2 N) amortized element moves and O(1 + (log 2 N)/B) amortized memory transfers per update. However, the APMA performs even better on many common input distributions achieving only O(logN) amortized element moves and O(1 + (logN)/B) amortized memory transfers. The paper analyzes sequential inserts, where the insertions are to the front of the APMA, hammer inserts, where the insertions “hammer ” on one part of the APMA, random inserts, where the insertions are after random elements in the APMA, and bulk inserts, where for constant α ∈ [0,1], N α elements are inserted after random elements in the APMA. The paper then gives simulation results that are consistent with the asymptotic bounds. For sequential insertions of roughly 1.4 million elements, the APMA has four times fewer element moves per insertion than the traditional PMA and running times that are more than seven times faster.
Lightweight data indexing and compression in external memory
 In Proc. 8th Latin American Symposium on Theoretical Informatics (LATIN
, 2010
"... Abstract. In this paper we describe algorithms for computing the BWT and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size n, they use only n bits of disk working space while all previou ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. In this paper we describe algorithms for computing the BWT and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size n, they use only n bits of disk working space while all previous approaches use Θ(n log n) bits of disk working space. Moreover, our algorithms access disk data only via sequential scans, thus they take full advantage of modern disk features that make sequential disk accesses much faster than random accesses. We also present a scanbased algorithm for inverting the BWT that uses Θ(n) bits of working space, and a lightweight internalmemory algorithm for computing the BWT which is the fastest in the literature when the available working space is o(n) bits. Finally, we prove lower bounds on the complexity of computing and inverting the BWT via sequential scans in terms of the classic product: internalmemory space × number of passes over the disk data. 1
Redesigning the String Hash Table, Burst Trie, and BST to Exploit Cache
, 2011
"... A key decision when developing inmemory computing applications is choice of a mechanism to store and retrieve strings. The most efficient current data structures for this task are the hash table with movetofront chains and the burst trie, both of which use linked lists as a substructure, and vari ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
A key decision when developing inmemory computing applications is choice of a mechanism to store and retrieve strings. The most efficient current data structures for this task are the hash table with movetofront chains and the burst trie, both of which use linked lists as a substructure, and variants of binary search tree. These data structures are computationally efficient, but typical implementations use large numbers of nodes and pointers to manage strings, which is not efficient in use of cache. In this article, we explore two alternatives to the standard representation: the simple expedient of including the string in its node, and, for linked lists, the more drastic step of replacing each list of nodes by a contiguous array of characters. Our experiments show that, for large sets of strings, the improvement is dramatic. For hashing, in the best case the total space overhead is reduced to less than 1 bit per string. For the burst trie, over 300MB of strings can be stored in a total of under 200MB of memory with significantly improved search time. These results, on a variety of data sets, show that cachefriendly variants of fundamental data structures can yield remarkable gains in performance.
CacheOblivious Index for Approximate String Matching
, 2007
"... This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its kerror matches in T efficiently. This problem is wellstu ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its kerror matches in T efficiently. This problem is wellstudied in the internalmemory setting. Here, we extend some of these recent results to externalmemory solutions, which are also cacheoblivious. Our first index occupies O((n log k n)/B) disk pages and finds all kerror matches with O((P  + occ)/B +log k n log log B n) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first externalmemory data structure that does not require Ω(P  + occ +poly(logn)) I/Os. The second index reduces the space to O((n log n)/B) disk pages, and the I/O complexity is O((P  + occ)/B +log k(k+1) n log log n).