Results 1  10
of
20
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 83 (37 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
Cacheoblivious priority queue and graph algorithm applications
 In Proc. 34th Annual ACM Symposium on Theory of Computing
, 2002
"... In this paper we develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hi ..."
Abstract

Cited by 69 (11 self)
 Add to MetaCart
(Show Context)
In this paper we develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in O ( 1 B logM/B N) amortized memory B transfers, where M and B are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hierarchy. In a cacheoblivious data structure, M and B are not used in the description of the structure. The bounds match the bounds of several previously developed externalmemory (cacheaware) priority queue data structures, which all rely crucially on knowledge about M and B. Priority queues are a critical component in many of the best known externalmemory graph algorithms, and using our cacheoblivious priority queue we develop several cacheoblivious graph algorithms.
WorstCase Efficient ExternalMemory Priority Queues
 In Proc. Scandinavian Workshop on Algorithms Theory, LNCS 1432
, 1998
"... . A priority queue Q is a data structure that maintains a collection of elements, each element having an associated priority drawn from a totally ordered universe, under the operations Insert, which inserts an element into Q, and DeleteMin, which deletes an element with the minimum priority from ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
(Show Context)
. A priority queue Q is a data structure that maintains a collection of elements, each element having an associated priority drawn from a totally ordered universe, under the operations Insert, which inserts an element into Q, and DeleteMin, which deletes an element with the minimum priority from Q. In this paper a priorityqueue implementation is given which is efficient with respect to the number of block transfers or I/Os performed between the internal and external memories of a computer. Let B and M denote the respective capacity of a block and the internal memory measured in elements. The developed data structure handles any intermixed sequence of Insert and DeleteMin operations such that in every disjoint interval of B consecutive priorityqueue operations at most c log M=B N M I/Os are performed, for some positive constant c. These I/Os are divided evenly among the operations: if B c log M=B N M , one I/O is necessary for every B=(c log M=B N M )th operation ...
Performance Engineering Case Study: Heap Construction
 WAE, LNCS
, 1999
"... this paper we study, both analytically and experimentally, the performance of programs that construct a binary heap [Williams 1964] in a hierarchical memory system. Especially, we consider the largest memory level that is too small to fit the whole heap. We call that particular level simply the cach ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
this paper we study, both analytically and experimentally, the performance of programs that construct a binary heap [Williams 1964] in a hierarchical memory system. Especially, we consider the largest memory level that is too small to fit the whole heap. We call that particular level simply the cache. It should, however, be emphasized that our analysis is valid for the memory levels below this cache as well, provided that all our assumptions are fulfilled. We let B denote the size of the blocks transferred between the cache and the memory level above it, and M the capacity of the cache, both measured in elements being manipulated.
Dynamic external hashing: The limit of buffering
 In Proc. ACM Symposium on Parallelism in Algorithms and Architectures
, 2009
"... Hash tables are one of the most fundamental data structures in computer science, in both theory and practice. They are especially useful in external memory, where their query performance approaches the ideal cost of just one disk access. Knuth [16] gave an elegant analysis showing that with some sim ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Hash tables are one of the most fundamental data structures in computer science, in both theory and practice. They are especially useful in external memory, where their query performance approaches the ideal cost of just one disk access. Knuth [16] gave an elegant analysis showing that with some simple collision resolution strategies such as linear probing or chaining, the expected average number of disk I/Os of a lookup is merely 1 + 1/2 Ω(b) , where each I/O can read and/or write a disk block containing b items. Inserting a new item into the hash table also costs 1 + 1/2 Ω(b) I/Os, which is again almost the best one can do if the hash table is entirely stored on disk. However, this requirement is unrealistic since any algorithm operating on an external hash table must have some internal memory (at least Ω(1) blocks) to work with. The availability of a small internal memory buffer can dramatically reduce the amortized insertion cost to o(1) I/Os for many external memory data structures. In this paper we study the inherent queryinsertion tradeoff of external hash tables in the presence of a memory buffer. In particular, we show that for any constant c> 1, if the expected average successful query cost is targeted at 1 + O(1/b c) I/Os, then it is not possible to support insertions in less than 1 − O(1/b c−1 6) I/Os amortized, which means that the memory buffer is essentially useless. While if the query cost is relaxed to 1 + O(1/b c) I/Os for any constant c < 1, there is a simple dynamic hash table with o(1) insertion cost. Categories and Subject Descriptors F.2.3 [Analysis of algorithms and problem complexity]: Tradeoffs between complexity measures; E.2 [Data storage]: hashtable representations
An Optimal CacheOblivious Priority Queue and its Application to Graph Algorithms
 SIAM JOURNAL ON COMPUTING
, 2007
"... We develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in $O(\frac{1}{B}\log_{M/B}\frac{N}{B})$ amortized memory transfers, where $M$ and $B$ are the memory and block transfer sizes of any two consecutive levels of a multilevel ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We develop an optimal cacheoblivious priority queue data structure, supporting insertion, deletion, and deletemin operations in $O(\frac{1}{B}\log_{M/B}\frac{N}{B})$ amortized memory transfers, where $M$ and $B$ are the memory and block transfer sizes of any two consecutive levels of a multilevel memory hierarchy. In a cacheoblivious data structure, $M$ and $B$ are not used in the description of the structure. Our structure is as efficient as several previously developed external memory (cacheaware) priority queue data structures, which all rely crucially on knowledge about $M$ and $B$. Priority queues are a critical component in many of the best known external memory graph algorithms, and using our cacheoblivious priority queue we develop several cacheoblivious graph algorithms.
The limits of buffering: A tight lower bound for dynamic membership in the external memory model
 In Proc. ACM Symposium on Theory of Computing
, 2010
"... We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental problems in data structures. We study the problem in the external memory model with cell size b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any ot ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental problems in data structures. We study the problem in the external memory model with cell size b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any other constant < 1), then the query cost must be Ω(logb log n (n/m)), where n is the number of elements in the dictionary. In contrast, when the update time is allowed to be 1 + o(1), then a bit vector or hash table give query time O(1). Thus, this is a threshold phenomenon for data structures. This lower bound answers a folklore conjecture of the external memory community. Since almost any data structure task can solve membership, our lower bound implies a dichotomy between two alternatives: (i) make the amortized update time at least 1 (so the data structure does not buffer, and we lose one of the main potential advantages of the cache), or (ii) make the query time at least roughly logarithmic in n. Our result holds even when the updates and queries are chosen uniformly at random and there are no deletions; it holds for randomized data structures, holds when the universe size is O(n), and does not make any restrictive assumptions such as indivisibility. All of the lower bounds we prove hold regardless of the space consumption of the data structure, while the upper bounds only need linear space. The lower bound has some striking implications for external memory data structures. It shows that the query complexities of many problems such as 1Drange counting, predecessor, rankselect, and many others, are all the same
On the Cell Probe Complexity of Dynamic Membership
"... We study the dynamic membership problem, one of the most fundamental data structure problems, in the cell probe model with an arbitrary cell size. We consider a cell probe model equipped with a cache that consists of at least a constant number of cells; reading or writing the cache is free of charge ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We study the dynamic membership problem, one of the most fundamental data structure problems, in the cell probe model with an arbitrary cell size. We consider a cell probe model equipped with a cache that consists of at least a constant number of cells; reading or writing the cache is free of charge. For nearly all common data structures, it is known that with sufficiently large cells together with the cache, we can significantly lower the amortized update cost to o(1). In this paper, we show that this is not the case for the dynamic membership problem. Specifically, for any deterministic membership data structure under a random input sequence, if the expected average query cost is no more than 1+δ for some small constant δ, we prove that the expected amortized update cost must be at least Ω(1), namely, it does not benefit from large block writes (and a cache). The space the structure uses is irrelevant to this lower bound. We also extend this lower bound to randomized membership structures, by using a variant of Yao’s minimax principle. Finally, we show that the structure cannot do better even if it is allowed to answer a query mistakenly with a small constant probability. 1
External Memory Geometric Data Structures, Handbook of Massive Data Sets
, 2002
"... Many modern applications store and process datasets much larger than the main memory of even stateoftheart highend machines. Thus massive and dynamically changing datasets often need to be stored in space efficient data structures on external storage devices such as disks. In such cases the Inpu ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Many modern applications store and process datasets much larger than the main memory of even stateoftheart highend machines. Thus massive and dynamically changing datasets often need to be stored in space efficient data structures on external storage devices such as disks. In such cases the Input/Output (or