Results 1  10
of
11
Succinct indexable dictionaries with applications to encoding kary trees and multisets
 In Proceedings of the 13th Annual ACMSIAM Symposium on Discrete Algorithms (SODA
"... We consider the indexable dictionary problem, which consists of storing a set S ⊆ {0,...,m − 1} for some integer m, while supporting the operations of rank(x), which returns the number of elements in S that are less than x if x ∈ S, and −1 otherwise; and select(i) which returns the ith smallest ele ..."
Abstract

Cited by 191 (7 self)
 Add to MetaCart
We consider the indexable dictionary problem, which consists of storing a set S ⊆ {0,...,m − 1} for some integer m, while supporting the operations of rank(x), which returns the number of elements in S that are less than x if x ∈ S, and −1 otherwise; and select(i) which returns the ith smallest element in S. We give a data structure that supports both operations in O(1) time on the RAM model and requires B(n,m)+ o(n)+O(lg lg m) bits to store a set of size n, where B(n,m) = ⌈ lg ( m) ⌉ n is the minimum number of bits required to store any nelement subset from a universe of size m. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the O(lg lg m) additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: • an informationtheoretically optimal representation of a kary cardinal tree that supports standard operations in constant time, • a representation of a multiset of size n from {0,...,m − 1} in B(n,m+n) + o(n) bits that supports (appropriate generalizations of) rank and select operations in constant time, and • a representation of a sequence of n nonnegative integers summing up to m in B(n,m + n) + o(n) bits that supports prefix sum queries in constant time. 1
Reducing the Space Requirement of Suffix Trees
 Software – Practice and Experience
, 1999
"... We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average ..."
Abstract

Cited by 118 (10 self)
 Add to MetaCart
We show that suffix trees store various kinds of redundant information. We exploit these redundancies to obtain more space efficient representations. The most space efficient of our representations requires 20 bytes per input character in the worst case, and 10.1 bytes per input character on average for a collection of 42 files of different type. This is an advantage of more than 8 bytes per input character over previous work. Our representations can be constructed without extra space, and as fast as previous representations. The asymptotic running times of suffix tree applications are retained. Copyright © 1999 John Wiley & Sons, Ltd. KEY WORDS: data structures; suffix trees; implementation techniques; space reduction
Space Efficient Hash Tables With Worst Case Constant Access Time
 In STACS
, 2003
"... We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes and the expected amortized insertion time is constant. This is the first dictionary that has worst case constant access time and expected constant update time, works with (1 + ffl) n space, and supports satellite information. Experiments indicate that d = 4 choices suffice for ffl 0:03. We also describe variants of the data structure that allow the use of hash functions that can be evaluted in constant time.
Bonsai: A Compact Representation of Trees
, 1993
"... This paper shows how trees can be stored in a very compact form, called `Bonsai', using hash tables. A method is described that is suitable for large trees that grow monotonically within a predefined maximum size limit. Using it, pointers in any tree can be represented within 6 +log 2 n bits per nod ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
This paper shows how trees can be stored in a very compact form, called `Bonsai', using hash tables. A method is described that is suitable for large trees that grow monotonically within a predefined maximum size limit. Using it, pointers in any tree can be represented within 6 +log 2 n bits per node where n is the maximum number of children a node can have. We first describe a general way of storing trees in hash tables, and then introduce the idea of compact hashing which underlies the Bonsai structure. These two techniques are combined to give a compact representation of trees, and a practical methodology is set out to permit the design of these structures. The new representation is compared with two conventional tree implementations in terms of the storage required per node. Examples of programs that must store large trees within a strict maximum size include those that operate on trie structures derived from natural language text. We describe how the Bonsai technique has been applied to the trees that arise in text compression and adaptive prediction, and include a discussion of the design parameters that work well in practice
Compact Dictionaries for VariableLength Keys and Data, with Applications
, 2007
"... We consider the problem of maintaining a dynamic dictionary T of keys and associated data for which both the keys and data are bit strings that can vary in length from zero up to the length w of a machine word. We present a data structure for this variablebitlength dictionary problem that supports ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We consider the problem of maintaining a dynamic dictionary T of keys and associated data for which both the keys and data are bit strings that can vary in length from zero up to the length w of a machine word. We present a data structure for this variablebitlength dictionary problem that supports constant time lookup and expected amortized constant time insertion and deletion. It uses O(m + 3n − n log 2 n) bits, where n is the number of elements in T, and m is the total number of bits across all strings in T (keys and data). Our dictionary uses an array A[1... n] in which locations store variablebitlength strings. We present a data structure for this variablebitlength array problem that supports worstcase constanttime lookups and updates and uses O(m + n) bits, where m is the total number of bits across all strings stored in A. The motivation for these structures is to support applications for which it is helpful to efficiently store short varying length bit strings. We present several applications, including representations for semidynamic graphs, order queries on integers sets, cardinal trees with varying cardinality, and simplicial meshes of d dimensions. These results either generalize or simplify previous results.
Compact Data Structures with Fast Queries
, 2005
"... Many applications dealing with large data structures can benefit from keeping them in compressed form. Compression has many benefits: it can allow a representation to fit in main memory rather than swapping out to disk, and it improves cache performance since it allows more data to fit into the c ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Many applications dealing with large data structures can benefit from keeping them in compressed form. Compression has many benefits: it can allow a representation to fit in main memory rather than swapping out to disk, and it improves cache performance since it allows more data to fit into the cache. However, a data structure is only useful if it allows the application to perform fast queries (and updates) to the data.
Cache, Hash and SpaceEfficient Bloom Filters
"... Abstract. A Bloom filter is a very compact data structure that supports approximate membership queries on a set, allowing false positives. We propose several new variants of Bloom filters and replacements with similar functionality. All of them have a better cacheefficiency and need less hash bits ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. A Bloom filter is a very compact data structure that supports approximate membership queries on a set, allowing false positives. We propose several new variants of Bloom filters and replacements with similar functionality. All of them have a better cacheefficiency and need less hash bits than regular Bloom filters. Some use SIMD functionality, while the others provide an even better space efficiency. As a consequence, we get a more flexible tradeoff between false positive rate, spaceefficiency, cacheefficiency, hashefficiency, and computational effort. We analyze the efficiency of Bloom filters and the proposed replacements in detail, in terms of the false positive rate, the number of expected cachemisses, and the number of required hash bits. We also describe and experimentally evaluate the performance of highlytuned implementations. For many settings, our alternatives perform better than the methods proposed so far. 1
Parallel Recursive State Compression for Free
 Proc. 18th Int. Spin Workshop on Model Checking Software, Springer Verlag, LNCS
, 2011
"... Abstract. State space exploration is a basic solution to many verification problems, but is limited by time and memory usage. Due to physical limits in modern CPUs, sequential exploration algorithms do not benefit automatically from the next generation of processors anymore, hence the need for multi ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. State space exploration is a basic solution to many verification problems, but is limited by time and memory usage. Due to physical limits in modern CPUs, sequential exploration algorithms do not benefit automatically from the next generation of processors anymore, hence the need for multicore solutions. This paper focuses on reducing memory usage in enumerative model checking, while maintaining the multicore scalability obtained in earlier work. We present a treebased multicore compression method, which works by leveraging sharing among subvectors of state vectors. An algorithmic analysis of both worstcase and optimal compression ratios shows the potential to compress even large states to a small constant on average (8 bytes). Our experiments demonstrate that this holds up in practice: the median compression ratio of 279 measured experiments is within 17 % of the optimum for tree compression, and five times better than the median compression ratio of Spin’s Collapse compression. Our algorithms are implemented in the LTSmin tool, and our experiments show that for model checking, multicore tree compression pays its own way: it comes virtually without overhead compared to the fastest hash tablebased methods. 1
Fast, AllPurpose State Storage
"... Abstract. Existing techniques for approximate storage of visited states in a model checker are too specialpurpose and too DRAMintensive. Bitstate hashing, based on Bloom filters, is good for exploring most of very large state spaces, and hash compaction is good for highassurance verification of m ..."
Abstract
 Add to MetaCart
Abstract. Existing techniques for approximate storage of visited states in a model checker are too specialpurpose and too DRAMintensive. Bitstate hashing, based on Bloom filters, is good for exploring most of very large state spaces, and hash compaction is good for highassurance verification of more tractable problems. We describe a scheme that is good at both, because it adapts at run time to the number of states visited. It does this within a fixed memory space and with remarkable speed and accuracy. In many cases, it is faster than existing techniques, because it only ever requires one random access to main memory per operation; existing techniques require several to have good accuracy. Adapting to accomodate more states happens in place using streaming access to memory; traditional rehashing would require extra space, random memory accesses, and hash computation. The structure can also incorporate search stack matching for partialorder reductions, saving the need for extra resources dedicated to an additional structure. Our scheme is wellsuited for a future in which random accesses to memory are more of a limiting factor than the size of memory. 1
Don’t Thrash: How
"... Many large storage systems use approximatemembershipquery (AMQ) data structures to deal with the massive amounts of data that they process. An AMQ data structure is a dictionary that trades off space for a false positive rate on membership queries. It is designed to fit into small, fast storage, an ..."
Abstract
 Add to MetaCart
Many large storage systems use approximatemembershipquery (AMQ) data structures to deal with the massive amounts of data that they process. An AMQ data structure is a dictionary that trades off space for a false positive rate on membership queries. It is designed to fit into small, fast storage, and it is used to avoid I/Os on slow storage. The Bloom filter is a wellknown example of an AMQ data structure. Bloom filters, however, do not scale outside of main memory. This paper describes the Cascade Filter TM, an AMQ data structure that scales beyond main memory, supporting over half a million insertions/deletions per second and over 500 lookups per second on a commodity flashbased SSD. 1