Results 1  10
of
16
SILT: A MemoryEfficient, HighPerformance KeyValue Store
 In Proc. 23rd ACM SOSP, Cascias
, 2011
"... SILT (Small Index Large Table) is a memoryefficient, highperformance keyvalue store system based on flash storage that scales to serve billions of keyvalue items on a single node. It requires only 0.7 bytes of DRAM per entry and retrieves key/value pairs using on average 1.01 flash reads each. S ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
SILT (Small Index Large Table) is a memoryefficient, highperformance keyvalue store system based on flash storage that scales to serve billions of keyvalue items on a single node. It requires only 0.7 bytes of DRAM per entry and retrieves key/value pairs using on average 1.01 flash reads each. SILT combines new algorithmic and systems techniques to balance the use of memory, storage, and computation. Our contributions include: (1) the design of three basic keyvalue stores each with a different emphasis on memoryefficiency and writefriendliness; (2) synthesis of the basic keyvalue stores to build a SILT keyvalue store system; and (3) an analytical model for tuning system parameters carefully to meet the needs of different workloads. SILT requires one to two orders of magnitude less memory to provide comparable throughput to current highperformance keyvalue systems on a commodity desktop system with flash storage.
Alphabetindependent compressed text indexing
 In ESA
, 2011
"... Abstract. Selfindexes can represent a text in asymptotically optimal space under the kth order entropy model, give access to text substrings, and support indexed pattern searches. Their time complexities are not optimal, however: they always depend on the alphabet size. In this paper we achieve, f ..."
Abstract

Cited by 14 (10 self)
 Add to MetaCart
Abstract. Selfindexes can represent a text in asymptotically optimal space under the kth order entropy model, give access to text substrings, and support indexed pattern searches. Their time complexities are not optimal, however: they always depend on the alphabet size. In this paper we achieve, for the first time, full alphabetindependence in the time complexities of selfindexes, while retaining space optimality. We obtain also some relevant byproducts on compressed suffix trees. 1
Theory and Practise of Monotone Minimal Perfect Hashing
"... Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, orderpreserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given orde ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, orderpreserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable �(n log n) lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practise, and provide a balance between access speed, ease of construction, and space usage. 1
Improved compressed indexes for fulltext document retrieval
 In Proc. 18th SPIRE
, 2011
"... Abstract. We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at lg D lg lg D least CSA  + O(n) or 2CSA  + o(n) bits of space, where CSA is a fulltext in ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Abstract. We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at lg D lg lg D least CSA  + O(n) or 2CSA  + o(n) bits of space, where CSA is a fulltext index. Using monotone minimum perfect hash functions, we give new algorithms for document listing with frequencies and topk document retrieval using just CSA  + O(n lg lg lg D) bits. We also improve current solutions that use 2CSA  + o(n) bits, and consider other problems such as colored range listing, topk most important documents, and computing arbitrary frequencies. 1 Introduction and Related Work Fulltext document retrieval is the problem of, given a collection of D documents (i.e., general sequences over alphabet [1, σ]), concatenated into a text T [1, n],
New lower and upper bounds for representing sequences
 CoRR
"... Abstract. Sequence representations supporting queries access, select and rank are at the core of many data structures. There is a considerable gap between different upper bounds, and the few lower bounds, known for such representations, and how they interact with the space used. In this article we p ..."
Abstract

Cited by 9 (8 self)
 Add to MetaCart
Abstract. Sequence representations supporting queries access, select and rank are at the core of many data structures. There is a considerable gap between different upper bounds, and the few lower bounds, known for such representations, and how they interact with the space used. In this article we prove a strong lower bound for rank, which holds for rather permissive assumptions on the space used, and give matching upper bounds that require only a compressed representation of the sequence. Within this compressed space, operations access and select can be solved within almostconstant time. 1
Encodings for Range Selection and Topk Queries
"... Abstract. We study the problem of encoding the positions the topk elements of an array A[1..n] for a given parameter 1 ≤ k ≤ n. Specifically, for any i and j, we wish create a data structure that reports the positions of the largest k elements in A[i..j] in decreasing order, without accessing A at ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract. We study the problem of encoding the positions the topk elements of an array A[1..n] for a given parameter 1 ≤ k ≤ n. Specifically, for any i and j, we wish create a data structure that reports the positions of the largest k elements in A[i..j] in decreasing order, without accessing A at query time. This is a natural extension of the wellknown encoding rangemaxima query problem, where only the position of the maximum in A[i..j] is sought, and finds applications in document retrieval and ranking. We give (sometimes tight) upper and lower bounds for this problem and some variants thereof. 1
Storing a Compressed Function with Constant Time Access
"... Abstract. We consider the problem of representing, in a spaceefficient way, a function f: S → Σ such that any function value can be computed in constant time on a RAM. Specifically, our aim is to achieve space usage close to the 0th order entropy of the sequence of function values. Our technique wo ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. We consider the problem of representing, in a spaceefficient way, a function f: S → Σ such that any function value can be computed in constant time on a RAM. Specifically, our aim is to achieve space usage close to the 0th order entropy of the sequence of function values. Our technique works for any set S of machine words, without storing S, which is crucial for applications. Our contribution consists of two new techniques, of independent interest, that we use in combination with an existing result of Dietzfelbinger and Pagh (ICALP 2008). First of all, we introduce a way to support more space efficient approximate membership queries (Bloom filter functionality) with arbitrary false positive rate. Second, we present a variation of Huffman coding using approximate membership, providing an alternative that improves the classical bounds of Gallager (IEEE Trans. Information Theory, 1978) in some cases. The end result is an entropycompressed function supporting constant time random access to values associated with a given set S. This improves both space and time compared to a recent result by Talbot and Talbot (ANALCO 2008). 1
Practical BatchUpdatable External Hashing with Sorting
"... This paper presents a practical external hashing scheme that supports fast lookup (7 microseconds) for large datasets (millions to billions of items) with a small memory footprint (2.5 bits/item) and fast index construction (151 K items/s for 1KiB keyvalue pairs). Our scheme combines three key tec ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper presents a practical external hashing scheme that supports fast lookup (7 microseconds) for large datasets (millions to billions of items) with a small memory footprint (2.5 bits/item) and fast index construction (151 K items/s for 1KiB keyvalue pairs). Our scheme combines three key techniques: (1) a new index data structure (EntropyCoded Tries); (2) the use of sorting as the main data manipulation method; and (3) support for incremental index construction for dynamic datasets. We evaluate our scheme by building an external dictionary on flashbased drives and demonstrate our scheme’s high performance, compactness, and practicality. 1
© 20YY ACM 00000000/20YY/00000002 $5.00Theory and Practice of Monotone Minimal Perfect Hashing
"... supported by the MIUR PRIN projects “Mathematical aspects and forthcoming applications of automata and formal languages ” and “Grafi del web e ranking”, and by a Yahoo! Faculty Grant. Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provi ..."
Abstract
 Add to MetaCart
supported by the MIUR PRIN projects “Mathematical aspects and forthcoming applications of automata and formal languages ” and “Grafi del web e ranking”, and by a Yahoo! Faculty Grant. Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
OrderPreserving Encryption Revisited: Improved Security Analysis and Alternative Solutions
"... We further the study of orderpreserving symmetric encryption (OPE), a primitive for allowing efficient range queries on encrypted data, recently initiated (from a cryptographic perspective) by Boldyreva et al. (Eurocrypt ’09). First, we address the open problem of characterizing what encryption via ..."
Abstract
 Add to MetaCart
We further the study of orderpreserving symmetric encryption (OPE), a primitive for allowing efficient range queries on encrypted data, recently initiated (from a cryptographic perspective) by Boldyreva et al. (Eurocrypt ’09). First, we address the open problem of characterizing what encryption via a random orderpreserving function (ROPF) leaks about underlying data (ROPF being the “ideal object ” in the security definition, POPF, satisfied by their scheme.) In particular, we show that, for a database of randomly distributed plaintexts and appropriate choice of parameters, ROPF encryption leaks neither the precise value of any plaintext nor the precise distance between any two of them. The analysis here introduces useful new techniques. On the other hand, we show that ROPF encryption leaks approximate value of any plaintext as well as approximate distance between any two plaintexts, each to an accuracy of about square root of the domain size. We then study schemes that are not orderpreserving, but which nevertheless allow efficient range queries and achieve security notions stronger than POPF. In a setting where the entire database is known in advance of keygeneration (considered in several prior works), we show that recent constructions of “monotone minimal perfect hash functions ” allow to efficiently achieve (an adaptation of) the notion