Results 1  10
of
34
SILT: A MemoryEfficient, HighPerformance KeyValue Store
 In Proc. 23rd ACM SOSP, Cascias
, 2011
"... SILT (Small Index Large Table) is a memoryefficient, highperformance keyvalue store system based on flash storage that scales to serve billions of keyvalue items on a single node. It requires only 0.7 bytes of DRAM per entry and retrieves key/value pairs using on average 1.01 flash reads each. S ..."
Abstract

Cited by 53 (14 self)
 Add to MetaCart
(Show Context)
SILT (Small Index Large Table) is a memoryefficient, highperformance keyvalue store system based on flash storage that scales to serve billions of keyvalue items on a single node. It requires only 0.7 bytes of DRAM per entry and retrieves key/value pairs using on average 1.01 flash reads each. SILT combines new algorithmic and systems techniques to balance the use of memory, storage, and computation. Our contributions include: (1) the design of three basic keyvalue stores each with a different emphasis on memoryefficiency and writefriendliness; (2) synthesis of the basic keyvalue stores to build a SILT keyvalue store system; and (3) an analytical model for tuning system parameters carefully to meet the needs of different workloads. SILT requires one to two orders of magnitude less memory to provide comparable throughput to current highperformance keyvalue systems on a commodity desktop system with flash storage.
OrderPreserving Encryption Revisited: Improved Security Analysis and Alternative Solutions
, 2011
"... We further the study of orderpreserving symmetric encryption (OPE), a primitive for allowing efficient range queries on encrypted data, recently initiated (from a cryptographic perspective) by Boldyreva et al. (Eurocrypt ’09). First, we address the open problem of characterizing what encryption via ..."
Abstract

Cited by 46 (1 self)
 Add to MetaCart
We further the study of orderpreserving symmetric encryption (OPE), a primitive for allowing efficient range queries on encrypted data, recently initiated (from a cryptographic perspective) by Boldyreva et al. (Eurocrypt ’09). First, we address the open problem of characterizing what encryption via a random orderpreserving function (ROPF) leaks about underlying data (ROPF being the “ideal object ” in the security definition, POPF, satisfied by their scheme.) In particular, we show that, for a database of randomly distributed plaintexts and appropriate choice of parameters, ROPF encryption leaks neither the precise value of any plaintext nor the precise distance between any two of them. The analysis here introduces useful new techniques. On the other hand, we show that ROPF encryption leaks approximate value of any plaintext as well as approximate distance between any two plaintexts, each to an accuracy of about square root of the domain size. We then study schemes that are not orderpreserving, but which nevertheless allow efficient range queries and achieve security notions stronger than POPF. In a setting where the entire database is known in advance of keygeneration (considered in several prior works), we show that recent constructions of “monotone minimal perfect hash functions” allow to efficiently achieve (an adaptation of) the notion
Alphabetindependent compressed text indexing
 In ESA
, 2011
"... Abstract. Selfindexes can represent a text in asymptotically optimal space under the kth order entropy model, give access to text substrings, and support indexed pattern searches. Their time complexities are not optimal, however: they always depend on the alphabet size. In this paper we achieve, f ..."
Abstract

Cited by 25 (17 self)
 Add to MetaCart
(Show Context)
Abstract. Selfindexes can represent a text in asymptotically optimal space under the kth order entropy model, give access to text substrings, and support indexed pattern searches. Their time complexities are not optimal, however: they always depend on the alphabet size. In this paper we achieve, for the first time, full alphabetindependence in the time complexities of selfindexes, while retaining space optimality. We obtain also some relevant byproducts on compressed suffix trees. 1
New lower and upper bounds for representing sequences
 CoRR
"... Abstract. Sequence representations supporting queries access, select and rank are at the core of many data structures. There is a considerable gap between different upper bounds, and the few lower bounds, known for such representations, and how they interact with the space used. In this article we p ..."
Abstract

Cited by 22 (14 self)
 Add to MetaCart
(Show Context)
Abstract. Sequence representations supporting queries access, select and rank are at the core of many data structures. There is a considerable gap between different upper bounds, and the few lower bounds, known for such representations, and how they interact with the space used. In this article we prove a strong lower bound for rank, which holds for rather permissive assumptions on the space used, and give matching upper bounds that require only a compressed representation of the sequence. Within this compressed space, operations access and select can be solved within almostconstant time. 1
Theory and Practise of Monotone Minimal Perfect Hashing
"... Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, orderpreserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given orde ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
(Show Context)
Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, orderpreserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable �(n log n) lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practise, and provide a balance between access speed, ease of construction, and space usage. 1
Improved compressed indexes for fulltext document retrieval
 IN PROC. 18TH SPIRE
, 2011
"... We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at lg D lg lg D least CSA  + O(n) or 2CSA  + o(n) bits of space, where CSA is a fulltext index. Usin ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
(Show Context)
We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at lg D lg lg D least CSA  + O(n) or 2CSA  + o(n) bits of space, where CSA is a fulltext index. Using monotone minimum perfect hash functions, we give new algorithms for document listing with frequencies and topk document retrieval using just CSA  + O(n lg lg lg D) bits. We also improve current solutions that use 2CSA  + o(n) bits, and consider other problems such as colored range listing, topk most important documents, and computing arbitrary frequencies.
Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences
 CoRR
"... Document retrieval is one of the best established information retrieval activities since the sixties, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to “natural language” text coll ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
(Show Context)
Document retrieval is one of the best established information retrieval activities since the sixties, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to “natural language” text collections, where inverted indexes are the preferred solution. As successful as this paradigm has been, it fails to properly handle various East Asian languages and other scenarios where the “natural language ” assumptions do not hold. In this survey we cover the recent research in extending the document retrieval techniques to a broader class of sequence collections, which has applications in bioinformatics, data and Web mining, chemoinformatics, software engineering, multimedia information retrieval, and many other fields. We focus on the algorithmic aspects of the techniques, uncovering a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas.
Fast compressed tries through path decompositions
 CORR
, 2014
"... Tries are popular data structures for storing a set of strings, where common prefixes are represented by common roottonode paths. More than 50 years of usage have produced many variants and implementations to overcome some of their limitations. We explore new succinct representations of pathdecom ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Tries are popular data structures for storing a set of strings, where common prefixes are represented by common roottonode paths. More than 50 years of usage have produced many variants and implementations to overcome some of their limitations. We explore new succinct representations of pathdecomposed tries and experimentally evaluate the corresponding reduction in space usage and memory latency, comparing with the state of the art. We study the following applications: compressed string dictionary andmonotone minimal perfect hash for strings. In compressed string dictionary, we obtain data structures that outperform other stateoftheart compressed dictionaries in space efficiency while obtaining predictable query times that are competitive with data structures preferred by the practitioners. On realworld datasets, our compressed tries obtain the smallest space (except for one case) and have the fastest lookup times, whereas access times are within 20 % slower than the bestknown solutions. In monotone minimal perfect hash for strings, our compressed tries perform several times faster than other triebased monotone perfect hash functions while occupying nearly the same space. On realworld datasets, our tries are approximately 2 to 5 times faster than previous solutions, with a space occupancy less than 10 % larger.
Fast Prefix Search in Little Space, with Applications
"... Abstract. It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution. Traditionally, prefix search is solved by data stru ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract. It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution. Traditionally, prefix search is solved by data structures that are also dictionaries—they actually contain the strings in S. For very large collections stored in slowaccess memory, we propose much more compact data structures that support weak prefix searches—they return the ranks of matching strings provided that some string in S starts with the given prefix. In fact, we show that our most spaceefficient data structure is asymptotically spaceoptimal. Previously, data structures such as String Btrees (and more complicated cacheoblivious string data structures) have implicitly supported weak prefix queries, but they all have query time that grows logarithmically with the size of the string collection. In contrast, our data structures are simple, naturally cacheefficient, and have query time that depends only on the length of the prefix, all the way down to constant query time for strings that fit in one machine word. We give several applications of weak prefix searches, including exact prefix counting and approximate counting of tuples matching conjunctive prefix conditions. 1