Results 1 -
9 of
9
Theory and Practise of Monotone Minimal Perfect Hashing
"... Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given orde ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable �(n log n) lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practise, and provide a balance between access speed, ease of construction, and space usage. 1
Privacy-preserving Queries over Relational Databases
"... Abstract—We explore how Private Information Retrieval (PIR) can help users keep their sensitive information from being leaked in an SQL query. We show how to retrieve data from a relational database with PIR by hiding sensitive constants contained in the predicates of a query. Experimental results a ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Abstract—We explore how Private Information Retrieval (PIR) can help users keep their sensitive information from being leaked in an SQL query. We show how to retrieve data from a relational database with PIR by hiding sensitive constants contained in the predicates of a query. Experimental results and microbenchmarking tests show our approach incurs reasonable storage overhead for the added privacy benefit and performs between 3 and 343 times faster than previous work. I.
Parallel State Space Search on the GPU
"... This paper exploits parallel computing power of graphics cards for the enhanced enumeration of state spaces. We illustrate that modern graphics processing units (GPUs) have the potential to speed up state space breadth first search significantly. For a bitvector representation of the search frontier ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper exploits parallel computing power of graphics cards for the enhanced enumeration of state spaces. We illustrate that modern graphics processing units (GPUs) have the potential to speed up state space breadth first search significantly. For a bitvector representation of the search frontier, GPU algorithms with one and two bits per state are presented. Efficient perfect hash functions and their inverse are studied for enhanced compression. We establish maximal speed-ups of up to factor 30 and more wrt. single core computation.
Near-Optimal Space Perfect Hashing Algorithms
"... Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the ha ..."
Abstract
- Add to MetaCart
Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. The study of perfect hash functions started in the early 80s, when it was proved that the theoretic information lower bound to describe a minimal perfect hash function was approximately 1.44 bits per key. Although the proof indicates that it would be possible to build an algorithm capable of generating optimal functions, no one was able to obtain a practical algorithm that could be used in real applications. Thus, there was a gap between theory and practice. The main result of the thesis filled this gap, lowering the space complexity to represent MPHFs that are useful in practice from O(n log n) to O(n) bits. This allows the use of perfect hashing in applications to which it was not considered a good option. This explicit construction of PHFs is something that the data structures and algorithms community has been looking for since the 1980s. 1.
© 20YY ACM 0000-0000/20YY/0000-0002 $5.00Theory and Practice of Monotone Minimal Perfect Hashing
"... supported by the MIUR PRIN projects “Mathematical aspects and forthcoming applications of automata and formal languages ” and “Grafi del web e ranking”, and by a Yahoo! Faculty Grant. Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provi ..."
Abstract
- Add to MetaCart
supported by the MIUR PRIN projects “Mathematical aspects and forthcoming applications of automata and formal languages ” and “Grafi del web e ranking”, and by a Yahoo! Faculty Grant. Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
Theory and Practice of Monotone Minimal Perfect Hashing DJAMAL BELAZZOUGUI
"... Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given orde ..."
Abstract
- Add to MetaCart
Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable.n log n / lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practice, and provide a balance between access speed, ease of construction, and space usage. 1
Coherent Parallel Hashing
, 2011
"... (a) The flower image is 3820 × 3820 image (14.5 million pixels) and contains 3.7 million non–white pixels. The coordinates of these pixels are shown as colors in (b). We store the image in a hash table under a 0.99 load factor: the hash table contains only 3.73 million entries. These are used as key ..."
Abstract
- Add to MetaCart
(a) The flower image is 3820 × 3820 image (14.5 million pixels) and contains 3.7 million non–white pixels. The coordinates of these pixels are shown as colors in (b). We store the image in a hash table under a 0.99 load factor: the hash table contains only 3.73 million entries. These are used as keys for hashing. (c) The table obtained with a typical randomizing hash function: Keys are randomly spread and all coherence is lost. (d) Our spatially coherent hash table, built in parallel on the GPU. The table is built in 15 ms on a GeForce GTX 480, and the image is reconstructed from the hash in 3.5 ms. The visible structures are due to preserved coherence. This translates to faster access as neighboring threads perform similar operations and access nearby memory. (e) Neighboring keys are kept together during probing, thereby improving the coherence of memory accesses of neighboring threads. Recent spatial hashing schemes hash millions of keys in parallel, compacting sparse spatial data in small hash tables while still allowing for fast access from the GPU. Unfortunately, available schemes suffer from two drawbacks: Multiple runs of the construction process are often required before success, and the random nature of the hash functions decreases access performance. We introduce a new parallel hashing scheme which reaches high
Practical Batch-Updatable External Hashing with Sorting
"... This paper presents a practical external hashing scheme that supports fast lookup (7 microseconds) for large datasets (millions to billions of items) with a small memory footprint (2.5 bits/item) and fast index construction (151 K items/s for 1-KiB key-value pairs). Our scheme combines three key tec ..."
Abstract
- Add to MetaCart
This paper presents a practical external hashing scheme that supports fast lookup (7 microseconds) for large datasets (millions to billions of items) with a small memory footprint (2.5 bits/item) and fast index construction (151 K items/s for 1-KiB key-value pairs). Our scheme combines three key techniques: (1) a new index data structure (Entropy-Coded Tries); (2) the use of sorting as the main data manipulation method; and (3) support for incremental index construction for dynamic datasets. We evaluate our scheme by building an external dictionary on flash-based drives and demonstrate our scheme’s high performance, compactness, and practicality. 1
Noname manuscript No. (will be inserted by the editor) Document Vector Representations for Feature Extraction in Multi-Stage Document Ranking
"... Abstract We consider a multi-stage retrieval architecture consisting of a fast, “cheap ” candidate generation stage, a feature extraction stage, and a more “expensive ” reranking stage using machine-learned models. In this context, feature extraction can be accomplished using a document vector index ..."
Abstract
- Add to MetaCart
Abstract We consider a multi-stage retrieval architecture consisting of a fast, “cheap ” candidate generation stage, a feature extraction stage, and a more “expensive ” reranking stage using machine-learned models. In this context, feature extraction can be accomplished using a document vector index, a mapping from document ids to document representations. We consider alternative organizations of such a data structure for efficient feature extraction: design choices include how document terms are organized, how complex term proximity features are computed, and how these structures are compressed. In particular, we propose a novel document-adaptive hashing scheme for compactly encoding term ids. The impact of alternative designs on both feature extraction speed and memory footprint is experimentally evaluated. Overall, results show that our architecture is comparable in speed to using a traditional positional inverted index but requires less memory overall, and offers additional advantages in terms of flexibility. Keywords learning to rank, compression, document stores 1

