Results 1 -
8 of
8
An Optimal Algorithm for Generating Minimal Perfect Hash Functions
- Information Processing Letters
, 1992
"... A new algorithm for generating order preserving minimal perfect hash functions is presented. The algorithm is probabilistic, involving generation of random graphs. It uses expected linear time and requires a linear number words to represent the hash function, and thus is optimal up to constant facto ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
A new algorithm for generating order preserving minimal perfect hash functions is presented. The algorithm is probabilistic, involving generation of random graphs. It uses expected linear time and requires a linear number words to represent the hash function, and thus is optimal up to constant factors. It runs very fast in practice. Keywords: Data structures, probabilistic algorithms, analysis of algorithms, hashing, random graphs
Simple and space-efficient minimal perfect hash functions
- In Proc. of the 10th Intl. Workshop on Data Structures and Algorithms
, 2007
"... Abstract. A perfect hash function (PHF) h: U → [0, m − 1] for a key set S is a function that maps the keys of S to unique values. The minimum amount of space to represent a PHF for a given set S is known to be approximately 1.44n 2 /m bits, where n = |S|. In this paper we present new algorithms for ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
Abstract. A perfect hash function (PHF) h: U → [0, m − 1] for a key set S is a function that maps the keys of S to unique values. The minimum amount of space to represent a PHF for a given set S is known to be approximately 1.44n 2 /m bits, where n = |S|. In this paper we present new algorithms for construction and evaluation of PHFs of a given set (for m = n and m = 1.23n), with the following properties: 1. Evaluation of a PHF requires constant time. 2. The algorithms are simple to describe and implement, and run in linear time. 3. The amount of space needed to represent the PHFs is around a factor 2 from the information theoretical minimum. No previously known algorithm has these properties. To our knowledge, any algorithm in the literature with the third property either: – Requires exponential time for construction and evaluation, or – Uses near-optimal space only asymptotically, for extremely large n.
External perfect hashing for very large key sets
- In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07
, 2007
"... A perfect hash function (PHF) h: S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functio ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
A perfect hash function (PHF) h: S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets. In this paper we present a distributed and parallel version of a simple, highly scalable and near-space optimal perfect hashing algorithm for very large key sets, recently presented in [4]. The sequential implementation of the algorithm constructs a MPHF for a set of 1.024 billion URLs of average length 64 bytes collected from the Web in approximately 50 minutes using a commodity PC. The parallel implementation proposed here presents the following performance using 14 commodity PCs: (i) it constructs a MPHF for the same set of 1.024 billion URLs in approximately 4 minutes; (ii) it constructs a MPHF for a set of 14.336 billion 16-byte random integers in approximately 50 minutes with a performance degradation of 20%; (iii) one version of the parallel algorithm distributes the description of the MPHF among the participating machines and its evaluation is done in a distributed way, faster than the centralized function.
Perfect hashing for network applications
- in IEEE Symposium on Information Theory
, 2006
"... Abstract — Hash tables are a fundamental data structure in many network applications, including route lookups, packet classification and monitoring. Often a part of the data path, they need to operate at wire-speed. However, several associative memory accesses are needed to resolve collisions, makin ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract — Hash tables are a fundamental data structure in many network applications, including route lookups, packet classification and monitoring. Often a part of the data path, they need to operate at wire-speed. However, several associative memory accesses are needed to resolve collisions, making them slower than required. This motivates us to consider minimal perfect hashing schemes, which reduce the number of memory accesses to just 1 and are also space-efficient. Existing perfect hashing algorithms are not tailored for network applications because they take too long to construct and are hard to implement in hardware. This paper introduces a hardware-friendly scheme for minimal perfect hashing, with space requirement approaching 3.7 times the information theoretic lower bound. Our construction is several orders faster than existing perfect hashing schemes. Instead of using the traditional mapping-partitioning-searching methodology, our scheme employs a Bloom filter, which is known for its simplicity and speed. We extend our scheme to the dynamic setting, thus handling insertions and deletions. I.
Hash and displace: Efficient evaluation of minimal perfect hash functions
- In Workshop on Algorithms and Data Structures
, 1999
"... A new way of constructing (minimal) perfect hash functions is described. The technique considerably reduces the overhead associated with resolving buckets in two-level hashing schemes. Evaluating a hash function requires just one multiplication and a few additions apart from primitive bit operations ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
A new way of constructing (minimal) perfect hash functions is described. The technique considerably reduces the overhead associated with resolving buckets in two-level hashing schemes. Evaluating a hash function requires just one multiplication and a few additions apart from primitive bit operations. The number of accesses to memory is two, one of which is to a fixed location. This improves the probe performance of previous minimal perfect hashing schemes, and is shown to be optimal. The hash function description (“program”) for a set of size n occupies O(n) words, and can be constructed in expected O(n) time. 1
Perfect hashing for data management applications
, 2007
"... Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory and practice among all previous methods on minimal perfect has ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory and practice among all previous methods on minimal perfect hashing. On one side, there are good theoretical results without experimentally proven practicality for large key sets. On the other side, there are the theoretically analyzed time and space usage algorithms that assume that truly random hash functions are available for free, which is an unrealistic assumption. In this paper we attempt to bridge this gap between theory and practice, using a number of techniques from the literature to obtain a novel scheme that is theoretically well-understood and at the same time achieves an order-of-magnitude increase in performance compared to previous “practical ” methods. This improvement comes from a combination of a novel, theoretically optimal perfect hashing scheme that greatly simplifies previous methods, and the fact that our algorithm is designed to make good use of the memory hierarchy. We demonstrate the scalability of our algorithm by considering a set of over one billion URLs from the World Wide Web of average length 64, for which we construct a minimal perfect hash function on a commodity PC in a little more than 1 hour. Our scheme produces minimal perfect hash functions using slightly more than 3 bits per key. For perfect hash functions in the range {0,..., 2n −1} the space usage drops to just over 2 bits per key (i.e., one bit more than optimal for representing the key). This is significantly below of what has been achieved previously for very large values of n. 1.
Provable Bounds for Portable and Flexible Privacy-Preserving Access Rights
, 2005
"... In this work we address the problem of portable and flexible privacy-preserving access rights for large online data repositories. Privacy-preserving access control means that the service provider can neither learn what access rights a customer has nor link a request to access an item to a particula ..."
Abstract
- Add to MetaCart
In this work we address the problem of portable and flexible privacy-preserving access rights for large online data repositories. Privacy-preserving access control means that the service provider can neither learn what access rights a customer has nor link a request to access an item to a particular customer, thus maintaining privacy of both customer activity and customer access rights. Flexible access rights allow any customer to choose any subset of items from the repository and correspondingly be charged only for the items selected. And portability of access rights means that the rights themselves can be stored on small devices of limited storage space and computational capabilities, and therefore the rights must be enforced using the limited resources available. Our main results are solutions to the problem that utilize minimal perfect hash functions and order-preserving minimal perfect hash functions. None of them use expensive cryptography, all require very little space, and they are therefore suitable for computationally weak and spacelimited devices such as smartcards, sensors, etc. Performance of the schemes is measured as the probability of false positives (i.e., the probability that access to an unpurchased item will be permitted) for a given storage space bound. Using our techniques, for a data repository of size n and subscription order of m # n items, we achieve a probability of false positives of m using only O(cm) bits of storage space, where c is an adjustable parameter (a constant or otherwise) that can be set to provide the desired performance. This is the first time that such provable bounds are established for this problem, and we believe the techniques we use are of more general interest through the unusual use we make of perfect hashing.
Near-Optimal Space Perfect Hashing Algorithms
"... Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the ha ..."
Abstract
- Add to MetaCart
Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. The study of perfect hash functions started in the early 80s, when it was proved that the theoretic information lower bound to describe a minimal perfect hash function was approximately 1.44 bits per key. Although the proof indicates that it would be possible to build an algorithm capable of generating optimal functions, no one was able to obtain a practical algorithm that could be used in real applications. Thus, there was a gap between theory and practice. The main result of the thesis filled this gap, lowering the space complexity to represent MPHFs that are useful in practice from O(n log n) to O(n) bits. This allows the use of perfect hashing in applications to which it was not considered a good option. This explicit construction of PHFs is something that the data structures and algorithms community has been looking for since the 1980s. 1.

