Results 1  10
of
11
An Optimal Algorithm for Generating Minimal Perfect Hash Functions
 Information Processing Letters
, 1992
"... A new algorithm for generating order preserving minimal perfect hash functions is presented. The algorithm is probabilistic, involving generation of random graphs. It uses expected linear time and requires a linear number words to represent the hash function, and thus is optimal up to constant facto ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
A new algorithm for generating order preserving minimal perfect hash functions is presented. The algorithm is probabilistic, involving generation of random graphs. It uses expected linear time and requires a linear number words to represent the hash function, and thus is optimal up to constant factors. It runs very fast in practice. Keywords: Data structures, probabilistic algorithms, analysis of algorithms, hashing, random graphs
Simple and spaceefficient minimal perfect hash functions
 In Proc. of the 10th Intl. Workshop on Data Structures and Algorithms
, 2007
"... Abstract. A perfect hash function (PHF) h: U → [0, m − 1] for a key set S is a function that maps the keys of S to unique values. The minimum amount of space to represent a PHF for a given set S is known to be approximately 1.44n 2 /m bits, where n = S. In this paper we present new algorithms for ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
Abstract. A perfect hash function (PHF) h: U → [0, m − 1] for a key set S is a function that maps the keys of S to unique values. The minimum amount of space to represent a PHF for a given set S is known to be approximately 1.44n 2 /m bits, where n = S. In this paper we present new algorithms for construction and evaluation of PHFs of a given set (for m = n and m = 1.23n), with the following properties: 1. Evaluation of a PHF requires constant time. 2. The algorithms are simple to describe and implement, and run in linear time. 3. The amount of space needed to represent the PHFs is around a factor 2 from the information theoretical minimum. No previously known algorithm has these properties. To our knowledge, any algorithm in the literature with the third property either: – Requires exponential time for construction and evaluation, or – Uses nearoptimal space only asymptotically, for extremely large n.
External perfect hashing for very large key sets
 In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07
, 2007
"... A perfect hash function (PHF) h: S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functio ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
A perfect hash function (PHF) h: S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets. In this paper we present a distributed and parallel version of a simple, highly scalable and nearspace optimal perfect hashing algorithm for very large key sets, recently presented in [4]. The sequential implementation of the algorithm constructs a MPHF for a set of 1.024 billion URLs of average length 64 bytes collected from the Web in approximately 50 minutes using a commodity PC. The parallel implementation proposed here presents the following performance using 14 commodity PCs: (i) it constructs a MPHF for the same set of 1.024 billion URLs in approximately 4 minutes; (ii) it constructs a MPHF for a set of 14.336 billion 16byte random integers in approximately 50 minutes with a performance degradation of 20%; (iii) one version of the parallel algorithm distributes the description of the MPHF among the participating machines and its evaluation is done in a distributed way, faster than the centralized function.
Perfect hashing for network applications
 in IEEE Symposium on Information Theory
, 2006
"... Abstract — Hash tables are a fundamental data structure in many network applications, including route lookups, packet classification and monitoring. Often a part of the data path, they need to operate at wirespeed. However, several associative memory accesses are needed to resolve collisions, makin ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Abstract — Hash tables are a fundamental data structure in many network applications, including route lookups, packet classification and monitoring. Often a part of the data path, they need to operate at wirespeed. However, several associative memory accesses are needed to resolve collisions, making them slower than required. This motivates us to consider minimal perfect hashing schemes, which reduce the number of memory accesses to just 1 and are also spaceefficient. Existing perfect hashing algorithms are not tailored for network applications because they take too long to construct and are hard to implement in hardware. This paper introduces a hardwarefriendly scheme for minimal perfect hashing, with space requirement approaching 3.7 times the information theoretic lower bound. Our construction is several orders faster than existing perfect hashing schemes. Instead of using the traditional mappingpartitioningsearching methodology, our scheme employs a Bloom filter, which is known for its simplicity and speed. We extend our scheme to the dynamic setting, thus handling insertions and deletions. I.
Hash and displace: Efficient evaluation of minimal perfect hash functions
 In Workshop on Algorithms and Data Structures
, 1999
"... A new way of constructing (minimal) perfect hash functions is described. The technique considerably reduces the overhead associated with resolving buckets in twolevel hashing schemes. Evaluating a hash function requires just one multiplication and a few additions apart from primitive bit operations ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
A new way of constructing (minimal) perfect hash functions is described. The technique considerably reduces the overhead associated with resolving buckets in twolevel hashing schemes. Evaluating a hash function requires just one multiplication and a few additions apart from primitive bit operations. The number of accesses to memory is two, one of which is to a fixed location. This improves the probe performance of previous minimal perfect hashing schemes, and is shown to be optimal. The hash function description (“program”) for a set of size n occupies O(n) words, and can be constructed in expected O(n) time. 1
Practical perfect hashing in nearly optimal space
 Information Systems
"... A hash function is a mapping from a key universe U to a range of integers, i.e., h: U↦→{0, 1,...,m−1}, where m is the range’s size. A perfect hash function for some set S ⊆ U is a hash function that is onetoone on S, where m≥S. A minimal perfect hash function for some set S ⊆ U is a perfect hash ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
A hash function is a mapping from a key universe U to a range of integers, i.e., h: U↦→{0, 1,...,m−1}, where m is the range’s size. A perfect hash function for some set S ⊆ U is a hash function that is onetoone on S, where m≥S. A minimal perfect hash function for some set S ⊆ U is a perfect hash function with a range of minimum size, i.e., m=S. This paper presents a construction for (minimal) perfect hash functions that combines theoretical analysis, practical performance, expected linear construction time and nearly optimal space consumption for the data structure. For n keys and m=n the space consumption ranges from 2.62n to 3.3n bits, and for m=1.23n it ranges from 1.95n to 2.7n bits. This is within a small constant factor from the theoretical lower bounds of 1.44n bits for m=n and 0.89n bits for m=1.23n. We combine several theoretical results into a practical solution that has turned perfect hashing into a very compact data structure to solve the membership problem when the key set S is static and known in advance. By taking into account the memory hierarchy we can construct (minimal) perfect hash functions for over a billion keys in 46 minutes using a commodity PC. An open source implementation of the algorithms is available
NearOptimal Space Perfect Hashing Algorithms
"... Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the ha ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. The study of perfect hash functions started in the early 80s, when it was proved that the theoretic information lower bound to describe a minimal perfect hash function was approximately 1.44 bits per key. Although the proof indicates that it would be possible to build an algorithm capable of generating optimal functions, no one was able to obtain a practical algorithm that could be used in real applications. Thus, there was a gap between theory and practice. The main result of the thesis filled this gap, lowering the space complexity to represent MPHFs that are useful in practice from O(n log n) to O(n) bits. This allows the use of perfect hashing in applications to which it was not considered a good option. This explicit construction of PHFs is something that the data structures and algorithms community has been looking for since the 1980s. 1.
Perfect hashing for data management applications
, 2007
"... Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory and practice among all previous methods on minimal perfect has ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory and practice among all previous methods on minimal perfect hashing. On one side, there are good theoretical results without experimentally proven practicality for large key sets. On the other side, there are the theoretically analyzed time and space usage algorithms that assume that truly random hash functions are available for free, which is an unrealistic assumption. In this paper we attempt to bridge this gap between theory and practice, using a number of techniques from the literature to obtain a novel scheme that is theoretically wellunderstood and at the same time achieves an orderofmagnitude increase in performance compared to previous “practical ” methods. This improvement comes from a combination of a novel, theoretically optimal perfect hashing scheme that greatly simplifies previous methods, and the fact that our algorithm is designed to make good use of the memory hierarchy. We demonstrate the scalability of our algorithm by considering a set of over one billion URLs from the World Wide Web of average length 64, for which we construct a minimal perfect hash function on a commodity PC in a little more than 1 hour. Our scheme produces minimal perfect hash functions using slightly more than 3 bits per key. For perfect hash functions in the range {0,..., 2n −1} the space usage drops to just over 2 bits per key (i.e., one bit more than optimal for representing the key). This is significantly below of what has been achieved previously for very large values of n. 1.
A New Algorithm for Constructing Minimal Perfect Hash Functions
"... 1 Introduction Let S be a set of n distinct keys belonging to a finiteuniverse U of keys. The keys in S are stored so thatmembership queries asking if key ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
1 Introduction Let S be a set of n distinct keys belonging to a finiteuniverse U of keys. The keys in S are stored so thatmembership queries asking if key
Provable Bounds for Portable and Flexible PrivacyPreserving Access Rights
, 2005
"... In this work we address the problem of portable and flexible privacypreserving access rights for large online data repositories. Privacypreserving access control means that the service provider can neither learn what access rights a customer has nor link a request to access an item to a particula ..."
Abstract
 Add to MetaCart
In this work we address the problem of portable and flexible privacypreserving access rights for large online data repositories. Privacypreserving access control means that the service provider can neither learn what access rights a customer has nor link a request to access an item to a particular customer, thus maintaining privacy of both customer activity and customer access rights. Flexible access rights allow any customer to choose any subset of items from the repository and correspondingly be charged only for the items selected. And portability of access rights means that the rights themselves can be stored on small devices of limited storage space and computational capabilities, and therefore the rights must be enforced using the limited resources available. Our main results are solutions to the problem that utilize minimal perfect hash functions and orderpreserving minimal perfect hash functions. None of them use expensive cryptography, all require very little space, and they are therefore suitable for computationally weak and spacelimited devices such as smartcards, sensors, etc. Performance of the schemes is measured as the probability of false positives (i.e., the probability that access to an unpurchased item will be permitted) for a given storage space bound. Using our techniques, for a data repository of size n and subscription order of m # n items, we achieve a probability of false positives of m using only O(cm) bits of storage space, where c is an adjustable parameter (a constant or otherwise) that can be set to provide the desired performance. This is the first time that such provable bounds are established for this problem, and we believe the techniques we use are of more general interest through the unusual use we make of perfect hashing.