Results 1 
8 of
8
Simple and spaceefficient minimal perfect hash functions
 In Proc. of the 10th Intl. Workshop on Data Structures and Algorithms
, 2007
"... Abstract. A perfect hash function (PHF) h: U → [0, m − 1] for a key set S is a function that maps the keys of S to unique values. The minimum amount of space to represent a PHF for a given set S is known to be approximately 1.44n 2 /m bits, where n = S. In this paper we present new algorithms for ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
Abstract. A perfect hash function (PHF) h: U → [0, m − 1] for a key set S is a function that maps the keys of S to unique values. The minimum amount of space to represent a PHF for a given set S is known to be approximately 1.44n 2 /m bits, where n = S. In this paper we present new algorithms for construction and evaluation of PHFs of a given set (for m = n and m = 1.23n), with the following properties: 1. Evaluation of a PHF requires constant time. 2. The algorithms are simple to describe and implement, and run in linear time. 3. The amount of space needed to represent the PHFs is around a factor 2 from the information theoretical minimum. No previously known algorithm has these properties. To our knowledge, any algorithm in the literature with the third property either: – Requires exponential time for construction and evaluation, or – Uses nearoptimal space only asymptotically, for extremely large n.
External perfect hashing for very large key sets
 In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07
, 2007
"... A perfect hash function (PHF) h: S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functio ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
A perfect hash function (PHF) h: S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets. In this paper we present a distributed and parallel version of a simple, highly scalable and nearspace optimal perfect hashing algorithm for very large key sets, recently presented in [4]. The sequential implementation of the algorithm constructs a MPHF for a set of 1.024 billion URLs of average length 64 bytes collected from the Web in approximately 50 minutes using a commodity PC. The parallel implementation proposed here presents the following performance using 14 commodity PCs: (i) it constructs a MPHF for the same set of 1.024 billion URLs in approximately 4 minutes; (ii) it constructs a MPHF for a set of 14.336 billion 16byte random integers in approximately 50 minutes with a performance degradation of 20%; (iii) one version of the parallel algorithm distributes the description of the MPHF among the participating machines and its evaluation is done in a distributed way, faster than the centralized function.
Practical perfect hashing in nearly optimal space
 Information Systems
"... A hash function is a mapping from a key universe U to a range of integers, i.e., h: U↦→{0, 1,...,m−1}, where m is the range’s size. A perfect hash function for some set S ⊆ U is a hash function that is onetoone on S, where m≥S. A minimal perfect hash function for some set S ⊆ U is a perfect hash ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
A hash function is a mapping from a key universe U to a range of integers, i.e., h: U↦→{0, 1,...,m−1}, where m is the range’s size. A perfect hash function for some set S ⊆ U is a hash function that is onetoone on S, where m≥S. A minimal perfect hash function for some set S ⊆ U is a perfect hash function with a range of minimum size, i.e., m=S. This paper presents a construction for (minimal) perfect hash functions that combines theoretical analysis, practical performance, expected linear construction time and nearly optimal space consumption for the data structure. For n keys and m=n the space consumption ranges from 2.62n to 3.3n bits, and for m=1.23n it ranges from 1.95n to 2.7n bits. This is within a small constant factor from the theoretical lower bounds of 1.44n bits for m=n and 0.89n bits for m=1.23n. We combine several theoretical results into a practical solution that has turned perfect hashing into a very compact data structure to solve the membership problem when the key set S is static and known in advance. By taking into account the memory hierarchy we can construct (minimal) perfect hash functions for over a billion keys in 46 minutes using a commodity PC. An open source implementation of the algorithms is available
NearOptimal Space Perfect Hashing Algorithms
"... Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the ha ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. The study of perfect hash functions started in the early 80s, when it was proved that the theoretic information lower bound to describe a minimal perfect hash function was approximately 1.44 bits per key. Although the proof indicates that it would be possible to build an algorithm capable of generating optimal functions, no one was able to obtain a practical algorithm that could be used in real applications. Thus, there was a gap between theory and practice. The main result of the thesis filled this gap, lowering the space complexity to represent MPHFs that are useful in practice from O(n log n) to O(n) bits. This allows the use of perfect hashing in applications to which it was not considered a good option. This explicit construction of PHFs is something that the data structures and algorithms community has been looking for since the 1980s. 1.
EFFICIENT HASH FUNCTION FOR DUPLICATE ELIMINATION IN DICTIONARIES
"... Abstract. Fast elimination of duplicate data is needed in many areas, especially in the textual data context. A solution to this problem was recently found for geometrical data using a hash function to speed up the process. The usage of the hash function is extremely efficient when incremental elimi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Fast elimination of duplicate data is needed in many areas, especially in the textual data context. A solution to this problem was recently found for geometrical data using a hash function to speed up the process. The usage of the hash function is extremely efficient when incremental elimination is required especially for processing large data sets. In this paper a new construction of the hash function is presented, giving short clusters with few collisions only. The proposed hash function is not a perfect hash function, nevertheless it gives similar properties to it. The hash function used takes advantage of the relatively large amount of available memory on modern computers, and works well with large data sets. Experiments have proved that different approaches should be used for different types of languages, because the structures of Slavonic and AngloSaxon languages are different. Therefore, tests were made with a Czech dictionary having 2.5 million words and an English dictionary having 130 thousands words. Algorithm was also tested for a few other languages. Experimental results are presented in this paper as well. Key words. Hash function, hash table, duplicate elimination, data structure, dictionary
Blooming Trees for Minimal Perfect Hashing
"... Abstract—Hash tables are used in many networking applications, such as lookup and packet classification. But the issue of collisions resolution makes their use slow and not suitable for fast operations. Therefore, perfect hash functions have been introduced to make the hashing mechanism more efficie ..."
Abstract
 Add to MetaCart
Abstract—Hash tables are used in many networking applications, such as lookup and packet classification. But the issue of collisions resolution makes their use slow and not suitable for fast operations. Therefore, perfect hash functions have been introduced to make the hashing mechanism more efficient. In particular, a minimal perfect hash function is a function that maps a set of n keys into a set of n integer numbers without collisions. In literature, there are many schemes to construct a minimal perfect hash function, either based on mathematical properties of polynomials or on graph theory. This paper proposes a new scheme which shows remarkable results in terms of space consumption and processing speed. It is based on an alternative to Bloom Filters and requires about 4 bits per key and 12.8 seconds to construct a MPHF with 3.8 × 10 9 elements. I.
Data Structure for Dynamic Patterns
"... Abstract—String matching and dynamic dictionary matching are significant principles in computer science. These principles require an efficient data structure for accommodating the pattern or patterns to be searched for in a large given text. Moreover, in the dynamic dictionary matching, the structur ..."
Abstract
 Add to MetaCart
Abstract—String matching and dynamic dictionary matching are significant principles in computer science. These principles require an efficient data structure for accommodating the pattern or patterns to be searched for in a large given text. Moreover, in the dynamic dictionary matching, the structure is able to insert or delete the individual patterns over time. This research article introduces a new dynamic data structure named inverted lists for both principles. The inverted lists data structure, which is derived from the inverted index, is implemented by the perfect hashing idea. This structure focuses on the position of characters and provides a hashing table to store the string patterns. The new data structure is more time efficient than traditional structures. Also, this structure is faster to construct and consumes less memory than others.
Modified Suffix Search Algorithm for Multiple String Matching
"... String Matching is now a prominent field in the area of Computer Science and it has many applications in the real world. A new algorithm for Suffix Search which uses chained hashing is proposed and this works well in matched case and mismatched case. A separate hash function is introduced in this pa ..."
Abstract
 Add to MetaCart
String Matching is now a prominent field in the area of Computer Science and it has many applications in the real world. A new algorithm for Suffix Search which uses chained hashing is proposed and this works well in matched case and mismatched case. A separate hash function is introduced in this paper. Hash functions can be declared in many ways. In this, radix hashing is used and the need of the shift table used in these algorithms can be avoided. Every pattern matching algorithm consists of mainly two phases. They are the preprocessing phase and the matching phase. Each of these phases has its own time complexity as well as space complexity. The proposed method has very low time complexity in average case.