Results 11  20
of
30
Application of Minimal Perfect Hashing in Main Memory Indexing
, 1994
"... With the rapid decrease in the cost of random access memory (RAM), it will soon become economically feasible to place fulltext indexes of a library in main memory. One essential component of the indexing system is a hashing algorithm, which maps a keyword into the memory address of the index inform ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
With the rapid decrease in the cost of random access memory (RAM), it will soon become economically feasible to place fulltext indexes of a library in main memory. One essential component of the indexing system is a hashing algorithm, which maps a keyword into the memory address of the index information corresponding to that keyword. This thesis studies the application of the minimal perfect hashing algorithm in main memory indexing. This algorithm is integrated into the index search engine of the Library 2000 system, a digital online library system. The performance of this algorithm is compared with that of the openaddressing hashing scheme. We find that although the minimal perfect hashing algorithm needs fewer keyword comparisons per keyword search on average, its hashing performance is slower than the openaddressing scheme.
Practical perfect hashing in nearly optimal space
 Information Systems
"... A hash function is a mapping from a key universe U to a range of integers, i.e., h: U↦→{0, 1,...,m−1}, where m is the range’s size. A perfect hash function for some set S ⊆ U is a hash function that is onetoone on S, where m≥S. A minimal perfect hash function for some set S ⊆ U is a perfect hash ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
A hash function is a mapping from a key universe U to a range of integers, i.e., h: U↦→{0, 1,...,m−1}, where m is the range’s size. A perfect hash function for some set S ⊆ U is a hash function that is onetoone on S, where m≥S. A minimal perfect hash function for some set S ⊆ U is a perfect hash function with a range of minimum size, i.e., m=S. This paper presents a construction for (minimal) perfect hash functions that combines theoretical analysis, practical performance, expected linear construction time and nearly optimal space consumption for the data structure. For n keys and m=n the space consumption ranges from 2.62n to 3.3n bits, and for m=1.23n it ranges from 1.95n to 2.7n bits. This is within a small constant factor from the theoretical lower bounds of 1.44n bits for m=n and 0.89n bits for m=1.23n. We combine several theoretical results into a practical solution that has turned perfect hashing into a very compact data structure to solve the membership problem when the key set S is static and known in advance. By taking into account the memory hierarchy we can construct (minimal) perfect hash functions for over a billion keys in 46 minutes using a commodity PC. An open source implementation of the algorithms is available
Signatures for Library Functions in Executable Files
, 1993
"... A method for efficiently generating signatures for detecting library functions in executable files is described. The signatures are used to automatically detect such functions in dcc, the reverse compiler at the Queensland University of Technology. Difficulties arise from the variability of the sign ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A method for efficiently generating signatures for detecting library functions in executable files is described. The signatures are used to automatically detect such functions in dcc, the reverse compiler at the Queensland University of Technology. Difficulties arise from the variability of the signatures, the multiplicity of library code vendors, and of memory models, and indistinguishable functions. An efficient hashing technique involving perfect optimal hashing functions is used. Performance is good  the signature files are created in a few seconds, and the name of a library function can be found in about the time of two standard hashes. One signature file is required for each vendor, version, and memory model combination, and they are generated from the appropriate library file (e.g. slibce.lib). Some issues are yet to be addressed, such as variation due to floating point math options (e.g. emulator, fast alternate, or coprocessor calls). 1 Application Signatures are required w...
Trie Methods for Structured Data on Secondary Storage
, 2000
"... We apply the trie structures to indexing, storing and querying structured data on secondary storage. We are interested in the storage compactness, the I/O efficiency, the orderpreserving properties, the general orthogonal range queries and the exact match queries for very large files and databases. ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We apply the trie structures to indexing, storing and querying structured data on secondary storage. We are interested in the storage compactness, the I/O efficiency, the orderpreserving properties, the general orthogonal range queries and the exact match queries for very large files and databases. We also apply the trie structures to relational joins (set operations). We compare trie structures to various data structures on secondary storage: multipaging and grid files in the direct access method category, Rtrees/R*trees and Xtrees in the logarithmic access cost category, as well as some representative join algorithms for performing join operations. Our results show that range queries by trie method are superior to these competitors in search cost when queries return more than a few records and are competitive to direct access methods for exact match queries. Furthermore, as the trie structure compresses data, it is the winner in terms of storage compared to all other methods mentioned above. We also present a new tidy function for orderpreserving keytoaddress transformation. Our tidy function is easy to construct and cheaper in access time and storage cost compared to its closest competitor.
Indexing Internal Memory with Minimal Perfect Hash Functions
"... Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values, which are in turn used to index a hash table. Since no collisions occur, each key can be retrieved from the table with a single probe. A minimal perfect hash function (MPHF) is a PHF with t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values, which are in turn used to index a hash table. Since no collisions occur, each key can be retrieved from the table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. MPHFs are widely used for memory efficient storage and fast retrieval of items from static sets. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. In the past, the amount of space to store an MPHF description was O(log n) bits per key and therefore similar to the overhead of space of other hashing schemes. Recent results on MPHFs by [Botelho et al. 2007] changed this scenario: in their work the space overhead of an MPHF is approximately 2.6 bits per key. The objective of this paper is to show that MPHFs are a good option to index internal memory when static key sets are involved and both successful and unsuccessful searches are allowed. We have shown that MPHFs provide the best tradeoff between space usage and lookup time when compared with linear hashing, quadratic hashing, double hashing, dense hashing, cuckoo hashing and sparse hashing. For example, MPHFs outperforms linear hashing, quadratic hashing and double hashing when these methods have a hash table occupancy of 75 % or higher (if the MPHF fits in the CPU cache the same happens for hash table occupancies greater than or equal to 55%). Furthermore, MPHFs also have a better performance in all measured aspects when compared to sparse hashing, which has been designed specifically for efficient memory usage. 1.
Perfect hashing for data management applications
, 2007
"... Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory and practice among all previous methods on minimal perfect has ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory and practice among all previous methods on minimal perfect hashing. On one side, there are good theoretical results without experimentally proven practicality for large key sets. On the other side, there are the theoretically analyzed time and space usage algorithms that assume that truly random hash functions are available for free, which is an unrealistic assumption. In this paper we attempt to bridge this gap between theory and practice, using a number of techniques from the literature to obtain a novel scheme that is theoretically wellunderstood and at the same time achieves an orderofmagnitude increase in performance compared to previous “practical ” methods. This improvement comes from a combination of a novel, theoretically optimal perfect hashing scheme that greatly simplifies previous methods, and the fact that our algorithm is designed to make good use of the memory hierarchy. We demonstrate the scalability of our algorithm by considering a set of over one billion URLs from the World Wide Web of average length 64, for which we construct a minimal perfect hash function on a commodity PC in a little more than 1 hour. Our scheme produces minimal perfect hash functions using slightly more than 3 bits per key. For perfect hash functions in the range {0,..., 2n −1} the space usage drops to just over 2 bits per key (i.e., one bit more than optimal for representing the key). This is significantly below of what has been achieved previously for very large values of n. 1.
Finding Succinct Ordered Minimal Perfect Hash Functions
, 1994
"... An ordered minimal perfect hash table is one in which no collisions occur among a predefined set of keys, no space is unused and the data are placed in the table in order. A new method for creating ordered minimal perfect hash functions is presented. It creates hash functions with representation spa ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
An ordered minimal perfect hash table is one in which no collisions occur among a predefined set of keys, no space is unused and the data are placed in the table in order. A new method for creating ordered minimal perfect hash functions is presented. It creates hash functions with representation space requirements closer to the theoretical lower bound than previous methods. The method presented requires approximately 17% less space to represent generated hash functions and is easy to implement. However, a high time complexity makes it practical for small sets only (size ! 1000). Keywords: Data Structures, Hashing, Perfect Hashing 1 Introduction A hash table is a data structure in which a number of keyed items are stored. To access an item with a given key, a hash function is used. The hash function maps from the set of keys, to the set of locations of the table. If more than one key maps to a given location, a collision occurs, and some collision resolution policy must be followed. O...
A New Algorithm for Constructing Minimal Perfect Hash Functions
"... 1 Introduction Let S be a set of n distinct keys belonging to a finiteuniverse U of keys. The keys in S are stored so thatmembership queries asking if key ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
1 Introduction Let S be a set of n distinct keys belonging to a finiteuniverse U of keys. The keys in S are stored so thatmembership queries asking if key
Areaefficient nearassociative memories on FPGAs
 In International Symposium on FieldProgrammable Gate Arrays
, 2013
"... Associative memories can map sparsely used keys to values with low latency but can incur heavy area overheads. The lack of customized hardware for associative memories in today’s mainstream FPGAs exacerbates the overhead cost of building these memories using the fixed address match BRAMs. In this pa ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Associative memories can map sparsely used keys to values with low latency but can incur heavy area overheads. The lack of customized hardware for associative memories in today’s mainstream FPGAs exacerbates the overhead cost of building these memories using the fixed address match BRAMs. In this paper, we develop a new, FPGAfriendly, memory architecture based on a multiple hash scheme that is able to achieve nearassociative performance (less than 5% of evictions due to conflicts) without the area overheads of a fully associative memory on FPGAs. Using the proposed architecture as a 64KB L1 data cache, we show that it is able to achieve nearassociative missrates while consuming 67 × less FPGA memory resources for a set of benchmark programs from the SPEC2006 suite than fully associative memories generated by the Xilinx Coregen tool. Benefits increase with match width, allowing area reduction up to 100×. At the same time, the new architecture has lower latency than the fully associative memory—3.7 ns for a 1024entry flat version or 6.1 ns for an areaefficient version compared to 8.8 ns for a fully associative memory for a 64b key.
Fast NGram Language Model LookAhead for Decoders With Static Pronunciation Prefix Trees
"... Decoders that make use of tokenpassing restrict their search space by various types of token pruning. With use of the Language Model LookAhead (LMLA) technique it is possible to increase the number of tokens that can be pruned without loss of decoding precision. Unfortunately, for token passing de ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Decoders that make use of tokenpassing restrict their search space by various types of token pruning. With use of the Language Model LookAhead (LMLA) technique it is possible to increase the number of tokens that can be pruned without loss of decoding precision. Unfortunately, for token passing decoders that use single static pronunciation prefix trees, full ngram LMLA increases the needed number of language model probability calculations considerably. In this paper a method for applying full ngram LMLA in a decoder with a single static pronunciation tree is introduced. The experiments show that this method improves the speed of the decoder without an increase of search errors.