External perfect hashing for very large key sets (2007)
Cached
Download Links
- [homepages.dcc.ufmg.br]
- [homepages.dcc.ufmg.br]
- DBLP
Other Repositories/Bibliography
| Venue: | In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07 |
| Citations: | 9 - 1 self |
BibTeX
@INPROCEEDINGS{Botelho07externalperfect,
author = {Fabiano C. Botelho and Daniel Galinkin and Wagner Meira and Jr. Nivio Ziviani},
title = {External perfect hashing for very large key sets},
booktitle = {In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07},
year = {2007},
pages = {653--662},
publisher = {ACM Press}
}
OpenURL
Abstract
A perfect hash function (PHF) h: S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets. In this paper we present a distributed and parallel version of a simple, highly scalable and near-space optimal perfect hashing algorithm for very large key sets, recently presented in [4]. The sequential implementation of the algorithm constructs a MPHF for a set of 1.024 billion URLs of average length 64 bytes collected from the Web in approximately 50 minutes using a commodity PC. The parallel implementation proposed here presents the following performance using 14 commodity PCs: (i) it constructs a MPHF for the same set of 1.024 billion URLs in approximately 4 minutes; (ii) it constructs a MPHF for a set of 14.336 billion 16-byte random integers in approximately 50 minutes with a performance degradation of 20%; (iii) one version of the parallel algorithm distributes the description of the MPHF among the participating machines and its evaluation is done in a distributed way, faster than the centralized function.







