Results 1 -
7 of
7
Why simple hash functions work: Exploiting the entropy in a data stream
- In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms
, 2008
"... Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealiz ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2-universal or O(1)-wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing. In practice, however, it is commonly observed that weak hash functions, including 2-universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source, ” whereby
Performance in Practice of String Hashing Functions
- Proc. Int. Conf. on Database Systems for Advanced Applications
, 1997
"... String hashing is a fundamental operation, used in countless applications where fast access to distinct strings is required. In this paper we describe a class of string hashing functions and explore its performance. In particular, using experiments with both small sets of keys and a large key set fr ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
String hashing is a fundamental operation, used in countless applications where fast access to distinct strings is required. In this paper we describe a class of string hashing functions and explore its performance. In particular, using experiments with both small sets of keys and a large key set from a text database, we show that it is possible to achieve performance close to that theoretically predicted for hashing functions. We also consider criteria for choosing a hashing function and use them to compare our class of functions to other methods for string hashing. These results show that our class of hashing functions is reliable and efficient, and is therefore an appropriate choice for general-purpose hashing.
A Performance Study of Hashing Functions for Hardware Applications
- In Proc. of Int. Conf. on Computing and Information
, 1994
"... Hashing is used extensively in hardware applications such as page tables for address translation. There is not much literature in this regard although hashing has been extensively studied for file organization. More specifically, there is no study of the practical performance of hashing functions us ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Hashing is used extensively in hardware applications such as page tables for address translation. There is not much literature in this regard although hashing has been extensively studied for file organization. More specifically, there is no study of the practical performance of hashing functions used. In the literature we find bit extraction and exclusive ORing hashing functions used, but there is no mention of the performance of these functions. Moreover, the performance of the hashing functions in relation to the theoretical performance of hashing schemes is not addressed. In this paper we study the practical performance of a particular class of hashing functions. Our results show that by choosing functions randomly from this class of hashing functions, which can be readily implemented in hardware, we can achieve analytically predicted performance of hashing schemes with real life data. 1622 1 Introduction Hashing is a widely used technique of organizing tables which also finds ...
Hash-Based Techniques for High-Speed Packet Processing
"... Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little bac ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background in either the theory or applications of hashing, reviewing the fundamentals as necessary. 1
Index Translation Schemes for Adaptive Computations on Distributed Memory Multicomputers
- Proceedings of the Ninth International Parallel Processing Symposium, IEEE Computer
, 1995
"... Current research in parallel programming is focused on closing the gap between globally indexed algorithms and the separate address spaces of processors on distributed memory multicomputers. A set of index translation schemes have been implemented as a part of CHAOS runtime support library, so that ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Current research in parallel programming is focused on closing the gap between globally indexed algorithms and the separate address spaces of processors on distributed memory multicomputers. A set of index translation schemes have been implemented as a part of CHAOS runtime support library, so that the library functions can be used for implementing a global index space across a collection of separate local index spaces. These schemes include two software-cached translation schemes aimed at adaptive irregular problems as well as a distributed translation table technique for statically irregular problems. To evaluate and demonstrate the efficiency of the software-cached translation schemes, experiments have been performed with an adaptively irregular loop kernel and a full-fledged 3D DSMC code from NASA Langley on the Intel Paragon and Cray T3D. This paper also discusses and analyzes the operational conditions under which each scheme can produce optimal performance. 1 Introduction Distr...
Hash-Based Data Structures for Extreme Conditions
, 2008
"... This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both an ..."
Abstract
- Add to MetaCart
This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both analyzable and practical. First, we show that a wide class of Bloom filter variants can be effectively implemented using very easily computable combinations of only two fully random hash functions. From a theoretical perspective, these results show that Bloom filters and related data structures can often be substantially derandomized with essentially no loss in performance. From a practical perspective, this derandomization allows for a significant speedup in certain query intensive applications. The rest of this work focuses on designing space-efficient, open-addressed, multiple choice hash tables for implementation in high-performance router hardware. Using multiple hash functions conserves space, but requires every hash table operation to consider multiple hash buckets, forcing a tradeoff between the slow speed of examining these buckets serially
Effect of cache lines in array-based hashing algorithms
"... Abstract — Hashing algorithms and their efficiency is modeled with their expected probe lengths. This value measures the number of algorithmic steps required to find the position of an item inside the table. This metric, however, has an implicit assumption that all of these steps have a uniform cost ..."
Abstract
- Add to MetaCart
Abstract — Hashing algorithms and their efficiency is modeled with their expected probe lengths. This value measures the number of algorithmic steps required to find the position of an item inside the table. This metric, however, has an implicit assumption that all of these steps have a uniform cost. In this paper we show that this is not true on modern computers, and that caches and especially cache lines have a great impact on the performance and effectiveness of hashing algorithms that use array-based structures. Spatial locality of memory accesses plays a major role in the effectiveness of an algorithm. We show a novel model of evaluating hashing schemes; this model is based on the number of cache misses the algorithms suffer. This new approach is shown to model the real performance of hash tables more precisely than previous methods. For developing this model the sketch of proof of the expected probe length of linear probing is included.

