Results 1 - 10
of
10
Trajectory Sampling for Direct Traffic Observation
, 2001
"... Traffic measurement is a critical component for the control and engineering of communication networks. We argue that traffic measurement should make it possible to obtain the spatial flow of traffic through the domain, i.e., the paths followed by packets between any ingress and egress point of the d ..."
Abstract
-
Cited by 176 (21 self)
- Add to MetaCart
Traffic measurement is a critical component for the control and engineering of communication networks. We argue that traffic measurement should make it possible to obtain the spatial flow of traffic through the domain, i.e., the paths followed by packets between any ingress and egress point of the domain. Most resource allocation and capacity planning tasks can benefit from such information. Also, traffic measurements should be obtained without a routing model and without knowledge of network state. This allows the traffic measurement process to be resilient to network failures and state uncertainty. We propose a method that allows the direct inference of traffic flows through a domain by observing the trajectories of a subset of all packets traversing the network. The key advantages of the method are that (i) it does not rely on routing state, (ii) its implementation cost is small, and (iii) the measurement reporting traffic is modest and can be controlled precisely. The key idea of the method is to sample packets based on a hash function computed over the packet content. Using the same hash function will yield the same sample set of packets in the entire domain, and enables us to reconstruct packet trajectories. I.
Approximate Distance Oracles
, 2001
"... Let G = (V; E) be an undirected weighted graph with jV j = n and jEj = m. Let k 1 be an integer. We show that G = (V; E) can be preprocessed in O(kmn ) expected time, constructing a data structure of size O(kn ), such that any subsequent distance query can be answered, approximately, in O(k ..."
Abstract
-
Cited by 154 (6 self)
- Add to MetaCart
Let G = (V; E) be an undirected weighted graph with jV j = n and jEj = m. Let k 1 be an integer. We show that G = (V; E) can be preprocessed in O(kmn ) expected time, constructing a data structure of size O(kn ), such that any subsequent distance query can be answered, approximately, in O(k) time. The approximate distance returned is of stretch at most 2k \Gamma 1, i.e., the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and 2k \Gamma 1. We show that a 1963 girth conjecture of Erdos, implies ) space is needed in the worst case for any real stretch strictly smaller than 2k + 1. The space requirement of our algorithm is, therefore, essentially optimal.
Cuckoo hashing
- Journal of Algorithms
, 2001
"... We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738–761, 1994). The space usage is similar to that ..."
Abstract
-
Cited by 86 (5 self)
- Add to MetaCart
We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738–761, 1994). The space usage is similar to that of binary search trees, i.e., three words per key on average. Besides being conceptually much simpler than previous dynamic dictionaries with worst case constant lookup time, our data structure is interesting in that it does not use perfect hashing, but rather a variant of open addressing where keys can be moved back in their probe sequences. An implementation inspired by our algorithm, but using weaker hash functions, is found to be quite practical. It is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee. Key Words: data structures, dictionaries, information retrieval, searching, hashing, experiments * Partially supported by the Future and Emerging Technologies programme of the EU
Hash-Based Techniques for High-Speed Packet Processing
"... Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little bac ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background in either the theory or applications of hashing, reviewing the fundamentals as necessary. 1
Shore-MT: A Scalable Storage Manager for the Multicore Era
- EXTENDING DATABASE TECHNOLOGY (EDBT)
, 2009
"... Database storage managers have long been able to efficiently handle multiple concurrent requests. Until recently, however, a computer contained only a few single-core CPUs, and therefore only a few transactions could simultaneously access the storage manager's internal structures. This allowed stora ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Database storage managers have long been able to efficiently handle multiple concurrent requests. Until recently, however, a computer contained only a few single-core CPUs, and therefore only a few transactions could simultaneously access the storage manager's internal structures. This allowed storage managers to use non-scalable approaches without any penalty. With the arrival of multicore chips, however, this situation is rapidly changing. More and more threads can run in parallel, stressing the internal scalability of the storage manager. Systems optimized for high performance at a limited number of cores are not assured similarly high performance at a higher core count, because unanticipated scalability obstacles arise. We benchmark four popular open-source storage managers (Shore, BerkeleyDB, MySQL, and PostgreSQL) on a modern multicore machine, and find that they all suffer in terms of scalability. We briefly examine the bottlenecks in the various storage engines. We then present Shore-MT, a multithreaded and highly scalable version of Shore which we developed by identifying and successively removing internal bottlenecks. When compared to other DBMS, Shore-MT exhibits superior scalability and 2-4 times higher absolute throughput than its peers. We also show that designers should favor scalability to single-thread performance, and highlight important principles for writing scalable storage engines, illustrated with real examples from the development of Shore-MT.
Fast evaluation of union-intersection expressions
, 2007
"... Abstract. We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size w, a special case of our result is that the intersection of m (preprocessed) sets, containing n elements in total, can be computed in expected time O(n(log w) 2 /w + km), where k is the number of elements in the intersection. If the first of the two terms dominates, this is a factor w 1−o(1) faster than the standard solution of merging sorted lists. We show a log k cell probe lower bound of time Ω(n/(wm log m) + (1 −)k), meaning w that our upper bound is nearly optimal for small m. Our algorithm uses a novel combination of approximate set representations and word-level parallelism. 1
Hashing, Randomness and Dictionaries
, 2002
"... This thesis is centered around one of the most basic information retrieval problems, namely that of storing and accessing the elements of a set. Each element in the set has some associated information that is returned along with it. The problem is referred to as the dictionary problem, due to the si ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This thesis is centered around one of the most basic information retrieval problems, namely that of storing and accessing the elements of a set. Each element in the set has some associated information that is returned along with it. The problem is referred to as the dictionary problem, due to the similarity to a bookshelf dictionary, which contains a set of words and has an explanation associated with each word. In the static version of the problem the set is fixed, whereas in the dynamic version, insertions and deletions of elements are possible. The approach
On the k-independence required by linear probing and minwise independence
- In Proc. 37th International Colloquium on Automata, Languages and Programming (ICALP
, 2010
"... )-independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiply-shift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5-independent hash functions for exp ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
)-independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiply-shift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5-independent hash functions for expected constant-time performance, matching an upper bound of [Pagh et al. STOC’07]. For (1 + ε)-approximate minwise independence, we show that Ω(lg 1 ε 1
Hash-Based Data Structures for Extreme Conditions
, 2008
"... This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both an ..."
Abstract
- Add to MetaCart
This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both analyzable and practical. First, we show that a wide class of Bloom filter variants can be effectively implemented using very easily computable combinations of only two fully random hash functions. From a theoretical perspective, these results show that Bloom filters and related data structures can often be substantially derandomized with essentially no loss in performance. From a practical perspective, this derandomization allows for a significant speedup in certain query intensive applications. The rest of this work focuses on designing space-efficient, open-addressed, multiple choice hash tables for implementation in high-performance router hardware. Using multiple hash functions conserves space, but requires every hash table operation to consider multiple hash buckets, forcing a tradeoff between the slow speed of examining these buckets serially
The Power of Simple Tabulation Hashing Mihai Pǎtras¸cu AT&T Labs
, 2011
"... Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates ba ..."
Abstract
- Add to MetaCart
Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates back to Carter and Wegman (STOC’77). Keys are viewed as consisting of c characters. We initialize c tables T1,..., Tc mapping characters to random hash codes. A key x = (x1,..., xq) is hashed to T1[x1] ⊕ · · · ⊕ Tc[xc], where ⊕ denotes xor. While this scheme is not even 4-independent, we show that it provides many of the guarantees that are normally obtained via higher independence, e.g., Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing. An important target of the analysis of algorithms is to determine whether there exist practical schemes, which enjoy mathematical guarantees on performance. Hashing and hash tables are one of the most common inner loops in real-world computation, and are even built-in “unit cost ” operations in high level programming languages that offer associative arrays. Often,

