Results 11  20
of
56
Minwise independent permutations (extended abstract
 In STOC ’98: Proceedings of the thirtieth annual ACM symposium on Theory of computing
, 1998
"... We define and study the notion of minwise independent families of permutations. We say that F⊆Sn is minwise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr ( min{π(X)} = π(x) ) = 1 X . In other words we require that all the elements of any fixed set ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
(Show Context)
We define and study the notion of minwise independent families of permutations. We say that F⊆Sn is minwise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr ( min{π(X)} = π(x) ) = 1 X . In other words we require that all the elements of any fixed set X have an equal chance to become the minimum element of the image of X under π. Our research was motivated by the fact that such a family (under some relaxations) is essential to the algorithm used in practice by the AltaVista web index software to detect and filter nearduplicate documents. However, in the course of
A Novel Cache Architecture to Support LayerFour Packet Classification at Memory Access Speeds
, 2000
"...  Existing and emerging layer4 switching technologies require packet classication to be performed on more than one header elds, known as layer4 lookup. Currently, the fastest general layer4 lookup scheme delivers a throughput of 1 Million Lookups Per Second (MLPS), far o from 25/75 MLPS needed to ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
 Existing and emerging layer4 switching technologies require packet classication to be performed on more than one header elds, known as layer4 lookup. Currently, the fastest general layer4 lookup scheme delivers a throughput of 1 Million Lookups Per Second (MLPS), far o from 25/75 MLPS needed to support 50/150 Gbps layer4 router. We propose the use of route caching to speed up layer4 lookup, and design and implement a cache architecture for this purpose. We investigated the locality behavior of the Interent trac (at layer4) and proposed a nearLRU algorithm that best harness this behavior. In implementation, to best approximate fullyassociative nearLRU using relatively inexpensive setassociative hardware, we invented a dynamic setassociative scheme that exploits the nice properties of Nuniversal hash functions. The cache architecture achieves a high and stable hit ratio above 90 percent and a fast throughput up to 75 MLPS at a reasonable cost ($700/1700 for 50/150 Gbps rou...
Cuckoo hashing: Further analysis
, 2003
"... We consider cuckoo hashing as proposed by Pagh and Rodler in 2001. We show that the expected construction time of the hash table is O(n) as long as the two open addressing tables are each of size at least (1 #)n,where#>0andn is the number of data points. Slightly improved bounds are obtained f ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
(Show Context)
We consider cuckoo hashing as proposed by Pagh and Rodler in 2001. We show that the expected construction time of the hash table is O(n) as long as the two open addressing tables are each of size at least (1 #)n,where#>0andn is the number of data points. Slightly improved bounds are obtained for various probabilities and constraints. The analysis rests on simple properties of branching processes.
Balanced Allocations (Extended Abstract)
 SIAM Journal on Computing
, 1994
"... Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability ln n= ln ln n(1 + o(1)) balls in it. Suppose instead, that for each ball we choose two boxes at random and place ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability ln n= ln ln n(1 + o(1)) balls in it. Suppose instead, that for each ball we choose two boxes at random and place the ball into the one which is less full at the time of placement. We show that with high probability, the fullest box contains only ln ln n= ln 2+O(1) balls  exponentially less than before. Furthermore, we show that a similar gap exists in the infinite process, where at each step one ball, chosen uniformly at random, is deleted, and one ball is added in the manner above. We discuss consequences of this and related theorems for dynamic resource allocation, hashing, and online load balancing. 1 Introduction Suppose that we sequentially place n balls into n boxes by putting each ball into a randomly chosen box. Properties of this random allocation process have been extensively studied in ...
The power of one move: Hashing schemes for hardware
 IEEE INFOCOM
, 2008
"... In a standard multiple choice hashing scheme, each item is stored in one of d ≥ 2 hash table buckets. The availability of choice in where items are stored improves space utilization. These schemes are often very amenable to a hardware implementation, such as in a router. Recently, researchers have ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
In a standard multiple choice hashing scheme, each item is stored in one of d ≥ 2 hash table buckets. The availability of choice in where items are stored improves space utilization. These schemes are often very amenable to a hardware implementation, such as in a router. Recently, researchers have discovered powerful variants where items already in the hash table may be moved during the insertion of a new item. Unfortunately, these schemes occasionally require a large number of items to be moved during an insertion operation, making them inappropriate for a hardware implementation. We show that it is possible to significantly increase the space utilization of a multiple choice hashing scheme by allowing at most one item to be moved during an insertion. Furthermore, our schemes can be effectively analyzed, optimized, and compared using numerical methods based on fluid limit arguments, without resorting to much slower simulations.
Fossilized Index: The Linchpin of Trustworthy Nonalterable Electronic Records
 In Proceedings of the ACM SIGMOD International Conference on Management of Data
, 2005
"... As critical records are increasingly stored in electronic form, which tends to make for easy destruction and clandestine modification, it is imperative that they be properly managed to preserve their trustworthiness, i.e., their ability to provide irrefutable proof and accurate details of events t ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
As critical records are increasingly stored in electronic form, which tends to make for easy destruction and clandestine modification, it is imperative that they be properly managed to preserve their trustworthiness, i.e., their ability to provide irrefutable proof and accurate details of events that have occurred. The need for proper record keeping is further underscored by the recent corporate misconduct and ensuing attempts to destroy incriminating records. Currently, the industry practice and regulatory requirements (e.g., SEC Rule 17a4) rely on storing records in WORM storage to immutably preserve the records. In this paper, we contend that simply storing records in WORM storage is increasingly inadequate to ensure that they are trustworthy. Specifically, with the large volume of records that are typical today, meeting the ever more stringent query response time requires the use of direct access mechanisms such as indexes. Relying on indexes for accessing records could, however, provide a means for effectively altering or deleting records, even those stored in WORM storage. In this paper, we establish the key requirements for a fossilized index that protects the records from such logical modification. We also analyze current indexing methods to determine how they fall short of these requirements. Based on our insights, we propose the Generalized Hash Tree (GHT). Using both theoretical analysis and simulations with real system data, we demonstrate that the GHT can satisfy the requirements of a fossilized index with performance and cost that are comparable to regular indexing techniques such as the Btree. We further note that as records are indexed on multiple fields to facilitate search and retrieval, the records can be reconstructed from the corresponding index entries even after the records expire and are disposed of. Therefore, we also present a novel method to eliminate this disclosure risk by allowing an index entry to be effectively disposed of when its record expires. 1.
HashBased Techniques for HighSpeed Packet Processing
"... Abstract Hashing is an extremely useful technique for a variety of highspeed packetprocessing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little bac ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Abstract Hashing is an extremely useful technique for a variety of highspeed packetprocessing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background in either the theory or applications of hashing, reviewing the fundamentals as necessary. 1
Perfect Hashing for Network Applications
, 2006
"... Hash tables are a fundamental data structure in many network applications, including route lookups, packet classification and monitoring. Often a part of the data path, they need to operate at wirespeed. However, several associative memory accesses are needed to resolve collisions, making them slow ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
Hash tables are a fundamental data structure in many network applications, including route lookups, packet classification and monitoring. Often a part of the data path, they need to operate at wirespeed. However, several associative memory accesses are needed to resolve collisions, making them slower than required. This motivates us to consider minimal perfect hashing schemes, which reduce the number of memory accesses to just 1 and are also spaceefficient. Existing perfect hashing algorithms are not tailored for network applications because they take too long to construct and are hard to implement in hardware. This paper introduces a hardwarefriendly scheme for minimal perfect hashing, with space requirement approaching 3.7 times the information theoretic lower bound. Our construction is several orders faster than existing perfect hashing schemes. Instead of using the traditional mappingpartitioningsearching methodology, our scheme employs a Bloom filter, which is known for its simplicity and speed. We extend our scheme to the dynamic setting, thus handling insertions and deletions.
Maximum matchings in random bipartite graphs and the space utilization of cuckoo hashtables
, 2009
"... We study the the following question in Random Graphs. We are given two disjoint sets L, R with L  = n = αm and R  = m. We construct a random graph G by allowing each x ∈ L to choose d random neighbours in R. The question discussed is as to the size µ(G) of the largest matching in G. When consi ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
We study the the following question in Random Graphs. We are given two disjoint sets L, R with L  = n = αm and R  = m. We construct a random graph G by allowing each x ∈ L to choose d random neighbours in R. The question discussed is as to the size µ(G) of the largest matching in G. When considered in the context of Cuckoo Hashing, one key question is as to when is µ(G) = n whp? We answer this question exactly when d is at least three. We also establish a precise threshold for when Phase 1 of the KarpSipser Greedy matching algorithm suffices to compute a maximum matching whp.
Simple Summaries for Hashing with Choices
 IEEE/ACM TRANSACTIONS ON NETWORKING
, 2008
"... In a multiplechoice hashing scheme, each item is stored in one of P possible hash table buckets. The availability of these multiple choices allows for a substantial reduction in the maximum load of the buckets. However, a lookup may now require examining each of the locations. For applications whe ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
In a multiplechoice hashing scheme, each item is stored in one of P possible hash table buckets. The availability of these multiple choices allows for a substantial reduction in the maximum load of the buckets. However, a lookup may now require examining each of the locations. For applications where this cost is undesirable, Song et al. propose keeping a summary that allows one to determine which of the locations is appropriate for each item, where the summary may allow false positives for items not in hash table. We propose alternative, simple constructions of such summaries that use less space for both the summary and the underlying hash table. Moreover, our constructions are easily analyzable and tunable.