## Cuckoo Ring: Balancing Workload for Locality Sensitive Hash (2006)

Venue: | Proc. IEEE Int’l Conf. Peerto-Peer Computing (P2P |

Citations: | 1 - 0 self |

### BibTeX

@INPROCEEDINGS{Han06cuckooring:,

author = {Dingyi Han and Ting Shen and Shicong Meng and Yong Yu},

title = {Cuckoo Ring: Balancing Workload for Locality Sensitive Hash},

booktitle = {Proc. IEEE Int’l Conf. Peerto-Peer Computing (P2P},

year = {2006},

pages = {49--56}

}

### OpenURL

### Abstract

Locality Sensitive Hash (LSH) is widely used in peerto-peer (P2P) systems. Although it can support range or similarity queries, it breaks the load balance mechanism of traditional Distributed Hash Table (DHT) based system by replacing consistent hash with LSH. To solve the imbalance problem, current systems either weaken the locality preserve ability from similarity preserved to order preserved or adopt load aware peer join mechanism. The first method does not support similarity query as it loses the similarity information and the second method is greatly affected by the dynamic nature of P2P networks. In this paper, we propose a novel system, cuckoo ring, which can preserve similarity information while load balanced. It does not guide the newly joining peer to the hot areas but move the items in the hot areas to cold areas so that the short life time peers are distributed uniformly across the network instead of being guided to the hot areas. Compared to traditional DHT systems, cuckoo ring only maintains a little more information about the global light load peers and the moved indexed items.

### Citations

2638 | Modern information retrieval
- Baeza-Yates, Ribeiro-Neto
- 1999
(Show Context)
Citation Context ... the nearest neighbor problem, especially dealing with the “curse of dimensionality” [12]. The authors have proved that for any ɛ>0, there is an algorithm for ɛ − NNS in ℜ d under any lp norm for p ∈ =-=[1, 2]-=- which uses (nd) O(1) preprocessing and requires Õ(d) query time. Since then, LSH is widely used in P2P systems [2, 3, 8, 9, 17, 18, 23, 28, 31] to support range or similarity queries. Definition 1 A ... |

1945 |
An algorithm for suffix stripping
- Porter
- 1980
(Show Context)
Citation Context ...03rd Congress, Federal Register, Financial Times, Foreign Broadcast Information Service and Los Angeles Times. We have filtered the stopwords [14] in the texts and have adopted Porter’s stemming rule =-=[25]-=- to transform the documents from text to term frequency (tf) vectors [1]. Some terms only appears in one document and some of the documents contain empty content or only stopwords, we have ignored the... |

760 | Approximate nearest neighbors: towards removing the curse of dimensionality - Indyk, Motwani - 1998 |

458 | Similarity search in high dimensions via hashing
- Gionis, Indyk, et al.
- 1999
(Show Context)
Citation Context ... indexed items. Key Words Cuckoo Ring, Load Balance, Consistent Hash, Universal Hash, Locality Sensitive Hash 1 Introduction Locality Sensitive Hash (LSH) was first introduced by Piotr et al. in 1998 =-=[16, 19]-=-, after the emergence of nonexpansive hashing [22], to solve the approximate version of the nearest neighbor problem, especially dealing with the “curse of dimensionality” [12]. The authors have prove... |

376 | On the resemblance and containment of documents
- Broder
- 1997
(Show Context)
Citation Context ...ion [5, 10] is a family of locality sensitive hash functions for Jaccard’s coefficient [21]. For two sets Sa and Sb, their Jeccard similarity is . Min-wise independent permutations uses k min-hashing =-=[4]-=-. For a set S, the min-hashing is hπ(S) =min{π(S)} simJ(Sa,Sb) = |Sa∩Sb| |Sa∪Sb| By using k functions hπ1, hπ2, ..., hπk uniformly at random, we can get the hash key for S, key(S) = (hπ1(S),hπ2(S),...... |

259 | Similarity estimation techniques from rounding algorithms
- Charikar
- 2002
(Show Context)
Citation Context ...lookup time property. When finding an item x, just go to the peers that index h1(x) or h2(x). 3.2 Locality Sensitive Hash There are two types of LSH functions. One is min-wise independent permutation =-=[5, 10]-=- and the other is absolute angle [18]. 3.2.1 Min-Wise Independent Permutation Min-wise independent permutation [5, 10] is a family of locality sensitive hash functions for Jaccard’s coefficient [21]. ... |

243 | Measures of distributional similarity
- Lee
- 1999
(Show Context)
Citation Context ...5, 10] and the other is absolute angle [18]. 3.2.1 Min-Wise Independent Permutation Min-wise independent permutation [5, 10] is a family of locality sensitive hash functions for Jaccard’s coefficient =-=[21]-=-. For two sets Sa and Sb, their Jeccard similarity is . Min-wise independent permutations uses k min-hashing [4]. For a set S, the min-hashing is hπ(S) =min{π(S)} simJ(Sa,Sb) = |Sa∩Sb| |Sa∪Sb| By usin... |

161 | Simple efficient load balancing algorithms for peer-to-peer systems
- Karger, Ruhl
- 2004
(Show Context)
Citation Context ...n of the peers are indexing most of the items. Without any modifications on CAN [27] or Chord [30], the peers in those systems are surely not balanced. Moreover, traditional load balancing techniques =-=[7, 15, 20, 26]-=- do not work as they are designed to make the distribution of hashed values to be uniform. Some researchers have also noticed this serious problem and have proposed several methods. Their methods can ... |

132 | Cuckoo hashing
- Pagh, Rodler
- 2001
(Show Context)
Citation Context ...n preserve similarity information while load balanced. The basic idea comes from cuckoo hash which “kicks out” some existing items from their “own home” and has very interesting worst-case properties =-=[13, 24]-=-. Cuckoo ring still preserves locality by LSH functions. It moves the indexed items from hot key areas to cold key areas instead of guiding the newly joined peers to the hot areas, so that the short l... |

110 | Simple Load Balancing for Distributed Hash Tables
- Byers, Considine, et al.
- 2003
(Show Context)
Citation Context ...n of the peers are indexing most of the items. Without any modifications on CAN [27] or Chord [30], the peers in those systems are surely not balanced. Moreover, traditional load balancing techniques =-=[7, 15, 20, 26]-=- do not work as they are designed to make the distribution of hashed values to be uniform. Some researchers have also noticed this serious problem and have proposed several methods. Their methods can ... |

110 | Load balancing in structured p2p systems
- Rao, Lakshminarayanan, et al.
- 2003
(Show Context)
Citation Context ...n of the peers are indexing most of the items. Without any modifications on CAN [27] or Chord [30], the peers in those systems are surely not balanced. Moreover, traditional load balancing techniques =-=[7, 15, 20, 26]-=- do not work as they are designed to make the distribution of hashed values to be uniform. Some researchers have also noticed this serious problem and have proposed several methods. Their methods can ... |

109 | Online Balancing of RangePartitioned Data with Applications to Peer-to-Peer Systems
- Ganesan, Bawa, et al.
(Show Context)
Citation Context |

98 | MAAN: A Multi-Attribute Addressable Network for Grid Information Services
- Cai, Frank, et al.
- 2003
(Show Context)
Citation Context ...t for any ɛ>0, there is an algorithm for ɛ − NNS in ℜ d under any lp norm for p ∈ [1, 2] which uses (nd) O(1) preprocessing and requires Õ(d) query time. Since then, LSH is widely used in P2P systems =-=[2, 3, 8, 9, 17, 18, 23, 28, 31]-=- to support range or similarity queries. Definition 1 A family of hash functions H = {h : S1 → S2} is called (r1,r2,p1,p2)-sensitive for similarity measure M if for any q, p, p ′ ∈ S1, • if p ∈ B(q, r... |

91 | Approximate range selection queries in peer-to-peer systems
- Gupta, Agrawal, et al.
(Show Context)
Citation Context ...t for any ɛ>0, there is an algorithm for ɛ − NNS in ℜ d under any lp norm for p ∈ [1, 2] which uses (nd) O(1) preprocessing and requires Õ(d) query time. Since then, LSH is widely used in P2P systems =-=[2, 3, 8, 9, 17, 18, 23, 28, 31]-=- to support range or similarity queries. Definition 1 A family of hash functions H = {h : S1 → S2} is called (r1,r2,p1,p2)-sensitive for similarity measure M if for any q, p, p ′ ∈ S1, • if p ∈ B(q, r... |

70 |
An algorithm for approximate closest-point queries
- Clarkson
- 1994
(Show Context)
Citation Context ...n high cosine similarity document pairs is quite low, a lot of low cosine similarity document pairs also have low absolute angle distances. We believe it is caused by the curse of high dimensionality =-=[11]-=- and the sparsity problem, as our tfidf matrix is quite sparse. Since absolute angel does not preserve the similarities well in this data set, we will focus our experiments on using min-wise independe... |

69 | Membership in constant time and almost minimum space
- Brodnik, Munro
- 1999
(Show Context)
Citation Context ...06 Authorized licensed use limited to: Shanghai Jiao Tong University. Downloaded on April 3, 2009 at 06:13 from IEEE Xplore. Restrictions apply.constant factor of the information theoretical minimum =-=[6]-=- of B = logn|U| bits [24]. Meanwhile, it is simple and easy for implementation. Luc et al. have also clarified the probability theoretical properties of cuckoo hashing in [13] by a graph-theoretic int... |

64 | Z.: Efficient top-k query calculation in distributed networks
- Cao, Wang
- 2004
(Show Context)
Citation Context ...t for any ɛ>0, there is an algorithm for ɛ − NNS in ℜ d under any lp norm for p ∈ [1, 2] which uses (nd) O(1) preprocessing and requires Õ(d) query time. Since then, LSH is widely used in P2P systems =-=[2, 3, 8, 9, 17, 18, 23, 28, 31]-=- to support range or similarity queries. Definition 1 A family of hash functions H = {h : S1 → S2} is called (r1,r2,p1,p2)-sensitive for similarity measure M if for any q, p, p ′ ∈ S1, • if p ∈ B(q, r... |

43 | M.: Min-wise independent permutations (extended abstract
- Broder, Charikar, et al.
- 1998
(Show Context)
Citation Context ...lookup time property. When finding an item x, just go to the peers that index h1(x) or h2(x). 3.2 Locality Sensitive Hash There are two types of LSH functions. One is min-wise independent permutation =-=[5, 10]-=- and the other is absolute angle [18]. 3.2.1 Min-Wise Independent Permutation Min-wise independent permutation [5, 10] is a family of locality sensitive hash functions for Jaccard’s coefficient [21]. ... |

41 | Lsh forest: self-tuning indexes for similarity search
- Bawa, Condie, et al.
- 2005
(Show Context)
Citation Context ... the nearest neighbor problem, especially dealing with the “curse of dimensionality” [12]. The authors have proved that for any ɛ>0, there is an algorithm for ɛ − NNS in ℜ d under any lp norm for p ∈ =-=[1, 2]-=- which uses (nd) O(1) preprocessing and requires Õ(d) query time. Since then, LSH is widely used in P2P systems [2, 3, 8, 9, 17, 18, 23, 28, 31] to support range or similarity queries. Definition 1 A ... |

30 | Non-expansive hashing
- Linial, Sasson
- 1996
(Show Context)
Citation Context ...onsistent Hash, Universal Hash, Locality Sensitive Hash 1 Introduction Locality Sensitive Hash (LSH) was first introduced by Piotr et al. in 1998 [16, 19], after the emergence of nonexpansive hashing =-=[22]-=-, to solve the approximate version of the nearest neighbor problem, especially dealing with the “curse of dimensionality” [12]. The authors have proved that for any ɛ>0, there is an algorithm for ɛ − ... |

19 | Cuckoo hashing: Further analysis
- Devroye, Morin
(Show Context)
Citation Context ...n preserve similarity information while load balanced. The basic idea comes from cuckoo hash which “kicks out” some existing items from their “own home” and has very interesting worst-case properties =-=[13, 24]-=-. Cuckoo ring still preserves locality by LSH functions. It moves the indexed items from hot key areas to cold key areas instead of guiding the newly joined peers to the hot areas, so that the short l... |

11 | Similarity searching in peer-to-peer databases
- Bhattacharya, SR, et al.
- 2005
(Show Context)
Citation Context |

9 |
WonGoo: A pure peer-to-peer full text information retrieval system based on semantic overlay networks
- Lv, Cheng
- 2004
(Show Context)
Citation Context |

3 | C.T.: Similarity discovery in structured P2P overlays
- Hsiao, King
(Show Context)
Citation Context |

1 |
Lexical analysis and stop lists, chapter Information Retrieval
- Fox
- 1992
(Show Context)
Citation Context ...tely 555K documents from the Congressional Record of the 103rd Congress, Federal Register, Financial Times, Foreign Broadcast Information Service and Los Angeles Times. We have filtered the stopwords =-=[14]-=- in the texts and have adopted Porter’s stemming rule [25] to transform the documents from text to term frequency (tf) vectors [1]. Some terms only appears in one document and some of the documents co... |