• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Text-Based Content Search and Retrieval in ad hoc P2P Communities (2002)

Cached

  • Download as a PDF

Download Links

  • [www.cs.rutgers.edu]
  • [www.cs.rutgers.edu]
  • [www.elet.polimi.it]
  • [www.cs.rutgers.edu]
  • [www.research.rutgers.edu]
  • [www.cs.rutgers.edu]
  • [www.matiascuenca.com.ar]
  • [www.panic-lab.rutgers.edu]
  • [www.research.rutgers.edu]
  • [www.panic-lab.rutgers.edu]
  • [www.comp.nus.edu.sg]
  • [www.cs.rutgers.edu]
  • [www.research.rutgers.edu]
  • [www.comp.nus.edu.sg]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Francisco Matias Cuenca-acuna , Thu D. Nguyen
Venue:In Proceedings of the International Workshop on Peer-to-Peer Computing (co-located with Networking
Citations:48 - 10 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Cuenca-acuna02text-basedcontent,
    author = {Francisco Matias Cuenca-acuna and Thu D. Nguyen},
    title = {Text-Based Content Search and Retrieval in ad hoc P2P Communities},
    booktitle = {In Proceedings of the International Workshop on Peer-to-Peer Computing (co-located with Networking},
    year = {2002}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

We consider the problem of content search and retrieval in peer-to-peer (P2P) communities. P2P computing is a potentially powerful model for information sharing between ad hoc groups of users because of its low cost of entry and natural model for resource scaling with community size. As P2P communities grow in size, however, locating information distributed across the large number of peers becomes problematic. We present a distributed text-based content search and retrieval algorithm to address this problem. Our algorithm is based on a state-of-the-art text-based document ranking algorithm: the vector-space model, instantiated with the TFxIDF ranking rule. A naive application of TFxIDF would require each peer in a community to collect an inverted index of the entire community. This is costly both in terms of bandwidth and storage. Instead, we show how TFxIDF can be approximated given compact summaries of peers ’ local inverted indexes. We make three contributions: (a) we show how the TFxIDF rule can be adapted to use the index summaries, (b) we provide a heuristic for adaptively determining the set of peers that should be contacted for a query, and (c) we show that our algorithm tracks TFxIDF’s performance very closely, regardless of how documents are distributed throughout the community. Furthermore, our algorithm preserves the main flavor of TFxIDF by retrieving close to the same set of documents for any given query.

Citations

3028 H.: Chord: A scalable Peer-To-Peer lookup service for internet applications - Stoica, Morris, et al.
2569 The Anatomy of a Large-Scale Hypertextual Web Search Engine - Brin, Page - 1998
2353 A scalable content-addressable network - Ratnasamy, Francis, et al. - 2001
1503 Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems - Rowstron, Druschel - 2001
1185 Space/time trade-offs in hash coding with allowable errors - Bloom - 1970
928 Tapestry: An infrastructure for fault-tolerant wide-area location and routing - Zhao, Kubiatowicz, et al. - 2001
914 A measurement study of peer-to-peer file sharing systems - Saroiu, Gummadi, et al. - 2002
847 Oceanstore: An architecture for global-scale persistent storage - Kubiatowicz, Bindel, et al. - 2000
773 Freenet: A distributed anonymous information storage and retrieval system - Clarke, Sandberg, et al. - 2000
717 TC: Managing Gigabytes: compressing and indexing documents and images - IA, Moffat, et al. - 1999
593 Epidemic algorithms for replicated database maintenance - Demers, Greene, et al. - 1988
510 Relevance weighting of search terms - Robertson, Jones - 1976
438 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web - Karger, Lehman, et al. - 1997
359 Searching Distributed Collections With Inference Networks - Callan, Zhihong, et al. - 1995
244 Harnessing the Power of Disruptive Technologies: O'Reilly & Associates - Oram, Peer-to-Peer - 2001
200 Semantic file systems - Gifford, Jouvelot, et al. - 1991
197 A Comparison of File System Workloads - Roselli, Lorch, et al. - 2000
176 Rate of change and other metrics : a live study of the world wide web - Douglis, Feldmann, et al. - 1997
158 Implementation of the Smart information retrieval system - Buckley - 1985
139 The effectiveness of GLOSS for the text database discovery problem - Gravano, García-Molina, et al. - 1994
135 Generalized vector spaces model in information retrieval - Wong, Ziarko, et al. - 1985
131 Efficient Search in Peer-toPeer Networks - Yang, Garcia-Molina - 2002
89 Comparing the performance of database selection algorithms - French, Powell, et al. - 1999
72 Resource discovery in distributed networks - Harchol-Balter, Leighton, et al. - 1999
58 Overview of the first TREC conference - Harman - 1993
44 JXTA Search: Distributed Search for Distributed Networks - Waterhouse
11 PlanetP: Infrastructure support for P2P information sharing - Cuenca-Acuna, Peery, et al. - 2001
9 P2p networking: An information sharing alternative - Parameswaran, Sursala, et al.
3 Semantic File Systems - Jr - 1991
2 P2P Networking: An Information Sharing Alternative - Parameswaran, Susarla, et al. - 2001
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University