Results 1 - 10
of
116
Making Gnutella-like P2P Systems Scalable
, 2003
"... Napster pioneered the idea of peer-to-peer file sharing, and supported it with a centralized file search facility. Subsequent P2P systems like Gnutella adopted decentralized search algorithms. However, Gnutella's notoriously poor scaling led some to propose distributed hash table solutions to the wi ..."
Abstract
-
Cited by 299 (1 self)
- Add to MetaCart
Napster pioneered the idea of peer-to-peer file sharing, and supported it with a centralized file search facility. Subsequent P2P systems like Gnutella adopted decentralized search algorithms. However, Gnutella's notoriously poor scaling led some to propose distributed hash table solutions to the wide-area file search problem. Contrary to that trend, we advocate retaining Gnutella's simplicity while proposing new mechanisms that greatly improve its scalability. Building upon prior research [1, 12, 22], we propose several modifications to Gnutella's design that dynamically adapt the overlay topology and the search algorithms in order to accommodate the natural heterogeneity present in most peer-to-peer systems. We test our design through simulations and the results show three to five orders of magnitude improvement in total system capacity. We also report on a prototype implementation and its deployment on a testbed. Categories and Subject Descriptors C.2 [Computer Communication Networks]: Distributed Systems General Terms Algorithms, Design, Performance, Experimentation Keywords Peer-to-peer, distributed hash tables, Gnutella 1.
Network Applications of Bloom Filters: A Survey
- Internet Mathematics
, 2002
"... Abstract. ABloomfilter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been u ..."
Abstract
-
Cited by 257 (12 self)
- Add to MetaCart
Abstract. ABloomfilter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been used in database applications since the 1970s, but only in recent years have they become popular in the networking literature. The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications. 1.
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
, 2003
"... Abstract. We present PlanetP, a peer-to-peer (P2P) content search and retrieval infrastructure targeting communities wishing to share large sets of text documents. P2P computing is an attractive model for information sharing between ad hoc groups of users because of its low cost of entry and explici ..."
Abstract
-
Cited by 139 (11 self)
- Add to MetaCart
Abstract. We present PlanetP, a peer-to-peer (P2P) content search and retrieval infrastructure targeting communities wishing to share large sets of text documents. P2P computing is an attractive model for information sharing between ad hoc groups of users because of its low cost of entry and explicit model for resource scaling. As communities grow, however, a key challenge becomes finding relevant information. To address this challenge, our design centers around indexing, content search, and retrieval rather than scalable name-based object location, which has been the focus of recent P2P systems. PlanetP takes the novel approach of replicating the global directory and a compact summary index at every peer using gossiping. PlanetP then leverages this information to approximate a state-of-the-art document ranking algorithm to help users locate relevant information within the large communal data set. Using a prototype implementation together with simulation, we show: (i) it is possible to design a gossiping algorithm that reliably maintains a copy of communal state at each peer yet requires only a modest amount of bandwidth, (ii) our content search and retrieval algorithm tracks the performance of the original ranking algorithm very closely, giving P2P communities a search and retrieval algorithm as good as that possible assuming a centralized server, and (iii) PlanetP’s gossiping and search and retrieval algorithms both scale well to communities of at least several thousand peers. 1
On the Feasibility of Peer-to-Peer Web Indexing and Search
- IN IPTPS’03
, 2003
"... This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We pr ..."
Abstract
-
Cited by 121 (11 self)
- Add to MetaCart
This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We present a simple feasibility analysis based on the resource constraints and search workload. Our study suggests that the peer-to-peer network does not have enough capacity to make naive use of either of search techniques attractive for Web search. The paper presents a number of existing and novel optimizations for P2P search based on distributed hash tables, estimates their effects on performance, and concludes that in combination these optimizations would bring the problem to within an order of magnitude of feasibility. The paper suggests a number of compromises that might achieve the last order of magnitude.
A keyword-set search system for peer-to-peer networks
, 2002
"... The Keyword-Set Search System (KSS) is a Peer-to-Peer (P2P) keyword search system that uses a distributed inverted index. The main challenge in a distributed index and search system is finding the right scheme to partition the index across the nodes in the network. The most obvious scheme would be t ..."
Abstract
-
Cited by 57 (0 self)
- Add to MetaCart
The Keyword-Set Search System (KSS) is a Peer-to-Peer (P2P) keyword search system that uses a distributed inverted index. The main challenge in a distributed index and search system is finding the right scheme to partition the index across the nodes in the network. The most obvious scheme would be to partition the index by keyword.
Hybrid Global-Local Indexing for Efficient Peer-To-Peer Information Retrieval
, 2004
"... Content-based full-text search still remains a particularly challenging problem in peer-to-peer (P2P) systems. Traditionally, there have been two index partitioning structures---partitioning based on the document space or partitioning based on keywords. The former requires search of every node in th ..."
Abstract
-
Cited by 52 (1 self)
- Add to MetaCart
Content-based full-text search still remains a particularly challenging problem in peer-to-peer (P2P) systems. Traditionally, there have been two index partitioning structures---partitioning based on the document space or partitioning based on keywords. The former requires search of every node in the system to answer a query whereas the latter transmits a large amount of data when processing multi-term queries. In this paper, we propose eSearch---a P2P keyword search system based on a novel hybrid indexing structure. In eSearch, each node is responsible for certain terms. Given a document, eSearch uses a modern information retrieval algorithm to select a small number of top (important) terms in the document and publishes the complete term list for the document to nodes responsible for those top terms. This selective replication of term lists allows a multi-term query to proceed local to the nodes responsible for query terms. We also propose automatic query expansion to alleviate the degradation of quality of search results due to the selective replication, overlay source multicast to reduce the cost of disseminating term lists, and techniques to balance term list distribution across nodes.
Design and Implementation Tradeoffs for Wide-Area Resource Discovery
- In Proceedings of 14th IEEE Symposium on High Performance, Research Triangle Park
, 2005
"... We describe the design and implementation of SWORD, a scalable resource discovery service for wide-area distributed systems. In contrast to previous systems, SWORD allows users to describe desired resources as a topology of interconnected groups with required intra-group, inter-group, and per-node c ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
We describe the design and implementation of SWORD, a scalable resource discovery service for wide-area distributed systems. In contrast to previous systems, SWORD allows users to describe desired resources as a topology of interconnected groups with required intra-group, inter-group, and per-node characteristics, along with the utility that the application derives from specified ranges of metric values. This design gives users the flexibility to find geographically distributed resources for applications that are sensitive to both node and network characteristics, and allows the system to rank acceptable configurations based on their quality for that application. Rather than evaluating a single implementation of SWORD, we explore a variety of architectural designs that deliver the required functionality in a scalable and highly-available manner. We discuss the tradeoffs of using a centralized architecture as compared to a fully decentralized design to perform wide-area resource discovery. To summarize our results, we found that a centralized architecture based on 4-node server cluster sites at network peering facilities outperforms a decentralized DHT-based resource discovery infrastructure with respect to query latency for all but the smallest number of sites. However, although a centralized architecture shows significant promise in stable environments, we find that our decentralized implementation has acceptable performance and also benefits from the DHT’s self-healing properties in more volatile environments. We evaluate the advantages and disadvantages of centralized and distributed resource discovery architectures on 1000 hosts in emulation and on approximately 200 PlanetLab nodes spread across the Internet.
Distributed Pagerank for P2P Systems
, 2003
"... This paper defines and describes a fully distributed implementation of Google's highly effective Pagerank algorithm, for "peer to peer"(P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables incremental computatio ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
This paper defines and describes a fully distributed implementation of Google's highly effective Pagerank algorithm, for "peer to peer"(P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables incremental computation of pageranks as new documents are entered into or deleted from the network. Incremental update enables continuously accurate pageranks whereas the currently centralized web crawl and computation over Internet documents requires several days. This suggests possible applicability of the distributed algorithm to pagerank computations as a replacement for the centralized web crawler based implementation for Internet documents. A complete solution of the distributed pagerank computation for an inplace network converges rapidly (1% accuracy in 10 iterations) for large systems although the time for an iteration may be long. The incremental computation resulting from addition of a single document converges extremely rapidly, typically requiring update path lengths of under 15 nodes even for large networks and very accurate solutions.
Minerva: Collaborative p2p search
- In VLDB
, 2005
"... This paper proposes the live demonstration of a prototype of MINERVA 1, a novel P2P Web search engine. The search engine is layered on top of a DHT-based overlay network that connects an a-priori unlimited number of peers, each of which maintains a personal local database and a local search facility ..."
Abstract
-
Cited by 30 (16 self)
- Add to MetaCart
This paper proposes the live demonstration of a prototype of MINERVA 1, a novel P2P Web search engine. The search engine is layered on top of a DHT-based overlay network that connects an a-priori unlimited number of peers, each of which maintains a personal local database and a local search facility. Each peer posts a small amount of metadata to a physically distributed directory that is used to efficiently select promising peers from across the peer population that can best locally execute a query. The proposed demonstration serves as a proof of concept for P2P Web search by deploying the project on standard notebook PCs and also invites everybody to join the network by instantly installing a small piece of software from a USB memory stick. 1
An Adaptive Protocol for Efficient Support of Range Queries in DHT-based Systems
- In ICNP ’04: Proceedings of the Network Protocols, 12th IEEE International Conference on (ICNP’04
, 2004
"... In recent years, Distributed Hash Tables (DHTs) have been proposed as a fundamental building block for large scale distributed applications. Important functionalities such as searching have been added to the DHT’s basic lookup capability. However, supporting range queries efficiently remains a diffi ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
In recent years, Distributed Hash Tables (DHTs) have been proposed as a fundamental building block for large scale distributed applications. Important functionalities such as searching have been added to the DHT’s basic lookup capability. However, supporting range queries efficiently remains a difficult problem. In this paper, we describe an adaptive mechanism that relies on a logical tree data structure, the Range Search Tree (RST), to support range queries efficiently. Nodes in the RST automatically group registrations based on their values. Queries are decomposed into a small number of sub-queries for efficient resolution. The system dynamically optimizes itself to minimize the registration and query cost based on observed load. The system is fully distributed and avoids bottleneck problems encountered in traditional tree-based systems. Extensive simulation results validate the effectiveness of the system. 1.

