Results 11 - 20
of
51
Distributed Cache Table: Efficient Query-Driven
- Processing of Multi-Term Queries in P2P Networks. In P2PIR
, 2006
"... The state-of-the-art techniques for processing multi-term queries in P2P environments are query flooding and inverted list intersection. However, it has been shown that due to scalability reasons both methods fail to support fulltext search in large scale document collections distributed among the n ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
The state-of-the-art techniques for processing multi-term queries in P2P environments are query flooding and inverted list intersection. However, it has been shown that due to scalability reasons both methods fail to support fulltext search in large scale document collections distributed among the nodes in a P2P network. Although a number of optimizations have been suggested recently based on the aforementioned techniques, little evidence is given on their scalability. In this paper we suggest a novel query-driven indexing strategy which generates and maintains only those index entries that are actually used for query processing. In our approach called Distributed Cache Table 1 (DCT) we suggest to abandon the difference between data indexing and query caching, and to store result sets (caches) for the most profitable queries. DCT employs a distributed index to efficiently locate caches that can answer a given multiterm query and broadcasts the query to all the peers only if no such caches were found. Evaluations on real data and query loads show that DCT converges to a high cache-hit ratio and indeed offers a large-scale distributed solution for storing and efficient querying of vast amounts of documents in the P2P setting. DCT achieves two orders of magnitude improvement in traffic consumption compared to a standard distributed single-term indexing approach.
Efficient and decentralized pagerank approximation in a peer-to-peer web search network
- In VLDB,2006
, 2006
"... PageRank-style (PR) link analyses are a cornerstone of Web search engines and Web mining, but they are computationally expensive. Recently, various techniques have been proposed for speeding up these analyses by distributing the link graph among multiple sites. However, none of these advanced method ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
PageRank-style (PR) link analyses are a cornerstone of Web search engines and Web mining, but they are computationally expensive. Recently, various techniques have been proposed for speeding up these analyses by distributing the link graph among multiple sites. However, none of these advanced methods is suitable for a fully decentralized PR computation in a peer-to-peer (P2P) network with autonomous peers, where each peer can independently crawl Web fragments according to the user’s thematic interests. In such a setting the graph fragments that different peers have locally available or know about may arbitrarily overlap among peers, creating additional complexity for the PR computation. This paper presents the JXP algorithm for dynamically and collaboratively computing PR scores of Web pages that are arbitrarily distributed in a P2P network. The algorithm runs at every peer, and it works by combining locally computed PR scores with random meetings among the peers in the network. It is scalable as the number of peers on the network grows, and experiments as well as theoretical arguments show that JXP scores converge to the true PR scores that one would obtain by a centralized computation. 1.
p2pDating: Real Life Inspired Semantic Overlay Networks for Web Search
- Inf. Process. Manage
, 2005
"... We consider a network of autonomous peers forming a logically global but physically distributed search engine, where every peer has its own local collection generated by independently crawling the web. A challenging task in such systems is to efficiently route user queries to peers that can deliver ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
We consider a network of autonomous peers forming a logically global but physically distributed search engine, where every peer has its own local collection generated by independently crawling the web. A challenging task in such systems is to efficiently route user queries to peers that can deliver high quality results and be able to rank these returned results, thus satisfying the users ’ information need. However, the problem inherent with this scenario is selecting a few promising peers out of an a priori unlimited number of peers. In recent research a rather strict notion of semantic overlay networks has been established. In most approaches, peers are squeezed into a semantic profile by clustering them based on their contents. In the spirit of the natural notion of autonomous peers participating in a P2P system, our strategy creates semantic overlay networks based on the notion of “peer-to-peer dating”: Peers are free to decide which connections they create and which they want to avoid based on various usefulness estimators. The proposed techniques can be easily integrated into existing systems as they require only small additional bandwidth consumption as most messages can be piggybacked onto established communication. We show how we can greatly benefit from these additional semantic relations during query routing in search engines, such as MINERVA, and in the JXP algorithm, which computes the PageRank authority measure in a completely decentralized manner.
The MINERVA project: Database selection in the context of P2P search
- In: BTW 2005
, 2005
"... Abstract: This paper presents the MINERVA project that protoypes a distributed search engine based on P2P techniques. MINERVA is layered on top of a Chord-style overlay network and uses a powerful crawling, indexing, and search engine on every autonomous peer. We formalize our system model and ident ..."
Abstract
-
Cited by 12 (9 self)
- Add to MetaCart
Abstract: This paper presents the MINERVA project that protoypes a distributed search engine based on P2P techniques. MINERVA is layered on top of a Chord-style overlay network and uses a powerful crawling, indexing, and search engine on every autonomous peer. We formalize our system model and identify the problem of efficiently selecting promising peers for a query as a pivotal issue. We revisit existing approaches to the database selection problem and adapt them to our system environment. Measurements are performed to compare different selection strategies using real-world data. The experiments show significant performance differences between the strategies and prove the importance of a judicious peer selection strategy. The experiments also present first evidence that a small number of carefully selected peers already provide the vast majority of all relevant results. 1
Emerging semantic communities in peer web search
- In P2PIR ’06: Proceedings of the international workshop on Information retrieval in peer-to-peer networks
, 2006
"... Peer network systems are becoming an increasingly important development in Web search technology. Many studies show that peer search systems perform better when a query is sent to a group of peers semantically similar to the query. This suggests that semantic communities should form so that a query ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Peer network systems are becoming an increasingly important development in Web search technology. Many studies show that peer search systems perform better when a query is sent to a group of peers semantically similar to the query. This suggests that semantic communities should form so that a query can quickly propagate to many appropriate peers. For the network to be functional, its dynamic communication topology must match the semantic clustering of peers. We introduce two criteria to evaluate a peer search network based on the concept of semantic locality: first, the “smallworld” topology of the network; second, we use topical semantic similarity to monitor the quality of a peer’s neighbors over time by looking at whether a peer chooses semantically appropriate neighbors to route its queries. We present several simulation experiments conducted with different peer search algorithms on our peer Web search system, 6S. The results suggest that 6S, despite its use of an unstructured overlay network; can effectively foster the spontaneous formation of semantic communities through local peer interactions alone.
Beyond term indexing: A P2P framework for web information retrieval
- Informatica
, 2006
"... Web search over peer-to-peer (P2P) networks shows promise to become an alternative to the state-of-the-art search engines since P2P overlays offer means for decentralized search across widely-distributed document collections. However, the design of effective techniques for P2P indexing and retrieval ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Web search over peer-to-peer (P2P) networks shows promise to become an alternative to the state-of-the-art search engines since P2P overlays offer means for decentralized search across widely-distributed document collections. However, the design of effective techniques for P2P indexing and retrieval raises a number of technical challenges due to potentially unscalable resource (e.g. bandwidth, storage) consumption. The paper presents a framework for full-text information retrieval in structured P2P networks and introduces a novel retrieval model based on highly discriminative keys—terms and term sets appearing in a restricted number of documents—that ensure efficient and scalable retrieval. Our goal is to design scalable techniques for building a global key index in structured P2P overlays for large document collections. We present experimental results that show acceptable indexing and retrieval costs while the retrieval quality is comparable to standard centralized solutions with BM25 relevance computation scheme. Povzetek: Razvito je P2P ogrodje za internetne iskalnike. 1
Global document frequency estimation in peer-to-peer web search
- In WebDB
, 2006
"... Information retrieval (IR) in peer-to-peer (P2P) networks, where the corpus is spread across many loosely coupled peers, has recently gained importance. In contrast to IR systems on a centralized server or server farm, P2P IR faces the additional challenge of either being oblivious to global corpus ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Information retrieval (IR) in peer-to-peer (P2P) networks, where the corpus is spread across many loosely coupled peers, has recently gained importance. In contrast to IR systems on a centralized server or server farm, P2P IR faces the additional challenge of either being oblivious to global corpus statistics or having to compute the global measures from local statistics at the individual peers in an efficient, distributed manner. One specific measure of interest is the global document frequency for different terms, which would be very beneficial as term-specific weights in the scoring and ranking of merged search results that have been obtained from different peers. This paper presents an efficient solution for the problem of estimating global document frequencies in a large-scale P2P network with very high dynamics where peers can join and leave the network on short notice. In particular, the developed method takes into account the fact that the local document collections of autonomous peers may arbitrarily overlap, so that global counting needs to be duplicateinsensitive. The method is based on hash sketches as a technique for compact data synopses. Experimental studies demonstrate the estimator’s accuracy, scalability, and ability to cope with high dynamics. Moreover, the benefit for ranking P2P search results is shown by experiments with real-world Web data and queries. 1.
Load reduction in the kad peer-to-peer system
- In Fifth International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P
, 2007
"... Abstract. Distributed hash tables (DHTs) have been actively studied in literature and many different proposals have been made on how to organize peers in a DHT. However, very few DHTs have been implemented in real systems and deployed on a large scale. One exception is kad, a DHT based on Kademlia, ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Abstract. Distributed hash tables (DHTs) have been actively studied in literature and many different proposals have been made on how to organize peers in a DHT. However, very few DHTs have been implemented in real systems and deployed on a large scale. One exception is kad, a DHT based on Kademlia, which is part of eDonkey, a peer-topeer file sharing system with several million simultaneous users. In this paper, we investigate the publishing and searching mechanisms in kad. We designed and implemented Mistral, a content spy that can capture up to ten million references to published content in several hours. At first evaluation, we notice that publishing new content in a kad system is much more expensive than searching and retrieving existing content. Indeed, measurements show that of all the Internet traffic generated by kad-based peer-to-peer networks, 90 % is for publishing and 10 % for retrieving
P2P Content Search: Give the Web Back to the People
"... The proliferation of peer-to-peer (P2P) systems has come with various compelling applications including file sharing based on distributed hash tables (DHTs) or other kinds of overlay networks. Searching the content of files (especially Web Search) requires multi-keyword querying with scoring and ran ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
The proliferation of peer-to-peer (P2P) systems has come with various compelling applications including file sharing based on distributed hash tables (DHTs) or other kinds of overlay networks. Searching the content of files (especially Web Search) requires multi-keyword querying with scoring and ranking. Existing approaches have no way of taking into account the correlation between the keywords in the query. This paper presents our solution that incorporates the queries and behavior of the users in the P2P network such that interesting correlations can be inferred.

