Results 1 - 10
of
17
ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval
- In WebDB
, 2003
"... this paper appears in [15], and updated information is available at http://cis.poly.edu/westlab/odissea/ ..."
Abstract
-
Cited by 100 (3 self)
- Add to MetaCart
(Show Context)
this paper appears in [15], and updated information is available at http://cis.poly.edu/westlab/odissea/
NeuroGrid: Semantically Routing Queries in Peer-to-Peer Networks
- In Proc. Intl. Workshop on Peer-to-Peer Computing
, 2002
"... NeuroGrid is an adaptive decentralized search system. NeuroGrid nodes support distributed search through semantic routing forwarding of queries based on content), and a learning mechanism that dynamically adjusts metadata describing the contents of nodes and the files that make up those contents. Ne ..."
Abstract
-
Cited by 58 (1 self)
- Add to MetaCart
(Show Context)
NeuroGrid is an adaptive decentralized search system. NeuroGrid nodes support distributed search through semantic routing forwarding of queries based on content), and a learning mechanism that dynamically adjusts metadata describing the contents of nodes and the files that make up those contents. NeuroGrid is an open-source project, and prototype software has been made available at http://www.neurogrid.net/ NeuroGrid presents users with an alternative to hierarchical, folder-based file organization, and in the process offers an alternative approach to distributed search.
Distributed Pagerank for P2P Systems
, 2003
"... This paper defines and describes a fully distributed implementation of Google's highly effective Pagerank algorithm, for "peer to peer"(P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables increme ..."
Abstract
-
Cited by 50 (7 self)
- Add to MetaCart
This paper defines and describes a fully distributed implementation of Google's highly effective Pagerank algorithm, for "peer to peer"(P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables incremental computation of pageranks as new documents are entered into or deleted from the network. Incremental update enables continuously accurate pageranks whereas the currently centralized web crawl and computation over Internet documents requires several days. This suggests possible applicability of the distributed algorithm to pagerank computations as a replacement for the centralized web crawler based implementation for Internet documents. A complete solution of the distributed pagerank computation for an inplace network converges rapidly (1% accuracy in 10 iterations) for large systems although the time for an iteration may be long. The incremental computation resulting from addition of a single document converges extremely rapidly, typically requiring update path lengths of under 15 nodes even for large networks and very accurate solutions.
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
, 2005
"... We study the problem of evaluating ranked (top-k) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of t ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
We study the problem of evaluating ranked (top-k) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of the Cosine and Okapi measures. The main bottleneck in such a scenario is the amount of communication required during query evaluation. We propose several efficient query evaluation schemes and evaluate their performance. Our results on real search engine query traces and over 120 million web pages show that after careful optimization such queries can be evaluated at a reasonable cost, while challenges remain for even larger collections and more general classes of ranking functions. 1.
G.: So-grid: A self-organizing grid featuring bio-inspired algorithms
- ACM Transactions on Autonomous and Adaptive Systems
"... This paper presents So-Grid, a set of bio-inspired algorithms tailored to the decentralized construction of a Grid information system which features adaptive and self-organization characteristics. Such algorithms exploit the properties of swarm systems, in which a number of entities/agents perform s ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper presents So-Grid, a set of bio-inspired algorithms tailored to the decentralized construction of a Grid information system which features adaptive and self-organization characteristics. Such algorithms exploit the properties of swarm systems, in which a number of entities/agents perform simple operations at the local level, but together engender an advanced form of “swarm intelligence ” at the global level. In particular, So-Grid provides two main functionalities: logical reorganization of resources, inspired by the behavior of some species of ants and termites which move and collect items within their environment, and resource discovery, inspired by the mechanisms through which ants searching for food sources are able to follow the pheromone traces left by other ants. These functionalities are correlated, since an intelligent dissemination can facilitate discovery. In the Grid environment, a number of ant-like agents autonomously travel the Grid through P2P interconnections and use biased probability functions to: (i) replicate resource descriptors in order to favor resource discovery; (ii) collect resource descriptors with similar characteristics in nearby Grid hosts; (iii) foster the dissemination of descriptors corresponding to fresh (recently updated) resources and to resources having high Quality of Service (QoS) characteristics. Simulation analysis shows that the So-Grid replication algorithm is capable of reducing the entropy of the system and efficiently disseminating content. Moreover, as descriptors are progressively reorganized and replicated, the So-Grid discovery algorithm allows users to reach Grid hosts that store information about a larger number of useful resources in a shorter amount of time. The proposed approach features interesting characteristics, i.e., self-organization, scalability and adaptivity, which make it useful for a dynamic and partially unreliable distributed system.
P2P MetaData Search Layers
"... Abstract. Distributed Hashtables (DHTs) provide a scalable method of associating file-hashes with a particular location in a distributed network environment. Modifying DHTs directly to support meta-data is difficult, and meta-data search systems such as flooding tend to scale poorly. However, a numb ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Distributed Hashtables (DHTs) provide a scalable method of associating file-hashes with a particular location in a distributed network environment. Modifying DHTs directly to support meta-data is difficult, and meta-data search systems such as flooding tend to scale poorly. However, a number of more scalable distributed meta-data search systems have recently been developed that could be deployed in tandem with DHTs, and several are discussed here along with some novel simulation results that concern the scalability and resource limitations of a meta-data search layer that employs semantic routing. Semantic routing is a method of pruning a flooding search such that queries are preferentially forwarded to nodes that can answer those queries. Previous simulations [9] showed that under certain circumstances semantic routing leads to a reduction in search path length. This paper presents further simulation results indicating that the scalability of this effect is a function of the query distribution of individual user search activity. 1.
Similarity Discovery in Structured P2P Overlays
- In Proceedings of the 32th International Conference on Parallel Processing (ICPP’03
, 2003
"... Peer-to-peer (P2P) overlays are appealing, since they can aggregate resources of end systems without relying on sophisticated infrastructures. Services can thus be rapidly deployed over such overlays. Primitive P2P overlays only support searches with single keywords. For queries with multiple keywor ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Peer-to-peer (P2P) overlays are appealing, since they can aggregate resources of end systems without relying on sophisticated infrastructures. Services can thus be rapidly deployed over such overlays. Primitive P2P overlays only support searches with single keywords. For queries with multiple keywords, presently only unstructured P2P systems can support by extensively employing message flooding. In this study, we propose a similarity information retrieval system called Meteorograph for structured P2P overlays without relying on message flooding. Meteorograph is faultresilient, scalable, responsive and self-administrative, which is particularly suitable for an environment with an explosion of information and a large number of dynamic entities. An information item stored in Meteorograph is represented as a vector. A small angle between two vectors means that the corresponding items are characterized by some identical keywords. Meteorograph further stores similar items at nearby locations in the P2P overlay. To retrieve similar items, only nodes in nearby locations are located and consulted. Meteorograph is evaluated with simulation. The results show that Meteorograph can effectively distribute loads to the nodes. Discovering a single O log N and item and a set (in size k) of similar items takes () k) O( log N) ( ⋅ messages and hops respectively, where N is the c number of nodes in the overlay and c is the storage capacity of a node. 1.
Towards Peer-to-Peer Content Indexing
, 2003
"... Distributed Hash Tables are the core technology on a significant share of system designs for Peer-to-Peer information sharing. Typically, a location mechanism is provided and object identifiers act as keys in the index of object locations. When introducing a search mechanism, where single words are ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Distributed Hash Tables are the core technology on a significant share of system designs for Peer-to-Peer information sharing. Typically, a location mechanism is provided and object identifiers act as keys in the index of object locations. When introducing a search mechanism, where single words are used as keys, the key image cardinality will be driven by the word popularity and most of the present designs will be unable to load balance the index among the nodes. We present two contributions: A design that allows participating nodes to load balance the indexing of popular keys and avoid content hot-spots on single nodes; A distributed mechanism for probabilistic filtering of popular keys (with low search relevance) that paves the way for scalable full content indexing.
Censorship-Resistant Communication over Public Networks
, 2006
"... The rapid growth of peer-to-peer networks and social networking websites has demonstrated the internet’s potential as a medium for grassroots collaboration. This report describes ongoing research into the use of friend-to-friend overlay networks for censorship-resistant communication. Decentralised ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The rapid growth of peer-to-peer networks and social networking websites has demonstrated the internet’s potential as a medium for grassroots collaboration. This report describes ongoing research into the use of friend-to-friend overlay networks for censorship-resistant communication. Decentralised mechanisms for
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
"... We study the problem of evaluating ranked (top-�) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of t ..."
Abstract
- Add to MetaCart
(Show Context)
We study the problem of evaluating ranked (top-�) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of the Cosine and Okapi measures. The main bottleneck in such a scenario is the amount of communication required during query evaluation. We propose several efficient query evaluation schemes and evaluate their performance. Our results on real search engine query traces and over million web pages show that after careful optimization such queries can be evaluated at a reasonable cost, while challenges remain for even larger collections and more general classes of ranking functions. 1.