• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

FASD: A fault-tolerant, adaptive, scalable, distributed search engine (0)

by A Z Kronfol
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 17
Next 10 →

ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval

by Torsten Suel, Chandan Mathur, Jo-wen Wu, Jiangong Zhang, Alex Delis, Mehdi Kharrazi, Xiaohui Long, Kulesh Shanmugasundaram - In WebDB , 2003
"... this paper appears in [15], and updated information is available at http://cis.poly.edu/westlab/odissea/ ..."
Abstract - Cited by 100 (3 self) - Add to MetaCart
this paper appears in [15], and updated information is available at http://cis.poly.edu/westlab/odissea/
(Show Context)

Citation Context

..., and thus they are unlikely to scale beyond a few hundred nodes. There have been attempts to overcome this issue by routing queries only to nodes likely to have good results or those in the vicinity =-=[4, 6, 13]-=-. However, we do not believe that this approach will work well if result quality is a major concern. To see this, consider the current web, where an approach based on local indexes at each site would ...

NeuroGrid: Semantically Routing Queries in Peer-to-Peer Networks

by Sam Joseph - In Proc. Intl. Workshop on Peer-to-Peer Computing , 2002
"... NeuroGrid is an adaptive decentralized search system. NeuroGrid nodes support distributed search through semantic routing forwarding of queries based on content), and a learning mechanism that dynamically adjusts metadata describing the contents of nodes and the files that make up those contents. Ne ..."
Abstract - Cited by 58 (1 self) - Add to MetaCart
NeuroGrid is an adaptive decentralized search system. NeuroGrid nodes support distributed search through semantic routing forwarding of queries based on content), and a learning mechanism that dynamically adjusts metadata describing the contents of nodes and the files that make up those contents. NeuroGrid is an open-source project, and prototype software has been made available at http://www.neurogrid.net/ NeuroGrid presents users with an alternative to hierarchical, folder-based file organization, and in the process offers an alternative approach to distributed search.
(Show Context)

Citation Context

...g a desire to move beyond the search capabilities of the first generation of P2P software. One possible P2P meta-data approach is to try and use Chord to store keyworddocument relations [19]. Kronfol =-=[10]-=- suggests that under this scheme popular query terms would drive excessive traffic to certain nodes. As an alternative Kronfol describes and simulates FASD, which adds keyword searching to the Freenet...

Distributed Pagerank for P2P Systems

by Karthikeyan Sankaralingam, Simha Sethumadhavan, James C. Browne , 2003
"... This paper defines and describes a fully distributed implementation of Google's highly effective Pagerank algorithm, for "peer to peer"(P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables increme ..."
Abstract - Cited by 50 (7 self) - Add to MetaCart
This paper defines and describes a fully distributed implementation of Google's highly effective Pagerank algorithm, for "peer to peer"(P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables incremental computation of pageranks as new documents are entered into or deleted from the network. Incremental update enables continuously accurate pageranks whereas the currently centralized web crawl and computation over Internet documents requires several days. This suggests possible applicability of the distributed algorithm to pagerank computations as a replacement for the centralized web crawler based implementation for Internet documents. A complete solution of the distributed pagerank computation for an inplace network converges rapidly (1% accuracy in 10 iterations) for large systems although the time for an iteration may be long. The incremental computation resulting from addition of a single document converges extremely rapidly, typically requiring update path lengths of under 15 nodes even for large networks and very accurate solutions.

Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment

by Jiangong Zhang, Torsten Suel , 2005
"... We study the problem of evaluating ranked (top-k) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of t ..."
Abstract - Cited by 28 (1 self) - Add to MetaCart
We study the problem of evaluating ranked (top-k) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of the Cosine and Okapi measures. The main bottleneck in such a scenario is the amount of communication required during query evaluation. We propose several efficient query evaluation schemes and evaluate their performance. Our results on real search engine query traces and over 120 million web pages show that after careful optimization such queries can be evaluated at a reasonable cost, while challenges remain for even larger collections and more general classes of ranking functions. 1.

G.: So-grid: A self-organizing grid featuring bio-inspired algorithms

by Agostino Forestiero, Carlo Mastroianni, Omenico Spezzano - ACM Transactions on Autonomous and Adaptive Systems
"... This paper presents So-Grid, a set of bio-inspired algorithms tailored to the decentralized construction of a Grid information system which features adaptive and self-organization characteristics. Such algorithms exploit the properties of swarm systems, in which a number of entities/agents perform s ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
This paper presents So-Grid, a set of bio-inspired algorithms tailored to the decentralized construction of a Grid information system which features adaptive and self-organization characteristics. Such algorithms exploit the properties of swarm systems, in which a number of entities/agents perform simple operations at the local level, but together engender an advanced form of “swarm intelligence ” at the global level. In particular, So-Grid provides two main functionalities: logical reorganization of resources, inspired by the behavior of some species of ants and termites which move and collect items within their environment, and resource discovery, inspired by the mechanisms through which ants searching for food sources are able to follow the pheromone traces left by other ants. These functionalities are correlated, since an intelligent dissemination can facilitate discovery. In the Grid environment, a number of ant-like agents autonomously travel the Grid through P2P interconnections and use biased probability functions to: (i) replicate resource descriptors in order to favor resource discovery; (ii) collect resource descriptors with similar characteristics in nearby Grid hosts; (iii) foster the dissemination of descriptors corresponding to fresh (recently updated) resources and to resources having high Quality of Service (QoS) characteristics. Simulation analysis shows that the So-Grid replication algorithm is capable of reducing the entropy of the system and efficiently disseminating content. Moreover, as descriptors are progressively reorganized and replicated, the So-Grid discovery algorithm allows users to reach Grid hosts that store information about a larger number of useful resources in a shorter amount of time. The proposed approach features interesting characteristics, i.e., self-organization, scalability and adaptivity, which make it useful for a dynamic and partially unreliable distributed system.

P2P MetaData Search Layers

by Sam Joseph
"... Abstract. Distributed Hashtables (DHTs) provide a scalable method of associating file-hashes with a particular location in a distributed network environment. Modifying DHTs directly to support meta-data is difficult, and meta-data search systems such as flooding tend to scale poorly. However, a numb ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract. Distributed Hashtables (DHTs) provide a scalable method of associating file-hashes with a particular location in a distributed network environment. Modifying DHTs directly to support meta-data is difficult, and meta-data search systems such as flooding tend to scale poorly. However, a number of more scalable distributed meta-data search systems have recently been developed that could be deployed in tandem with DHTs, and several are discussed here along with some novel simulation results that concern the scalability and resource limitations of a meta-data search layer that employs semantic routing. Semantic routing is a method of pruning a flooding search such that queries are preferentially forwarded to nodes that can answer those queries. Previous simulations [9] showed that under certain circumstances semantic routing leads to a reduction in search path length. This paper presents further simulation results indicating that the scalability of this effect is a function of the query distribution of individual user search activity. 1.
(Show Context)

Citation Context

...ent the WHAT stage. Merging stages may be necessary in some cases, however to the extent that they are separable they can be implemented by entirely different systems. For example, one might use FASD =-=[12]-=- to identify a file from keyword meta-data, use Chord [20] to work out the location of the file itself, and then BitTorrent [5] to actually download it. Distributed Hashtables such as Chord and CAN [1...

Similarity Discovery in Structured P2P Overlays

by Hung-chang Hsiao, Chung-ta King - In Proceedings of the 32th International Conference on Parallel Processing (ICPP’03 , 2003
"... Peer-to-peer (P2P) overlays are appealing, since they can aggregate resources of end systems without relying on sophisticated infrastructures. Services can thus be rapidly deployed over such overlays. Primitive P2P overlays only support searches with single keywords. For queries with multiple keywor ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Peer-to-peer (P2P) overlays are appealing, since they can aggregate resources of end systems without relying on sophisticated infrastructures. Services can thus be rapidly deployed over such overlays. Primitive P2P overlays only support searches with single keywords. For queries with multiple keywords, presently only unstructured P2P systems can support by extensively employing message flooding. In this study, we propose a similarity information retrieval system called Meteorograph for structured P2P overlays without relying on message flooding. Meteorograph is faultresilient, scalable, responsive and self-administrative, which is particularly suitable for an environment with an explosion of information and a large number of dynamic entities. An information item stored in Meteorograph is represented as a vector. A small angle between two vectors means that the corresponding items are characterized by some identical keywords. Meteorograph further stores similar items at nearby locations in the P2P overlay. To retrieve similar items, only nodes in nearby locations are located and consulted. Meteorograph is evaluated with simulation. The results show that Meteorograph can effectively distribute loads to the nodes. Discovering a single O log N and item and a set (in size k) of similar items takes () k) O( log N) ( ⋅ messages and hops respectively, where N is the c number of nodes in the overlay and c is the storage capacity of a node. 1.
(Show Context)

Citation Context

...e and results. Based on the vector space model, [7] can provide similarity search. It, however, is based on an unstructured overlay and relies on a flooding mechanism. Another interesting study, FASD =-=[13]-=-, atop a unstructured P2P overlay�Freenet [3]�implements similarity discovery based on the concept of the vector space model. Basically, Freenet is a depth-first search network (in contrast to Gnutell...

Towards Peer-to-Peer Content Indexing

by Carlos Baquero, Nuno Lopes , 2003
"... Distributed Hash Tables are the core technology on a significant share of system designs for Peer-to-Peer information sharing. Typically, a location mechanism is provided and object identifiers act as keys in the index of object locations. When introducing a search mechanism, where single words are ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Distributed Hash Tables are the core technology on a significant share of system designs for Peer-to-Peer information sharing. Typically, a location mechanism is provided and object identifiers act as keys in the index of object locations. When introducing a search mechanism, where single words are used as keys, the key image cardinality will be driven by the word popularity and most of the present designs will be unable to load balance the index among the nodes. We present two contributions: A design that allows participating nodes to load balance the indexing of popular keys and avoid content hot-spots on single nodes; A distributed mechanism for probabilistic filtering of popular keys (with low search relevance) that paves the way for scalable full content indexing.

Censorship-Resistant Communication over Public Networks

by Michael Rogers , 2006
"... The rapid growth of peer-to-peer networks and social networking websites has demonstrated the internet’s potential as a medium for grassroots collaboration. This report describes ongoing research into the use of friend-to-friend overlay networks for censorship-resistant communication. Decentralised ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
The rapid growth of peer-to-peer networks and social networking websites has demonstrated the internet’s potential as a medium for grassroots collaboration. This report describes ongoing research into the use of friend-to-friend overlay networks for censorship-resistant communication. Decentralised mechanisms for

Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment

by unknown authors
"... We study the problem of evaluating ranked (top-�) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of t ..."
Abstract - Add to MetaCart
We study the problem of evaluating ranked (top-�) queries on textual collections ranging from multiple gigabytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking functions that includes common variants of the Cosine and Okapi measures. The main bottleneck in such a scenario is the amount of communication required during query evaluation. We propose several efficient query evaluation schemes and evaluate their performance. Our results on real search engine query traces and over million web pages show that after careful optimization such queries can be evaluated at a reasonable cost, while challenges remain for even larger collections and more general classes of ranking functions. 1.
(Show Context)

Citation Context

... scale beyond a few million documents. In the local index case, there are some optimizations that may allow us to answer many queries by only contacting a carefully chosen subset of “promising” nodes =-=[13, 25, 11, 20, 6]-=-. In this paper we assume a global index organization. As we describe later, there are several techniques that allow query execution in this case without transmitting complete inverted lists, and thes...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University