Results 1 -
5 of
5
PCIR: Combining DHTs and Peer Clusters for Efficient Full-text P2P Indexing
- COMPUTER NETWORKS
, 2010
"... Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing. 1.
Data Sharing in P2P Systems
, 2009
"... Abstract In this chapter, we survey P2P data sharing systems. All along, we focus on the evolution from simple file-sharing systems, with limited functionalities, to Peer Data Management Systems (PDMS) that support advanced applications with more sophisticated data management techniques. Advanced P2 ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract In this chapter, we survey P2P data sharing systems. All along, we focus on the evolution from simple file-sharing systems, with limited functionalities, to Peer Data Management Systems (PDMS) that support advanced applications with more sophisticated data management techniques. Advanced P2P applications are dealing with semantically rich data (e.g. XML documents, relational tables), using a high-level SQL-like query language. We start our survey with an overview over the existing P2P network architectures, and the associated routing protocols. Then, we discuss data indexing techniques based on their distribution degree and the semantics they can capture from the underlying data. We also discuss schema management techniques which allow integrating heterogeneous data. We conclude by discussing the techniques proposed for processing complex queries (e.g. range and join queries). Complex query facilities are necessary for advanced applications which require a high level of search expressiveness. This last part shows the lack of querying techniques that allow for an approximate query answering.
Semi-structured Semantic Overlay for Information Retrieval in Self-organizing Networks
"... As scalability and flexibility have become the critical concerns in information management systems, self-organizing networks attract attentions from both research and industrial communities. This work proposes a semi-structured semantic overlay for information retrieval in large-scale selforganizing ..."
Abstract
- Add to MetaCart
(Show Context)
As scalability and flexibility have become the critical concerns in information management systems, self-organizing networks attract attentions from both research and industrial communities. This work proposes a semi-structured semantic overlay for information retrieval in large-scale selforganizing networks. With the autonomy to their own resources, the nodes are organized into a semantic overlay hosting topically discriminative communities. For information retrieval within a community, unstructured routing approach is employed for the sake of flexibility; While for joining new nodes and routing queries to a distant community, a structured mechanism is designed to save the traffic and time cost. Different from the semantic overlay in the literature, our proposal has three contributions: 1. we design topicbased indexing to form and maintain the semantic overlay, to guarantee both scalability and efficiency; 2. We introduce unstructured routing approach within the community, to allow flexible node joining and leaving; 3. We take advantage of the interaction among nodes to capture the overlay changes and make corresponding adaption in topic-based indexing. Categories and Subject Descriptors H.3.3 [Information storage and retrieval]: Information search and retrieval—Selection process, Clustering
Dynamic P2P Indexing and Search based on Compact Clustering
"... Abstract—We propose a strategy to perform query processing on P2P similarity search systems based on peers and superpeers. We show that by approximating global but resumed information about the indexed data in each peer, the average amount of computation and communication performed to solve range qu ..."
Abstract
- Add to MetaCart
Abstract—We propose a strategy to perform query processing on P2P similarity search systems based on peers and superpeers. We show that by approximating global but resumed information about the indexed data in each peer, the average amount of computation and communication performed to solve range queries can be significantly reduced as compared to alternative state of the art strategies based on local indexing at peer level. We illustrate our technique by using an indexing method based on compact clustering. Keywords- Peer to peer networks; Metric-Space Databases. I.