Results 11 - 20
of
40
Distributed Evaluation of Continuous Equi-join Queries over Large Structured Overlay Networks
- In ICDE 2006
, 2005
"... ..."
Understanding the practical limits of the gnutella p2p system: An analysis of query terms and object name distributions
- in Proceedings of the ACM/SPIE Multimedia Computing and Networking (MMCN ’08
, 2008
"... A number of prior efforts analyzed the behavior of popular peer-to-peer (P2P) systems and proposed ways for maintaining the overlays as well as methods for searching for contents using these overlays. However, little was known about how successful users could be in locating the shared objects in the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
A number of prior efforts analyzed the behavior of popular peer-to-peer (P2P) systems and proposed ways for maintaining the overlays as well as methods for searching for contents using these overlays. However, little was known about how successful users could be in locating the shared objects in these system. There might be a mismatch between the way content creators named objects and the way such objects were queried by the consumers. Our aim was to examine the terms used in the queries and shared object names in the Gnutella file-sharing system. We analyzed the object names of over 20 million objects collected from 40,000 peers as well as terms from over 230,000 queries. We observed that almost half (44.4%) of the queries had no matching objects in the system regardless of the overlay or search mechanism used to locate the objects. We also evaluated the query success rates against random peer groups of various sizes (200, 1K, 2K, 3K, 4K, 5K, 10K and 20K peers sampled from the full 40,000 peers). We showed that the success rates increased rapidly from 200 to 5,000 peers, but only exhibited modest improvements when increasing the number of peers beyond 5,000. Finally, we observed Zipf-like distribution for query terms and the object names. However, the relative popularity of a term in the object names did not correlate with the terms popularity in the query workload. This observation affected the ability of hybrid P2P systems to guide searches by creating a synopsis of the peer object names. A synopsis created by using the distribution of terms in the object names need not represent relevant terms for the query. Our results can be used to guide the design of future P2P systems that are optimized for the observed object names and user query behavior.
Semantic Social Overlay Networks
"... Abstract — Peer selection for query routing is a core task in peer-to-peer networks. Unstructured peer-to-peer systems (like Gnutella) ignore this problem, leading to an abundance of network traffic. Structured peer-to-peer systems (like Chord) enforce a particular, global way of distributing data a ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract — Peer selection for query routing is a core task in peer-to-peer networks. Unstructured peer-to-peer systems (like Gnutella) ignore this problem, leading to an abundance of network traffic. Structured peer-to-peer systems (like Chord) enforce a particular, global way of distributing data among the peers in order to solve this problem, but then encounter problems of network volatility and conflicts with the autonomy of the peer data management. In this paper, we propose a new mechanism, INGA, which is based on the observation that query routing in social networks is made possible by locally available knowledge about the expertise of neighbors and a semantics-based peer selection function. We validate INGA by simulation experiments with different data sets. We compare INGA with competing peer selection mechanisms on resulting parameters like recall, message gain or number of messages produced.
Just-In-Time Query Retrieval Over Partially Indexed Data on Structured P2P Overlays
"... Structured peer-to-peer (P2P) overlays have been successfully employed in many applications to locate content. However, they have been less effective in handling massive amounts of data because of the high overhead of maintaining indexes. In this paper, we propose PISCES, a Peer-based system that In ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Structured peer-to-peer (P2P) overlays have been successfully employed in many applications to locate content. However, they have been less effective in handling massive amounts of data because of the high overhead of maintaining indexes. In this paper, we propose PISCES, a Peer-based system that Indexes Selected Content for Efficient Search. Unlike traditional approaches that index all data, PISCES identifies a subset of tuples to index based on some criteria (such as query frequency, update frequency, index cost, etc.). In addition, a coarse-grained range index is built to facilitate the processing of queries that cannot be fully answered by the tuple-level index. More importantly, PISCES can adaptively self-tune to optimize the subset of tuples to be indexed. That is, the (partial) index in PISCES is built in a Just-In-Time (JIT) manner. Beneficial tuples for current users are pulled for indexing while indexed tuples with infrequent access and high maintenance cost are discarded. We also introduce a light-weight monitoring scheme for structured networks to collect the necessary statistics. We have conducted an extensive experimental study on PlanetLab to illustrate the feasibility, practicality and efficiency of PISCES. The results show that PISCES incurs lower maintenance cost and offers better search and query efficiency compared to existing methods.
Difficulty-Aware Hybrid Search in Peer-to-Peer Networks
- IEEE Trans. Parallel and Distributed Systems
, 2009
"... Abstract—By combining an unstructured protocol with a DHT-based global index, hybrid peer-to-peer (P2P) improves search efficiency in terms of query recall and response time. The major challenge in hybrid search is how to estimate the number of peers that can answer a given query. Existing approache ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—By combining an unstructured protocol with a DHT-based global index, hybrid peer-to-peer (P2P) improves search efficiency in terms of query recall and response time. The major challenge in hybrid search is how to estimate the number of peers that can answer a given query. Existing approaches assume that such a number can be directly obtained by computing item popularity. In this work, we show that such an assumption is not always valid, and previous designs cannot distinguish whether items related to a query are distributed in many peers or are in a few peers. To address this issue, we propose QRank, a difficulty-aware hybrid search, which ranks queries by weighting keywords based on term frequency. Using rank values, QRank selects proper search strategies for queries. We conduct comprehensive trace-driven simulations to evaluate this design. Results show that QRank significantly improves the search quality as well as reducing system traffic cost compared with existing approaches. Index Terms—Peer-to-peer, hybrid search, flooding, DHT, difficulty awareness. Ç
A study of dynamic coordination mechanisms
, 2007
"... received the Schupf fellowship, which generously supports PhD students with leadership potential. In addition to the sponsors of these grants, I am grateful to all of the administrators who helped insure that my funding was transferred in a timely fashion. I would also like to acknowledge all of my ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
received the Schupf fellowship, which generously supports PhD students with leadership potential. In addition to the sponsors of these grants, I am grateful to all of the administrators who helped insure that my funding was transferred in a timely fashion. I would also like to acknowledge all of my collaborators. First, the Maverick and DAI labs at Bar-Ilan were always my academic home. All of the lab members are talented researchers who stimulated many fruitful discussions. I would especially like to mention Noa Segel-Argamon, Michal Chalamish, Ariel Felner, Meirav Hadad, Meir Kalech, Efrat Manister, Dudi Sarne, Osher Yadgar and Aner Yarden for their help throughout my time at Bar-Ilan. I would like to single out Noa who was extremely helpful in formulating several concepts in Chapters 2 and 3. The NSF and DARPA projects I worked on introduced me to many interesting people outside of Bar-Ilan. I am indebted to Barbara Grosz of Harvard who spent many hours giving encouragement, and stimulating many interesting conversations. Willem-Jan van Hoeve of Cornell developed the scheduler used in Chapter 4 of this
Optimal Search Performance in Unstructured Peer-to-Peer Networks with Clustered Demands
, 2005
"... This paper derives the optimal search time and the optimal search cost that can be achieved in unstructured peer-topeer networks when the demand pattern exhibits clustering (i.e. file popularities vary from region to region in the network). Previous work in this area had assumed a uniform distributi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper derives the optimal search time and the optimal search cost that can be achieved in unstructured peer-topeer networks when the demand pattern exhibits clustering (i.e. file popularities vary from region to region in the network). Previous work in this area had assumed a uniform distribution of file replicas throughout the network with an implicit or explicit assumption of uniform file popularity distribution whereas in reality, there is clear evidence of clustering in file popularity patterns. In this paper, we provide mechanisms for modeling clustering in file popularity distributions and the consequent nonuniform distribution of file replicas. We provide results for the search time in such networks for both random walk and flooding search mechanisms. The potential performance benefit that the clustering in demand patterns affords is captured by our results. Interestingly, the performance gains are shown to be independent of whether the search network topology reflects the clustering in file popularity. We also provide the relation between the queryprocessing load and the number of replicas of each file for the clustered demands case showing that flooding searches may have lower query-processing load than random walk searches in the clustered demands case.
The Declarative Imperative Experiences and Conjectures in Distributed Logic
"... The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solutions to the difficulty of parallel and distributed programming. At the same time, there is more—and more varied—interest in data-centric programming languages than at any time in ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solutions to the difficulty of parallel and distributed programming. At the same time, there is more—and more varied—interest in data-centric programming languages than at any time in computing history, in part because these languages parallelize naturally. This juxtaposition raises the possibility that the theory of declarative database query languages can provide a foundation for the next generation of parallel and distributed programming languages. In this paper I reflect on my group’s experience over seven years using Datalog extensions to build networking protocols and distributed systems. Based on that experience, I present a number of theoretical conjectures that may both interest the database community, and clarify important practical issues in distributed computing. Most importantly, I make a case for database researchers to take a leadership role in addressing the impending programming crisis. This is an extended version of an invited lecture at the ACM PODS 2010 conference [32]. 1.
An Architecture for Hybrid P2P Free-Text Search ⋆
"... Abstract. Recent advances in peer to peer (P2P) search algorithms have presented viable structured and unstructured approaches for full-text search. We posit that these existing approaches are each best suited for different types of queries. We present PHIRST, the first system to facilitate effectiv ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. Recent advances in peer to peer (P2P) search algorithms have presented viable structured and unstructured approaches for full-text search. We posit that these existing approaches are each best suited for different types of queries. We present PHIRST, the first system to facilitate effective full-text search within P2P networks. PHIRST works by effectively leveraging between the relative strengths of these approaches. Similar to structured approaches, agents first publish terms within their stored documents. However, frequent terms are quickly identified and not exhaustively stored, resulting in a significantly reduction in the system’s storage requirements. During query lookup, agents use unstructured searches to compensate for the lack of fully published terms. Additionally, they explicitly weigh between the costs involved with structured and unstructured approaches, allowing for a significant reduction in query costs. We evaluated the effectiveness of our approach using both real-world and artificial queries. We found that in most situations our approach yields near perfect recall. We discuss the limitations of our system, as well as possible compensatory strategies. 1
A Case for Unstructured Distributed Hash Tables
"... Structured peer-to-peer overlays support compelling applications such as large-scale file systems and distributed backup using the distributed hash table (DHT) interface. While unstructured file-sharing systems continue to flourish, wide adoption of structured applications has been elusive. We explo ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Structured peer-to-peer overlays support compelling applications such as large-scale file systems and distributed backup using the distributed hash table (DHT) interface. While unstructured file-sharing systems continue to flourish, wide adoption of structured applications has been elusive. We explore an alternative path to deployment of these applications by asking the question, can structured applications be run on top of unstructured overlays? We build an unstructured distributed hash table (UDHT) on top of state of the art search and topology management mechanisms, and evaluate whether it can sufficiently emulate properties of DHTs to support structured applications.

