Results 1 - 10
of
21
Sampling Bias in BitTorrent Measurements
"... Real-world measurements play an important role in understanding the characteristics and in improving the operation of BitTorrent, which is currently a popular Internet application. Much like measuring the Internet, the complexity and scale of the BitTorrent network make a single, complete measureme ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
(Show Context)
Real-world measurements play an important role in understanding the characteristics and in improving the operation of BitTorrent, which is currently a popular Internet application. Much like measuring the Internet, the complexity and scale of the BitTorrent network make a single, complete measurement impractical. While a large number of measurements have already employed diverse sampling techniques to study parts of BitTorrent network, until now there exists no investigation of their sampling bias, that is, of their ability to objectively represent the characteristics of BitTorrent. In this work we present the first study of the sampling bias in BitTorrent measurements. We first introduce a novel taxonomy of sources of sampling bias in BitTorrent measurements. We then investigate the sampling among fifteen long-term BitTorrent measurements completed between 2004 and 2009, and find that different data sources and measurement techniques can lead to significantly different measurement results. Last, we formulate three recommendations to improve the design of future BitTorrent measurements, and estimate the cost of using these recommendations in practice.
A Game Theoretic Approach to the Formation of Clustered Overlay Networks (Extended Version)
"... In many large-scale content sharing applications, participants or peers are grouped together forming clusters based on their content or interests. In this paper, we deal with the maintenance of such clusters in the presence of updates. We model the evolution of the system as a game, where peers dete ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
In many large-scale content sharing applications, participants or peers are grouped together forming clusters based on their content or interests. In this paper, we deal with the maintenance of such clusters in the presence of updates. We model the evolution of the system as a game, where peers determine their cluster membership based on a utility function of the query recall. Peers are guided either by selfish or altruistic motives: selfish peers aim at improving the recall of their own queries, whereas altruistic peers aim at improving the recall of the queries of other peers. We study the evolution of such clusters both theoretically and experimentally under a variety of conditions. We show that, in general, local decisions made independently by each peer enable the system to adapt to changes and maintain the overall recall of the query workload. 1
Peer-to-peer similarity search over widely distributed document collections
- in: Proceedings of the 2008 ACM Workshopon Large-ScaleDistributed Systems for InformationRetrieval, LSDSIR ’08, ACM
, 2008
"... ABSTRACT This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar documents in a widely distributed document collection, as in the case of digital libraries. Peer-to-peer (P2P) s ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
ABSTRACT This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar documents in a widely distributed document collection, as in the case of digital libraries. Peer-to-peer (P2P) systems emerge as a promising solution to delve with content management in cases of highly distributed data collections. We propose a self-organizing P2P approach in which an unstructured P2P network evolves into a super-peer architecture, with super-peers responsible for peers with similar content. Our approach is based on distributed clustering of peer contents, thus managing to create high quality clusters that span the entire network. More importantly, we show how to efficiently process similarity queries capitalizing on the newly constructed, clustered super-peer network. During query processing, the query is propagated only to few carefully selected super-peers that are able to return results of high quality. We evaluate the performance of our approach and demonstrate its advantages through simulation experiments on two document collections.
Ad-hoc limited scalefree models for unstructured peer-to-peer networks, Peer-to-Peer Networking and Applications
, 2011
"... Several protocol efficiency metrics (e.g., scalability, search success rate, routing reachability and stability) depend on the capability of preserving structure even over the churn caused by the ad-hoc nodes joining or leaving the network. Preserving the structure becomes more prohibitive due to th ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Several protocol efficiency metrics (e.g., scalability, search success rate, routing reachability and stability) depend on the capability of preserving structure even over the churn caused by the ad-hoc nodes joining or leaving the network. Preserving the structure becomes more prohibitive due to the distributed and potentially uncooperative nature of such networks, as in the peer-to-peer (P2P) networks. Thus, most practical solutions involve unstructured approaches while attempting to maintain the structure at various levels of protocol stack. The primary focus of this paper is to investigate construction and maintenance of scale-free topologies in a distributed manner without requiring global topology information at the time when nodes join or leave. We consider the uncooperative behavior of peers by limiting the number of neighbors to a predefined hard cutoff value (i.e., no peer is a major hub), and the ad-hoc behavior of peers by rewiring the neighbors of nodes leaving the network. We also investigate the effect of these hard cutoffs and rewiring of ad-hoc nodes on the P2P search efficiency. 1.
Efficient Search based on Content Similarity over Self-Organizing P2P Networks
"... The advent of the World Wide Web has made an enormous amount of information available to everyone and the widespread use of digital equipment enables end-users (peers) to produce their own digital content. This vast amount of information requires scalable data management systems. Peer-to-peer (P2P) ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The advent of the World Wide Web has made an enormous amount of information available to everyone and the widespread use of digital equipment enables end-users (peers) to produce their own digital content. This vast amount of information requires scalable data management systems. Peer-to-peer (P2P) systems have so far been well established in several application areas, with file-sharing being the most prominent. The next challenge that needs to be addressed is (more complex) data sharing, management and query processing, thus facilitating the delivery of a wide spectrum of novel data-centric applications to the end-user, while providing high Quality-of-Service. In this paper, we propose a self-organizing P2P system that is capable to identify peers with similar content and intentionally assign them to the same super-peer. During content retrieval, fewer super-peers need to be contacted and therefore efficient similarity search is supported, in terms of reduced network traffic and contacted peers. Our approach increases the responsiveness and reliability of a P2P system and we demonstrate the advantages of our approach using large-scale simulations.
Overnesia: a Resilient Overlay Network for Virtual Super-Peers
"... Abstract—Unstructured P2P networks have been widely used to implement resource location systems that support complex queries semantics. Unfortunately these systems usually rely on search algorithms based on some variant of flooding, which generate a significant amount of duplicate messages. An effec ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Unstructured P2P networks have been widely used to implement resource location systems that support complex queries semantics. Unfortunately these systems usually rely on search algorithms based on some variant of flooding, which generate a significant amount of duplicate messages. An effective way to minimize the cost of query flooding in unstructured P2P networks is the use of super-peers. On the other hand, super-peers may become overloaded or may fail, and have a negative impact on the performance and connectivity of the overlay. These risks can be circumvented by replicating super-peers. Replication serves the dual purpose of supporting load distribution and fault-tolerance purposes. This paper proposes a novel algorithm to construct an overlay network connecting replicated super-peers. We have called the resulting overlay, Overnesia. The paper also proposes techniques to perform query routing that leverage on the unique properties of Overnesia to effectively distribute the query processing load among replicas. I.
A Recall-Based Cluster Formation Game in Peer-to-Peer Systems
"... In many large-scale content sharing applications, participants or peers are grouped together forming clusters based on their content or interests. In this paper, we deal with the maintenance of such clusters in the presence of updates. We model the evolution of the system as a strategic game, where ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
In many large-scale content sharing applications, participants or peers are grouped together forming clusters based on their content or interests. In this paper, we deal with the maintenance of such clusters in the presence of updates. We model the evolution of the system as a strategic game, where peers determine their cluster membership based on a utility function of the query recall. Peers are guided either by selfish or altruistic motives: selfish peers aim at improving the recall of their own queries, whereas altruistic peers aim at improving the recall of the queries of other peers. We study the evolution of such clusters both theoretically and experimentally under a variety of conditions. We show that, in general, local decisions made independently by each peer enable the system to adapt to changes and maintain the overall recall of the query workload. 1.
This work was partially supported by LaSIGE through the FCT Pluriannual Funding Programme
, 2008
"... Lisboa, declara ceder os seus direitos de cópia sobre o seu Relatório de Projecto em ..."
Abstract
- Add to MetaCart
Lisboa, declara ceder os seus direitos de cópia sobre o seu Relatório de Projecto em
Eighth International Conference on Peer-to-Peer Computing (P2P'08) Ad-hoc Limited Scale-Free Models for Unstructured Peer-to-Peer Networks
"... Several protocol efficiency metrics (e.g., scalability, search success rate, routing reachability and stability) depend on the capability of preserving structure even over the churn caused by the ad-hoc nodes joining or leaving the network. Preserving the structure becomes more prohibitive due to th ..."
Abstract
- Add to MetaCart
(Show Context)
Several protocol efficiency metrics (e.g., scalability, search success rate, routing reachability and stability) depend on the capability of preserving structure even over the churn caused by the ad-hoc nodes joining or leaving the network. Preserving the structure becomes more prohibitive due to the distributed and potentially uncooperative nature of such networks, as in the peer-to-peer (P2P) networks. Thus, most practical solutions involve unstructured approaches while attempting to maintain the structure at various levels of protocol stack. The primary focus of this paper is to investigate construction and maintenance of scale-free topologies in a distributed manner without requiring global topology information at the time when nodes join or leave. We consider the uncooperative behavior of peers by limiting the number of neighbors to a predefined hard cutoff value (i.e., no peer is a major hub), and the ad-hoc behavior of peers by rewiring the neighbors of nodes leaving the network. We also investigate the effect of these hard cutoffs and rewiring of ad-hoc nodes on the P2P search efficiency. 1.