Results 1 - 10
of
24
A Few Chirps About Twitter
"... Web 2.0 has brought about several new applications that have enabled arbitrary subsets of users to communicate with each other on a social basis. Such communication increasingly happens not just on Facebook and MySpace but on several smaller network applications such as Twitter and Dodgeball. We pre ..."
Abstract
-
Cited by 63 (3 self)
- Add to MetaCart
Web 2.0 has brought about several new applications that have enabled arbitrary subsets of users to communicate with each other on a social basis. Such communication increasingly happens not just on Facebook and MySpace but on several smaller network applications such as Twitter and Dodgeball. We present a detailed characterization of Twitter, an application that allows users to send short messages. We gathered three datasets (covering nearly 100,000 users) including constrained crawls of the Twitter network using two different methodologies, and a sampled collection from the publicly available timeline. We identify distinct classes of Twitter users and their behaviors, geographic growth patterns and current size of the network, and compare crawl results obtained under rate limiting constraints. Categories and Subject Descriptors C.4 [Performance of Systems]: [Measurement techniques, Modeling techniques]
Characterizing files in the modern gnutella network: A measurement study
- In Proceedings of SPIE/ACM Multimedia Computing and Networking
, 2006
"... The Internet has witnessed an explosive increase in the popularity of Peer-to-Peer (P2P) file-sharing applications during the past few years. As these applications become more popular, it becomes increasingly important to characterize their behavior in order to improve their performance and quantify ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
The Internet has witnessed an explosive increase in the popularity of Peer-to-Peer (P2P) file-sharing applications during the past few years. As these applications become more popular, it becomes increasingly important to characterize their behavior in order to improve their performance and quantify their impact on the network. In this paper, we present a measurement study on characteristics of available files in the modern Gnutella system. We developed a new methodology to capture accurate “snapshots ” of available files in a large scale P2P system. This methodology was implemented in a parallel crawler that captures the entire overlay topology of the system where each peer in the overlay is annotated with its available files. We have captured tens of snapshots of the Gnutella system and conducted three types of analysis on available files: (i) Static analysis, (ii) Topological analysis and (iii) Dynamic analysis. Our results reveal several interesting properties of available files in Gnutella that can be leveraged to improve the design and evaluations of P2P file-sharing applications. 1.
The many facets of Internet topology and traffic
- Networks and Heterogeneous Media
"... ABSTRACT. The Internet’s layered architecture and organizational structure give rise to a number of different topologies, with the lower layers defining more physical and the higher layers more virtual/logical types of connectivity structures. These structures are very different, and successful Inte ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
ABSTRACT. The Internet’s layered architecture and organizational structure give rise to a number of different topologies, with the lower layers defining more physical and the higher layers more virtual/logical types of connectivity structures. These structures are very different, and successful Internet topology modeling requires annotating the nodes and edges of the corresponding graphs with information that reflects their network-intrinsic meaning. These structures also give rise to different representations of the traffic that traverses the heterogeneous Internet, and a traffic matrix is a compact and succinct description of the traffic exchanges between the nodes in a given connectivity structure. In this paper, we summarize recent advances in Internet research related to (i) inferring and modeling the router-level topologies of individual service providers (i.e., the physical connectivity structure of an ISP, where nodes are routers/switches and links represent physical connections), (ii) estimating the intra-AS traffic matrix when the AS’s router-level topology and routing configuration are known, (iii) inferring and modeling the Internet’s AS-level topology, and (iv) estimating the inter-AS traffic matrix. We will also discuss recent work on Internet connectivity structures that arise at the higher layers in the TCP/IP protocol stack and are more virtual and dynamic; e.g., overlay networks like the WWW graph, where nodes are web pages and edges represent existing hyperlinks, or P2P networks like Gnutella, where nodes represent peers and two peers are connected if they have an active network connection. 1. Introduction. The
A walk in facebook: Uniform sampling of users in online social networks. Arxiv preprint arXiv:0906.0060
, 2009
"... The popularity of online social networks (OSNs) has given rise to a number of measurements studies that provide a first step towards their understanding. So far, such studies have been based either on complete data sets provided directly by the OSN itself or on Breadth-First-Search (BFS) crawling of ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The popularity of online social networks (OSNs) has given rise to a number of measurements studies that provide a first step towards their understanding. So far, such studies have been based either on complete data sets provided directly by the OSN itself or on Breadth-First-Search (BFS) crawling of the social graph, which does not guarantee good statistical properties of the collected sample. In this paper, we crawl the publicly available social graph and present the first unbiased sampling of Facebook (FB) users using a Metropolis-Hastings random walk with multiple chains. We study the convergence properties of the walk and demonstrate the uniformity of the collected sample with respect to multiple metrics of interest. We provide a comparison of our crawling technique to baseline algorithms, namely BFS and simple random walk, as well as to the “ground truth ” obtained through truly uniform sampling of userIDs. Our contributions lie both in the measurement methodology and in the collected sample. With regards to the methodology, our measurement technique (i) applies and combines known results from random walk sampling specifically in the OSN context and (ii) addresses system implementation aspects that have made the measurement of Facebook challenging so far. With respect to the collected sample: (i) it is the first representative sample of FB users and we plan to make it publicly available; (ii) we perform a characterization of several key properties of the data set, and find that some of them are substantially different from what was previously believed based on non-representative OSN samples. 1.
Sampling Bias in BitTorrent Measurements
"... Abstract. Real-world measurements play an important role in understanding the characteristics and in improving the operation of BitTorrent, which is currently a popular Internet application. Much like measuring the Internet, the complexity and scale of the BitTorrent network make a single, complete ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Abstract. Real-world measurements play an important role in understanding the characteristics and in improving the operation of BitTorrent, which is currently a popular Internet application. Much like measuring the Internet, the complexity and scale of the BitTorrent network make a single, complete measurement impractical. While a large number of measurements have already employed diverse sampling techniques to study parts of BitTorrent network, until now there exists no investigation of their sampling bias, that is, of their ability to objectively represent the characteristics of BitTorrent. In this work we present the first study of the sampling bias in BitTorrent measurements. We first introduce a novel taxonomy of sources of sampling bias in BitTorrent measurements. We then investigate the sampling among fifteen long-term BitTorrent measurements completed between 2004 and 2009, and find that different data sources and measurement techniques can lead to significantly different measurement results. Last, we formulate three recommendations to improve the design of future BitTorrent measurements, and estimate the cost of using these recommendations in practice. 1
Just-In-Time Query Retrieval Over Partially Indexed Data on Structured P2P Overlays
"... Structured peer-to-peer (P2P) overlays have been successfully employed in many applications to locate content. However, they have been less effective in handling massive amounts of data because of the high overhead of maintaining indexes. In this paper, we propose PISCES, a Peer-based system that In ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Structured peer-to-peer (P2P) overlays have been successfully employed in many applications to locate content. However, they have been less effective in handling massive amounts of data because of the high overhead of maintaining indexes. In this paper, we propose PISCES, a Peer-based system that Indexes Selected Content for Efficient Search. Unlike traditional approaches that index all data, PISCES identifies a subset of tuples to index based on some criteria (such as query frequency, update frequency, index cost, etc.). In addition, a coarse-grained range index is built to facilitate the processing of queries that cannot be fully answered by the tuple-level index. More importantly, PISCES can adaptively self-tune to optimize the subset of tuples to be indexed. That is, the (partial) index in PISCES is built in a Just-In-Time (JIT) manner. Beneficial tuples for current users are pulled for indexing while indexed tuples with infrequent access and high maintenance cost are discarded. We also introduce a light-weight monitoring scheme for structured networks to collect the necessary statistics. We have conducted an extensive experimental study on PlanetLab to illustrate the feasibility, practicality and efficiency of PISCES. The results show that PISCES incurs lower maintenance cost and offers better search and query efficiency compared to existing methods.
Residual-Based Estimation of Peer and Link Lifetimes in P2P Networks
"... Abstract—Existing methods of measuring lifetimes in P2P systems usually rely on the so-called Create-Based Method (CBM), which divides a given observation window into two halves and samples users “created ” in the first half every 1 time units until they die or the observation period ends. Despite i ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract—Existing methods of measuring lifetimes in P2P systems usually rely on the so-called Create-Based Method (CBM), which divides a given observation window into two halves and samples users “created ” in the first half every 1 time units until they die or the observation period ends. Despite its frequent use, this approach has no rigorous accuracy or overhead analysis in the literature. To shed more light on its performance, we first derive a model for CBM and show that small window size or large 1 may lead to highly inaccurate lifetime distributions. We then show that createbased sampling exhibits an inherent tradeoff between overhead and accuracy, which does not allow any fundamental improvement to the method. Instead, we propose a completely different approach for sampling user dynamics that keeps track of only residual lifetimes of peers and uses a simple renewal-process model to recover the actual lifetimes from the observed residuals. Our analysis indicates that for reasonably large systems, the proposed method can reduce bandwidth consumption by several orders of magnitude compared to prior approaches while simultaneously achieving higher accuracy. We finish the paper by implementing a two-tier Gnutella network crawler equipped with the proposed sampling method and obtain the distribution of ultrapeer lifetimes in a network of 6.4 million users and 60 million links. Our experimental results show that ultrapeer lifetimes are Pareto with shape 1 1; however, link lifetimes exhibit much lighter tails with 1 8. Index Terms—Gnutella networks, lifetime estimation, peer-topeer, residual sampling. I.
Residual-based measurement of peer and link lifetimes in gnutella networks
- In IEEE InfoCom
, 2007
"... Abstract—Existing methods of measuring lifetimes in P2P systems usually rely on the so-called Create-Based Method (CBM) [16], which divides a given observation window into two halves and samples users “created ” in the first half every ∆ time units until they die or the observation period ends. Desp ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract—Existing methods of measuring lifetimes in P2P systems usually rely on the so-called Create-Based Method (CBM) [16], which divides a given observation window into two halves and samples users “created ” in the first half every ∆ time units until they die or the observation period ends. Despite its frequent use [2], [17], [19], this approach has no rigorous accuracy or overhead analysis in the literature. To shed more light on its performance, we first derive a model for CBM and show that small window size or large ∆ may lead to highly inaccurate lifetime distributions. We then show that create-based sampling exhibits an inherent tradeoff between overhead and accuracy, which does not allow any fundamental improvement to the method. Instead, we propose a completely different approach for sampling user dynamics that keeps track of only residual lifetimes of peers and uses a simple renewal-process model to recover the actual lifetimes from the observed residuals. Our analysis indicates that for reasonably large systems, the proposed method can reduce bandwidth consumption by several orders of magnitude compared to prior approaches while simultaneously achieving higher accuracy. We finish the paper by implementing a two-tier Gnutella network crawler equipped with the proposed sampling method and obtain the distribution of ultrapeer lifetimes in a network of 6.4 million users and 60 million links. Our experimental results show that ultrapeer lifetimes are Pareto with shape α ≈ 1.1; however, link lifetimes exhibit much lighter tails with α ≈ 1.9. I.
Research on Online Social Networks: Time to Face the Real Challenges
"... Online Social Networks (OSNs) provide a unique opportunity for researchers to study how a combination of technological, economical, and social forces have been conspiring to provide a service that has attracted the largest user population in the history of the Internet. With more than half a billion ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Online Social Networks (OSNs) provide a unique opportunity for researchers to study how a combination of technological, economical, and social forces have been conspiring to provide a service that has attracted the largest user population in the history of the Internet. With more than half a billion of users and counting, OSNs have the potential to impact almost every aspect of networking, including measurements and performance modeling/analysis, network architecture and system design, and privacy and user behavior, to name just a few. However, much of the existing OSN research literature seems to have lost sight of this unique opportunity and has avoided dealing with the new challenges posed by OSNs. We argue in this position paper that it is high time for OSN researcher to exploit and face these opportunities and challenges to provide a basic understanding of the OSN eco-system as a whole. Such an understanding has to reflect the key role users play in this system and must focus on the system’s dynamics, purpose and functionality when trying to illuminate the main technological, economic, and social forces at work in the current OSN revolution. 1.
Time-Based Sampling of Social Network Activity Graphs
"... While most research in online social networks (OSNs) in the past has focused on static friendship networks, social network activity graphs are quite important as well. However, characterizing social network activity graphs is computationally intensive; reducing the size of these graphs using samplin ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
While most research in online social networks (OSNs) in the past has focused on static friendship networks, social network activity graphs are quite important as well. However, characterizing social network activity graphs is computationally intensive; reducing the size of these graphs using sampling algorithms is critical. There are two important requirements—the sampling algorithm must be able to preserve core graph characteristics and be amenable to a streaming implementation since activity graphs are naturally evolving in a streaming fashion. Existing approaches satisfy either one or the other requirement, but not both. In this paper, we propose a novel sampling algorithm called Streaming Time Node Sampling (STNS) that exploits temporal clustering often found in real social networks. Using real communication data collected from Facebook and Twitter, we show that STNS significantly out-performs stateof-the-art sampling mechanisms such as node sampling and Forest Fire sampling, across both averages and distributions of several graph properties.

