Results 1 - 10
of
42
A Brief History of Generative Models for Power Law and Lognormal Distributions
- INTERNET MATHEMATICS
"... Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a a lognormal distribution. In trying ..."
Abstract
-
Cited by 192 (7 self)
- Add to MetaCart
Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a a lognormal distribution. In trying
I Tube, You Tube, Everybody Tubes: Analyzing the World’s Largest User Generated Content Video System
- In Proceedings of the 5th ACM/USENIX Internet Measurement Conference (IMC’07
, 2007
"... User Generated Content (UGC) is re-shaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better ..."
Abstract
-
Cited by 109 (5 self)
- Add to MetaCart
User Generated Content (UGC) is re-shaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better understand the impact of UGC systems, we have analyzed YouTube, the world’s largest UGC VoD system. Based on a large amount of data collected, we provide an in-depth study of YouTube and other similar UGC systems. In particular, we study the popularity life-cycle of videos, the intrinsic statistical properties of requests and their relationship with video age, and the level of content aliasing or of illegal content in the system. We also provide insights on the potential for more efficient UGC VoD systems (e.g. utilizing P2P techniques or making better use of caching). Finally, we discuss the opportunities to leverage the latent demand for niche videos that are not reached today due to information filtering effects or other system scarcity distortions. Overall, we believe that the results presented in this paper are crucial in understanding UGC systems and can provide valuable information to ISPs, site administrators, and content owners with major commercial and technical implications. Categories and Subject Descriptors Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Internet topology: connectivity of IP graphs
, 2001
"... In this paper we introduce a framework for analyzing local properties of Internet connectivity. We compare BGP and probed topology data, finding that currently probed topology data yields much denser coverage of AS-level connectivity. We describe data acquisition and construction of several IP-level ..."
Abstract
-
Cited by 69 (6 self)
- Add to MetaCart
In this paper we introduce a framework for analyzing local properties of Internet connectivity. We compare BGP and probed topology data, finding that currently probed topology data yields much denser coverage of AS-level connectivity. We describe data acquisition and construction of several IP-level graphs derived from a collection of 220M skitter traceroutes. We find that a graph consisting of IP nodes and links contains 90.5% of its 629K nodes in the acyclic subgraph. In particular, 55% of the IP nodes are in trees. Full bidirectional connectivity is observed for a giant component containing 8.3% of IP nodes.
A Hierarchical Characterization of a Live Streaming Media Workload
- IEEE/ACM Transactions on Networking
, 2002
"... We present what we believe to be the first thorough characterization of live streaming media content delivered over the Internet. Our characterization of over five million requests spanning a 28-day period is done at three increasingly granular levels, corresponding to clients, sessions, and transfe ..."
Abstract
-
Cited by 64 (8 self)
- Add to MetaCart
We present what we believe to be the first thorough characterization of live streaming media content delivered over the Internet. Our characterization of over five million requests spanning a 28-day period is done at three increasingly granular levels, corresponding to clients, sessions, and transfers. Our findings support two important conclusions. First, we show that the nature of interactions between users and objects is fundamentally different for live versus stored objects. Access to stored objects is user driven, whereas access to live objects is object driven. This reversal of active/passive roles of users and objects leads to interesting dualities. For instance, our analysis underscores a Zipf-like profile for user interest in a given object, which is to be contrasted to the classic Zipf-like popularity of objects for a given user. Also, our analysis reveals that transfer lengths are highly variable and that this variability is due to the stickiness of clients to a particular live object, as opposed to structural (size) properties of objects. Second, based on observations we make, we conjecture that the particular characteristics of live media access workloads are likely to be highly dependent on the nature of the live content being accessed. In our study, this dependence is clear from the strong temporal correlations we observed in the traces, which we attribute to the synchronizing impact of live content on access characteristics. Based on our analyses, we present a model for live media workload generation that incorporates many of our findings, and which we implement in GISMO [19].
Dynamic models for file sizes and double pareto distributions
- Internet Mathematics
, 2002
"... Abstract. In this paper, we introduce and analyze a new, dynamic generative user model to explain the behavior of file size distributions. Our Recursive Forest File model combines multiplicative models that generate lognormal distributions with recent work on random graph models for the web. Unlike ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
Abstract. In this paper, we introduce and analyze a new, dynamic generative user model to explain the behavior of file size distributions. Our Recursive Forest File model combines multiplicative models that generate lognormal distributions with recent work on random graph models for the web. Unlike similar previous work, our Recursive Forest File model allows new files to be created and old files to be deleted over time, and our analysis covers problematic issues such as correlation among file sizes. Moreover, our model allows natural variations where files that are copied or modified are more likely to be copied or modified subsequently. Previous empirical work suggests that file sizes tend to have a lognormal body but a Pareto tail. The Recursive Forest File model explains this behavior, yielding a double Pareto distribution, which has a Pareto tail but close to a lognormal body. We believe the Recursive Forest model may be useful for describing other power law phenomena in computer systems as well as other fields. 1.
A five-year study of file-system metadata
- In Proceedings of the 5th USENIX Conference on File and Storage Technologies. USENIX Association
, 2007
"... For five years, we collected annual snapshots of file-system metadata from over 60,000 Windows PC file systems in a large corporation. In this article, we use these snapshots to study temporal changes in file size, file age, file-type frequency, directory size, namespace structure, file-system popul ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
For five years, we collected annual snapshots of file-system metadata from over 60,000 Windows PC file systems in a large corporation. In this article, we use these snapshots to study temporal changes in file size, file age, file-type frequency, directory size, namespace structure, file-system population, storage capacity and consumption, and degree of file modification. We present a generative model that explains the namespace structure and the distribution of directory sizes. We find significant temporal trends relating to the popularity of certain file types, the origin of file content, the way the namespace is used, and the degree of variation among file systems, as well as more pedestrian changes in size and capacities. We give examples of consequent lessons for designers of file systems and related software.
Evidence for long-tailed distributions in the Internet
- In Proceedings of ACM SIGCOMM Internet Measurment Workshop
, 2001
"... We review evidence that Internet traffic is characterized by long-tailed distributions of interarrival times, transfer times, burst sizes and burst lengths. We propose a new statistical technique for identifying long-tailed distributions, and apply it to a variety of datasets collected on the Intern ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
We review evidence that Internet traffic is characterized by long-tailed distributions of interarrival times, transfer times, burst sizes and burst lengths. We propose a new statistical technique for identifying long-tailed distributions, and apply it to a variety of datasets collected on the Internet. We find that there is little evidence that interarrival times and transfer times are long-tailed, but that there is some evidence for long-tailed burst sizes. We speculate on the causes of long-tailed bursts. I.
Characterization of national Web domains
- ACM Transactions on Internet Technology
, 2005
"... During the last few years, several studies on the characterization of the public Web space of various national domains have been published. The pages of a country are an interesting set for studying the characteristics of the Web, because at the same time these are diverse (as they are written by se ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
During the last few years, several studies on the characterization of the public Web space of various national domains have been published. The pages of a country are an interesting set for studying the characteristics of the Web, because at the same time these are diverse (as they are written by several authors) and yet rather similar (as they share a common geographical, historical and cultural context). This paper discusses the methodologies used for presenting the results of Web characterization studies, including the granularity at which different aspects are presented, and a separation of concerns between contents, links, and technologies. Based on this, we present a side-by-side comparison of the results of 12 Web characterization studies comprising over 120 million pages from 24 countries. The comparison unveils similarities and differences between the collections, and sheds light on how certain results of a single Web characterization study on a sample may be valid in the context of the full Web.
Variable Heavy Tailed Durations in Internet Traffic
"... This paper studies tails of the duration distribution of internet data flows, and their "heaviness". Data analysis ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
This paper studies tails of the duration distribution of internet data flows, and their "heaviness". Data analysis
Generating Realistic Impressions for File-System Benchmarking
- In Proceedings of the 7th Conference on File and Storage Technologies (FAST ’09
, 2009
"... ..."

