Results 1  10
of
115
A Brief History of Generative Models for Power Law and Lognormal Distributions
 INTERNET MATHEMATICS
"... Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a a lognormal distribution. In trying ..."
Abstract

Cited by 252 (7 self)
 Add to MetaCart
Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a a lognormal distribution. In trying
I Tube, You Tube, Everybody Tubes: Analyzing the World’s Largest User Generated Content Video System
 In Proceedings of the 5th ACM/USENIX Internet Measurement Conference (IMC’07
, 2007
"... User Generated Content (UGC) is reshaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better ..."
Abstract

Cited by 198 (6 self)
 Add to MetaCart
User Generated Content (UGC) is reshaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better understand the impact of UGC systems, we have analyzed YouTube, the world’s largest UGC VoD system. Based on a large amount of data collected, we provide an indepth study of YouTube and other similar UGC systems. In particular, we study the popularity lifecycle of videos, the intrinsic statistical properties of requests and their relationship with video age, and the level of content aliasing or of illegal content in the system. We also provide insights on the potential for more efficient UGC VoD systems (e.g. utilizing P2P techniques or making better use of caching). Finally, we discuss the opportunities to leverage the latent demand for niche videos that are not reached today due to information filtering effects or other system scarcity distortions. Overall, we believe that the results presented in this paper are crucial in understanding UGC systems and can provide valuable information to ISPs, site administrators, and content owners with major commercial and technical implications. Categories and Subject Descriptors Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Power laws, Pareto distributions and Zipf’s law
 Contemporary Physics
, 2005
"... When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf’s law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, econ ..."
Abstract

Cited by 170 (0 self)
 Add to MetaCart
When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf’s law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. For instance, the distributions of the sizes of cities, earthquakes, solar flares, moon craters, wars and people’s personal fortunes all appear to follow power laws. The origin of powerlaw behaviour has been a topic of debate in the scientific community for more than a century. Here we review some of the empirical evidence for the existence of powerlaw forms and the theories proposed to explain them. I.
A General Model of Web Graphs
, 2003
"... We describe a very general model of a random graph process whose proportional degree sequence obeys a power law. Such laws have recently been observed in graphs associated with the world wide web. ..."
Abstract

Cited by 82 (7 self)
 Add to MetaCart
We describe a very general model of a random graph process whose proportional degree sequence obeys a power law. Such laws have recently been observed in graphs associated with the world wide web.
Stochastic Models and Descriptive Statistics for Phylogenetic Trees, from Yule to Today
 STATIST. SCI
, 2001
"... Yule (1924) observed that distributions of number of species per genus were typically longtailed, and proposed a stochastic model to fit this data. Modern taxonomists often prefer to represent relationships between species via phylogenetic trees; the counterpart to Yule's observation is that ac ..."
Abstract

Cited by 62 (4 self)
 Add to MetaCart
Yule (1924) observed that distributions of number of species per genus were typically longtailed, and proposed a stochastic model to fit this data. Modern taxonomists often prefer to represent relationships between species via phylogenetic trees; the counterpart to Yule's observation is that actual reconstructed trees look surprisingly unbalanced. The imbalance can readily be seen via a scatter diagram of the sizes of clades involved in the splits of published large phylogenetic trees. Attempting stochastic modeling leads to two puzzles. First, two somewhat opposite possible biological descriptions of what dominates the macroevolutionary process (adaptive radiation; "neutral" evolution) lead to exactly the same mathematical model (Markov or Yule or coalescent). Second, neither this nor any other simple stochastic model predicts the observed pattern of imbalance. This essay represents a probabilist's musings on these puzzles, complementing the more detailed survey of biol...
Fast and Accurate Phylogeny Reconstruction Algorithms Based on the MinimumEvolution Principle
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 2002
"... The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary leastsquares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) to ..."
Abstract

Cited by 57 (5 self)
 Add to MetaCart
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary leastsquares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first stage requires O(n³) time, where n is the number of taxa, while the current implementations of the second are in O(p n³) or more, where p is the number of swaps performed by the program. In this paper, we examine a greedy approach to minimum evolution which produces a starting topology in O(n²) time. Moreover, we provide an algorithm that searches for the best topology using nearest neighbor interchanges (NNIs), where the cost of doing p NNIs is O(n² C p n), i.e., O(n²) in practice because p is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when used in combination with NNIs, produces trees which are fairly close to NJ trees in terms of topological accuracy. We also examine ME under a balanced weighting scheme, where sibling subtrees have equal weight, as opposed to the standard “unweighted ” OLS, where
The economics of social networks
 PROCEEDINGS OF THE 9 TH WORLD CONGRESS OF THE ECONOMETRIC SOCIETY
, 2005
"... The science of social networks is a central field of sociological study, a major application of random graph theory, and an emerging area of study by economists, statistical physicists and computer scientists. While these literatures are (slowly) becoming aware of each other, and on occasion drawing ..."
Abstract

Cited by 53 (2 self)
 Add to MetaCart
The science of social networks is a central field of sociological study, a major application of random graph theory, and an emerging area of study by economists, statistical physicists and computer scientists. While these literatures are (slowly) becoming aware of each other, and on occasion drawing from one another, they are still largely distinct in their methods, interests, and goals. Here, my aim is to provide some perspective on the research from these literatures, with a focus on the formal modeling of social networks and the two major types of models: those based on random graphs and those based on game theoretic reasoning. I highlight some of the strengths, weaknesses, and potential synergies between these two network modeling approaches.
Counting graph homomorphisms
 In:Topics in Discrete Math
, 2006
"... Counting homomorphisms between graphs (often with weights) comes up in a wide variety of areas, including extremal graph theory, properties of graph products, partition functions in statistical physics and property testing of large graphs. In this paper we survey recent developments in the study of ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
Counting homomorphisms between graphs (often with weights) comes up in a wide variety of areas, including extremal graph theory, properties of graph products, partition functions in statistical physics and property testing of large graphs. In this paper we survey recent developments in the study of homomorphism numbers, including the characterization of the homomorphism numbers in terms of the semidefiniteness of “connection matrices”, and some applications of this fact in extremal graph theory. We define a distance of two graphs in terms of similarity of their global structure, which also reflects the closeness of (appropriately scaled) homomorphism numbers into the two graphs. We use homomorphism numbers to define convergence of a sequence of graphs, and show that a graph sequence is convergent if and only if it is Cauchy in this distance. Every convergent graph sequence has a limit in the form of a symmetric measurable function in two variables. We use these notions of distance and graph limits to give a general theory for parameter testing. The convergence can also be characterized in terms of mappings of the graphs into fixed small graphs, which is strongly connected to important parameters like ground state energy in statistical physics, and to weighted maximum cut problems in computer science. 1
A Geometric Preferential Attachment Model of Networks
 In Algorithms and Models for the WebGraph: Third International Workshop, WAW 2004
, 2004
"... We study a random graph Gn that combines certain aspects of geometric random graphs and preferential attachment graphs. This model yields a graph with powerlaw degree distribution where the expansion property depends on a tunable parameter of the model. The vertices of Gn are n sequentially generat ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
We study a random graph Gn that combines certain aspects of geometric random graphs and preferential attachment graphs. This model yields a graph with powerlaw degree distribution where the expansion property depends on a tunable parameter of the model. The vertices of Gn are n sequentially generated points x1, x2,..., xn chosen uniformly at random from the unit sphere in R 3. After generating xt, we randomly connect it to m points from those points in x1, x2,..., xt−1. 1
Nonequilibrium phase transition in the coevolution of networks and opinions
 Growth dynamics of the WorldWide Web,” Nature: VOL 401: 9 SEPTEMBER
, 2006
"... Models of the convergence of opinion in social systems have been the subject of a considerable amount of recent attention in the physics literature. These models divide into two classes, those in which individuals form their beliefs based on the opinions of their neighbors in a social network of per ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
Models of the convergence of opinion in social systems have been the subject of a considerable amount of recent attention in the physics literature. These models divide into two classes, those in which individuals form their beliefs based on the opinions of their neighbors in a social network of personal acquaintances, and those in which, conversely, network connections form between individuals of similar beliefs. While both of these processes can give rise to realistic levels of agreement between acquaintances, practical experience suggests that opinion formation in the real world is not a result of one process or the other, but a combination of the two. Here we present a simple model of this combination, with a single parameter controlling the balance of the two processes. We find that the model undergoes a continuous phase transition as this parameter is varied, from a regime in which opinions are arbitrarily diverse to one in which most individuals hold the same opinion. We characterize the static and dynamical properties of this transition.