## A Stochastic Model for the Evolution of the Web (2002)

### Cached

### Download Links

- [www.dcs.bbk.ac.uk]
- [www.dcs.bbk.ac.uk]
- [www.dcs.bbk.ac.uk]
- [arxiv.org]
- [arxiv.org]
- DBLP

### Other Repositories/Bibliography

Venue: | Computer Networks |

Citations: | 16 - 5 self |

### BibTeX

@ARTICLE{Levene02astochastic,

author = {Mark Levene and Trevor Fenner and George Loizou and Richard Wheeldon},

title = {A Stochastic Model for the Evolution of the Web},

journal = {Computer Networks},

year = {2002},

volume = {39},

pages = {2002}

}

### OpenURL

### Abstract

Recently several authors have proposed stochastic models of the growth of the Web graph that give rise to power-law distributions. These models are based on the notion of preferential attachment leading to the "rich get richer" phenomenon. However, these models fail to explain several distributions arising from empirical results, due to the fact that the predicted exponent is not consistent with the data. To address this problem, we extend the evolutionary model of the Web graph by including a non-preferential component, and we view the stochastic process in terms of an urn transfer model. By making this extension, we can now explain a wider variety of empirically discovered power-law distributions provided the exponent is greater than two. These include: the distribution of incoming links, the distribution of outgoing links, the distribution of pages in a Web site and the distribution of visitors to a Web site. A by-product of our results is a formal proof of the convergence of the standard stochastic model (first proposed by Simon).

### Citations

1258 | On power-law relationships of the Internet topology
- Faloutsos, Faloutsos, et al.
- 1999
(Show Context)
Citation Context ...ntributions is proportional to n −2 . (We refer the reader to [Sch91] for more examples of power-law distributions.) Recently several researchers have detected power-law distributions in the Interne=-=t [FFF99]-=- and World-Wide-Web [BKM + 00, DKM + 01] topologies. In order to understand how these power-law distributions emerge and how the Web has evolved and is evolving, several researchers have recently been... |

249 |
On a class of skew distribution functions
- SIMON
- 1955
(Show Context)
Citation Context ... with the empirical value of the exponent of the distribution of incoming links provided A/m is sufficiently small. Bornholdt and Ebel [BE00] pointed out that the stochastic process proposed by Simon =-=[Sim55]-=- in 1955 can also offer an explanation of the power-law distribution. (We note that during the period of 1959-1961 there was a fierce debate between Mandelbrot and Simon in Information and Control on ... |

224 | Scale-free characteristics of random networks: the topology of the world-wide web. Physica A: Statistical Mechanics and its Applications
- Barabási, Albert, et al.
- 2000
(Show Context)
Citation Context ...hus Web pages 1shaving more incoming links are more highly recommended and therefore potentially of higher quality. This observation is the basis of Google’s PageRank algorithm [Hen01]. Albert et al=-=. [ABJ00]-=- studied a stochastic model of growth and preferential attachment, where new links to existing Web pages are added in proportion to the number of incoming links these Web pages already have. Their the... |

176 | The web as a graph - Kumar, Raghavan, et al. - 2000 |

108 | A general theory of bibliometric and other cumulative advantage processes
- PRICE, J
- 1976
(Show Context)
Citation Context ...versely proportional to their rank, and Lotka’s law [Nic89], which is an inverse square law stating that the number of authors making n contributions is proportional to n −2 . (We refer the reader to =-=[Sch91]-=- for more examples of power-law distributions.) Recently several researchers have detected power-law distributions in the Internet [FFF99] and World-Wide-Web [BKM + 00, DKM + 01] topologies. In order ... |

104 | Extracting large-scale knowledge bases from the web
- KUMAR, RAGHAVAN, et al.
- 1999
(Show Context)
Citation Context ... number of incoming links (referred to as inlinks) to a node. This value was derived from a 203 million node crawl of the Web graph. The average number of inlinks per Web page was measured at about 8 =-=[KRRT99], which give-=-s us a value of 0.125 for p. We can compute α by α = ρ(1 − p) − 1 . p Thus a more accurate model of the stochastic process generating the distribution of incoming links would assume α ≈ −0... |

84 |
Structure of growing networks with preferential linking
- DOROGOVTSEV, MENDES, et al.
- 2000
(Show Context)
Citation Context ...ages already have. Their theoretical model predicts an exponent τ = 3, which is not in agreement with the value of approximately 2.1 obtained from the study reported in [BKM + 00]. Dorogovtsev et al.=-= [DMS00a] g-=-eneralise Albert et al.’s model and predict an exponent greater than two. More precisely, they obtain the value 2 + A/m for the exponent, where A is the initial attractiveness of a newly created Web... |

81 |
Urn Models and Their Application: An Approach to Modern Discrete Probability Theory
- Johnson, Kotz
- 1977
(Show Context)
Citation Context ...he number of incoming links. Our main contribution in this paper is to extend Simon’s model [Sim55] with a non-preferential component and view the stochastic process in terms of an urn transfer mode=-=l [JK77]-=-. (We note that, at the end of Section 3 of his seminal paper, Simon suggested adopting a mixture of preferential and non-preferential components but did not develop the idea.) By making this extensio... |

72 |
Minutes from an Infinite
- Schroeder, Fractals, et al.
- 1991
(Show Context)
Citation Context ...versely proportional to their rank, and Lotka’s law [Nic89], which is an inverse square law stating that the number of authors making n contributions is proportional to n −2 . (We refer the reader=-= to [Sch91]-=- for more examples of power-law distributions.) Recently several researchers have detected power-law distributions in the Internet [FFF99] and World-Wide-Web [BKM + 00, DKM + 01] topologies. In order ... |

71 | Selfsimilarity in the web - Dill, Kumar, et al. |

66 |
Hyperlink analysis for the Web
- Henzinger
- 2001
(Show Context)
Citation Context ...mmendation of page Q; thus Web pages 1shaving more incoming links are more highly recommended and therefore potentially of higher quality. This observation is the basis of Google’s PageRank algorith=-=m [Hen01]-=-. Albert et al. [ABJ00] studied a stochastic model of growth and preferential attachment, where new links to existing Web pages are added in proportion to the number of incoming links these Web pages ... |

42 | Evolutionary dynamics of the World Wide Web
- Huberman, Adamic
- 1999
(Show Context)
Citation Context ..., for example, in order to maintain the local structure of a Web site. Another interpretation of i is the number of pages within Web sites (referred to as webpages). In this case, Huberman and Adamic =-=[HA99]-=- reported a power-law distribution with exponent 1.85, derived from a 250,000 Web site crawl. Our model cannot explain this observation as the exponent is less than two. A more recent result from a pr... |

35 |
The nature of markets on the World Wide Web
- Adamic, Huberman
- 2000
(Show Context)
Citation Context ... by adding certain generated pages. As a final interpretation, let i be the number of users visiting a Web site during the course of a day (referred to as visitors). In this case, Adamic and Huberman =-=[AH00]-=- reported a power-law distribution with exponent 2.07, derived from access logs of 60,000 AOL users accessing 120,000 Web sites. Now, from www.netsizer.com we obtain the statistic that in the USA ther... |

18 |
Sizing the internet
- Murray, Moore
- 2000
(Show Context)
Citation Context ...exponent of 2.2, derived from a 1.6 million Web site crawl; the difference is possibly due to a different crawling strategy. To calculate p we can estimate the size of the Web to be 2.1 billion pages =-=[MM00]-=- distributed over approximately 113.5 million 8sWeb sites (this number, which was reported on www.netsizer.com during the first quarter of 2001, refers to the number of Internet hosts, so it is an ove... |

15 |
A note on a class of skew distribution functions: Analysis and critique of a paper by
- Mandelbrot
- 1959
(Show Context)
Citation Context ...lanation of the power-law distribution. (We note that during the period of 1959-1961 there was a fierce debate between Mandelbrot and Simon in Information and Control on the validity of Simon’s mode=-=l [Man59].)-=- In reply to Bornholdt and Ebel, Dorogovtsev et al. [DMS00b] note that the model they describe in [DMS00a] essentially coincides with Simon’s model. The models discussed above are based on the proce... |

12 |
2000), “The Nature of Markets in the World Wide Web,” Quarterly
- Huberman
(Show Context)
Citation Context ...ppendix. As far as we are aware, our convergence proof given in the Appendix is the first formal proof validating Simon’s model − it does not rely on the mean-field theory approach, as for example in =-=[BAJ99]-=-. 2 An Urn Transfer Model We now present an urn transfer model [JK77] for a stochastic process that we will use in Section 3 to analyse the evolution of the Web graph. Our model is an extension of Sim... |

9 |
WWW and internet models from 1955 till our day and the “popularity is attractive” principle
- Dorogovtsev, Mendes, et al.
(Show Context)
Citation Context ...the period of 1959-1961 there was a fierce debate between Mandelbrot and Simon in Information and Control on the validity of Simon’s model [Man59].) In reply to Bornholdt and Ebel, Dorogovtsev et al=-=. [DMS00b] n-=-ote that the model they describe in [DMS00a] essentially coincides with Simon’s model. The models discussed above are based on the process of preferential attachment and do not take into account the... |

8 |
World–wide web scaling exponent from Simon’s 1955 model
- Bornholdt, Ebel
(Show Context)
Citation Context ...step of the stochastic process. This exponent value is consistent with the empirical value of the exponent of the distribution of incoming links provided A/m is sufficiently small. Bornholdt and Ebel =-=[BE00]-=- pointed out that the stochastic process proposed by Simon [Sim55] in 1955 can also offer an explanation of the power-law distribution. (We note that during the period of 1959-1961 there was a fierce ... |

7 |
Mean-field theory for scale– free random networks
- Barabási, Albert, et al.
- 1999
(Show Context)
Citation Context ...ppendix. As far as we are aware, our convergence proof given in the Appendix is the first formal proof validating Simon’s model − it does not rely on the mean-field theory approach, as for example=-= in [BAJ99]-=-. 2 An Urn Transfer Model We now present an urn transfer model [JK77] for a stochastic process that we will use in Section 3 to analyse the evolution of the Web graph. Our model is an extension of Sim... |

4 |
Bibliometric modeling processes and the empirical validity of Lotka’s law
- Nicholls
- 1989
(Show Context)
Citation Context ...tional to i −τ . Power-law distributions are abundant, for example Zipf’s law [Rap82], which states that relative frequency of words in a text is inversely proportional to their rank, and Lotka��=-=�s law [Nic89], -=-which is an inverse square law stating that the number of authors making n contributions is proportional to n −2 . (We refer the reader to [Sch91] for more examples of power-law distributions.) Rece... |

4 |
Zipf's Law Re-visited
- Rapoport
- 1982
(Show Context)
Citation Context ...butions are scale-free in the sense that if i is rescaled by multiplying it by a constant, then f(i) would still be proportional to i −τ . Power-law distributions are abundant, for example Zipf’s=-= law [Rap82], -=-which states that relative frequency of words in a text is inversely proportional to their rank, and Lotka’s law [Nic89], which is an inverse square law stating that the number of authors making n c... |

2 |
Some Monte Carlo estimates of the Yule distribution
- Simon, Wormer
- 1963
(Show Context)
Citation Context ...ned with the reported empirical values. (Our simulation is in the spirit of Simon and Van Wormer’s Monte Carlo simulation, whose intention was to test how good the estimates of the original model ar=-=e [SV63]-=-.) We repeated the simulation five times using the pk-model, and five times using the p-model. Each simulation was carried out for 200,000 iterations, and for the purpose of regression we considered o... |

1 | Lotka’s law, price’s urn and electronic publishing - Koenig, Harrell - 1995 |

1 | Graph strucutre - Broder, Kumar, et al. |