The Web as a graph: measurements, models, and methods
, 1999
"... . The pages and hyperlinks of the WorldWide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons  mathematical, ..."
Abstract

Cited by 309 (11 self)
. The pages and hyperlinks of the WorldWide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons  mathematical, sociological, and commercial  for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new subfield of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. 1 Overview Few events in the history of comput...
Evolution of networks
 Adv. Phys
, 2002
"... We review the recent fast progress in statistical physics of evolving networks. Interest has focused mainly on the structural properties of random complex networks in communications, biology, social sciences and economics. A number of giant artificial networks of such a kind came into existence rece ..."
Abstract

Cited by 282 (2 self)
We review the recent fast progress in statistical physics of evolving networks. Interest has focused mainly on the structural properties of random complex networks in communications, biology, social sciences and economics. A number of giant artificial networks of such a kind came into existence recently. This opens a wide field for the study of their topology, evolution, and complex processes occurring in them. Such networks possess a rich set of scaling properties. A number of them are scalefree and show striking resilience against random breakdowns. In spite of large sizes of these networks, the distances between most their vertices are short — a feature known as the “smallworld” effect. We discuss how growing networks selforganize into scalefree structures and the role of the mechanism of preferential linking. We consider the topological and structural properties of evolving networks, and percolation in these networks. We present a number of models demonstrating the main features of evolving networks and discuss current approaches for their simulation and analytical study. Applications of the general results to particular networks in Nature are discussed. We demonstrate the generic connections of the network growth processes with the general problems
A Brief History of Generative Models for Power Law and Lognormal Distributions
 INTERNET MATHEMATICS
"... Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a a lognormal distribution. In trying ..."
Abstract

Cited by 257 (7 self)
Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a a lognormal distribution. In trying
Stochastic Models for the Web Graph
, 2000
"... The web may be viewed as a directed graph each of whose vertices is a static HTML web page, and each of whose edges corresponds to a hyperlink from one web page to another. In this paper we propose and analyze random graph models inspired by a series of empirical observations on the web. Our graph m ..."
Abstract

Cited by 225 (11 self)
The web may be viewed as a directed graph each of whose vertices is a static HTML web page, and each of whose edges corresponds to a hyperlink from one web page to another. In this paper we propose and analyze random graph models inspired by a series of empirical observations on the web. Our graph models differ from the traditional Gn;p models in two ways: 1. Independently chosen edges do not result in the statistics (degree distributions, clique multitudes) observed on the web. Thus, edges in our model are statistically dependent on each other. 2. Our model introduces new vertices in the graph as time evolves. This captures the fact that the web is changing with time. Our results are two fold: we show that graphs generated using our model exhibit the statistics observed on the web graph, and additionally, that natural graph models proposed earlier do not exhibit them. This remains true even when these earlier models are generalized to account for the arrival of vertices over time. In particular, the sparse random graphs in our models exhibit properties that do not arise in far denser random graphs generated by ErdosR'enyi models.
Power laws, Pareto distributions and Zipf’s law
 Contemporary Physics
, 2005
"... When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf’s law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, econ ..."
Abstract

Cited by 186 (0 self)
When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf’s law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. For instance, the distributions of the sizes of cities, earthquakes, solar flares, moon craters, wars and people’s personal fortunes all appear to follow power laws. The origin of powerlaw behaviour has been a topic of debate in the scientific community for more than a century. Here we review some of the empirical evidence for the existence of powerlaw forms and the theories proposed to explain them. I.
Coauthorship networks and patterns of scientific collaboration
 In Proceedings of the National Academy of Sciences
, 2004
"... Using data from three bibliographic databases in biology, physics, and mathematics respectively, networks are constructed in which the nodes are scientists and two scientists are connected if they have coauthored a paper together. We use these networks to answer a broad variety of questions about co ..."
Abstract

Cited by 127 (0 self)
Using data from three bibliographic databases in biology, physics, and mathematics respectively, networks are constructed in which the nodes are scientists and two scientists are connected if they have coauthored a paper together. We use these networks to answer a broad variety of questions about collaboration patterns, such as the numbers of papers authors write, how many people they write them with, what the typical distance between scientists is through the network, and how patterns of collaboration vary between subjects and over time. We also summarize a number of recent results by other authors on coauthorship patterns. 1
Extracting LargeScale Knowledge Bases From the Web
 Proceedings of the 25th VLDB Conference
, 1999
"... The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightlyfocused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the ..."
Abstract

Cited by 106 (2 self)
The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightlyfocused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central page with bidirectional links to a number of other pages. We develop novel algorithms for such enumeration problems. A key technical contribution is the development of a model for the evolution of the web graph, based on experimental observations derived from a snapshot of the web. We argue that our algorithms run efficiently in this model, and use the model to explain some statistical phenomena on the web that emerged during our experiments. Finally, we describe the design and implementation of Campfire, a knowledge base of over one hundred thousand web communities. 1 Overview The subject of this paper is the creation of knowledge bases by ...
2005), 'Network dynamics and field evolution: The growth of interorganizational collaboration in the life sciences
 American Journal of Sociology
"... A recursive analysis of network and institutional evolution is offered to account for the decentralized structure of the commercial field of the life sciences. Four alternative logics of attachment—accumulative advantage, homophily, followthetrend, and multiconnectivity—are tested to explain the s ..."
Abstract

Cited by 74 (7 self)
A recursive analysis of network and institutional evolution is offered to account for the decentralized structure of the commercial field of the life sciences. Four alternative logics of attachment—accumulative advantage, homophily, followthetrend, and multiconnectivity—are tested to explain the structure and dynamics of interorganizational collaboration in biotechnology. Using multiple novel methods, the authors demonstrate how different rules for affiliation shape network evolution. Commercialization strategies pursued by early corporate entrants are supplanted by universities, research institutes, venture capital, and small firms. As organizations increase their collaborative activities and diversify their ties to others, cohesive subnetworks form, characterized by multiple, independent pathways. These structural components, in turn, condition the choices and opportunities available to members of a field, thereby reinforcing an attachment logic based on differential connections to diverse partners.
ON THE COVERINGS OF GRAPHS
, 1980
"... Let p(n) denote the smallest integer with the property that any graph with n vertices can be covered by p(n) complete bipartite subgraphs. We prove a conjecture of J.C. Bermond by showing p(n) = n + o(n 11’14+c) for any positive E. ..."
Abstract

Cited by 70 (6 self)
Let p(n) denote the smallest integer with the property that any graph with n vertices can be covered by p(n) complete bipartite subgraphs. We prove a conjecture of J.C. Bermond by showing p(n) = n + o(n 11’14+c) for any positive E.
SP 2 Bench: A SPARQL performance benchmark
 In ICDE
, 2009
"... Abstract — Recently, the SPARQL query language for RDF has reached the W3C recommendation status. In response to this emerging standard, the database community is currently exploring efficient storage techniques for RDF data and evaluation strategies for SPARQL queries. A meaningful analysis and com ..."
Abstract

Cited by 54 (6 self)
Abstract — Recently, the SPARQL query language for RDF has reached the W3C recommendation status. In response to this emerging standard, the database community is currently exploring efficient storage techniques for RDF data and evaluation strategies for SPARQL queries. A meaningful analysis and comparison of these approaches necessitates a comprehensive and universal benchmark platform. To this end, we have developed SP 2 Bench, a publicly available, languagespecific SPARQL performance benchmark. SP 2 Bench is settled in the DBLP scenario and comprises both a data generator for creating arbitrarily large DBLPlike documents and a set of carefully designed benchmark queries. The generated documents mirror key characteristics and socialworld distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. As a proof of concept, we apply SP 2 Bench to existing engines and discuss their strengths and weaknesses that follow immediately from the benchmark results. I.