Abstract:
. The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, sociological, and commercial --- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. 1 Overview Few events in the history of comput...
Citations
|
1839
|
The Anatomy of a Large-Scale Hypertextual Web Search Engine
– Brin, Page
- 1998
|
|
1735
|
Fast Algorithms for Mining Association Rules
– Agrawal, Srikant
- 1994
|
|
1669
|
Authoritative sources in a hyperlinked environment
– Kleinberg
- 1999
|
|
1192
|
Random Graphs
– Bollobás
- 1985
|
|
614
|
Human behavior and the principle of least-effort
– Zipf
- 1949
|
|
595
|
The Lorel Query Language for Semistructured Data
– Abiteboul, Quass, et al.
- 1997
|
|
349
|
Improved algorithms for topic distillation in hyperlinked environments
– Bharat, Henzinger
- 1998
|
|
244
|
Automatic resource compilation by analyzing hyperlink structure and associated text
– Chakrabarti, Dom, et al.
- 1998
|
|
228
|
Querying the world wide web
– Mendelzon, Mihaila, et al.
- 1997
|
|
128
|
A.: A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines
– Bharat, Broder
- 1998
|
|
127
|
Bibliographic coupling between scientific papers
– KESSLER
- 1963
|
|
87
|
Parasite: Mining structural information on the web
– Spertus
- 1997
|
|
86
|
WebQuery: Searching and visualizing the Web through connectivity
– Carrière, Kazman
|
|
86
|
Finding Regular Simple Paths in Graph Databases
– Mendelzon, Wood
- 1995
|
|
82
|
Bibliometrics of the World Wide Web: an exploratory analysis of the intellectual structure of cyberspace
– Larson
|
|
81
|
The Frequency Distribution of Scientific Productivity
– Lotka
- 1926
|
|
67
|
Citation analysis as a tool in journal evaluation
– Garfield
- 1972
|
|
65
|
Applications of a Web query language
– Arocena, Mendelzon, et al.
|
|
57
|
Parametrized Computational Feasibility
– Downey, Fellows
- 1994
|
|
55
|
Database Techniques for the World Wide Web: A Survey
– Florescu, Levy, et al.
- 1998
|
|
41
|
Introduction to Informetrics
– Egghe, Rousseau
- 1990
|
|
27
|
Behavior and the Principle of Least E ort
– Zipf, Human
- 1949
|
|
18
|
Trawling emerging cyber-communities automatically
– Kumar, Raghavan, et al.
- 1999
|
|
15
|
Query Flocks: A Generalization of Association Rule Mining
– Tsur, Ullman, et al.
- 1998
|
|
14
|
A simulation of the structure of academic science
– Gilbert
- 1997
|
|
11
|
Information gathering on the World Wide Web: the W3QL query language and the W3QS system. Trans. on Database Systems
– Konopnicki, Shmueli
- 1998
|
|
9
|
Enhanced hypertext classification using hyperlinks
– Chakrabarti, Dom, et al.
- 1998
|
|
5
|
The frequency distribution of scienti c productivity
– Lotka
- 1926
|
|
4
|
Citation analysis as a tool in journal evaluation
– eld
- 1972
|
|
3
|
The Analysis of economic time series., Principia press
– Davis
- 1941
|
|
2
|
Enhanced hypertext classi cation using hyperlinks
– Chakrabarti, Dom, et al.
- 1998
|
|
2
|
Computing on data streams. AMS-DIMACS series, special issue on computing on very large datasets
– Henzinger, Raghavan, et al.
- 1998
|
|
2
|
A declarative approach to querying and restructuring
– Lakshmanan, Sadri, et al.
- 1996
|
|
1
|
M.Fellows. Parametrized Computational Feasibility. InFeasible Mathematics
– Downey
- 1994
|