## The Web as a graph: measurements, models, and methods (1999)

### Cached

### Download Links

- [www.zdnet.com]
- [www.almaden.ibm.com]
- [cs.brown.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 305 - 11 self |

### BibTeX

@MISC{Kleinberg99theweb,

author = {Jon M. Kleinberg and Ravi Kumar and Prabhakar Raghavan and Sridhar Rajagopalan and Andrew S. Tomkins},

title = {The Web as a graph: measurements, models, and methods},

year = {1999}

}

### Years of Citing Articles

### OpenURL

### Abstract

. The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, sociological, and commercial --- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. 1 Overview Few events in the history of comput...

### Citations

3237 | The anatomy of a large-scale hypertextual Web search engine
- Brin, Page
- 1998
(Show Context)
Citation Context ...dges) to other pages, making for a total of several billion hyperlinks in all.sThere are several reasons for studying the Web graph. The structure of this graph has already led to improved Web search =-=[6,8, 21, 29]-=-, more accurate topic-classi cation algorithms [11] and has inspired algorithms for enumerating emergent cyber-communities [23]. The hyperlinks themselves represent afecund source of sociological info... |

2702 | Authoritative sources in a hyperlinked environment
- Kleinberg
- 1998
(Show Context)
Citation Context ...dges) to other pages, making for a total of several billion hyperlinks in all.sThere are several reasons for studying the Web graph. The structure of this graph has already led to improved Web search =-=[6,8, 21, 29]-=-, more accurate topic-classi cation algorithms [11] and has inspired algorithms for enumerating emergent cyber-communities [23]. The hyperlinks themselves represent afecund source of sociological info... |

2663 | R.Srikan: Fast Algorithms for Mining Association Rules
- Agrawal
- 1994
(Show Context)
Citation Context ... two examples of projects in this space. Some other examples are W3QS [22], WebQuery [8], Weblog [24], and ParaSite/Squeal [29]. Traditional data mining research (see for instance Agrawal and Srikant =-=[2]-=-) focuses largely on algorithms for nding association rules and related statistical correlation measures in a given dataset. However, e cient methods such as a priori [2] or even more general methodol... |

1792 | Random graphs
- Bollobas
- 2001
(Show Context)
Citation Context ... Zip an (inverse polynomial) distributions [12, 17, 26, 31]. This and other measurements of the frequency of occurrence of certain structures suggest that traditional random graph models such asG n�p =-=[7]-=- are likely to do a poor job of modeling the Web graph. In Section 4 we laydown a framework for a class of random graph models, and give evidence that at least some of our observations about the Web (... |

980 | Human behavior and the principle of least effort - Zipf - 1949 |

666 | The Lorel query language for semistructured data
- Abiteboul, Quass, et al.
- 1997
(Show Context)
Citation Context ...studied phenomena in citation� some of these insights have been applied to the Web as well [25]. A view of the Web as a semi-structured database has been advanced by many authors. In particular, LORE =-=[1]-=- and WebSQL [27] use graph-theoretic and relational views of the Web respectively. These views support structured query interfaces to the Web (Lorel [1] and WebSQL [27]) that are evocative of and simi... |

403 | Improved algorithms for topic distillation in a hyperlinked environment
- Henzinger, Bharat
(Show Context)
Citation Context ...wn independently. We conclude in Section 5 with a number of directions for further work. 1.2 Related work Analysis of the structure of the Web graph has been used to enhance the quality of Web search =-=[5, 6, 8, 9, 21, 29]-=-. The topics of pages pointed to by aWeb page can be used to improve the accuracy of determining the (unknown) topic of this page in the setting of supervised classi cation [11]. Statistical analysis ... |

277 | Automatic resource compilation by analyzing hyperlink structure and associated text
- Chakrabarti, Dom, et al.
- 1998
(Show Context)
Citation Context ...wn independently. We conclude in Section 5 with a number of directions for further work. 1.2 Related work Analysis of the structure of the Web graph has been used to enhance the quality of Web search =-=[5, 6, 8, 9, 21, 29]-=-. The topics of pages pointed to by aWeb page can be used to improve the accuracy of determining the (unknown) topic of this page in the setting of supervised classi cation [11]. Statistical analysis ... |

244 | Querying the World Wide Web
- Mendelson, Mihaila, et al.
- 1996
(Show Context)
Citation Context ...na in citation� some of these insights have been applied to the Web as well [25]. A view of the Web as a semi-structured database has been advanced by many authors. In particular, LORE [1] and WebSQL =-=[27]-=- use graph-theoretic and relational views of the Web respectively. These views support structured query interfaces to the Web (Lorel [1] and WebSQL [27]) that are evocative of and similar to SQL. An a... |

176 | Bibliographic coupling between scientific papers, American Documentation 14(1) (1963) 10–25 - Kessler |

169 |
A technique for measuring the relative size and overlap of public Web search engine
- Bharat, Broder
- 1998
(Show Context)
Citation Context ...ted graph induced by the hyperlinks between Web pages� we refer to this as the Web graph. For our purposes, nodes represent static html pages and hyperlinks represent directed edges. Recent estimates =-=[4]-=- suggest that there are several hundred million nodes in the Web graph� this quantity isgrowing by a few percent a month. The average node has roughly seven hyperlinks (directed edges) to other pages,... |

164 | Frequency distribution of scientific productivity - Lotka - 1926 |

127 | Citation analysis as a tool in journal evaluation - Garfield - 1972 |

110 | Finding regular simple paths in graph databases
- Mendelzon, Wood
- 1995
(Show Context)
Citation Context ... numbers of \items" (pages) in the Web graph. This number is already two to three orders of magnitude more than the number of items in a typical market basket analysis. The work of Mendelzon and Wood =-=[28]-=- is an instance of structural methods in mining. They argue that the traditional SQL query interface to databases is inadequate in its power to specify several structural queries that are interesting ... |

98 | ParaSite: Mining Structural Information on the Web
- Spertus
- 1997
(Show Context)
Citation Context ...dges) to other pages, making for a total of several billion hyperlinks in all.sThere are several reasons for studying the Web graph. The structure of this graph has already led to improved Web search =-=[6,8, 21, 29]-=-, more accurate topic-classi cation algorithms [11] and has inspired algorithms for enumerating emergent cyber-communities [23]. The hyperlinks themselves represent afecund source of sociological info... |

96 |
WebQuery: searching and visualizing the Web through connectivity. Froc. Sixlh International World Wide Web Conference. Also rit http://Proceedings. ~vbcon$org/inde.~-by-topic. html #bro ws er
- Carriere, Kazman
- 1997
(Show Context)
Citation Context |

95 |
Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of cyberspace,” Ann
- Larson
- 1996
(Show Context)
Citation Context ...in spirit to our proposal, though di erent in details and application. The eld of bibliometrics [14, 16] has studied phenomena in citation� some of these insights have been applied to the Web as well =-=[25]-=-. A view of the Web as a semi-structured database has been advanced by many authors. In particular, LORE [1] and WebSQL [27] use graph-theoretic and relational views of the Web respectively. These vie... |

66 | Applications of a Web query language
- Arocena, Mendelzon, et al.
- 1997
(Show Context)
Citation Context ...rly de ne y =(y1�y2�:::�y n). Then the update rule for x can be written as x A T y and the update rule for y can be written as y Ax. Unwinding these one step further, we have x A T y A T Ax =(A T A)x =-=(3)-=- and y Ax AA T y =(AA T )y: (4) Thus the vector x after multiple iterations is precisely the result of applying the power iteration technique to A T A |wemultiply our initial iterate by larger and lar... |

62 | Database techniques for the World Wide Web : a survey - Florescu, Levy, et al. - 1998 |

59 | Parameterized computational feasibility - Downey, Fellows - 1995 |

54 |
Introduction to informetrics
- Egghe, Rousseau
- 1990
(Show Context)
Citation Context ...Lotka in 1926 [26].sGilbert [17] presents a probabilistic model explaining Lotka's law, which issimilar in spirit to our proposal, though di erent in details and application. The eld of bibliometrics =-=[14, 16]-=- has studied phenomena in citation� some of these insights have been applied to the Web as well [25]. A view of the Web as a semi-structured database has been advanced by many authors. In particular, ... |

35 |
Human Behavior and the Principle of Least E ort
- Zipf
- 1949
(Show Context)
Citation Context ...ements we have madeonthe entire Web graph, and on particular local subgraphs of interest. We show, for instance, that the in- and out-degrees of nodes follow Zip an (inverse polynomial) distributions =-=[12, 17, 26, 31]-=-. This and other measurements of the frequency of occurrence of certain structures suggest that traditional random graph models such asG n�p [7] are likely to do a poor job of modeling the Web graph. ... |

25 | A simulation of the structure of academic science
- Gilbert
- 1997
(Show Context)
Citation Context ...ements we have madeonthe entire Web graph, and on particular local subgraphs of interest. We show, for instance, that the in- and out-degrees of nodes follow Zip an (inverse polynomial) distributions =-=[12, 17, 26, 31]-=-. This and other measurements of the frequency of occurrence of certain structures suggest that traditional random graph models such asG n�p [7] are likely to do a poor job of modeling the Web graph. ... |

19 |
Trawling emerging cyber-communities automatically
- Kumar, Raghavan, et al.
- 1999
(Show Context)
Citation Context ... structure of this graph has already led to improved Web search [6,8, 21, 29], more accurate topic-classi cation algorithms [11] and has inspired algorithms for enumerating emergent cyber-communities =-=[23]-=-. The hyperlinks themselves represent afecund source of sociological information. Beyond the intrinsic interest of the structure of the Web graph, measurements of the graph and of the behavior of user... |

18 |
Query Flocks: A generalization of association rule mining
- Tsur, Ullman, et al.
- 1998
(Show Context)
Citation Context ...orithms for nding association rules and related statistical correlation measures in a given dataset. However, e cient methods such as a priori [2] or even more general methodologies such asquery ocks =-=[30]-=-, do not scale to the numbers of \items" (pages) in the Web graph. This number is already two to three orders of magnitude more than the number of items in a typical market basket analysis. The work o... |

14 | Enhanced hypertext classification using hyperlinks - Chakrabarti, Dom, et al. - 1998 |

10 | Information gathering on the World Wide Web: the W3QL query language and the W3QS system. Trans. on Database Systems - Konopnicki, Shmueli - 1998 |

6 |
The frequency distribution of scienti c productivity
- Lotka
- 1926
(Show Context)
Citation Context ...ements we have madeonthe entire Web graph, and on particular local subgraphs of interest. We show, for instance, that the in- and out-degrees of nodes follow Zip an (inverse polynomial) distributions =-=[12, 17, 26, 31]-=-. This and other measurements of the frequency of occurrence of certain structures suggest that traditional random graph models such asG n�p [7] are likely to do a poor job of modeling the Web graph. ... |

5 |
Citation analysis as a tool in journal evaluation
- eld
(Show Context)
Citation Context ...Lotka in 1926 [26].sGilbert [17] presents a probabilistic model explaining Lotka's law, which issimilar in spirit to our proposal, though di erent in details and application. The eld of bibliometrics =-=[14, 16]-=- has studied phenomena in citation� some of these insights have been applied to the Web as well [25]. A view of the Web as a semi-structured database has been advanced by many authors. In particular, ... |

4 |
The Analysis of Economic Time Series. Principia press
- Davis
- 1941
(Show Context)
Citation Context |

3 |
Enhanced hypertext classi cation using hyperlinks
- Chakrabarti, Dom, et al.
- 1998
(Show Context)
Citation Context ...hyperlinks in all.sThere are several reasons for studying the Web graph. The structure of this graph has already led to improved Web search [6,8, 21, 29], more accurate topic-classi cation algorithms =-=[11]-=- and has inspired algorithms for enumerating emergent cyber-communities [23]. The hyperlinks themselves represent afecund source of sociological information. Beyond the intrinsic interest of the struc... |

2 | Computing on data streams. AMS-DIMACS series, special issue on computing on very large datasets - Henzinger, Raghavan, et al. - 1998 |

2 | A declarative approach to querying and restructuring - Lakshmanan, Sadri, et al. |

1 |
M.Fellows. Parametrized Computational Feasibility. InFeasible Mathematics
- Downey
- 1994
(Show Context)
Citation Context ...ing to three pages would require examining approximately 10 40 possibilities on a graph with 10 8 nodes. A theoretical question (open as far as we know): does the work on xed-parameter intractability =-=[13]-=- imply that we cannot { in the worst case { improve on naive enumeration for bipartite cores? Such a result would argue that algorithms that are provably e cient ontheWeb graph must exploit some featu... |