Abstract:
We describe a model of document citation that learns to identify hubs and authorities in a set of linked documents, such as pages retrieved from the world wide web, or papers retrieved from a research paper archive. Unlike the popular HITS algorithm, which relies on dubious statistical assumptions, our model provides probabilistic estimates that have clear semantics. We also find that in general, the identified authoritative documents correspond better to human intuition. 1. Introduction Bibliometrics has been described as a "series of techniques that seek to quantify the process of written communication" (Ikpaahindi, 1985). It typically attempts to give quantified answers to questions involving the relationships among documents, or authors and documents: "Who are the authoritative authors in this field?" "What are the seminal papers?" "How many distinct communities are studying this subject?" and others (see White & McCain, 1989, for details). Traditionally, the statistics...
Citations
|
4735
|
Maximum Likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
1669
|
Authoritative sources in a hyperlinked environment
– Kleinberg
- 1999
|
|
1241
|
Matrix Computations
– Golub, Loan
- 1993
|
|
971
|
Estimating the dimension of a model
– Schwarz
- 1978
|
|
714
|
A new look at the statistical model identification
– Akaike
- 1974
|
|
349
|
Improved algorithms for topic distillation in hyperlinked environments
– Bharat, Henzinger
- 1998
|
|
305
|
Probabilistic latent semantic indexing
– Hofmann
- 1999
|
|
302
|
Decision theoretic generalizations of the PAC model for neural net and other learning applications
– Haussler
- 1992
|
|
216
|
An information maximization approach to blind separation and blind deconvolution
– Bell, Sejnowski
- 1995
|
|
82
|
Bibliometrics of the World Wide Web: an exploratory analysis of the intellectual structure of cyberspace
– Larson
|
|
51
|
Building domain-specific search engines with machine learning techniques
– McCallum, Nigam, et al.
- 1999
|
|
35
|
A new look at the statistical model identi cation
– Akaike
- 1974
|
|
11
|
Building domain-speci search engines with machine learning techniques
– McCallum, Nigam, et al.
- 1999
|
|
9
|
Matrix computations
– LOAN
- 1989
|
|
6
|
Factor analysis. Lawrence Erlbaum Associates
– Gorsuch, R
- 1983
|
|
4
|
An overview of bibliometrics: its measurements, laws, and their applications. Librarian
– Ikpaahindi
- 1985
|
|
1
|
Cora, a computer science research archive (Technical Report). Just Research, http://www.cora.justresearch.com
– McCallum, Nigam, et al.
- 2000
|
|
1
|
Bibliometrics and the web (Technical Report FIS-12-19-1996-1). Faculty of Information Studies
– Turnbull
- 1996
|