MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Learning to Probabilistically Identify Authoritative Documents (2000) [88 citations — 2 self]

by David Cohn ,  Huan Chang
In Proceedings of the 17th International Conference on Machine Learning
Add To MetaCart

Abstract:

We describe a model of document citation that learns to identify hubs and authorities in a set of linked documents, such as pages retrieved from the world wide web, or papers retrieved from a research paper archive. Unlike the popular HITS algorithm, which relies on dubious statistical assumptions, our model provides probabilistic estimates that have clear semantics. We also find that in general, the identified authoritative documents correspond better to human intuition. 1. Introduction Bibliometrics has been described as a "series of techniques that seek to quantify the process of written communication" (Ikpaahindi, 1985). It typically attempts to give quantified answers to questions involving the relationships among documents, or authors and documents: "Who are the authoritative authors in this field?" "What are the seminal papers?" "How many distinct communities are studying this subject?" and others (see White & McCain, 1989, for details). Traditionally, the statistics...

Citations

4735 Maximum Likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
1669 Authoritative sources in a hyperlinked environment – Kleinberg - 1999
1241 Matrix Computations – Golub, Loan - 1993
971 Estimating the dimension of a model – Schwarz - 1978
714 A new look at the statistical model identification – Akaike - 1974
349 Improved algorithms for topic distillation in hyperlinked environments – Bharat, Henzinger - 1998
305 Probabilistic latent semantic indexing – Hofmann - 1999
302 Decision theoretic generalizations of the PAC model for neural net and other learning applications – Haussler - 1992
216 An information maximization approach to blind separation and blind deconvolution – Bell, Sejnowski - 1995
82 Bibliometrics of the World Wide Web: an exploratory analysis of the intellectual structure of cyberspace – Larson
51 Building domain-specific search engines with machine learning techniques – McCallum, Nigam, et al. - 1999
35 A new look at the statistical model identi cation – Akaike - 1974
11 Building domain-speci search engines with machine learning techniques – McCallum, Nigam, et al. - 1999
9 Matrix computations – LOAN - 1989
6 Factor analysis. Lawrence Erlbaum Associates – Gorsuch, R - 1983
4 An overview of bibliometrics: its measurements, laws, and their applications. Librarian – Ikpaahindi - 1985
1 Cora, a computer science research archive (Technical Report). Just Research, http://www.cora.justresearch.com – McCallum, Nigam, et al. - 2000
1 Bibliometrics and the web (Technical Report FIS-12-19-1996-1). Faculty of Information Studies – Turnbull - 1996