. The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis. Categories and S...
|
1839
|
The Anatomy of a Large-Scale Hypertextual Web Search Engine
– Brin, Page
- 1998
|
|
1636
|
Indexing by latent semantic analysis
– Deerwester, Dumais, et al.
- 1990
|
|
970
|
Principal Component Analysis
– Jolliffe
- 1986
|
|
565
|
Automatic Text Processing
– Salton
- 1989
|
|
430
|
Scatter/gather: a cluster-based approach to browsing large document collections
– Cutting, Karger, et al.
- 1992
|
|
422
|
Spectral Graph Theory
– Chung
- 1997
|
|
349
|
Improved algorithms for topic distillation in hyperlinked environments
– Bharat, Henzinger
- 1998
|
|
253
|
Inferring Web communities from link topology
– Gibson, Kleinberg, et al.
- 1998
|
|
244
|
Automatic resource compilation by analyzing hyperlink structure and associated text
– Chakrabarti, Dom, et al.
- 1998
|
|
239
|
Analysis of a Complex of Statistical Variables into Principal Components
– Hotelling
- 1993
|
|
205
|
Silk from a sow’s ear: Extracting usable structures from the Web
– Pirolli, Pitkow, et al.
- 1996
|
|
171
|
Co-citation in the scientific literature: A new measu re of the relationship between two documents
– Small
- 1973
|
|
158
|
Latent semantic indexing: A probabilistic analysis
– Papadimitriou, Tamaki, et al.
- 1998
|
|
137
|
Structural analysis of hypertexts: Identifying hierarchies and useful metrics
– BOTAFOGO, RIVLIN, et al.
- 1992
|
|
127
|
Bibliographic coupling between scientific papers
– KESSLER
- 1963
|
|
114
|
Lower bounds for the partitioning of graphs
– Donath, Hoffman
- 1973
|
|
105
|
Clustering categorical data: An approach based on dynamic systems
– GIBSON, KLEINBERG, et al.
- 1998
|
|
93
|
GENVL and WWWW: Tools for Taming the Web
– McBryan
- 1994
|
|
87
|
Parasite: Mining structural information on the web
– Spertus
- 1997
|
|
86
|
WebQuery: Searching and visualizing the Web through connectivity
– Carrière, Kazman
|
|
86
|
Hypursuit: A hierarchical network search engine that exploits content-link hypertext clustering
– Weiss, Velez, et al.
- 1996
|
|
82
|
How to personalize the Web
– Barrett, Maglio, et al.
- 1997
|
|
82
|
Fast Monte-Carlo algorithms for finding low rank approximations
– Frieze, Kannan, et al.
- 1998
|
|
82
|
Bibliometrics of the World Wide Web: an exploratory analysis of the intellectual structure of cyberspace
– Larson
|
|
79
|
Searching for information in a hypertext medical handbook
– Frisse
|
|
73
|
The quest for correct information on the Web: Hyper search engines
– Marchiori
- 1997
|
|
67
|
Citation analysis as a tool in journal evaluation
– Garfield
- 1972
|
|
66
|
A new status index derived from sociometric analysis
– Katz
- 1953
|
|
65
|
Applications of a Web query language
– Arocena, Mendelzon, et al.
|
|
46
|
Citation influence for journal aggregates of scientific publications: Theory, with applications to the literature of physics
– PINSKI, NARIN
- 1976
|
|
41
|
Introduction to Informetrics
– Egghe, Rousseau
- 1990
|
|
39
|
The structure of scientific literatures I: Identifying and graphing specialties
– SMALL, GRIFFITH
- 1974
|
|
16
|
lawfulness on the electronic frontier
– Pitkow, Pirolli, et al.
- 1997
|
|
14
|
An input-output approach to clique identification
– Hubbell
- 1965
|
|
13
|
Cocited author mapping as a valid representation of intellectual structure
– McCain
- 1986
|
|
8
|
The analysis of square matrices of scientometric transactions
– Price
- 1981
|
|
8
|
An improved method for analyzing square scientometric transaction matrices
– Noma
- 1982
|
|
5
|
Connectivity Server: Fast Access to
– Bharat, Broder, et al.
- 1998
|
|
5
|
The synthesis of specialty narratives from co-citation clusters
– SMALL
- 1986
|
|
4
|
Mathematical relations between impact factors and average number of citations
– EGGHE
- 1988
|
|
4
|
Algebraic connectivity of graphs
– Fielder
- 1973
|
|
4
|
Joint-space analysis of ‘pick-any’ data: Analysis of choices from an unconstrained set of alternatives
– LEVINE
- 1979
|
|
4
|
Subject and citation indexing. Part I: The clustering structure of composite representations in the cystic fibrosis document collection
– Shaw
- 1991
|
|
4
|
Subject and citation indexing. Part II: The optimal, cluster-based retrieval performance of composite representations
– Shaw
- 1991
|
|
3
|
Measuring the Relative Standing of Disciplinary Journals
– Doreian
- 1988
|
|
3
|
A measure of standing for citation networks within a wider environment
– DOREIAN
- 1994
|
|
3
|
Co-citation analysis and the invisible college
– NOMA
- 1984
|
|
2
|
Flowinterception problems, Facility Location
– Berman, Hodgson, et al.
- 1995
|
|
1
|
Upfal “Web search using automated classification,” poster at
– Chekuri, Goldwasser, et al.
- 1997
|
|
1
|
On the citation influence methodology of Pinski and
– Geller
|