The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authoritative ” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages ” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
|
1844
|
The anatomy of a large-scale hypertextual web search engine
– Brin, Page
- 1998
|
|
1639
|
Indexing by latent semantic analysis
– Deerwester, Dumais, et al.
- 1990
|
|
971
|
Principal Component Analysis
– Jolliffe
- 2002
|
|
565
|
Automatic Text Processing
– Salton
- 1989
|
|
432
|
Scatter/Gather: A cluster-based approach to browsing large document collections
– Cutting, Karger, et al.
- 1992
|
|
424
|
Spectral Graph Theory
– Chung
- 1997
|
|
351
|
Improved algorithms for topic distillation in a hyperlinked environment
– Bharat, Henzinger
- 1998
|
|
253
|
Inferring Web communities from link topology
– Gibson, Kleinberg
- 1998
|
|
245
|
Automatic resource compilation by analyzing hyperlink structure and associated text
– Chakrabarti, Dom, et al.
- 1998
|
|
239
|
Analysis of a complex of statistical variables into principal components
– Hotelling
- 1933
|
|
207
|
Ramana: Silk from a sow’s ear: extracting usable structures from the Web
– Pirolli, Pitkow, et al.
- 1996
|
|
171
|
Co-Citation in the Scientific Literature: A New Measure of the Relationship Between Two Documents
– Small
- 1973
|
|
159
|
Latent semantic indexing: A probabilistic analysis
– Papadimitriou, Raghavan, et al.
- 1998
|
|
137
|
Structural analysis of hypertexts: Identifying hierarchies and useful metrics
– Botafogo, Rivlin
- 1992
|
|
127
|
Bibliographic coupling between scientific papers
– KESSLER
- 1963
|
|
114
|
Lower bounds for the partitioning of graphs
– Donath, Hoffman
- 1973
|
|
105
|
Clustering categorical data: and approach based on dynamical systems
– Gibson, Kleinberg
- 1998
|
|
93
|
GENVL and WWWW: Tools for Taming the Web
– McBryan
- 1994
|
|
87
|
ParaSite: Mining structural information on the Web
– Spertus
- 1997
|
|
86
|
WebQuery: searching and visualizing the Web through connectivity
– Carriere
- 1997
|
|
86
|
Hypursuit: A hierarchical network search engine that exploits content-link hypertext clustering
– Weiss, Velez, et al.
- 1996
|
|
82
|
How to personalize the Web
– Barrett, Maglio, et al.
- 1997
|
|
82
|
Fast Monte-Carlo algorithms for finding low rank approximations
– Frieze, Kannan, et al.
- 1998
|
|
82
|
Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace
– Larson
- 1996
|
|
79
|
Searching for information in a hypertext medical handbook
– Frisse
- 1997
|
|
74
|
The quest for correct information of the Web: hyper search engines
– Marchiori
- 1997
|
|
68
|
Citation analysis as a tool in journal evaluation
– Garfield
- 1972
|
|
66
|
A new status index derived from sociometric analysis
– Katz
- 1953
|
|
65
|
Applications of a Web query language
– Arocena, Mendelzon
- 1997
|
|
47
|
Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing and Management
– Pinski, Narin
- 1976
|
|
41
|
Introduction to Informetrics
– Egghe, Rousseau
- 1990
|
|
39
|
The structure of scientific literatures I: Identifying and graphing specialties
– SMALL, GRIFFITH
- 1974
|
|
16
|
lawfulness on the electronic frontier
– Pitkow, Pirolli, et al.
- 1997
|
|
14
|
An input-output approach to clique identification
– Hubbell
- 1965
|
|
13
|
Cocited author mapping as a valid representation of intellectual structure
– McCain
- 1986
|
|
8
|
The analysis of square matrices of scientometric transactions
– Price
- 1981
|
|
8
|
An improved method for analyzing square scientometric transaction matrices
– Noma
- 1982
|
|
5
|
Connectivity Server: Fast Access to
– Bharat, Broder, et al.
- 1998
|
|
5
|
The synthesis of specialty narratives from co-citation clusters
– SMALL
- 1986
|
|
4
|
Mathematical relations between impact factors and average number of citations
– EGGHE
- 1988
|
|
4
|
Algebraic connectivity of graphs
– Fielder
- 1973
|
|
4
|
Joint-space analysis of ‘pick-any’ data: Analysis of choices from an unconstrained set of alternatives
– LEVINE
- 1979
|
|
4
|
Subject and citation indexing. Part I: The clustering structure of composite representations in the cystic fibrosis document collection
– Shaw
- 1991
|
|
4
|
Subject and citation indexing. Part II: The optimal, cluster-based retrieval performance of composite representations
– Shaw
- 1991
|
|
3
|
Measuring the Relative Standing of Disciplinary Journals
– Doreian
- 1988
|
|
3
|
A measure of standing for citation networks within a wider environment
– DOREIAN
- 1994
|
|
3
|
Co-citation analysis and the invisible college
– NOMA
- 1984
|
|
2
|
Flowinterception problems, Facility Location
– Berman, Hodgson, et al.
- 1995
|
|
1
|
Upfal “Web search using automated classification,” poster at
– Chekuri, Goldwasser, et al.
- 1997
|
|
1
|
On the citation influence methodology of Pinski and
– Geller
|