Abstract:
This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity analysis has been shown to be useful in identifying high quality pages within a topic specific graph of hyperlinked documents. The essence of our approach is to augment a previous connectivity analysis based algorithm with content analysis. We identify three problems with the existing approach and devise algorithms to tackle them. The results of a user evaluation are reported that show an improvement of precision at 10 documents by at least 45% over pure connectivity analysis.
Citations
|
1674
|
Authoritative sources in a hyperlinked environment
– Kleinberg
- 1999
|
|
1103
|
An Algorithm For Suffix Stripping
– Porter
- 1980
|
|
432
|
Scatter/Gather: A cluster-based approach to browsing large document collections
– Cutting, Karger, et al.
- 1992
|
|
245
|
Automatic resource compilation by analyzing hyperlink structure and associated text
– Chakrabarti, Dom, et al.
- 1998
|
|
209
|
A First Course in Stochastic Processes
– Karlin, Taylor
- 1975
|
|
207
|
Ramana: Silk from a sow’s ear: extracting usable structures from the Web
– Pirolli, Pitkow, et al.
- 1996
|
|
147
|
Term Weighting Approaches
– Salton, Buckley
- 1988
|
|
91
|
The connectivity server: Fast access to linkage information on the Web
– Bharat, Broder, et al.
- 1998
|
|
82
|
Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace
– Larson
- 1996
|
|
77
|
Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy
– Hearst, Karadi
- 1997
|
|
74
|
Towards interactive query expansion
– Harman
|
|
60
|
Providing government information on the Internet: experiences with THOMAS
– Croft, Cook
- 1995
|
|
46
|
Search engines for the World Wide Web: A comparative study and evaluation methodology. Paper presented at the annual Conference of the ASIS
– Chu, Rosenthal
- 1996
|
|
36
|
Exploiting Clustering and Phrases for Context-Based Information Retrieval
– Anick, Vaithyanathan
- 1997
|
|
28
|
A user-centred evaluation of ranking algorithms for interactive query expansion. Korfhage et.al
– Efthimiadis
- 1993
|
|
28
|
Interfaces for End-User Information Seeking
– Marchionini
- 1992
|
|
14
|
PageRank: Bringing order to the web. Stanford Digital Libraries Working Paper 1997-0072
– Page
- 1997
|
|
7
|
Adapting a Full-text Information Retrieval System to Computer the Troubleshooting Domain
– Anick
- 1994
|
|
5
|
LiveTopics: Recherche Visuelle d’Information sur l’Internet.” Dossiers de l’Audiovisuel, La Documentation Francaise No. 74 (July-Aug
– Bourdoncle
- 1997
|
|
4
|
Distinguishing between Web Data Mining and Information Access
– Hearst
- 1997
|
|
3
|
Citation Indexing’s Achilles Heel? Evaluative Bibliometrics and Non Coverage of the Monographic
– Cronin, Snyder
- 1996
|
|
2
|
Search Engines for the World Wide Web: A Comparative Study and Evaluation Methodology
– unknown authors
- 1996
|
|
2
|
The TREC Conferences” R. Kuhnlen and M. Rittberger (Eds
– Harman
- 1995
|
|
2
|
Fast and Effective Query Refinement
– Vklex, Sheldon, et al.
- 1997
|