Abstract:
The use of link analysis and page popularity in search engines has grown recently to improve query result rankings. Since the number of such links contributes to the value of the document in such calculations, we wish to recognize and eliminate nepotistic links --- links between pages that are present for reasons other than merit. This paper explores some of the issues surrounding the question of what links to keep, and we report high accuracy in initial experiments to show the potential for using a machine learning tool to automatically recognize such links. Introduction Recently there has been growing interest in the research community in using analysis of link information of the Web (Bharat & Henzinger 1998; Brin & Page 1998; Chakrabarti et al. 1998b; 1998a; Chakrabarti, Dom, & Indyk 1998; Gibson, Kleinberg, & Raghavan 1998a; 1998b; Kleinberg 1998; Page et al. 1998; Chakrabarti et al. 1999a; Chakrabarti, van den Berg, & Dom 1999; Chakrabarti et al. 1999b; Henzinger et al...
Citations
|
3356
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
1839
|
The Anatomy of a Large-Scale Hypertextual Web Search Engine
– Brin, Page
- 1998
|
|
1669
|
Authoritative sources in a hyperlinked environment
– Kleinberg
- 1999
|
|
1064
|
The PageRank Citation Ranking: Bringing Order to the Web
– Page, Brin, et al.
- 1999
|
|
349
|
Improved algorithms for topic distillation in hyperlinked environments
– Bharat, Henzinger
- 1998
|
|
339
|
Focused crawling: a new approach to topic-specific (web) resource discovery
– Chakrabarti, Berg, et al.
- 1999
|
|
254
|
Enhanced hypertext categorization using hyperlinks
– Chakrabarti, Dom, et al.
- 1998
|
|
253
|
Inferring Web communities from link topology
– Gibson, Kleinberg, et al.
- 1998
|
|
244
|
Automatic resource compilation by analyzing hyperlink structure and associated text
– Chakrabarti, Dom, et al.
- 1998
|
|
208
|
The web as a graph: Measurements, models and methods
– Kleinberg, Kumar, et al.
- 1999
|
|
124
|
Finding related pages in the World Wide Web
– Dean, Henzinger
- 1999
|
|
122
|
Mining the Web's Link Structure
– Chakrabarti, Dom, et al.
- 1999
|
|
53
|
M.: Measuring Index Quality using Random Walks on the Web
– Henzinger, Heydon, et al.
- 1999
|
|
49
|
Finding near-replicas of documents on the Web
– SHIVAKUMAR, GARCIA-MOLINA
- 1998
|
|
48
|
A comparison of techniques to find mirrored hosts on the WWW
– Bharat, Broder, et al.
- 2000
|
|
29
|
Hypersearching the web. Scientific American
– Chakrabarti, Dom, et al.
- 1999
|
|
17
|
A.Z.: Mirror, mirror on the Web: A study of host pairs with replicated content
– Bharat, Broder
- 1999
|
|
14
|
DiscoWeb: Applying link analysis to Web search
– Davison, Gerasoulis, et al.
- 1999
|
|
7
|
Google home
– Inc
- 2002
|
|
5
|
Structural Analysis of the World Wide Web
– Gibson, Kleinberg, et al.
- 1998
|
|
4
|
Webster’s revised unabridged dictionary. Online at http://www.dictionary.com
– C, Co
- 1913
|
|
4
|
Search engine watch. http://www.searchenginewatch.com
– Sullivan
- 2000
|
|
2
|
Direct Hit home
– Jeeves, Inc
- 2000
|
|
2
|
Lycos: Your personal internet guide. http://www.lycos.com
– Lycos
- 2000
|