MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Enhanced Hypertext Categorization Using Hyperlinks (1998) [256 citations — 8 self]

by Soumen Chakrabarti ,  Byron Dom ,  and Piotr Indyk ,  Piotr Indyk
Add To MetaCart

Abstract:

A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain highquality semantic clues that are lost upon a purely termbased classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relaxation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented ...

Citations

3018 Pattern Classification and Scene Analysis – Duda, Hart - 1973
2579 Classification and Regression Trees – Breiman, Friedman, et al. - 1984
2331 Introduction to Modern Information Retrieval – Salton, McGill - 1983
866 Inductive Logic Programming – Muggleton - 1991
734 Learning logical definitions from relations – Quinlan - 1990
303 Hierarchically classifying documents using very few words – Koller, Sahami - 1997
213 Sprint: A scalable parallel classifier for data mining, in – Shafer, Agrawal, et al.
194 Context sensitive learning methods for text categorization – Cohen - 1999
190 On the foundations of relaxation labeling processes – Hummel, Zucker - 1983
186 Automated Learning of Decision Rules for Text Categorization – Apte, Damerau - 1994
173 Finding the k shortest paths – Eppstein - 1994
159 SLIQ: A fast scalable classifier for data mining – Mehta, Agrawal, et al. - 1996
139 Visually searching the web for content – Smith, Chang - 1997
86 Hypursuit: A hierarchical network search engine that exploits content-link hypertext clustering – Weiss, Velez, et al. - 1996
82 Inductive Logic Programming: techniques and applications – Lavrac, Dzeroski - 1994
81 Markov random fields: theory and applications – Chellappa, Jain - 1993
77 Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy – Hearst, Karadi - 1997
64 A retrieval model for incorporating hypertext links – Croft, Turtle - 1989
60 Using taxonomy, discriminants, and signatures for navigating in text databases – Chakrabarti, Dom, et al. - 1997
58 Using probabilistic information in data integration – Florescu, Koller, et al. - 1997
45 A knowledge-intensive approach to learning relational concepts – Pazzani, Brunk, et al. - 1991
33 Making Use of Hypertext Links when Retrieving Information – Frei, Stieger - 1992
29 Web search using automatic classification – Chekuri, Goldwasser, et al. - 1996
19 Retrieval strategies for hypertext – Croft - 1993
18 Associative document retrieval techniques using bibliographic information – Salton - 1963
18 An extended vector processing scheme for searching information in hypertext systems – Savoy - 1996
16 The use of semantic links in hypertext information retrieval – Frei, Stieger - 1995
13 Learning to Model Sequences Generated by Switching Distributions – Freund, Ron - 1995
13 Information Retrieval from Hypertext: An Approach using Plausible Inference – Lucarella, Zanzi - 1993
13 A learning scheme for information retrieval – Savoy - 1994
11 The use of title and cited titles as document representation for automatic classification – Kwok - 1975
7 A probabilistic theory of indexing and similarity measure based on cited and citing documents – Kwok - 1985
5 A continuous relaxation labeling algorithm for markov random fields – Pelkowitz - 1990
3 A document-document similarity measure based on cited titles and probability theory, and its application to relevance feedback retrieval – Kwok - 1984
2 A new probabilistic relaxation scheme and its application to edge detection – Deng, Iyenger - 1996
2 On the use of bibliographically related titles for the enhancement of document representations – Kwok - 1988
2 Enhancement of text representations using related document titles – Salton, Zhang - 1986
2 A new probabilistic scheme for information retrieval in hypertext. The New Review of Hypermedia and Multimedia – Savoy - 1995
1 Learning relational rules from relational data – Ali - 1997
1 Web classification using Bayesian nets – Sahami - 1997