Enhanced Hypertext Categorization Using Hyperlinks (1998)
| Citations: | 326 - 8 self |
BibTeX
@MISC{Chakrabarti98enhancedhypertext,
author = {Soumen Chakrabarti and Byron Dom and and Piotr Indyk and Piotr Indyk},
title = {Enhanced Hypertext Categorization Using Hyperlinks},
year = {1998}
}
Years of Citing Articles
OpenURL
Abstract
A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain highquality semantic clues that are lost upon a purely termbased classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relaxation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented ...







