Results 1 -
2 of
2
Enhanced Hypertext Categorization Using Hyperlinks
, 1998
"... A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is ..."
Abstract
-
Cited by 326 (8 self)
- Add to MetaCart
A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain highquality semantic clues that are lost upon a purely termbased classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relaxation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented ...
Xu-Dong Zhang,
"... This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i.e., the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines. 1

