Using taxonomy, discriminants, and signatures for navigating in text databases (1997)
Cached
Download Links
- [arbor.ee.ntu.edu.tw]
- [sage.chungbuk.ac.kr]
- [ftp.informatik.uni-trier.de]
- [master.cpe.ku.ac.th]
- DBLP
Other Repositories/Bibliography
| Venue: | In Proceedings of the 23rd VLDB Conference |
| Citations: | 67 - 4 self |
BibTeX
@INPROCEEDINGS{Chakrabarti97usingtaxonomy,,
author = {Soumen Chakrabarti and Byron Dom and Rakesh Agrawal and Prabhakar Raghavan},
title = {Using taxonomy, discriminants, and signatures for navigating in text databases},
booktitle = {In Proceedings of the 23rd VLDB Conference},
year = {1997}
}
Years of Citing Articles
OpenURL
Abstract
We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora,suchas internet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through the query response not as a at unstructured list, but embedded in the familiar taxonomy, and annotated with document signatures computed dynamically with respect to where the user is located at any time. Weshowhowto update such databases with new documents with high speed and accuracy. Weuse techniques from statistical pattern recognition to e ciently separate the feature words or discriminants from the noise words at each node of the taxonomy. Using these, we build a multi-level classi er. At each node, this classi er can ignore the large number of noise words in a document. Thus the classi er has a small model size and is very fast. However, owing to the use of context-sensitive features, the classi er is very accurate. We report on experiences with the Reuters newswire benchmark, the US Patent database, and web document samples from Yahoo!. 1







