Topic modeling for mediated access to very large document collections (2004)
| Venue: | Journal of the American Society for Information Science and Technology |
| Citations: | 7 - 0 self |
BibTeX
@ARTICLE{Muresan04topicmodeling,
author = {Gheorghe Muresan and David J. Harper},
title = {Topic modeling for mediated access to very large document collections},
journal = {Journal of the American Society for Information Science and Technology},
year = {2004},
volume = {55},
pages = {892--910}
}
OpenURL
Abstract
Clear and precise queries are a necessity when searching very large document collections, especially when query-based retrieval is the only means of exploration. We propose system-mediated information access as a solution for users ’ well-documented inability to formulate good queries. Our approach is based on two main assumptions: first, on the ability of document clustering to reveal the topical, semantic structure of a problem domain represented by a specialized “source collection,” and, second, on the capacity of statistical language models to convey content. Taking the role of the human mediator or intermediary searcher, a mediation system interacts with the user and supports her exploration of a relatively small source collection, chosen to be representative for the problem domain. Based on the







