MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Applying the Multiple Cause Mixture Model to Text Categorization (1996) [19 citations — 0 self]

by Mehran Sahami ,  Marti Hearst ,  Eric Saund
Add To MetaCart

Abstract:

This paper introduces the use of the Multiple Cause Mixture Model to automatic text category assignment. Although much research has been done on text categorization, this algorithm is novel in that is unsupervised, that is, does not require pre-labeled training examples, and it can assign multiple category labels to documents. In this paper we present very preliminary results of the application of this model to a standard test collection, evaluating it in supervised mode in order to facilitate comparison with other methods, and showing initial results of its use in unsupervised mode. Introduction The popularity of searching the contents of the Internet has recently increased recognition of the need for automatic assignment of category labels to documents in large text collections. Web interfaces such as Stanford 's Yahoo web search system (Yahoo! 1995) make use of manually-assigned category labels to help users understand the structure of its text collection. However, manual informati...

Citations

4701 Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference – Pearl - 1988
3011 Pattern Classification and Scene Analysis – Duda, Hart - 1973
2044 Learning internal representations by error propagation – Rumelhart, G, et al. - 1986
1636 Indexing by latent semantic analysis – Deerwester, Dumais, et al. - 1990
400 Towards memory-based reasoning – Stanfill, Waltz - 1986
282 A sequential algorithm for training text classifiers – Lewis, Gale - 1994
259 Toward optimal feature selection – Koller, Sahami - 1996
186 Automated learning of decision rules for text categorization – Apte, Damerau - 1994
169 Recent trends in hierarchic document clustering: a critical review – Willett - 1988
100 Information extraction as a basis for highprecision text classification – Riloff, Lehnert - 1994
78 SCISOR: Extracting information from on-line news – Jacobs, Rau - 1990
70 Classifying news stories using memory based reasoning – Masand, Linoff, et al. - 1992
59 A multiple cause mixture model for unsupervised learning – Saund - 1994
37 Automating the assignment of submitted manuscripts to reviewers – Dumais, Nielsen - 1992
35 Applying Bayesian networks to information retrieval – Fung, Favero - 1995
35 RUBRIC: a system for rule-based information retrieval – McCune, Tong, et al. - 1985
24 Text retrieval and inference – Croft, B, et al. - 1992
24 Using categories to provide context for full-text retrieval results – Hearst - 1151
23 E�cient inference in bayes nets as a combinatorial optimization problem – Li, D�Ambrosio - 1994
14 An architecture for probabilistic concept-based information retrieval – Fung, Crawford, et al. - 1990
8 Intelligent high-volume text processing using shallow, domain-specific techniques – Hayes - 1992
2 On-line guide for the internet. http://www.yahoo.com – Yahoo - 1995