Abstract:
This paper introduces the use of the Multiple Cause Mixture Model to automatic text category assignment. Although much research has been done on text categorization, this algorithm is novel in that is unsupervised, that is, does not require pre-labeled training examples, and it can assign multiple category labels to documents. In this paper we present very preliminary results of the application of this model to a standard test collection, evaluating it in supervised mode in order to facilitate comparison with other methods, and showing initial results of its use in unsupervised mode. Introduction The popularity of searching the contents of the Internet has recently increased recognition of the need for automatic assignment of category labels to documents in large text collections. Web interfaces such as Stanford 's Yahoo web search system (Yahoo! 1995) make use of manually-assigned category labels to help users understand the structure of its text collection. However, manual informati...
Citations
|
4701
|
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
– Pearl
- 1988
|
|
3011
|
Pattern Classification and Scene Analysis
– Duda, Hart
- 1973
|
|
2044
|
Learning internal representations by error propagation
– Rumelhart, G, et al.
- 1986
|
|
1636
|
Indexing by latent semantic analysis
– Deerwester, Dumais, et al.
- 1990
|
|
400
|
Towards memory-based reasoning
– Stanfill, Waltz
- 1986
|
|
282
|
A sequential algorithm for training text classifiers
– Lewis, Gale
- 1994
|
|
259
|
Toward optimal feature selection
– Koller, Sahami
- 1996
|
|
186
|
Automated learning of decision rules for text categorization
– Apte, Damerau
- 1994
|
|
169
|
Recent trends in hierarchic document clustering: a critical review
– Willett
- 1988
|
|
100
|
Information extraction as a basis for highprecision text classification
– Riloff, Lehnert
- 1994
|
|
78
|
SCISOR: Extracting information from on-line news
– Jacobs, Rau
- 1990
|
|
70
|
Classifying news stories using memory based reasoning
– Masand, Linoff, et al.
- 1992
|
|
59
|
A multiple cause mixture model for unsupervised learning
– Saund
- 1994
|
|
37
|
Automating the assignment of submitted manuscripts to reviewers
– Dumais, Nielsen
- 1992
|
|
35
|
Applying Bayesian networks to information retrieval
– Fung, Favero
- 1995
|
|
35
|
RUBRIC: a system for rule-based information retrieval
– McCune, Tong, et al.
- 1985
|
|
24
|
Text retrieval and inference
– Croft, B, et al.
- 1992
|
|
24
|
Using categories to provide context for full-text retrieval results
– Hearst
- 1151
|
|
23
|
E�cient inference in bayes nets as a combinatorial optimization problem
– Li, D�Ambrosio
- 1994
|
|
14
|
An architecture for probabilistic concept-based information retrieval
– Fung, Crawford, et al.
- 1990
|
|
8
|
Intelligent high-volume text processing using shallow, domain-specific techniques
– Hayes
- 1992
|
|
2
|
On-line guide for the internet. http://www.yahoo.com
– Yahoo
- 1995
|