We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This word-cluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and e#cient representation of documents. When combined with the classification power of the SVM, this method yields high performance in text categorization. This novel combination of SVM with word-cluster representation is compared with SVM-based categorization using the simpler bag-of-words (BOW) representation. The comparison is performed over three known datasets. On one of these datasets (the 20 Newsgroups) the method based on word clusters significantly outperforms the word-based representation in terms of categorization accuracy or representation e#ciency. On the two other sets (Reuters-21578 and WebKB) the word-based representation slightly outperforms the wordcluster representation. We investigate the potential reasons for this behavior and relate it to structural di#erences between the datasets.
|
4923
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
1636
|
Indexing by latent semantic analysis
– Deerwester, Dumais, et al.
- 1990
|
|
1439
|
Modern Information Retrieval
– Baeza-Yates, Ribeiro
- 1999
|
|
1091
|
Support-vector network
– Cortes, Vapnik
- 1995
|
|
1053
|
Text Categorization with Support Vector Machines: Learning with Many Relevant Features
– Joachims
- 1998
|
|
1045
|
Experiments with a new boosting algorithm
– Freund, Schapire
- 1996
|
|
719
|
A training algorithm for optimal margin classifiers
– Boser, Guyon, et al.
- 1992
|
|
640
|
Combining labeled and unlabeled data with co-training
– Blum, Mitchell
- 1998
|
|
587
|
Machine learning in automated text categorization
– SEBASTIANI
|
|
450
|
X: A re-examination of text categorization methods
– Yang, Liu
|
|
407
|
Distributional clustering of english words
– Pereira, Tishby, et al.
- 1993
|
|
347
|
Inductive learning algorithms and representations for text categorization
– Dumais, Platt, et al.
- 1998
|
|
269
|
BoosTexter: A boostingbased system for text categorization
– Schapire, Singer
- 2000
|
|
259
|
Toward optimal feature selection
– Koller, Sahami
- 1996
|
|
249
|
Learning to extract symbolic knowledge from the World Wide Web
– Craven, DiPasquo, et al.
- 1998
|
|
248
|
Reducing multiclass to binary: A unifying approach for margin classifiers
– Allwein, Schapire, et al.
|
|
231
|
W: The information bottleneck method
– Tishby, Pereira, et al.
- 1999
|
|
218
|
Making large-scale support vector machine learning practical
– Joachims
- 1999
|
|
207
|
Text classification using string kernels
– Lodhi, Saunders, et al.
- 2002
|
|
160
|
Distributional clustering of words for text classification
– Baker, McCallum
- 1998
|
|
149
|
Deterministic annealing for clustering, compression, classification, regression, and related optimization problems
– Rose
- 1998
|
|
123
|
Learning to classify text from labeled and unlabeled documents
– Nigam, McCallum, et al.
- 1998
|
|
83
|
Agglomerative information bottleneck
– Slonim, Tishby
- 2000
|
|
69
|
Estimating the generalization performance of a SVM efficiently
– Joachims
- 1999
|
|
59
|
Maximizing text-mining performance
– Weiss, Apte, et al.
- 1999
|
|
56
|
Multivariate information bottleneck
– Friedman, Mosenzon, et al.
- 2001
|
|
53
|
Unsupervised document classification using sequential information maximization
– Slonim, Friedman, et al.
- 2002
|
|
38
|
Round robin classification
– Furnkranz
|
|
37
|
The power of word clusters for text classification
– SLONIM, TISHBY
|
|
36
|
A statistical learning model of text classification with support vector machines
– Joachims
- 2001
|
|
33
|
Extracting relevant structures with side information
– Chechik, Tishby
- 2002
|
|
33
|
A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization
– Caropreso, Matwin, et al.
- 2001
|
|
20
|
Iterative double clustering for unsupervised and semi-supervised learning
– El-Yaniv, Souroujon
- 2001
|
|
18
|
Modern Information Retrieval. Addison-Wesley and ACM
– Baeza-Yates, Ribeiro-Neto
- 1999
|
|
12
|
Joining statistics with nlp for text categorization
– Jacobs
- 1992
|
|
7
|
Relevance Feedback in Information Retrieval, chapter 14
– Rocchio
- 1971
|
|
6
|
Language-sensitive text classification
– Basili, Moschitti, et al.
- 2000
|
|
5
|
Machine learning for information retrieval: Advanced techniques, 2000. A tutorial presented at SIGIR'00
– Singer, Lewis
|
|
3
|
Unsupervised learning by probabilistic latent semantic analysis
– Hoffman
|
|
1
|
Reducing multiclass to binary: A unifying approach for margin classifiers
– Bekkerman, Tishby, et al.
- 2000
|
|
1
|
Clusters vs. Words for Text Categorization
– Word
- 1977
|
|
1
|
Toward optimal feature selection
– Bekkerman, Tishby, et al.
- 1996
|
|
1
|
Clusters vs. Words for Text Categorization N. Slonim and N. Tishby. The power of word clusters for text classification
– Word
- 2001
|
|
1
|
Maximum likelihood from incomplete data via the em algorithm
– BEKKERMAN, TISHBY, et al.
- 1977
|
|
1
|
A re-examination of text categorization methods
– BEKKERMAN, TISHBY, et al.
- 1999
|