## Combining Labeled and Unlabeled Data for MultiClass Text Categorization (2002)

### Cached

### Download Links

- []
- [www.accenture.com]
- [www.accenture.com]
- [accenture-outsourcing.ie]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the International Conference on Machine Learning |

Citations: | 42 - 0 self |

### BibTeX

@INPROCEEDINGS{Ghani02combininglabeled,

author = {Rayid Ghani},

title = {Combining Labeled and Unlabeled Data for MultiClass Text Categorization},

booktitle = {In Proceedings of the International Conference on Machine Learning},

year = {2002},

pages = {187--194}

}

### Years of Citing Articles

### OpenURL

### Abstract

Supervised learning techniques for text classification often require a large number of labeled examples to learn accurately. One way to reduce the amount of labeled data required is to develop algorithms that can learn effectively from a small number of labeled examples augmented with a large number of unlabeled examples. Current text learning techniques for combining labeled and unlabeled, such as EM and Co-Training, are mostly applicable for classification tasks with a small number of classes and do not scale up well for large multiclass problems. In this paper, wedevelop a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC) setup by first decomposing multiclass problems into multiple binary problems and then using Co-Training to learn the individual binary classification problems.

### Citations

9054 | Maximum likelihood from incomplete data via the EM algorithm (with discussion
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...classes with a single multinomial component isbadly violated, basic EM performance su ers. EM is an iterative statistical technique for maximum likelihood estimation in problems with incomplete data (=-=Dempster et al., 1977-=-). Given a model of data generation, and data with some missing values, EM will locally maximize the likelihood of the parameters and give estimates for the missing values. The naive Bayes generative ... |

1319 | Combining labeled and unlabeled data with co-training
- Blum, Mitchell
- 1998
(Show Context)
Citation Context ...mains where the features naturally divide into two disjoint sets. Data sets whose features naturally partition into two sets, and algorithms that use this division, fall into the co-training setting (=-=Blum & Mitchell, 1998-=-). They present an algorithm for classifying web pages that builds two classi ers: one over the words that appear on the page, and another over the words appearing in hyperlinks pointing to that page.... |

835 | A Comparison of Event Models for Naïve Bayes Text Classification. AAAI/ICML-98 Workshop on Learning for Text Categorization
- McCallum, Nigam
- 1998
(Show Context)
Citation Context ... Expectation-Maximization (EM) algorithm to compare with our proposed approach. 5.1 Naive Bayes Naive Bayes is a simple but e ective text classi cation algorithm for learning from labeled data alone (=-=McCallum & Nigam, 1998��-=-� Lewis, 1998). We use the multinomial model as de ned in (McCallum & Nigam, 1998) where each word in a document is assumed to be generated independently of the others given the class and use Laplace ... |

598 | Solving multiclass learning problems via error-correcting output codes
- Dietterich, Bakiri
- 1995
(Show Context)
Citation Context ...ts in a system that can provide a smooth tradeo between recall and precision and can be used for high-precision classi cation. 2. Error Correcting Output Coding Error Correcting Output Coding (ECOC) (=-=Dietterich & Bakiri, 1995) ha-=-s been shown to perform extraordinarily well for text classi cation (Berger, 1999� Ghani, 2000� ?). ECOC converts a m-class supervised learning problem into n binary supervised learning problems. ... |

415 | Exploiting generative models in discriminative classifiers
- Jaakkola, Haussler
- 1998
(Show Context)
Citation Context ... estimate maximum a posteriori parameters of a generative model for text classi cation (Nigam et al., 2000), using a generative model built from unlabeled data to perform discriminative classication (=-=Jaakkola & Haussler, 1999-=-), and using transductive inference for support vector machinestooptimize performance on a speci c test set (Joachims, 1999). These studies have shown that using unlabeled data can signi cantly improv... |

385 | Naive (bayes) at forty: The independence assumption in information retrieval
- Lewis
- 1998
(Show Context)
Citation Context ...n (EM) algorithm to compare with our proposed approach. 5.1 Naive Bayes Naive Bayes is a simple but e ective text classi cation algorithm for learning from labeled data alone (McCallum & Nigam, 1998��=-=� Lewis, 1998-=-). We use the multinomial model as de ned in (McCallum & Nigam, 1998) where each word in a document is assumed to be generated independently of the others given the class and use Laplace smoothing to ... |

95 |
On a class of error correcting binary group codes
- Bose, Ray-Chaudhuri
- 1960
(Show Context)
Citation Context ...stance which counts the number of bits that the two codewords di er by. This process of mapping the output string to the nearest codeword is identical to the decoding step for error-correcting codes (=-=Bose & Ray-Chaudhri, 1960��-=-� Hocuenghem, 1959). Training Phase 1. Given a problem with m classes, create an m x n binary matrix M (where n can be less than m). 1 2. Each class is assigned one row of M (Each column divides the e... |

58 | Error-correcting output coding for text classification
- BERGER
- 1999
(Show Context)
Citation Context ...igh-precision classi cation. 2. Error Correcting Output Coding Error Correcting Output Coding (ECOC) (Dietterich & Bakiri, 1995) has been shown to perform extraordinarily well for text classi cation (=-=Berger, 1999� -=-Ghani, 2000� ?). ECOC converts a m-class supervised learning problem into n binary supervised learning problems. Any learning algorithm that can handle twoclass learning problems can then be applied... |

32 | Hypertext categorization using hyperlink patterns and meta data - Ghani, Slattery, et al. - 2001 |

9 |
Transductive inference for text classi cation using support vector machines
- Joachims
- 1999
(Show Context)
Citation Context ...built from unlabeled data to perform discriminative classication (Jaakkola & Haussler, 1999), and using transductive inference for support vector machinestooptimize performance on a speci c test set (=-=Joachims, 1999-=-). These studies have shown that using unlabeled data can signi cantly improve classi cation performance, especially when labeled training data are sparse. A related set of research uses labeled and u... |

4 | Selective sampling + semi-supervised learning = robust multi-view learning - Muslea, Minton, et al. - 2001 |

3 |
Using error-correcting codes for text classi cation
- Ghani
- 2000
(Show Context)
Citation Context ...ation, and (2) the two feature sets of each instance are conditionally independent given the class. Most studies on text classi cation with Co-training type algorithms (Blum & Mitchell, 1998� Nigam =-=& Ghani, 2000-=-) have focused on small, often binary, problems and it is not clear whether their conclusions would generalize to real-world classi cation tasks with a large number of categories. Experimental evaluat... |

3 |
Codes corecteurs d’erreurs
- Hocuenghem
- 1959
(Show Context)
Citation Context ...ber of bits that the two codewords di er by. This process of mapping the output string to the nearest codeword is identical to the decoding step for error-correcting codes (Bose & Ray-Chaudhri, 1960��=-=� Hocuenghem, 1959-=-). Training Phase 1. Given a problem with m classes, create an m x n binary matrix M (where n can be less than m). 1 2. Each class is assigned one row of M (Each column divides the entire class space ... |

1 |
Using error-correcting codes for e - cient text classi cation with a large number of categories. masters thesis (Technical Report
- Ghani
- 2001
(Show Context)
Citation Context ...aive Bayes classi er from the labeled data only. Then, A probabilistically labels all the unlabeled data. The Bfeature-set classi er then trains using the labeled data 2 More details can be found in (=-=Ghani, 2001)-=-sPrecision 80 70 60 50 40 30 20 10 Naïve Bayes EM ECOC + Co-Training 0 0 20 40 60 80 100 Recall and the unlabeled data with A's labels. B then relabels the data for use by A, and this process iterate... |

1 |
Analyzing the applicability and e ectiveness of co-training
- Nigam, Ghani
- 2000
(Show Context)
Citation Context ...classi cation, and (2) the two feature sets of each instance are conditionally independent given the class. Most studies on text classi cation with Co-training type algorithms (Blum & Mitchell, 1998��=-=� Nigam & Ghani, 2000-=-) have focused on small, often binary, problems and it is not clear whether their conclusions would generalize to real-world classi cation tasks with a large number of categories. Experimental evaluat... |