## Partially Supervised Classification of Text Documents (2002)

### Cached

### Download Links

- [www.cs.uic.edu]
- [www.comp.nus.edu.sg]
- [cbio.ensmp.fr]
- [www.cs.uic.edu]
- [www.cs.uic.edu]
- [www.cs.uic.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 97 - 19 self |

### BibTeX

@INPROCEEDINGS{Liu02partiallysupervised,

author = {Bing Liu and Wee Sun Lee and Philip S. Yu and Xiaoli Li},

title = {Partially Supervised Classification of Text Documents},

booktitle = {},

year = {2002},

pages = {387--394}

}

### Years of Citing Articles

### OpenURL

### Abstract

We investigate the following problem: Given a set of documents of a particular topic or class # , and a large set # of mixed documents that contains documents from class # and other types of documents, identify the documents from class # in # . The key feature of this problem is that there is no labeled non- # document, which makes traditional machine learning techniques inapplicable, as they all need labeled documents of both classes. We call this problem partially supervised classification. In this paper, we show that this problem can be posed as a constrained optimization problem and that under appropriate conditions, solutions to the constrained optimization problem will give good solutions to the partially supervised classification problem. We present a novel technique to solve the problem and demonstrate the effectiveness of the technique through extensive experimentation.

### Citations

8542 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...supervised classification problem in the text domain. Our algorithm is built on the naive Bayesian classifier (McCallum & Nigam, 1998) in conjunction with the EM (Expectation Maximization) algorithm (=-=Dempster et al., 1977-=-). Our algorithm has two main novelties: After building an initial classifier (using naive Bayes and the EM algorithm), we select those documents that are most likely to be negative documents from the... |

1800 | Text categorization with support vector machines
- Joachims
- 1998
(Show Context)
Citation Context ...A number of techniques have been proposed, e.g., Rocchio algorithm (Rocchio, 1971), the naive Bayesian method (Lewis & Ringuette, 1994), K-nearest neighbour (Yang, 1999), and support vector machines (=-=Joachims, 1997-=-). These existing techniques, however, all require labeled data for all classes for building the classifier. They are not designed for solving the partially supervised classification. Note that we use... |

959 |
On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its applications XVI(2):264{280
- Vapnik, Chervonenkis
- 1971
(Show Context)
Citation Context ...nimizing while holding (where is recall) if the set of positive examples and the set of unlabeled examples are large enough. We will measure the complexity of function classes using the VC-dimension (=-=Vapnik & Chervonenkis, 1971-=-) of the function class. The VC-dimension is a standard measure of complexity in computational learning theory (see e.g. (Anthony & Bartlett, 1999)). For a class of function and a finite set , let be ... |

893 |
Relevance feedback in information retrieval
- Rocchio
- 1971
(Show Context)
Citation Context ...lated Work Text classification has been studied extensively in the past in information retrieval, machine learning and data mining. A number of techniques have been proposed, e.g., Rocchio algorithm (=-=Rocchio, 1971-=-), the naive Bayesian method (Lewis & Ringuette, 1994), K-nearest neighbour (Yang, 1999), and support vector machines (Joachims, 1997). These existing techniques, however, all require labeled data for... |

797 | A comparison of event models for naive bayes text classification
- McCallum, Nigam
- 1998
(Show Context)
Citation Context ...ractice. In this paper, we propose a novel heuristic technique for solving the partially supervised classification problem in the text domain. Our algorithm is built on the naive Bayesian classifier (=-=McCallum & Nigam, 1998-=-) in conjunction with the EM (Expectation Maximization) algorithm (Dempster et al., 1977). Our algorithm has two main novelties: After building an initial classifier (using naive Bayes and the EM algo... |

467 | NewsWeeder: learning to filter netnews
- Lang
- 1995
(Show Context)
Citation Context ...ment results, we will also report the accuracy results. 5.2 Experiment datasets Our experiments used two large document corpora, from which we created 30 datasets. The first one is the 20 Newsgroups (=-=Lang, 1995-=-). It contains 20 different UseNet discussion groups, which are also categorized into 4 main categories, computer, recreation, science, and talk. We remove all the UseNet headers (thereby discarding t... |

379 |
Decision theoretic generalizations of the PAC model for neural net and other learning applications
- Haussler
- 1992
(Show Context)
Citation Context ...he target function. Let 1 All proofs are omitted due to lack of space and can be found in the full version of the paper 2 This is a measurability condition which need not concern us in practice. See (=-=Haussler, 1992-=-) . Let be drawn from the distribution of positive examples where be unlabeled examples drawn indewhere Let pendently from Let be the subset of that achieves total recall on and . Then, with probabili... |

318 |
Neural Network Learning: Theoretical Foundations
- Anthony, Bartlett
- 1999
(Show Context)
Citation Context ...lexity of function classes using the VC-dimension (Vapnik & Chervonenkis, 1971) of the function class. The VC-dimension is a standard measure of complexity in computational learning theory (see e.g. (=-=Anthony & Bartlett, 1999-=-)). For a class of function and a finite set , let be the restriction of to (that is, the set of all possible -valued functions on the domain that can be obtained from the class ). The VC-dimension of... |

298 | Efficient noise-tolerant learning from statistical queries
- Kearns
- 1998
(Show Context)
Citation Context ...ive and unlabeled examples was done in (Denis, 1998). The study concentrates on the computational complexity of learning and shows that function classes learnable under the statistical queries model (=-=Kearns, 1998-=-) is also learnable from positive and unlabeled examples. (Letouzey et al., 2000) presents an algorithm for learning using a modified C4.5 (decision tree) algorithm based on statistical query model. R... |

280 | A Comparison of Two Learning algorithms for Text Categorization
- Lewis, Ringuette
- 1994
(Show Context)
Citation Context ...died extensively in the past in information retrieval, machine learning and data mining. A number of techniques have been proposed, e.g., Rocchio algorithm (Rocchio, 1971), the naive Bayesian method (=-=Lewis & Ringuette, 1994-=-), K-nearest neighbour (Yang, 1999), and support vector machines (Joachims, 1997). These existing techniques, however, all require labeled data for all classes for building the classifier. They are no... |

166 | Learning to Classify Text from Labeled and Unlabeled Documents
- Nigam, McCallum, et al.
- 1998
(Show Context)
Citation Context ...es. The main bottleneck of building such a classifier is that a large, often prohibitive, number of labeled training documents is needed to build accurate classifiers. Recently, it has been shown in (=-=Nigam et al., 1998-=-) that unlabeled data is helpful in classifier building. Their approach basically uses a small labeled set of documents of every class, and a large set of unlabeled documents to build classifiers. The... |

107 | The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon
- Shahshahani, Landgrebe
- 1994
(Show Context)
Citation Context ... novel technique to solve the problem in the text domain. This technique produces remarkably good results. Another line of related work is learning using a small labeled set (Nigam et al., 1998) and (=-=Shahshahani & Landdgrebe, 1994-=-). In both works, a small set of labeled data of every class and a large unlabeled set are used for classifier building. It was shown that the unlabeled data helps classification. These works are clea... |

97 |
Learning from positive data
- Muggleton
- 1996
(Show Context)
Citation Context ... 2000) presents an algorithm for learning using a modified C4.5 (decision tree) algorithm based on statistical query model. Recently, learning from positive example was also studied theoretically in (=-=Muggleton, 2001-=-) within a Bayesian framework where the distribution of functions and examples are assumed known. The result obtained in (Muggleton, 2001) is similar to our theoretical result in the noiseless case. H... |

43 | Learning from Positive Statistical Queries
- Denis, “PAC
- 1998
(Show Context)
Citation Context ...classification; a similar approach could be applied to more complex classifiers. A theoretical study of Probably Approximately Correct (PAC) learning from positive and unlabeled examples was done in (=-=Denis, 1998-=-). The study concentrates on the computational complexity of learning and shows that function classes learnable under the statistical queries model (Kearns, 1998) is also learnable from positive and u... |

13 |
Measurement-Theoretical Investigation of the MZ-Metric
- Bollmann, Cherniavsky
- 1980
(Show Context)
Citation Context ...urpose. Two popular measures are the score and breakeven point. score is defined as, , where is the precision and is the recall. score measures the performance of a system on a particular class (see (=-=Bollmann & Cherniavsky, 1981-=-) (Shaw, 1986) for its theoretical bases and practical advantages). The breakeven point is the value at which recall and precision are equal (Lewis & Ringuette, 1994). However, the breakeven point mea... |

3 |
Learning from positive and unlabeled examples. ALT-2000
- Letouzey, Denis, et al.
- 2000
(Show Context)
Citation Context ...rates on the computational complexity of learning and shows that function classes learnable under the statistical queries model (Kearns, 1998) is also learnable from positive and unlabeled examples. (=-=Letouzey et al., 2000-=-) presents an algorithm for learning using a modified C4.5 (decision tree) algorithm based on statistical query model. Recently, learning from positive example was also studied theoretically in (Muggl... |

3 |
On the foundation of evaluation. American society for information science
- Shaw
- 1986
(Show Context)
Citation Context ...e the score and breakeven point. score is defined as, , where is the precision and is the recall. score measures the performance of a system on a particular class (see (Bollmann & Cherniavsky, 1981) (=-=Shaw, 1986-=-) for its theoretical bases and practical advantages). The breakeven point is the value at which recall and precision are equal (Lewis & Ringuette, 1994). However, the breakeven point measure is not s... |

2 |
Developments in automatic relevance feedback in information retrieval
- Salton
- 1991
(Show Context)
Citation Context ...classification. These works are clearly different from ours as we do not have any labeled document of the negative class. The proposed method is also different from traditional information retrieval (=-=Salton, 1991-=-). In information retrieval, given a query document and a large document collection, the system retrieves and ranks the documents in the collection according to their similarities to the query documen... |