## Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression (2003)

### Cached

### Download Links

- [www.cs.uic.edu]
- [www.aaai.org]
- [www.aaai.org]
- [www.comp.nus.edu.sg]
- [www.hpl.hp.com]
- [www.comp.nus.edu.sg]
- DBLP

### Other Repositories/Bibliography

Venue: | Proceedings of the Twentieth International Conference on Machine Learning (ICML |

Citations: | 41 - 7 self |

### BibTeX

@INPROCEEDINGS{Lee03learningwith,

author = {Wee Sun Lee and Bing Liu},

title = {Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression},

booktitle = {Proceedings of the Twentieth International Conference on Machine Learning (ICML},

year = {2003},

pages = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

The problem of learning with positive and unlabeled examples arises frequently in retrieval applications.

### Citations

4118 | LIBSVM: a library for support vector machines,” September 14 2002. [Online]. Available: http://www.csie.ntu.edu.tw/˜cjlin/papers/libsvm.pdf - Chang, Lin |

1610 |
Making large-scale SVM learning practical
- Joachims
- 1999
(Show Context)
Citation Context ...erformance for the linear kernel is shown. (Results for Gaussian kernels are poorer.) To compare performance of the weighted logistic regression on noiseless data, we ran the SVM algorithm (svmLight (=-=Joachims, 1998-=-) with the default parameters) and naive bayes with an additive smoothing parameter of 0.1 on the noiseless dataset using the same feature set. The average F score for the SVM is 0.650 while the avera... |

1369 | T.: Combining labeled and unlabeled data with co-training
- Blum, Mitchell
- 1998
(Show Context)
Citation Context ...ever make an error on a negative example but will randomly label positive examples as negative with probability . One problem with this formulation is that the value of is unknown. Blum and Mitchell (=-=Blum & Mitchell, 1998-=-) observed that given , function , noisy observed label , actual label and input , is linearly related to . The ex&* " # () ! " # () pression ! is the expected sum of observed false positive and false... |

747 | Transductive inference for text classification using support vector machines
- Joachims
- 1999
(Show Context)
Citation Context ...from a small number of labeled positive and negative examples with a large number of unlabeled examples. Works on this topic include (Nigam et al., 1998) which uses naive bayes and the EM algorithm, (=-=Joachims, 1999-=-) which uses transductive SVM and (Blum & Mitchell, 1998) which exploits conditional independence of multiple views of the data tosdo co-training. 3. Learning Linear Functions Linear functions of the ... |

577 | Estimating the support of a high-dimensional distribution
- Schölkopf, Platt, et al.
- 2001
(Show Context)
Citation Context ...we have been experimenting with datasets with very large input dimensions. It is also possible to discard the unlabeled data and learn only from the positive data. This was done in the one-class SVM (=-=Scholkopf et al., 1999-=-), which tries to learn the support of the positive distribution. This method appears to be highly sensitive to the input representation and dimensionality (Manevitz & Yousef, 2001) and did not perfor... |

502 | Machine learning
- Mitchell
- 1997
(Show Context)
Citation Context ...of losses at epoch � . � The th component negative gradient of the sum of squared weights is ¡ simply § . We add a momentum term � � § £©�� � , � ¥s, to the � gradient of the s=-=um of losses (see e.g. (Mitchell, 1997)) to accelerate convergence-=- of the gradient descent. Our update at each epoch � then becomes § £© � � � � § £©�� � � � where � is the learning rate. Bias is implemented by extending the feature vecto... |

497 | Newsweeder: Learning to filter netnews
- Lang
- 1995
(Show Context)
Citation Context ...the same behaviour � as the score in the sense that it is large when bothsand ¡ are large and is small if eithersor ¡ is small. 5. Experiments We performed experiments using the 20 Newsgroup datas=-=et (Lang, 1995). T-=-he dataset consists of documents from 20 newsgroups with roughly 1000 documents in each group. The preprocessing is as follows: Removal of the headers of each document. � Removal of stop words. � ... |

313 | Efficient noise-tolerant learning from statistical queries
- Kearns
- 1998
(Show Context)
Citation Context ...ples was done in (Denis, 1998). Using the model where a positive example is left unlabeled with constant probability, it was shown that function classes learnable under the statistical queries model (=-=Kearns, 1998-=-) is also learnable from positive and unlabeled examples. Learning from positive example was also studied theoretically in (Muggleton, 2001) within a Bayesian framework where the distribution of funct... |

175 | Learning to classify text from labeled and unlabeled documents. AAAI-98
- Nigam, McCallum, et al.
- 1998
(Show Context)
Citation Context ...abeled examples, there has been considerable interest in learning from a small number of labeled positive and negative examples with a large number of unlabeled examples. Works on this topic include (=-=Nigam et al., 1998-=-) which uses naive bayes and the EM algorithm, (Joachims, 1999) which uses transductive SVM and (Blum & Mitchell, 1998) which exploits conditional independence of multiple views of the data tosdo co-t... |

103 |
Learning from positive data
- Muggleton
- 2001
(Show Context)
Citation Context ...function classes learnable under the statistical queries model (Kearns, 1998) is also learnable from positive and unlabeled examples. Learning from positive example was also studied theoretically in (=-=Muggleton, 2001-=-) within a Bayesian framework where the distribution of functions and examples are assumed known. Sample complexity for the case where the positive and unlabeled examples can be sampled is given in (L... |

101 | X.: Partially supervised classification of text documents
- Liu, Lee, et al.
- 2002
(Show Context)
Citation Context ...1) within a Bayesian framework where the distribution of functions and examples are assumed known. Sample complexity for the case where the positive and unlabeled examples can be sampled is given in (=-=Liu et al., 2002-=-), where it was shown that maximizing the number of examples classified as negative while constraining the function to correctly classify positive examples will give good performance with large enough... |

88 | Robust Trainability of Single Neurons - Hoffgen, Simon, et al. - 1995 |

65 | A polynomial-time algorithm for learning noisy linear threshold functions
- Blum, Frieze, et al.
- 1998
(Show Context)
Citation Context ...sitive example is left unlabeled is constant, it is also possible to modify the perceptron algorithm so that it is able to learn from positive and unlabeled examples using ideas from (Bylander, 1994; =-=Blum et al., 1996-=-). However, so far our attempts on using such algorithms have not been very successful, most probably due to the lack of good regularization criterion as we have been experimenting with datasets with ... |

52 | M.: Boosting algorithms as gradient descent in function space
- Mason, Baxter, et al.
- 1999
(Show Context)
Citation Context ...ve us accurate approximation when the sample size is large enough. In the case where the function class is not powerful enough, it is useful to view the logit loss as an upper bound to � ¡¦sthe lo=-=ss (Mason et al., 1999-=-). In this case, we are trying to minimize an upper bound to the sum of false positive and false negative frequencies, which still makes good sense even when the function class is not powerful enough ... |

46 | PAC learning from positive statistical queries
- Denis
- 1998
(Show Context)
Citation Context ...use the following simple model for learning with positive and unlabeled examples: positive examples are randomly labeled positive with probabilitys¢¡¤£ and are left unlabeled with probability £ (=-=see (Denis, 1998-=-)). Under this assumption, if we labeled all the unlabeled examples as negative, we will never make an error on a negative example but will randomly label positive examples as negative with probabilit... |

39 | Athena: mining-based interactive management of text databases
- Aggrawal, Bayardo, et al.
- 2000
(Show Context)
Citation Context ...then uses EM with naive bayes in order to label the unlabeled examples. The implementation used is the same as that described in (Liu et al., 2002) except that an additive smoothing parameter of 0.1 (=-=Agrawal et al., 2000-=-) is used instead of 1 (Laplacian smoothing) in the naive bayes model as it performs better. The other method used is the one-class support vector machines (Scholkopf et al., 1999) which does not use ... |

33 | Learning Linear Threshold Functions in the Presence of Classification Noise
- BYLANDER
- 1994
(Show Context)
Citation Context ...lity that the positive example is left unlabeled is constant, it is also possible to modify the perceptron algorithm so that it is able to learn from positive and unlabeled examples using ideas from (=-=Bylander, 1994-=-; Blum et al., 1996). However, so far our attempts on using such algorithms have not been very successful, most probably due to the lack of good regularization criterion as we have been experimenting ... |

31 | Text classification from positive and unlabeled examples. The - Denis, Gilleron, et al. - 2002 |

6 |
One class SVMs for document classification
- Manevitz, Yousef
- 2001
(Show Context)
Citation Context ...n the one-class SVM (Scholkopf et al., 1999), which tries to learn the support of the positive distribution. This method appears to be highly sensitive to the input representation and dimensionality (=-=Manevitz & Yousef, 2001-=-) and did not perform well on the feature set that we used in this paper. Besides learning from positive and unlabeled examples, there has been considerable interest in learning from a small number of... |

1 | LIBSVM: a li - Scholkopf, Platt, et al. - 2001 |