Results 1 
7 of
7
Learning Classifiers from Only Positive and Unlabeled Data
, 2008
"... The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case that the available training data are an incomp ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case that the available training data are an incomplete set of positive examples, and a set of unlabeled examples, some of which are positive and some of which are negative. The problem solved in this paper is how to learn a standard binary classifier given a nontraditional training set of this nature. Under the assumption that the labeled examples are selected randomly from the positive examples, we show that a classifier trained on positive and unlabeled examples predicts probabilities that differ by only a constant factor from the true conditional probabilities of being positive. We show how to use this result in two different ways to learn a classifier from a nontraditional training set. We then apply these two new methods to solve a realworld problem: identifying protein records that should be included in an incomplete specialized molecular biology database. Our experiments in this domain show that models trained using the new methods perform better than the current stateoftheart biased SVM method for learning from positive and unlabeled examples.
Learning Classifiers without Negative Examples: A Reduction Approach
"... The problem of PU Learning, i.e., learning classifiers with positive and unlabelled examples (but not negative examples), is very important in information retrieval and data mining. We address this problem through a novel approach: reducing it to the problem of learning classifiers for some meaningf ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The problem of PU Learning, i.e., learning classifiers with positive and unlabelled examples (but not negative examples), is very important in information retrieval and data mining. We address this problem through a novel approach: reducing it to the problem of learning classifiers for some meaningful multivariate performance measures. In particular, we show how a powerful machine learning algorithm, Support Vector Machine, can be adapted to solve this problem. The effectiveness and efficiency of the proposed approach have been confirmed by our experiments on three realworld datasets. 1
QueryByMultipleExamples using Support Vector Machines
, 2009
"... Journal of Digital Information Management Abstract: We identify and explore an Information Retrieval paradigm called QueryByMultipleExamples (QBME) where the information need is described not by a set of terms but by a set of documents. Intuitive ideas for QBME include using the centroid of these ..."
Abstract
 Add to MetaCart
Journal of Digital Information Management Abstract: We identify and explore an Information Retrieval paradigm called QueryByMultipleExamples (QBME) where the information need is described not by a set of terms but by a set of documents. Intuitive ideas for QBME include using the centroid of these documents or the wellknown Rocchio algorithm to construct the query vector. We consider this problem from the perspective of text classification, and find that a better query vector can be obtained through learning with Support Vector Machines (SVMs). For online queries, we show how SVMs can be learned from oneclass examples in linear time. For offline queries, we show how SVMs can be learned from positive and unlabeled examples together in linear or polynomial time, optimising some meaningful multivariate performance measures. The effectiveness and efficiency of the proposed approaches have been confirmed by our experiments on four realworld datasets.
Transductive anomaly detection
, 2008
"... One formulation of the anomaly detection problem is to build a detector based on a training sample consisting only on nominal data. The standard approach to this problem has been to declare anomalies where the nominal density is low, which reduces the problem to density level set estimation. This ap ..."
Abstract
 Add to MetaCart
One formulation of the anomaly detection problem is to build a detector based on a training sample consisting only on nominal data. The standard approach to this problem has been to declare anomalies where the nominal density is low, which reduces the problem to density level set estimation. This approach is inductive in the sense that the detector is constructed before any test data are observed. In this paper, we consider the transductive setting where the unlabeled and possibly contaminated test sample is also available at learning time. We argue that anomaly detection in this transductive setting is naturally solved by a general reduction to a binary classification problem. In particular, an anomaly detector with a desired false positive rate can be achieved through a reduction to NeymanPearson classification. Unlike the inductive approach, the transductive approach yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on anomalies. Therefore, in anomaly detection, unlabeled data can have a substantial impact on the theoretical properties of the decision rule.
Asian Conference on Machine Learning Multiview Positive and Unlabeled Learning
"... Learning with Positive and Unlabeled instances (PU learning) arises widely in information retrieval applications. To address the unavailability issue of negative instances, most existing PU learning approaches require to either identify a reliable set of negative instances from the unlabeled data or ..."
Abstract
 Add to MetaCart
Learning with Positive and Unlabeled instances (PU learning) arises widely in information retrieval applications. To address the unavailability issue of negative instances, most existing PU learning approaches require to either identify a reliable set of negative instances from the unlabeled data or estimate probability densities as an intermediate step. However, inaccurate negativeinstance identification or poor density estimation may severely degrade overall performance of the final predictive model. To this end, we propose a novel PU learning method based on density ratio estimation without constructing any sets of negative instances or estimating any intermediate densities. To further boost PU learning performance, we extend our proposed learning method in a multiview manner by utilizing multiple heterogeneous sources. Extensive experimental studies demonstrate the effectiveness of our proposed methods, especially when positive labeled data are limited.
Acknowledgements
"... Dedicated to those that are gone and to those that are to come... Izan ziren eta Izango direnentzatVIVII ..."
Abstract
 Add to MetaCart
Dedicated to those that are gone and to those that are to come... Izan ziren eta Izango direnentzatVIVII