Results 11 - 20
of
62
Enhancing Image and Video Retrieval: Learning via Equivalence Constraints
- In Proc. of CVPR
, 2003
"... This paper is about learning using partial information in the form of equivalence constraints. Equivalence constraints provide relational information about the labels of data points, rather than the labels themselves. Our work is motivated by the observation that in many real life applications parti ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
This paper is about learning using partial information in the form of equivalence constraints. Equivalence constraints provide relational information about the labels of data points, rather than the labels themselves. Our work is motivated by the observation that in many real life applications partial information about the data can be obtained with very little cost. For example, in video indexing we may want to use the fact that a sequence of faces obtained from successive frames in roughly the same location is likely to contain the same unknown individual.
Unlabeled Data Can Degrade Classification Performance of Generative Classifiers
- in Fifteenth International Florida Artificial Intelligence Society Conference
, 2002
"... This paper analyzes the effect of unlabeled training data in generative classifiers. We are interested in classification performance when unlabeled data are added to an existing pool of labeled data. We show that unlabeled data can degrade the performance of a classifier when there are discrepancies ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
This paper analyzes the effect of unlabeled training data in generative classifiers. We are interested in classification performance when unlabeled data are added to an existing pool of labeled data. We show that unlabeled data can degrade the performance of a classifier when there are discrepancies between modeling assumptions used to build the classifier and the actual model that generates the data
The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization
- 25TH INTERNATIONAL ACM SIGIR
, 2002
"... With the huge amount of information available electronically, there is an increasing demand for automatic text summarization systems. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties hav ..."
Abstract
-
Cited by 26 (11 self)
- Add to MetaCart
With the huge amount of information available electronically, there is an increasing demand for automatic text summarization systems. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting sentence segments and adopt the supervised learning paradigm which requires to label documents at the text span level. This is a costly process, which puts strong limitations on the applicability of these methods. We investigate here the use of semi-supervised algorithms for summarization. These techniques make use of few labeled data together with a larger amount of unlabeled data. We propose new semi-supervised algorithms for training classification models for text summarization. We analyze their performances on two data sets- the Reuters newswire corpus and the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC. We perform comparisons with a baseline – non learning – system, and a reference trainable summarizer system.
Semi-supervised regression with co-training style algorithms
, 2007
"... The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semi-sup ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning has attracted much attention. Previous research on semi-supervised learning mainly focuses on semi-supervised classification. Although regression is almost as important as classification, semi-supervised regression is largely understudied. In particular, although co-training is a main paradigm in semi-supervised learning, few works has been devoted to co-training style semi-supervised regression algorithms. In this paper, a co-training style semi-supervised regression algorithm, i.e. COREG, is proposed. This algorithm uses two regressors each labels the unlabeled data for the other regressor, where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean square error over the labeled neighborhood of that example. Analysis and experiments show that COREG can effectively exploit unlabeled data to improve regression estimates.
Active learning for anomaly and rare-category detection
- In Advances in Neural Information Processing Systems 18
, 2004
"... We introduce a novel active-learning scenario in which a user wants to work with a learning algorithm to identify useful anomalies. These are distinguished from the traditional statistical definition of anomalies as outliers or merely ill-modeled points. Our distinction is that the usefulness of ano ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
We introduce a novel active-learning scenario in which a user wants to work with a learning algorithm to identify useful anomalies. These are distinguished from the traditional statistical definition of anomalies as outliers or merely ill-modeled points. Our distinction is that the usefulness of anomalies is categorized subjectively by the user. We make two additional assumptions. First, there exist extremely few useful anomalies to be hunted down within a massive dataset. Second, both useful and useless anomalies may sometimes exist within tiny classes of similar anomalies. The challenge is thus to identify “rare category ” records in an unlabeled noisy set with help (in the form of class labels) from a human expert who has a small budget of datapoints that they are prepared to categorize. We propose a technique to meet this challenge, which assumes a mixture model fit to the data, but otherwise makes no assumptions on the particular form of the mixture components. This property promises wide applicability in real-life scenarios and for various statistical models. We give an overview of several alternative methods, highlighting their strengths and weaknesses, and conclude with a detailed empirical analysis. We show that our method can quickly zoom in on an anomaly set containing a few tens of points in a dataset of hundreds of thousands. 1
Improved Rooftop Detection in Aerial Images with Machine Learning
- Machine Learning
, 2002
"... In this paper, we examine the use of machine learning to improve a rooftop detection process, one step in a vision system that recognizes buildings in overhead imagery. We review the problem of analyzing aerial images and describe an existing system that detects buildings in such images. We briefly ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
In this paper, we examine the use of machine learning to improve a rooftop detection process, one step in a vision system that recognizes buildings in overhead imagery. We review the problem of analyzing aerial images and describe an existing system that detects buildings in such images. We briefly detail four algorithms that we selected to improve rooftop detection. The data sets were highly skewed and the cost of mistakes differed between the classes, so we used ROC analysis to evaluate the methods under varying error costs. We report three experiments designed to illuminate facets of applying machine learning to the image analysis task. One investigated learning with all available images to determine the best performing method. Another focused on within-image learning, in which we derived training and testing data from the same image. A final experiment addressed between-image learning, in which training and testing sets came from different images. Results suggest that useful generalization occurred when training and testing on data derived from images differing in location and in aspect. They demonstrate that under most conditions, naive Bayes exceeded the accuracy of other methods and a handcrafted classifier, the solution currently used in the building detection system.
Semi-Supervised Learning of Mixture Models and Bayesian Networks
- Networks, Proceedings of the Twentieth International Conference of Machine Learning
, 2003
"... This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. This behavior contradicts several empirical results repo ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. This behavior contradicts several empirical results reported in the literature. We present a mathematical analysis of this "degradation" phenomenon and show that it is due to the fact that bias may be adversely affected by unlabeled data.
Adjusting the Outputs of a Classifier to New a Priori Probabilities May Significantly Improve Classification Accuracy: Evidence from a Multi-Class Problem in Remote Sensing
- Neural Computation
, 2001
"... In the present study, we introduce a simple iterative procedure that allows to correct the outputs of a classifier with respect to the new a priori probabilities of a new data set to be scored, even when these new a priori probabilities are unknown in advance. We also show that a significant i ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In the present study, we introduce a simple iterative procedure that allows to correct the outputs of a classifier with respect to the new a priori probabilities of a new data set to be scored, even when these new a priori probabilities are unknown in advance. We also show that a significant increase in classification accuracy can be observed when using this procedure properly. More specifically, by applying the correcting procedure to the outputs of a simple logistic regression model, we observe an increase of 5.8% of classification rate on a di#cult real-world multi-class problem -- the automatic labeling of geographical maps based on remote sensing information.
Enhancing relevance feedback in image retrieval using unlabeled data
- ACM Transactions on Information Systems
, 2006
"... Relevance feedback is an effective scheme bridging the gap between high-level semantics and lowlevel features in content-based image retrieval (Cbir). In contrast to previous methods which rely on labeled images provided by the user, this paper attempts to enhance the performance of relevance feedba ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Relevance feedback is an effective scheme bridging the gap between high-level semantics and lowlevel features in content-based image retrieval (Cbir). In contrast to previous methods which rely on labeled images provided by the user, this paper attempts to enhance the performance of relevance feedback by exploiting unlabeled images existing in the database. Concretely, this paper integrates the merits of semi-supervised learning and active learning into the relevance feedback process. In detail, in each round of relevance feedback, two simple learners are trained from the labeled data, i.e. images from user query and user feedback. Each learner then labels some unlabeled images in the database for the other learner. After re-training with the additional labeled data, the learners classify the images in the database again and then their classifications are merged. Images judged to be positive with high confidence are returned as the retrieval result, while those judged with low confidence are put into the pool which is used in the next round of relevance feedback. Experiments show that using semi-supervised learning and active learning simultaneously in Cbir is beneficial, and the proposed method achieves better performance than some existing methods.

