Results 1 - 10
of
20
Active learning literature survey
, 2010
"... The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for active learning, a summary of several problem setting variants, and a discussion
Query rewriting using active learning for sponsored search
- In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
, 2002
"... Sponsored search is a major revenue source for search companies. Web searchers can issue any queries, while advertisement keywords are limited. Query rewriting technique effectively matches user queries with relevant advertisement keywords, thus increases the amount of web advertisements available. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Sponsored search is a major revenue source for search companies. Web searchers can issue any queries, while advertisement keywords are limited. Query rewriting technique effectively matches user queries with relevant advertisement keywords, thus increases the amount of web advertisements available. The match relevance is critical for clicks. In this study, we aim to improve query rewriting relevance. For this purpose, we use an active learning algorithm called Transductive Experimental Design to select the most informative samples to train the query rewriting relevance model. Experiments show that this approach improves model accuracy and rewriting relevance.
ABSTRACT Laplacian Optimal Design for Image Retrieval
"... Relevance feedback is a powerful technique to enhance Content-Based Image Retrieval (CBIR) performance. It solicits the user’s relevance judgments on the retrieved images returned by the CBIR systems. The user’s labeling is then used to learn a classifier to distinguish between relevant and irreleva ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Relevance feedback is a powerful technique to enhance Content-Based Image Retrieval (CBIR) performance. It solicits the user’s relevance judgments on the retrieved images returned by the CBIR systems. The user’s labeling is then used to learn a classifier to distinguish between relevant and irrelevant images. However, the top returned images may not be the most informative ones. The challenge is thus to determine which unlabeled images would be the most informative (i.e., improve the classifier the most) if they were labeled and used as training samples. In this paper, we propose a novel active learning algorithm, called Laplacian Optimal Design (LOD), for relevance feedback image retrieval. Our algorithm is based on a regression model which minimizes the least square error on the measured (or, labeled) images and simultaneously preserves the local geometrical structure of the image space. Specifically, we assume that if two images are sufficiently close to each other, then their measurements (or, labels) are close as well. By constructing a nearest neighbor graph, the geometrical structure of the image space can be described by the graph Laplacian. We discuss how results from the field of optimal experimental design may be used to guide our selection of a subset of images, which gives us the most amount of information. Experimental results on Corel database suggest that the proposed approach achieves higher precision in relevance feedback image retrieval. Categories and Subject Descriptors H.3.3 [Information storage and retrieval]: Information search and retrieval—Relevance feedback; G.3 [Mathematics
Advertising keyword generation using active learning
- In WWW’09
, 2009
"... This paper proposes an efficient relevance feedback based interactive model for keyword generation in sponsored search advertising. We formulate the ranking of relevant terms as a supervised learning problem and suggest new terms for the seed by leveraging user relevance feedback information. Active ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper proposes an efficient relevance feedback based interactive model for keyword generation in sponsored search advertising. We formulate the ranking of relevant terms as a supervised learning problem and suggest new terms for the seed by leveraging user relevance feedback information. Active learning is employed to select the most informative samples from a set of candidate terms for user labeling. Experiments show our approach improves the relevance of generated terms significantly with little user effort required.
Convex Experimental Design Using Manifold Structure for Image Retrieval
"... Content Based Image Retrieval (CBIR) has become one of the most active research areas in computer science. Relevance feedback is often used in CBIR systems to bridge the semantic gap. Typically, users are asked to make relevance judgements on some query results, and the feedback information is then ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Content Based Image Retrieval (CBIR) has become one of the most active research areas in computer science. Relevance feedback is often used in CBIR systems to bridge the semantic gap. Typically, users are asked to make relevance judgements on some query results, and the feedback information is then used to re-rank the images in the database. An effective relevance feedback algorithm must provide the users with the most informative images with respect to the ranking function. In this paper, we propose a novel active learning algorithm, called Convex Laplacian Regularized I-optimal Design (CLapRID), for relevance feedback image retrieval. Our algorithm is based on a regression model which minimizes the least square error on the labeled images and simultaneously preserves the intrinsic geometrical structure of the image space. It selects the most informative images which minimize the average predictive variance. The optimization problem of CLapRID can be cast as a semidefinite programming (SDP) problem, and solved via interior-point methods. Experimental results on COREL database have demonstrate the effectiveness of the proposed algorithm for relevance feedback image retrieval. Categories and Subject Descriptors H.3.3 [Information storage and retrieval]: Information search and retrieval—Relevance feedback; G.3 [Mathematics of Computing]: Probability and Statistics—Experimental design
Active Subspace Learning
"... Many previous studies have shown that naturally occurring data cannot possibly fill up the high dimensional space uniformly, rather it must concentrate around lower dimensional structure. The typical supervised subspace learning algorithms to discover this low dimensional structure include Linear Di ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Many previous studies have shown that naturally occurring data cannot possibly fill up the high dimensional space uniformly, rather it must concentrate around lower dimensional structure. The typical supervised subspace learning algorithms to discover this low dimensional structure include Linear Discriminant Analysis (LDA). For LDA, the training data points are usually pre-given. However, in some real world applications like relevance feedback image retrieval, there is opportunity to interact with the user and actively select the training points for labeling. In this paper, we propose a novel active subspace learning algorithm which selects the most informative data points and uses them for learning an optimal subspace. Using techniques from experimental design, we discuss how to perform data selection in supervised or semi-supervised subspace learning by minimizing the expected error. Experiments on image retrieval show improvement over state-of-the-art methods. 1.
Laplacian Regularized D-Optimal Design for Active Learning and Its Application to Image Retrieval
"... Abstract—In increasingly many cases of interest in computer vision and pattern recognition, one is often confronted with the situation where data size is very large. Usually, the labels are expensive and the challenge is, thus, to determine which unlabeled samples would be the most informative (i.e. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—In increasingly many cases of interest in computer vision and pattern recognition, one is often confronted with the situation where data size is very large. Usually, the labels are expensive and the challenge is, thus, to determine which unlabeled samples would be the most informative (i.e., improve the classifier the most) if they were labeled and used as training samples. Particularly, we consider the problem of active learning of a regression model in the context of experimental design. Classical optimal experimental design approaches are based on least square errors over the measured samples only. They fail to take into account the unmeasured samples. In this paper, we propose a novel active learning algorithm which operates over graphs. Our algorithm is based on a graph Laplacian regularized regression model which simultaneously minimizes the least square error on the measured samples and preserves the local geometrical structure of the data space. By constructing a nearest neighbor graph, the geometrical structure of the data space can be described by the graph Laplacian. We discuss how results from the field of optimal experimental design may be used to guide our selection of a subset of data points, which gives us the most amount of information. Experiments demonstrate its superior performance in comparison with conventional algorithms. Index Terms—Active learning, experimental design, image retrieval, regularization. I.
Active Learning for Personalizing Treatment
"... Abstract—The personalization of treatment via genetic biomarkers and other risk categories has drawn increasing interest among clinical researchers and scientists. A major challenge here is to construct individualized treatment rules (ITR), which recommend the best treatment for each of the differen ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—The personalization of treatment via genetic biomarkers and other risk categories has drawn increasing interest among clinical researchers and scientists. A major challenge here is to construct individualized treatment rules (ITR), which recommend the best treatment for each of the different categories of individuals. In general, ITRs can be constructed using data from clinical trials, however these are generally very costly to run. In order to reduce the cost of learning an ITR, we explore active learning techniques designed to carefully decide whom to recruit, and which treatment to assign, throughout the online conduct of the clinical trial. As an initial investigation, we focus on simple ITRs that utilize a small number of subpopulation categories to personalize treatment. To minimize the maximal uncertainty regarding the treatment effects for each subpopulation, we propose the use of a minimax bandit model and provide an active learning policy for solving it. We evaluate our active learning policy using simulated data and data modeled after a clinical trial involving treatments for depressed individuals. We contrast this policy with other plausible active learning policies. The techniques presented in the paper may be generalized to tackle problems of efficient exploration in other domains. I.
Efficient Manifold Ranking for Image Retrieval
, 2011
"... Manifold Ranking (MR), a graph-based ranking algorithm, has been widely applied in information retrieval and shown to have excellent performance and feasibility on a variety of data types. Particularly, it has been successfully applied to content-based image retrieval, because of its outstanding abi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Manifold Ranking (MR), a graph-based ranking algorithm, has been widely applied in information retrieval and shown to have excellent performance and feasibility on a variety of data types. Particularly, it has been successfully applied to content-based image retrieval, because of its outstanding ability to discover underlying geometrical structure of the given image database. However, manifold ranking is computationally very expensive, both in graph construction and ranking computation stages, which significantly limits its applicability to very large data sets. In this paper, we extend the original manifold ranking algorithm and propose a new framework named Efficient Manifold Ranking (EMR). We aim to address the shortcomings of MR from two perspectives: scalable graph construction and efficient computation. Specifically, we build an anchor graph on the data set instead of the traditional k-nearest neighbor graph, and design a new form of adjacency matrix utilized to speed up the ranking computation. The experimental results on a real world image database demonstrate the effectiveness and efficiency of our proposed method. With a comparable performance to the original manifold ranking, our method significantly reduces the computational time, makes it a promising method to large scale real world retrieval problems.
Feature Selection for Gene Expression using Model-based Entropy
"... Abstract—Gene expression data usually contain a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. Using machine learning techniques, traditional gene selection ..."
Abstract
- Add to MetaCart
Abstract—Gene expression data usually contain a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. Using machine learning techniques, traditional gene selection based on empirical mutual information suffers the data sparseness issue due to the small number of samples. To overcome the sparseness issue, we propose a model-based approach to estimate the entropy of class variables on the model, instead of on the data themselves. Here, we use multivariate normal distributions to fit the data, because multivariate normal distributions have maximum entropy among all real-valued distributions with specified mean and standard deviation, and are widely used to approximate various distributions. Given that the data follow a multivariate normal distribution, since the conditional distribution of class variables given the selected features is normal distribution, its entropy can be computed with the log-determinant of its covariance matrix. Because of the large number of genes, the computation of all possible log-determinants is not efficient. We propose several algorithms to largely reduce the computational cost. The experiments on seven gene datasets and the comparison with other five approaches show the accuracy of the multivariate Gaussian generative model for feature selection, and the efficiency of our algorithms.

