Results 1  10
of
305
Active learning of inverse models with intrinsically motivated goal exploration in robots
 ROBOTICS AND AUTONOMOUS SYSTEMS
, 2013
"... ..."
(Show Context)
Active learning by labeling features
 In Proc. of EMNLP
, 2009
"... Methods that learn from prior information about input features such as generalized expectation (GE) have been used to train accurate models with very little effort. In this paper, we propose an active learning approach in which the machine solicits “labels ” on features rather than instances. In bot ..."
Abstract

Cited by 43 (11 self)
 Add to MetaCart
(Show Context)
Methods that learn from prior information about input features such as generalized expectation (GE) have been used to train accurate models with very little effort. In this paper, we propose an active learning approach in which the machine solicits “labels ” on features rather than instances. In both simulated and real user experiments on two sequence labeling tasks we show that our active learning method outperforms passive learning with features as well as traditional active learning with instances. Preliminary experiments suggest that novel interfaces which intelligently solicit labels on multiple features facilitate more efficient annotation. 1
Active Learning for Reward Estimation in Inverse Reinforcement Learning
, 2009
"... Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonst ..."
Abstract

Cited by 42 (14 self)
 Add to MetaCart
(Show Context)
Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at “arbitrary” states. The purpose of our algorithm is to estimate the reward function with similar accuracy as other methods from the literature while reducing the amount of policy samples required from the expert. We also discuss the use of our algorithm in higher dimensional problems, using both Monte Carlo and gradient methods. We present illustrative results of our algorithm in several simulated examples of different complexities.
Active Learning for Networked Data
"... We introduce a novel active learning algorithm for classification of network data. In this setting, training instances are connected by a set of links to form a network, the labels of linked nodes are correlated, and the goal is to exploit these dependencies and accurately label the nodes. This prob ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
We introduce a novel active learning algorithm for classification of network data. In this setting, training instances are connected by a set of links to form a network, the labels of linked nodes are correlated, and the goal is to exploit these dependencies and accurately label the nodes. This problem arises in many domains, including social and biological network analysis and document classification, and there has been much recent interest in methods that collectively classify the nodes in the network. While in many cases labeled examples are expensive, often network information is available. We show how an active learning algorithm can take advantage of network structure. Our algorithm effectively exploits the links between instances and the interaction between the local and collective aspects of a classifier to improve the accuracy of learning from fewer labeled examples. We experiment with two realworld benchmark collective classification domains, and show that we are able to achieve extremely accurate results even when only a small fraction of the data is labeled. 1.
Active coanalysis of a set of shapes
 ACM Trans. on Graph (SIGGRAPH Asia
, 2012
"... Figure 1: Overview of our active coanalysis: (a) We start with an initial unsupervised cosegmentation of the input set. (b) During active learning, the system automatically suggests constraints which would refine results and the user interactively adds constraints as appropriate. In this example, ..."
Abstract

Cited by 36 (11 self)
 Add to MetaCart
Figure 1: Overview of our active coanalysis: (a) We start with an initial unsupervised cosegmentation of the input set. (b) During active learning, the system automatically suggests constraints which would refine results and the user interactively adds constraints as appropriate. In this example, the user adds a cannotlink constraint (in red) and a mustlink constraint (in blue) between segments. (c) The constraints are propagated to the set and the cosegmentation is refined. The process from (b) to (c) is repeated until the desired result is obtained. Unsupervised coanalysis of a set of shapes is a difficult problem since the geometry of the shapes alone cannot always fully describe the semantics of the shape parts. In this paper, we propose a semisupervised learning method where the user actively assists in the coanalysis by iteratively providing inputs that progressively constrain the system. We introduce a novel constrained clustering method based on a spring system which embeds elements to better respect their interdistances in feature space together with the usergiven set of constraints. We also present an active learning method that suggests to the user where his input is likely to be the most effective in refining the results. We show that each single pair of constraints affects many relations across the set. Thus, the method requires only a sparse set of constraints to quickly converge toward a consistent and errorfree semantic labeling of the set.
CoTraining for Domain Adaptation
"... Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we intr ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
(Show Context)
Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of cotraining [7], and we name it CODA (Cotraining for domain adaptation). Unlike the original cotraining work, we do not assume a particular feature split. Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly outperforms the stateoftheart on the 12domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. 1
Pairwise Ranking Aggregation in a Crowdsourced Setting
"... Inferring rankings over elements of a set of objects, such as documents or images, is a key learning problem for such important applications as Web search and recommender systems. Crowdsourcing services provide an inexpensive and efficient means to acquire preferences over objects via labeling by se ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
Inferring rankings over elements of a set of objects, such as documents or images, is a key learning problem for such important applications as Web search and recommender systems. Crowdsourcing services provide an inexpensive and efficient means to acquire preferences over objects via labeling by sets of annotators. We propose a new model to predict a goldstandard ranking that hinges on combining pairwise comparisons via crowdsourcing. In contrast to traditional ranking aggregation methods, the approach learns about and folds into consideration the quality of contributions of each annotator. In addition, we minimize the cost of assessment by introducing a generalization of the traditional active learning scenario to jointly select the annotator and pair to assess while taking into account the annotator quality, the uncertainty over ordering of the pair, and the current model uncertainty. We formalize this as an active learning strategy that incorporates an explorationexploitation tradeoff and implement it using an efficient online Bayesian updating scheme. Using simulated and realworld data, we demonstrate that the active learning strategy achieves significant reductions in labeling cost while maintaining accuracy.
Hashing Hyperplane Queries to Near Points with Applications to LargeScale Active Learning
"... We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to twobit binary keys that are localitysensitive for the angle between the hyperplane no ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to twobit binary keys that are localitysensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sublinear time. Our first method’s preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to poolbased active learning: taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the wellknown minimal distancetohyperplane selection criterion. We empirically demonstrate our methods ’ tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points. 1
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize longterm utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce shortterm utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better gameplaying strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach nearoptimal strategies
Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing
"... We consider the budget allocation problem in binary/multiclass crowd labeling where each label from the crowd has a certain cost. Since different instances have different ambiguities and different workers have different reliabilities, a fundamental challenge here is how to allocate a prefixed amou ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
We consider the budget allocation problem in binary/multiclass crowd labeling where each label from the crowd has a certain cost. Since different instances have different ambiguities and different workers have different reliabilities, a fundamental challenge here is how to allocate a prefixed amount of budget among instanceworker pairs so that the overall accuracy can be maximized. We start with a simple setting where all workers are assumed to be identical and formulate the problem as a Bayesian Markov Decision Process (MDP). Using the dynamic programming (DP) algorithm, one can obtain the optimal allocation policy for any given budget. However, DP is computationally intractable. To address the computational challenge, we propose a new approximate policy, called optimistic knowledge gradient. The consistency of the proposed policy is established. Then we extend the MDP framework to incorporate estimating the reliabilities of workers into the allocation process when workers are no longer identical. We conduct simulated and real experiments to demonstrate the superiority of our policy in different crowd labeling tasks.