Results 1  10
of
550
Active Learning with Statistical Models
, 1995
"... For manytypes of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992# Cohn, 1994]. We then showhow the same principles may be used to select data for two alternative, statist ..."
Abstract

Cited by 679 (12 self)
 Add to MetaCart
(Show Context)
For manytypes of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992# Cohn, 1994]. We then showhow the same principles may be used to select data for two alternative, statisticallybased learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.
InformationBased Objective Functions for Active Data Selection
 Neural Computation
"... Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed which measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain infor ..."
Abstract

Cited by 430 (5 self)
 Add to MetaCart
(Show Context)
Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed which measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain information about lead to three different criteria for data selection. All these criteria depend on the assumption that the hypothesis space is correct, which may prove to be their main weakness. 1 Introduction Theories for data modelling often assume that the data is provided by a source that we do not control. However, there are two scenarios in which we are able to actively select training data. In the first, data measurements are relatively expensive or slow, and we want to know where to look next so as to learn as much as possible. According to Jaynes (1986), Bayesian reasoning was first applied to this problem two centuries ago by Laplace, who in consequence made more important discoveries...
Selective sampling using the Query by Committee algorithm
 Machine Learning
, 1997
"... We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the twomember committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the numbe ..."
Abstract

Cited by 429 (7 self)
 Add to MetaCart
We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the twomember committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of perceptrons.
Query by Committee
, 1992
"... We propose an algorithm called query by committee, in which a committee of students is trained on the same data set. The next query is chosen according to the principle of maximal disagreement. The algorithm is studied for two toy models: the highlow game and perceptron learning of another perceptr ..."
Abstract

Cited by 428 (3 self)
 Add to MetaCart
We propose an algorithm called query by committee, in which a committee of students is trained on the same data set. The next query is chosen according to the principle of maximal disagreement. The algorithm is studied for two toy models: the highlow game and perceptron learning of another perceptron. As the number of queries goes to infinity, the committee algorithm yields asymptotically finite information gain. This leads to generalization error that decreases exponentially with the number of examples. This in marked contrast to learning from randomly chosen inputs, for which the information gain approaches zero and the generalization error decreases with a relatively slow inverse power law. We suggest that asymptotically finite information gain may be an important characteristic of good query algorithms.
Bayesian Experimental Design: A Review
 Statistical Science
, 1995
"... This paper reviews the literature on Bayesian experimental design, both for linear and nonlinear models. A unified view of the topic is presented by putting experimental design in a decision theoretic framework. This framework justifies many optimality criteria, and opens new possibilities. Various ..."
Abstract

Cited by 310 (1 self)
 Add to MetaCart
This paper reviews the literature on Bayesian experimental design, both for linear and nonlinear models. A unified view of the topic is presented by putting experimental design in a decision theoretic framework. This framework justifies many optimality criteria, and opens new possibilities. Various design criteria become part of a single, coherent approach.
Intrinsic motivation systems for autonomous mental development
 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION
, 2007
"... Exploratory activities seem to be intrinsically rewarding for children and crucial for their cognitive development. Can a machine be endowed with such an intrinsic motivation system? This is the question we study in this paper, presenting a number of computational systems that try to capture this dr ..."
Abstract

Cited by 253 (56 self)
 Add to MetaCart
Exploratory activities seem to be intrinsically rewarding for children and crucial for their cognitive development. Can a machine be endowed with such an intrinsic motivation system? This is the question we study in this paper, presenting a number of computational systems that try to capture this drive towards novel or curious situations. After discussing related research coming from developmental psychology, neuroscience, developmental robotics, and active learning, this paper presents the mechanism of Intelligent Adaptive Curiosity, an intrinsic motivation system which pushes a robot towards situations in which it maximizes its learning progress. This drive makes the robot focus on situations which are neither too predictable nor too unpredictable, thus permitting autonomous mental development. The complexity of the robot’s activities autonomously increases and complex developmental sequences selforganize without
Neural network exploration using optimal experiment design
 Neural Networks
, 1994
"... We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov [1972] and MacKay [1992], we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network lear ..."
Abstract

Cited by 163 (2 self)
 Add to MetaCart
(Show Context)
We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov [1972] and MacKay [1992], we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network learner. We demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely.We conclude that, while not a panacea, OEDbased query/action has muchto offer, especially in domains where its high computational costs can be tolerated.
Fast Kernel Classifiers With Online And Active Learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This ..."
Abstract

Cited by 153 (18 self)
 Add to MetaCart
Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This contribution proposes an empirical answer. We first present an online SVM algorithm based on this premise. LASVM yields competitive misclassification rates after a single pass over the training examples, outspeeding stateoftheart SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.
Some New Three Level Designs for the Study of Quantitative Variables
, 1960
"... This article describes some methods which enable us to construct small designs for quantitative factors, while maintaining as much orthogonality of the design as possible. To calculate the D ..."
Abstract

Cited by 143 (1 self)
 Add to MetaCart
This article describes some methods which enable us to construct small designs for quantitative factors, while maintaining as much orthogonality of the design as possible. To calculate the D
Covariate shift adaptation by importance weighted cross validation
, 2000
"... A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapo ..."
Abstract

Cited by 121 (53 self)
 Add to MetaCart
(Show Context)
A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distributions while the conditional distribution of output values given input points is unchanged is called the covariate shift. Under the covariate shift, standard model selection techniques such as cross validation do not work as desired since its unbiasedness is no longer maintained. In this paper, we propose a new method called importance weighted cross validation (IWCV), for which we prove its unbiasedness even under the covariate shift. The IWCV procedure is the only one that can be applied for unbiased classification under covariate shift, whereas alternatives to IWCV exist for regression. The usefulness of our proposed method is illustrated by simulations, and furthermore demonstrated in the braincomputer interface, where strong nonstationarity effects can be seen between training and test sessions. c2000 Masashi Sugiyama, Matthias Krauledat, and KlausRobert Müller.