Results 1  10
of
425
Bayesian Interpolation
 Neural Computation
, 1991
"... Although Bayesian analysis has been in use since Laplace, the Bayesian method of modelcomparison has only recently been developed in depth. In this paper, the Bayesian approach to regularisation and modelcomparison is demonstrated by studying the inference problem of interpolating noisy data. T ..."
Abstract

Cited by 721 (17 self)
 Add to MetaCart
(Show Context)
Although Bayesian analysis has been in use since Laplace, the Bayesian method of modelcomparison has only recently been developed in depth. In this paper, the Bayesian approach to regularisation and modelcomparison is demonstrated by studying the inference problem of interpolating noisy data. The concepts and methods described are quite general and can be applied to many other problems. Regularising constants are set by examining their posterior probability distribution. Alternative regularisers (priors) and alternative basis sets are objectively compared by evaluating the evidence for them. `Occam's razor' is automatically embodied by this framework. The way in which Bayes infers the values of regularising constants and noise levels has an elegant interpretation in terms of the effective number of parameters determined by the data set. This framework is due to Gull and Skilling. 1 Data modelling and Occam's razor In science, a central task is to develop and compare models to a...
Active Learning with Statistical Models
, 1995
"... For manytypes of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992# Cohn, 1994]. We then showhow the same principles may be used to select data for two alternative, statist ..."
Abstract

Cited by 677 (12 self)
 Add to MetaCart
(Show Context)
For manytypes of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992# Cohn, 1994]. We then showhow the same principles may be used to select data for two alternative, statisticallybased learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.
Improving generalization with active learning
 Machine Learning
, 1994
"... Abstract. Active learning differs from "learning from examples " in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples ..."
Abstract

Cited by 539 (1 self)
 Add to MetaCart
Abstract. Active learning differs from &quot;learning from examples &quot; in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples. In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning called selective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers &quot;useful. &quot; We test our implementation, called an SGnetwork, on three domains and observe significant improvement in generalization.
Bayesian Compressive Sensing
, 2007
"... The data of interest are assumed to be represented as Ndimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basisfunction coefficients associated with B. Compressive sensing ..."
Abstract

Cited by 327 (24 self)
 Add to MetaCart
The data of interest are assumed to be represented as Ndimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basisfunction coefficients associated with B. Compressive sensing is a framework whereby one does not measure one of the aforementioned Ndimensional signals directly, but rather a set of related measurements, with the new measurements a linear combination of the original underlying Ndimensional signal. The number of required compressivesensing measurements is typically much smaller than N, offering the potential to simplify the sensing system. Let f denote the unknown underlying Ndimensional signal, and g a vector of compressivesensing measurements, then one may approximate f accurately by utilizing knowledge of the (underdetermined) linear relationship between f and g, in addition to knowledge of the fact that f is compressible in B. In this paper we employ a Bayesian formalism for estimating the underlying signal f based on compressivesensing measurements g. The proposed framework has the following properties: (i) in addition to estimating the underlying signal f, “error bars ” are also estimated, these giving a measure of confidence in the inverted signal; (ii) using knowledge of the error bars, a principled means is provided for determining when a sufficient
Active learning literature survey
, 2010
"... The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., ..."
Abstract

Cited by 311 (1 self)
 Add to MetaCart
(Show Context)
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). Active learning is wellmotivated in many modern machine learning problems, where unlabeled data may be abundant but labels are difficult, timeconsuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for active learning, a summary of several problem setting variants, and a discussion
Heterogeneous uncertainty sampling for supervised learning
 In Proceedings of the 11th International Conference on Machine Learning (ICML
, 1994
"... Uncertainty sampling methods iteratively request class labels for training instances whose classes are uncertain despite the previous labeled instances. These methods can greatly reduce the number of instances that an expert need label. One problem with this approach is that the classifier best suit ..."
Abstract

Cited by 308 (3 self)
 Add to MetaCart
(Show Context)
Uncertainty sampling methods iteratively request class labels for training instances whose classes are uncertain despite the previous labeled instances. These methods can greatly reduce the number of instances that an expert need label. One problem with this approach is that the classifier best suited for an application may be too expensive to train or use during the selection of instances. We test the use of one classifier (a highly efficient probabilistic one) to select examples for training another (the C4.5 rule induction program). Despite being chosen by this heterogeneous approach, the uncertainty samples yielded classifiers with lower error rates than random samples ten times larger. 1
Intrinsic motivation systems for autonomous mental development
 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION
, 2007
"... Exploratory activities seem to be intrinsically rewarding for children and crucial for their cognitive development. Can a machine be endowed with such an intrinsic motivation system? This is the question we study in this paper, presenting a number of computational systems that try to capture this dr ..."
Abstract

Cited by 253 (56 self)
 Add to MetaCart
(Show Context)
Exploratory activities seem to be intrinsically rewarding for children and crucial for their cognitive development. Can a machine be endowed with such an intrinsic motivation system? This is the question we study in this paper, presenting a number of computational systems that try to capture this drive towards novel or curious situations. After discussing related research coming from developmental psychology, neuroscience, developmental robotics, and active learning, this paper presents the mechanism of Intelligent Adaptive Curiosity, an intrinsic motivation system which pushes a robot towards situations in which it maximizes its learning progress. This drive makes the robot focus on situations which are neither too predictable nor too unpredictable, thus permitting autonomous mental development. The complexity of the robot’s activities autonomously increases and complex developmental sequences selforganize without
The Evidence Framework Applied to Classification Networks
, 1992
"... Three Bayesian ideas are presented for supervised adaptive classifiers. First, it is argued that the output of a classifier should be obtained by marginalizing over the posterior distribution of the parameters; a simple approximation to this integral is proposed and demonstrated. This involves a &qu ..."
Abstract

Cited by 189 (13 self)
 Add to MetaCart
(Show Context)
Three Bayesian ideas are presented for supervised adaptive classifiers. First, it is argued that the output of a classifier should be obtained by marginalizing over the posterior distribution of the parameters; a simple approximation to this integral is proposed and demonstrated. This involves a "moderation" of the most probable classifier's outputs, and yields improved performance. Second, it is demonstrated that the Bayesian framework for model comparison described for regression models in MacKay (1992a,b) can also be applied to classification problems. This framework successfully chooses the magnitude of weight decay terms, and ranks solutions found using different numbers of hidden units. Third, an informationbased data selection criterion is derived and demonstrated within this framework.
A Comprehensive Survey of Fitness Approximation in Evolutionary Computation
, 2003
"... Evolutionary algorithms (EAs) have received increasing interests both in the academy and industry. One main difficulty in applying EAs to realworld applications is that EAs usually need a large number of fitness evaluations before a satisfying result can be obtained. However, fitness evaluations ar ..."
Abstract

Cited by 166 (10 self)
 Add to MetaCart
Evolutionary algorithms (EAs) have received increasing interests both in the academy and industry. One main difficulty in applying EAs to realworld applications is that EAs usually need a large number of fitness evaluations before a satisfying result can be obtained. However, fitness evaluations are not always straightforward in many realworld applications. Either an explicit fitness function does not exist, or the evaluation of the fitness is computationally very expensive. In both cases, it is necessary to estimate the fitness function by constructing an approximate model. In this paper, a comprehensive survey of the research on fitness approximation in evolutionary computation is presented. Main issues like approximation levels, approximate model management schemes, model construction techniques are reviewed. To conclude, open questions and interesting issues in the field are discussed.
Neural network exploration using optimal experiment design
 Neural Networks
, 1994
"... We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov [1972] and MacKay [1992], we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network lear ..."
Abstract

Cited by 163 (2 self)
 Add to MetaCart
(Show Context)
We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov [1972] and MacKay [1992], we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network learner. We demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely.We conclude that, while not a panacea, OEDbased query/action has muchto offer, especially in domains where its high computational costs can be tolerated.