Results 1  10
of
32
InformationBased Objective Functions for Active Data Selection
 Neural Computation
"... Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed which measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain infor ..."
Abstract

Cited by 327 (5 self)
 Add to MetaCart
Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed which measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain information about lead to three different criteria for data selection. All these criteria depend on the assumption that the hypothesis space is correct, which may prove to be their main weakness. 1 Introduction Theories for data modelling often assume that the data is provided by a source that we do not control. However, there are two scenarios in which we are able to actively select training data. In the first, data measurements are relatively expensive or slow, and we want to know where to look next so as to learn as much as possible. According to Jaynes (1986), Bayesian reasoning was first applied to this problem two centuries ago by Laplace, who in consequence made more important discoveries...
Heterogeneous uncertainty sampling for supervised learning
 In Proceedings of the 11th International Conference on Machine Learning (ICML
, 1994
"... Uncertainty sampling methods iteratively request class labels for training instances whose classes are uncertain despite the previous labeled instances. These methods can greatly reduce the number of instances that an expert need label. One problem with this approach is that the classifier best suit ..."
Abstract

Cited by 235 (3 self)
 Add to MetaCart
Uncertainty sampling methods iteratively request class labels for training instances whose classes are uncertain despite the previous labeled instances. These methods can greatly reduce the number of instances that an expert need label. One problem with this approach is that the classifier best suited for an application may be too expensive to train or use during the selection of instances. We test the use of one classifier (a highly efficient probabilistic one) to select examples for training another (the C4.5 rule induction program). Despite being chosen by this heterogeneous approach, the uncertainty samples yielded classifiers with lower error rates than random samples ten times larger. 1
The Evidence Framework applied to Classification Networks
 Neural Computation
, 1992
"... Three Bayesian ideas are presented for supervised adaptive classifiers. First, it is argued that the output of a classifier should be obtained by marginalising over the posterior distribution of the parameters; a simple approximation to this integral is proposed and demonstrated. This involves a `mo ..."
Abstract

Cited by 154 (10 self)
 Add to MetaCart
Three Bayesian ideas are presented for supervised adaptive classifiers. First, it is argued that the output of a classifier should be obtained by marginalising over the posterior distribution of the parameters; a simple approximation to this integral is proposed and demonstrated. This involves a `moderation' of the most probable classifier 's outputs, and yields improved performance. Second, it is demonstrated that the Bayesian framework for model comparison described for regression models in (MacKay, 1992a, 1992b) can also be applied to classification problems. This framework successfully chooses the magnitude of weight decay terms, and ranks solutions found using different numbers of hidden units. Third, an informationbased data selection criterion is derived and demonstrated within this framework. 1 Introduction A quantitative Bayesian framework has been described for learning of mappings in feedforward networks (MacKay, 1992a, 1992b). It was demonstrated that this `evidence' fram...
Reinforcement Driven Information Acquisition In NonDeterministic Environments
 ICANN'95
, 1995
"... For an agent living in a nondeterministic Markov environment (NME), what is, in theory, the fastest way of acquiring information about its statistical properties? The answer is: to design "optimal" sequences of "experiments" by performing action sequences that maximize expected information gain. Th ..."
Abstract

Cited by 47 (20 self)
 Add to MetaCart
For an agent living in a nondeterministic Markov environment (NME), what is, in theory, the fastest way of acquiring information about its statistical properties? The answer is: to design "optimal" sequences of "experiments" by performing action sequences that maximize expected information gain. This notion is implemented by combining concepts from information theory and reinforcement learning. Experiments show that the resulting method, reinforcement driven information acquisition, can explore certain NMEs much faster than conventional random exploration.
Active Learning with Multiple Views
, 2002
"... Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multiview domains, in which there are several disjoint subsets of features (views), each of which i ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multiview domains, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept. In this paper we make several contributions. First, we introduce CoTesting, which is the first approach to multiview active learning. Second, we extend the multiview learning framework by also exploiting weak views, which are adequate only for learning a concept that is more general/specific than the target concept. Finally, we empirically show that CoTesting outperforms existing active learners on a variety of real world domains such as wrapper induction, Web page classification, advertisement removal, and discourse tree parsing. 1.
Formal Theory of Creativity, Fun, and Intrinsic Motivation (19902010)
"... The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditional fiel ..."
Abstract

Cited by 33 (14 self)
 Add to MetaCart
The simple but general formal theory of fun & intrinsic motivation & creativity (1990) is based on the concept of maximizing intrinsic reward for the active creation or discovery of novel, surprising patterns allowing for improved prediction or data compression. It generalizes the traditional field of active learning, and is related to old but less formal ideas in aesthetics theory and developmental psychology. It has been argued that the theory explains many essential aspects of intelligence including autonomous development, science, art, music, humor. This overview first describes theoretically optimal (but not necessarily practical) ways of implementing the basic computational principles on exploratory, intrinsically motivated agents or robots, encouraging them to provoke event sequences exhibiting previously unknown but learnable algorithmic regularities. Emphasis is put on the importance of limited computational resources for online prediction and compression. Discrete and continuous time formulations are given. Previous practical but nonoptimal implementations (1991, 1995, 19972002) are reviewed, as well as several recent variants by others (2005). A simplified typology addresses current confusion concerning the precise nature of intrinsic motivation.
Active Learning with Local Models
 Neural Processing Letters
, 1998
"... In this contribution, we deal with active learning, which gives the learner the power to select training samples. We propose a novel query algorithm for local learning models, a class of learners that has not been considered in the context of active learning until now. Our query algorithm is based o ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
In this contribution, we deal with active learning, which gives the learner the power to select training samples. We propose a novel query algorithm for local learning models, a class of learners that has not been considered in the context of active learning until now. Our query algorithm is based on the idea of selecting a query on the borderline of the actual classification. This is done by drawing on the geometrical properties of local models that typically induce a Voronoi tessellation on the input space, so that the Voronoi vertices of this tessellation offer themselves as prospective query points. The performance of the new query algorithm is tested on the twospirals problem with promising results. Keywords: active learning, local models, query based learning, vector quantization 1 Introduction In supervised learning, we are interested in training a student on a set of input output pairs generated by an unknown target function in such a way that the student does not only re...
Exploring the Predictable
, 2002
"... Details of complex event sequences are often not predictable, but their reduced abstract representations are. I study an embedded active learner that can limit its predictions to almost arbitrary computable aspects of spatiotemporal events. It constructs probabilistic algorithms that (1) control in ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
Details of complex event sequences are often not predictable, but their reduced abstract representations are. I study an embedded active learner that can limit its predictions to almost arbitrary computable aspects of spatiotemporal events. It constructs probabilistic algorithms that (1) control interaction with the world, (2) map event sequences to abstract internal representations (IRs), (3) predict IRs from IRs computed earlier. Its goal is to create novel algorithms generating IRs useful for correct IR predictions, without wasting time on those learned before. This requires an adaptive novelty measure which is implemented by a coevolutionary scheme involving two competing modules collectively designing (initially random) algorithms representing experiments. Using special instructions, the modules can bet on the outcome of IR predictions computed by algorithms they have agreed upon. If their opinions dier then the system checks who's right, punishes the loser (the surprised one), and rewards the winner. An evolutionary or reinforcement learning algorithm forces each module to maximize reward. This motivates both modules to lure each other into agreeing upon experiments involving predictions that surprise it. Since each module essentially can veto experiments it does not consider profitable, the system is motivated to focus on those computable aspects of the environment where both modules still have confident but different opinions. Once both share the same opinion on a particular issue (via the loser's learning process, e.g., the winner is simply copied onto the loser), the winner loses a source of reward  an incentive to shift the focus of interest onto novel experiments. My simulations include an example where surprisegeneration of this kind helps to speed up ...
What's Interesting?
, 1997
"... Interestingness depends on the observer's current knowledge and computational abilities. Things are boring if either too much or too little is known about them  if they appear either trivial or random. Interesting are unexpected regularities that seem easy to figure out. I attempt to implement th ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
Interestingness depends on the observer's current knowledge and computational abilities. Things are boring if either too much or too little is known about them  if they appear either trivial or random. Interesting are unexpected regularities that seem easy to figure out. I attempt to implement these ideas in a "curious", "creative" explorer with two coevolving "brains". It executes a lifelong sequence of instructions whose modifiable probabilities are conditioned on both brains  both must agree on each instruction. There are special instructions for comparing computational results. The brains can predict outcomes of such comparisons. If their opinions differ, then the winner will get rewarded, the loser punished. Hence each brain wants to lure the other into agreeing upon instruction subsequences involving comparisons that surprise it. The surprised brain adapts. In turn, the other loses a source of reward  an incentive to shift the focus of interest. Both brains deal with the...
Gaussian Processes for Active Data Mining of Spatial Aggregates
 In Proceedings of the SIAM International Conference on Data Mining
, 2005
"... We present an active data mining mechanism for qualitative analysis of spatial datasets, integrating identification and analysis of structures in spatial data with targeted collection of additional samples. The mechanism is designed around the spatial aggregation language (SAL) for qualitative ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
We present an active data mining mechanism for qualitative analysis of spatial datasets, integrating identification and analysis of structures in spatial data with targeted collection of additional samples. The mechanism is designed around the spatial aggregation language (SAL) for qualitative spatial reasoning, and seeks to uncover highlevel spatial structures from only a sparse set of samples. This approach is important for applications in domains such as aircraft design, wireless system simulation, fluid dynamics, and sensor networks. The mechanism employs Gaussian processes, a formal mathematical model for reasoning about spatial data, in order to build surrogate models from sparse data, reason about the uncertainty of estimation at unsampled points, and formulate objective criteria for closingtheloop between data collection and data analysis. It optimizes sample selection using entropybased functionals defined over spatial aggregates instead of the traditional approach of sampling to minimize estimated variance. We apply this mechanism on a global optimization benchmark comprising a testbank of 2D functions, as well as on data from wireless system simulations. The results reveal that the proposed sampling strategy makes more judicious use of data points by selecting locations that clarify highlevel structures in data, rather than choosing points that merely improve quality of function approximation.