Results 1  10
of
146
Active learning literature survey
, 2010
"... The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., ..."
Abstract

Cited by 136 (1 self)
 Add to MetaCart
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). Active learning is wellmotivated in many modern machine learning problems, where unlabeled data may be abundant but labels are difficult, timeconsuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for active learning, a summary of several problem setting variants, and a discussion
Combining Active Learning and SemiSupervised Learning Using Gaussian Fields and Harmonic Functions
 ICML 2003 workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining
, 2003
"... Active and semisupervised learning are important techniques when labeled data are scarce. We combine the two under a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The semisupervi ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
Active and semisupervised learning are important techniques when labeled data are scarce. We combine the two under a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The semisupervised learning problem is then formulated in terms of a Gaussian random field on this graph, the mean of which is characterized in terms of harmonic functions. Active learning is performed on top of the semisupervised learning scheme by greedily selecting queries from the unlabeled data to minimize the estimated expected classification error (risk); in the case of Gaussian fields the risk is efficiently computed using matrix methods. We present experimental results on synthetic data, handwritten digit recognition, and text classification tasks. The active learning scheme requires a much smaller number of queries to achieve high accuracy compared with random query selection. 1.
Bayesian inference and optimal design in the sparse linear model
 Workshop on Artificial Intelligence and Statistics
"... The linear model with sparsityfavouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of ..."
Abstract

Cited by 59 (12 self)
 Add to MetaCart
The linear model with sparsityfavouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from microarray expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007).
Rational explanation of the selection task
 Psychological Review
, 1996
"... M. Oaksford and N. Chater (O&C; 1994) presented the first quantitative model of P. C. Wason's ( 1966, 1968) selection task in.which performance is rational. J. St B T Evans and D. E. Over (1996) reply that O&C's account is normatively incorrect and cannot model K. N. Kirby's (1994b) or P. Pollard an ..."
Abstract

Cited by 46 (4 self)
 Add to MetaCart
M. Oaksford and N. Chater (O&C; 1994) presented the first quantitative model of P. C. Wason's ( 1966, 1968) selection task in.which performance is rational. J. St B T Evans and D. E. Over (1996) reply that O&C's account is normatively incorrect and cannot model K. N. Kirby's (1994b) or P. Pollard and J. St B T Evans's (1983) data. It is argued that an equivalent measure satisfies their normative concerns and that a modification of O&C's model accounts for their empirical concerns. D. Laming (1996) argues that O&C made unjustifiable psychological assumptions and that a "correct" Bayesian analysis agrees with logic. It is argued that O&C's model makes normative and psychological sense and that Laming's analysis is not Bayesian. A. Almor and S. A. Sloman (1996) argue that O&C cannot explain their data. It is argued that Almor and Sloman's data do not bear on O&C's model because they alter the nature of the task. It is concluded that O&C's model remains the most compelling and comprehensive account of the selection task. Research on Wason's (1966, 1968) selection task questions human rationality because performance is not "logically correct?' Recently, Oaksford and Chater (O&C; 1994) provided a rational analysis (Anderson, 1990, 1991) of the selection task that appeared to vindicate human rationality. O&C argued that the selection task is an inductive, rather than a deductive, reasoning task: Participants must assess the truth or falsity of a general rule from specific instances. In particular, participants face a problem of optimal data selection (Lindley, 1956): They must decide which of four cards (p, notp, q, or notq) is likely to provide the most useful data to test a conditional rule,/fp then q. The "logical " solution is to select the p and the notq cards. O&C argued that this solution presupposes falsificationism (Popper, 1959), which argues that only data that can disconfirm, not confirm, hypotheses are of interest. In contrast, O&C's rational analysis uses a Bayesian approach to inductive
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
"... Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regre ..."
Abstract

Cited by 46 (9 self)
 Add to MetaCart
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GPUCB, an intuitive upperconfidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GPUCB compares favorably with other heuristical GP optimization approaches. 1.
Bayesian Treed Gaussian Process Models with an Application to Computer Modeling
 Journal of the American Statistical Association
, 2007
"... This paper explores nonparametric and semiparametric nonstationary modeling methodologies that couple stationary Gaussian processes and (limiting) linear models with treed partitioning. Partitioning is a simple but effective method for dealing with nonstationarity. Mixing between full Gaussian proce ..."
Abstract

Cited by 44 (15 self)
 Add to MetaCart
This paper explores nonparametric and semiparametric nonstationary modeling methodologies that couple stationary Gaussian processes and (limiting) linear models with treed partitioning. Partitioning is a simple but effective method for dealing with nonstationarity. Mixing between full Gaussian processes and simple linear models can yield a more parsimonious spatial model while significantly reducing computational effort. The methodological developments and statistical computing details which make this approach efficient are described in detail. Illustrations of our model are given for both synthetic and real datasets. Key words: recursive partitioning, nonstationary spatial model, nonparametric regression, Bayesian model averaging 1
Active Learning of Causal Bayes Net Structure
, 2001
"... We propose a decision theoretic approach for deciding which interventions to perform so as to learn the causal structure of a model as quickly as possible. Without such interventions, it is impossible to distinguish between Markov equivalent models, even given infinite data. We perform online MCMC t ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
We propose a decision theoretic approach for deciding which interventions to perform so as to learn the causal structure of a model as quickly as possible. Without such interventions, it is impossible to distinguish between Markov equivalent models, even given infinite data. We perform online MCMC to estimate the posterior over graph structures, and use importance sampling to find the best action to perform at each step. We assume the data is discretevalued and fully observed.
Learning From Measurements in Exponential Families
"... Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints—both provide information about the desired model. In general, what is the most costeffective way to learn? To address this question, we introduce measurements, a general class ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints—both provide information about the desired model. In general, what is the most costeffective way to learn? To address this question, we introduce measurements, a general class of mechanisms for providing information about a target model. We present a Bayesian decisiontheoretic framework, which allows us to both integrate diverse measurements and choose new measurements to make. We use a variational inference algorithm, which exploits exponential family duality. The merits of our approach are demonstrated on two sequence labeling tasks. 1.
Learning and Classifying under Hard Budgets
 In Proceedings of the European Conference on Machine Learning (ECML05
, 2005
"... Abstract. Since resources for data acquisition are seldom infinite, both learners and classifiers must act intelligently under hard budgets. In this paper, we consider problems in which feature values are unknown to both the learner and classifier, but can be acquired at a cost. Our goal is a learne ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
Abstract. Since resources for data acquisition are seldom infinite, both learners and classifiers must act intelligently under hard budgets. In this paper, we consider problems in which feature values are unknown to both the learner and classifier, but can be acquired at a cost. Our goal is a learner that spends its fixed learning budget bL acquiring training data, to produce the most accurate “active classifier ” that spends at most bC per instance. To produce this fixedbudget classifier, the fixedbudget learner must sequentially decide which feature values to collect to learn the relevant information about the distribution. We explore several approaches the learner can take, including the standard “round robin” policy (purchasing every feature of every instance until the bL budget is exhausted). We demonstrate empirically that round robin is problematic (especially for small bL), and provide alternate learning strategies that achieve superior performance on a variety of datasets. 1
Robust submodular observation selection
, 2008
"... In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations wh ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations which are robust against a number of possible objective functions. Examples include minimizing the maximum posterior variance in Gaussian Process regression, robust experimental design, and sensor placement for outbreak detection. In this paper, we present the Submodular Saturation algorithm, a simple and efficient algorithm with strong theoretical approximation guarantees for cases where the possible objective functions exhibit submodularity, an intuitive diminishing returns property. Moreover, we prove that better approximation algorithms do not exist unless NPcomplete problems admit efficient algorithms. We show how our algorithm can be extended to handle complex cost functions (incorporating nonunit observation cost or communication and path costs). We also show how the algorithm can be used to nearoptimally trade off expectedcase (e.g., the Mean Square Prediction Error in Gaussian Process regression) and worstcase (e.g., maximum predictive variance) performance. We show that many important machine learning problems fit our robust submodular observation selection formalism, and provide extensive empirical evaluation on several realworld problems. For Gaussian Process regression, our algorithm compares favorably with stateoftheart heuristics described in the geostatistics literature, while being simpler, faster and providing theoretical guarantees. For robust experimental design, our algorithm performs favorably compared to SDPbased algorithms.