Results 1 - 10
of
85
Combining Active Learning and Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions
- ICML 2003 workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining
, 2003
"... Active and semi-supervised learning are important techniques when labeled data are scarce. We combine the two under a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The semi-supervi ..."
Abstract
-
Cited by 59 (4 self)
- Add to MetaCart
Active and semi-supervised learning are important techniques when labeled data are scarce. We combine the two under a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The semi-supervised learning problem is then formulated in terms of a Gaussian random field on this graph, the mean of which is characterized in terms of harmonic functions. Active learning is performed on top of the semisupervised learning scheme by greedily selecting queries from the unlabeled data to minimize the estimated expected classification error (risk); in the case of Gaussian fields the risk is efficiently computed using matrix methods. We present experimental results on synthetic data, handwritten digit recognition, and text classification tasks. The active learning scheme requires a much smaller number of queries to achieve high accuracy compared with random query selection. 1.
Active learning literature survey
, 2010
"... The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for active learning, a summary of several problem setting variants, and a discussion
Bayesian inference and optimal design in the sparse linear model
- Workshop on Artificial Intelligence and Statistics
"... The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007).
Active Learning of Causal Bayes Net Structure
, 2001
"... We propose a decision theoretic approach for deciding which interventions to perform so as to learn the causal structure of a model as quickly as possible. Without such interventions, it is impossible to distinguish between Markov equivalent models, even given infinite data. We perform online MCMC t ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
We propose a decision theoretic approach for deciding which interventions to perform so as to learn the causal structure of a model as quickly as possible. Without such interventions, it is impossible to distinguish between Markov equivalent models, even given infinite data. We perform online MCMC to estimate the posterior over graph structures, and use importance sampling to find the best action to perform at each step. We assume the data is discrete-valued and fully observed.
A Concept Exploration Method for Product Family Design
- in Mechanical Engineering. Atlanta, GA: Georgia Institute of Technology
, 1998
"... ii ..."
Learning and Classifying under Hard Budgets
- In Proceedings of the European Conference on Machine Learning (ECML-05
, 2005
"... Abstract. Since resources for data acquisition are seldom infinite, both learners and classifiers must act intelligently under hard budgets. In this paper, we consider problems in which feature values are unknown to both the learner and classifier, but can be acquired at a cost. Our goal is a learne ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Abstract. Since resources for data acquisition are seldom infinite, both learners and classifiers must act intelligently under hard budgets. In this paper, we consider problems in which feature values are unknown to both the learner and classifier, but can be acquired at a cost. Our goal is a learner that spends its fixed learning budget bL acquiring training data, to produce the most accurate “active classifier ” that spends at most bC per instance. To produce this fixed-budget classifier, the fixedbudget learner must sequentially decide which feature values to collect to learn the relevant information about the distribution. We explore several approaches the learner can take, including the standard “round robin” policy (purchasing every feature of every instance until the bL budget is exhausted). We demonstrate empirically that round robin is problematic (especially for small bL), and provide alternate learning strategies that achieve superior performance on a variety of datasets. 1
Learning From Measurements in Exponential Families
"... Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints—both provide information about the desired model. In general, what is the most cost-effective way to learn? To address this question, we introduce measurements, a general class ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints—both provide information about the desired model. In general, what is the most cost-effective way to learn? To address this question, we introduce measurements, a general class of mechanisms for providing information about a target model. We present a Bayesian decision-theoretic framework, which allows us to both integrate diverse measurements and choose new measurements to make. We use a variational inference algorithm, which exploits exponential family duality. The merits of our approach are demonstrated on two sequence labeling tasks. 1.
Robust submodular observation selection
, 2008
"... In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations wh ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations which are robust against a number of possible objective functions. Examples include minimizing the maximum posterior variance in Gaussian Process regression, robust experimental design, and sensor placement for outbreak detection. In this paper, we present the Submodular Saturation algorithm, a simple and efficient algorithm with strong theoretical approximation guarantees for cases where the possible objective functions exhibit submodularity, an intuitive diminishing returns property. Moreover, we prove that better approximation algorithms do not exist unless NP-complete problems admit efficient algorithms. We show how our algorithm can be extended to handle complex cost functions (incorporating non-unit observation cost or communication and path costs). We also show how the algorithm can be used to near-optimally trade off expected-case (e.g., the Mean Square Prediction Error in Gaussian Process regression) and worst-case (e.g., maximum predictive variance) performance. We show that many important machine learning problems fit our robust submodular observation selection formalism, and provide extensive empirical evaluation on several real-world problems. For Gaussian Process regression, our algorithm compares favorably with state-of-the-art heuristics described in the geostatistics literature, while being simpler, faster and providing theoretical guarantees. For robust experimental design, our algorithm performs favorably compared to SDP-based algorithms.
Optimal Design via Curve Fitting of Monte Carlo Experiments
, 1996
"... This paper explores numerical methods for stochastic optimization, with special attention to Bayesian design problems. A common and challenging situation occurs when the objective function (in Bayesian applications the expected utility) is very expensive to evaluate, perhaps because it requires inte ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
This paper explores numerical methods for stochastic optimization, with special attention to Bayesian design problems. A common and challenging situation occurs when the objective function (in Bayesian applications the expected utility) is very expensive to evaluate, perhaps because it requires integration over a space of very large dimensionality. Our goal is to explore a class of optimization algorithms designed to gain efficiency in such situations, by exploiting smoothness of the expected utility surface and borrowing information from neighboring design points. The central idea is that of implementing stochastic optimization by curve fitting of Monte Carlo samples. This is done by simulating draws from the joint parameter/sample space and evaluating the observed utilities. Fitting a smooth surface through these simulated points serves as estimate for the expected utility surface. The optimal design can then be found deterministically. In this paper we introduce a general algorithm for curve-fitting-based optimization, we discuss implementation options, and we present a consistency property for one particular implementation of the algorithm. To illustrate the advantages and limitations of curve-fitting-based optimization, and compare it with some of the alternatives, we consider in detail three important practical applications. The first is an information theoretical stopping rule for a clinical trial. The objective function is based on the expected amount of information acquired about a sub-vector of parameters of interest. The second is concerned with the timing of examination for the early detection of breast cancer in mass screening programs. It involves a two-dimensional optimization and an objective function embodying a cost-benefit analysis. The third applicat...
Bayesian Analysis For Simulation Input And Output
, 1997
"... The paper summarizes some important results at the intersection of the fields of Bayesian statistics and stochastic simulation. Two statistical analysis issues for stochastic simulation are discussed in further detail from a Bayesian perspective. First, a review of recent work in input distribution ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
The paper summarizes some important results at the intersection of the fields of Bayesian statistics and stochastic simulation. Two statistical analysis issues for stochastic simulation are discussed in further detail from a Bayesian perspective. First, a review of recent work in input distribution selection is presented. Then, a new Bayesian formulation for the problem of output analysis for a single system is presented. A key feature is analyzing simulation output as a random variable whose parameters are an unknown function of the simulation's inputs. The distribution of those parameters is inferred from simulation output via Bayesian response-surface methods. A brief summary of Bayesian inference and decision making is included for reference.

