## Information-Based Objective Functions for Active Data Selection (0)

Venue: | Neural Computation |

Citations: | 342 - 6 self |

### BibTeX

@ARTICLE{MacKay_information-basedobjective,

author = {David J.C. MacKay},

title = {Information-Based Objective Functions for Active Data Selection},

journal = {Neural Computation},

year = {},

volume = {4},

pages = {590--604}

}

### Years of Citing Articles

### OpenURL

### Abstract

Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed which measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain information about lead to three different criteria for data selection. All these criteria depend on the assumption that the hypothesis space is correct, which may prove to be their main weakness. 1 Introduction Theories for data modelling often assume that the data is provided by a source that we do not control. However, there are two scenarios in which we are able to actively select training data. In the first, data measurements are relatively expensive or slow, and we want to know where to look next so as to learn as much as possible. According to Jaynes (1986), Bayesian reasoning was first applied to this problem two centuries ago by Laplace, who in consequence made more important discoveries...

### Citations

1358 |
Statistical decision theory and Bayesian analysis. (2nd Ed
- Berger
- 1985
(Show Context)
Citation Context ...here is no need to undo biases introduced by the data collecting strategy, because it is not possible for such biases to be introduced --- as long as we perform inference using all the data gathered (=-=Berger, 1985-=-, Loredo, 1989). When the models are concerned with estimating the distribution of output variables t given input variables x, we are allowed to look at the x value of a datum, and decide whether or n... |

426 | A practical Bayesian framework for backpropagation networks
- MacKay
- 1992
(Show Context)
Citation Context ...sions. This paper uses similar information--based objective functions and discusses the problem of optimal data selection within the Bayesian framework for interpolation described in previous papers (=-=MacKay, 1991-=-a, 1991b). Most of the results in this paper have direct analogs in Fedorov (1972), though the quantities involved have different interpretations: for example, Fedorov's dispersion of an estimator bec... |

390 |
Theory of Optimal Experiments
- Fedorov
- 1972
(Show Context)
Citation Context ...f objectively estimating the utility of candidate data points. The problem of `active learning' or `sequential design' has been extensively studied in economic theory and statistics (El--Gamal, 1991, =-=Fedorov, 1972-=-). Experimental design within a Bayesian framework using the Shannon information as an objective function has been studied by Lindley (1956) and by Luttrell (1985). A distinctive feature of this appro... |

166 | The evidence framework applied to classification networks
- MacKay
- 1992
(Show Context)
Citation Context ...w = w \Gamma wMP and the Hessian A = rrM is evaluated at the minimum wMP . We will use this quadratic approximation from here on. If M has other minima, those can be treated as distinct models as in (=-=MacKay, 1991-=-b). First we will need to know what the entropy of a gaussian distribution is. It is easy to confirm that if P (w) / e \GammaM (w) , then for a flat measure m(w) = m, S = k 2 (1 + log 2) + 1 2 log i m... |

144 |
On a measure of the information provided by an experiment
- Lindley
- 1956
(Show Context)
Citation Context ...idate information measures are equivalent for our purposes. This proof also implicitly demonstrates that E (\DeltaS) is independent of the measure m(w). Other properties of E (\DeltaS) are proved in (=-=Lindley, 1956-=-). The rest of this paper will use \DeltaS as the information measure, with m(w) set to a constant. 3 Maximising total information gain Let us now solve the first task: how to choose x N+1 so that the... |

70 | Neural net algorithms that learn in polynomial time from examples and queries - Baum - 1991 |

56 | From Laplace to Supernova SN 1987A: Bayesian inference in astrophysics
- Loredo
- 1990
(Show Context)
Citation Context ...d to undo biases introduced by the data collecting strategy, because it is not possible for such biases to be introduced --- as long as we perform inference using all the data gathered (Berger, 1985, =-=Loredo, 1989-=-). When the models are concerned with estimating the distribution of output variables t given input variables x, we are allowed to look at the x value of a datum, and decide whether or not to include ... |

40 | Querybased learning applied to partially trained multilayer perceptrons - Hwang, Choi, et al. - 1991 |

40 | Baysan Methods: General Background - Jaynes - 1985 |

13 |
The use of transinformation in the design of data sampling schemes for inverse problems, Inverse problems
- Luttrell
- 1985
(Show Context)
Citation Context ...+1 will be made. The more complex task of selecting multiple new data points will not be addressed here, but the methods used can be generalised to solve this task, as is discussed in (Fedorov, 1972, =-=Luttrell, 1985-=-). The similar problem of choosing the x N+1 at which a vector of outputs t N+1 is measured will not be addressed either. The first and third definitions of information gain have both been studied in ... |

13 | Active selection of training examples for network learning in noiseless environments - Plutowski, White - 1991 |

3 | The role of priors in active Bayesian learning in the sequential statistical decision framework - El-Gamal, A - 1991 |