Results 1  10
of
208
InformationBased Objective Functions for Active Data Selection
 Neural Computation
"... Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed which measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain infor ..."
Abstract

Cited by 429 (5 self)
 Add to MetaCart
(Show Context)
Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed which measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain information about lead to three different criteria for data selection. All these criteria depend on the assumption that the hypothesis space is correct, which may prove to be their main weakness. 1 Introduction Theories for data modelling often assume that the data is provided by a source that we do not control. However, there are two scenarios in which we are able to actively select training data. In the first, data measurements are relatively expensive or slow, and we want to know where to look next so as to learn as much as possible. According to Jaynes (1986), Bayesian reasoning was first applied to this problem two centuries ago by Laplace, who in consequence made more important discoveries...
Selective sampling using the Query by Committee algorithm
 Machine Learning
, 1997
"... We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the twomember committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the numbe ..."
Abstract

Cited by 429 (7 self)
 Add to MetaCart
We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the twomember committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of perceptrons.
Bayesian Experimental Design: A Review
 Statistical Science
, 1995
"... This paper reviews the literature on Bayesian experimental design, both for linear and nonlinear models. A unified view of the topic is presented by putting experimental design in a decision theoretic framework. This framework justifies many optimality criteria, and opens new possibilities. Various ..."
Abstract

Cited by 310 (1 self)
 Add to MetaCart
This paper reviews the literature on Bayesian experimental design, both for linear and nonlinear models. A unified view of the topic is presented by putting experimental design in a decision theoretic framework. This framework justifies many optimality criteria, and opens new possibilities. Various design criteria become part of a single, coherent approach.
A rational analysis of the selection task as optimal data selection
 67 – 215535 Deliverable 4.1
, 1994
"... Human reasoning in hypothesistesting tasks like Wason's (1966, 1968) selection task has been depicted as prone to systematic biases. However, performance on this task has been assessed against a now outmoded falsificationist philosophy of science. Therefore, the experimental data is reassessed ..."
Abstract

Cited by 238 (16 self)
 Add to MetaCart
(Show Context)
Human reasoning in hypothesistesting tasks like Wason's (1966, 1968) selection task has been depicted as prone to systematic biases. However, performance on this task has been assessed against a now outmoded falsificationist philosophy of science. Therefore, the experimental data is reassessed in the light of a Bayesian model of optimal data selection in inductive hypothesis testing. The model provides a rational analysis (Anderson, 1990) of the selection task that fits well with people's performance on both abstract and thematic versions of the task. The model suggests that reasoning in these tasks may be rational rather than subject to systematic bias. Over the past 30 years, results in the psychology of reasoning have raised doubts about human rationality. The assumption of human rationality has a long history. Aristotle took the capacity for rational thought to be the defining characteristic of human beings, the capacity that separated us from the animals. Descartes regarded the ability to use language and to reason as the hallmarks of the mental that separated it from the merely physical. Many contemporary philosophers of mind also appeal to a basic principle of rationality in accounting for everyday, folk psychological explanation whereby we explain each other's behavior in terms of our beliefs and desires (Cherniak, 1986; Cohen, 1981; Davidson, 1984; Dennett, 1987; but see Stich, 1990). These philosophers, both ancient and modern, share a common view of rationality: To be rational is to reason according to rules (Brown, 1989). Logic and mathematics provide the normative rules that tell us how we should reason. Rationality therefore seems to demand that the human cognitive system embodies the rules of logic and mathematics. However, results in the psychology of reasoning appear to show that people do not reason according to these rules. In both deductive (Evans, 1982, 1989;
Model Selection and the Principle of Minimum Description Length
 Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract

Cited by 195 (8 self)
 Add to MetaCart
(Show Context)
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can coexist and be compared. We illustrate th...
Computer Experiments
, 1996
"... Introduction Deterministic computer simulations of physical phenomena are becoming widely used in science and engineering. Computers are used to describe the flow of air over an airplane wing, combustion of gasses in a flame, behavior of a metal structure under stress, safety of a nuclear reactor, a ..."
Abstract

Cited by 119 (6 self)
 Add to MetaCart
Introduction Deterministic computer simulations of physical phenomena are becoming widely used in science and engineering. Computers are used to describe the flow of air over an airplane wing, combustion of gasses in a flame, behavior of a metal structure under stress, safety of a nuclear reactor, and so on. Some of the most widely used computer models, and the ones that lead us to work in this area, arise in the design of the semiconductors used in the computers themselves. A process simulator starts with a data structure representing an unprocessed piece of silicon and simulates the steps such as oxidation, etching and ion injection that produce a semiconductor device such as a transistor. A device simulator takes a description of such a device and simulates the flow of current through it under varying conditions to determine properties of the device such as its switching speed and the critical voltage at which it switches. A circuit simulator takes a list of devices and the
Finding the number of clusters in a data set: An information theoretic approach
 Journal of the American Statistical Association
, 2003
"... One of the most difficult problems in cluster analysis is the identification of the number of groups in a data set. Most previously suggested approaches to this problem are either somewhat ad hoc or require parametric assumptions and complicated calculations. In this paper we develop a simple yet po ..."
Abstract

Cited by 90 (1 self)
 Add to MetaCart
(Show Context)
One of the most difficult problems in cluster analysis is the identification of the number of groups in a data set. Most previously suggested approaches to this problem are either somewhat ad hoc or require parametric assumptions and complicated calculations. In this paper we develop a simple yet powerful nonparametric method for choosing the number of clusters based on distortion, a quantity that measures the average distance, per dimension, between each observation and its closest cluster center. Our technique is computationally efficient and straightforward to implement. We demonstrate empirically its effectiveness, not only for choosing the number of clusters but also for identifying underlying structure, on a wide range of simulated and real world data sets. In addition, we give a rigorous theoretical justification for the method based on information theoretic ideas. Specifically, results from the subfield of electrical engineering known as rate distortion theory allow us to describe the behavior of the distortion in both the presence and absence of clustering. Finally, we note that these ideas potentially can be extended to a wide range of other statistical model selection problems. 1
Rational explanation of the selection task
 Psychological Review
, 1996
"... M. Oaksford and N. Chater (O&C; 1994) presented the first quantitative model of P. C. Wason's ( 1966, 1968) selection task in.which performance is rational. J. St B T Evans and D. E. Over (1996) reply that O&C's account is normatively incorrect and cannot model K. N. Kirby's ( ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
(Show Context)
M. Oaksford and N. Chater (O&C; 1994) presented the first quantitative model of P. C. Wason's ( 1966, 1968) selection task in.which performance is rational. J. St B T Evans and D. E. Over (1996) reply that O&C's account is normatively incorrect and cannot model K. N. Kirby's (1994b) or P. Pollard and J. St B T Evans's (1983) data. It is argued that an equivalent measure satisfies their normative concerns and that a modification of O&C's model accounts for their empirical concerns. D. Laming (1996) argues that O&C made unjustifiable psychological assumptions and that a "correct" Bayesian analysis agrees with logic. It is argued that O&C's model makes normative and psychological sense and that Laming's analysis is not Bayesian. A. Almor and S. A. Sloman (1996) argue that O&C cannot explain their data. It is argued that Almor and Sloman's data do not bear on O&C's model because they alter the nature of the task. It is concluded that O&C's model remains the most compelling and comprehensive account of the selection task. Research on Wason's (1966, 1968) selection task questions human rationality because performance is not "logically correct?' Recently, Oaksford and Chater (O&C; 1994) provided a rational analysis (Anderson, 1990, 1991) of the selection task that appeared to vindicate human rationality. O&C argued that the selection task is an inductive, rather than a deductive, reasoning task: Participants must assess the truth or falsity of a general rule from specific instances. In particular, participants face a problem of optimal data selection (Lindley, 1956): They must decide which of four cards (p, notp, q, or notq) is likely to provide the most useful data to test a conditional rule,/fp then q. The "logical " solution is to select the p and the notq cards. O&C argued that this solution presupposes falsificationism (Popper, 1959), which argues that only data that can disconfirm, not confirm, hypotheses are of interest. In contrast, O&C's rational analysis uses a Bayesian approach to inductive
Active Learning from Crowds
"... Obtaining labels can be expensive or timeconsuming, but unlabeled data is often abundant and easier to obtain. Most learning tasks can be made more efficient, in terms of labeling cost, by intelligently choosing specific unlabeled instances to be labeled by an oracle. The general problem of optimall ..."
Abstract

Cited by 46 (2 self)
 Add to MetaCart
(Show Context)
Obtaining labels can be expensive or timeconsuming, but unlabeled data is often abundant and easier to obtain. Most learning tasks can be made more efficient, in terms of labeling cost, by intelligently choosing specific unlabeled instances to be labeled by an oracle. The general problem of optimally choosing these instances is known as active learning. As it is usually set in the context of supervised learning, active learning relies on a single oracle playing the role of a teacher. We focus on the multiple annotator scenario where an oracle, who knows the ground truth, no longer exists; instead, multiple labelers, with varying expertise, are available for querying. This paradigm posits new challenges to the active learning scenario. We can now ask which data sample should be labeled next and which annotator should be queried to benefit our learning model the most. In this paper, we employ a probabilistic model for learning from multiple annotators that can also learn the annotator expertise even when their expertise may not be consistently accurate across the task domain. We then focus on providing a criterion and formulation that allows us to select both a sample and the annotator/s to query the labels from. 1.
Sequential optimal design of neurophysiology experiments
, 2008
"... Adaptively optimizing experiments has the potential to significantly reduce the number of trials needed to build parametric statistical models of neural systems. However, application of adaptive methods to neurophysiology has been limited by severe computational challenges. Since most neurons are hi ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
(Show Context)
Adaptively optimizing experiments has the potential to significantly reduce the number of trials needed to build parametric statistical models of neural systems. However, application of adaptive methods to neurophysiology has been limited by severe computational challenges. Since most neurons are high dimensional systems, optimizing neurophysiology experiments requires computing highdimensional integrations and optimizations in real time. Here we present a fast algorithm for choosing the most informative stimulus by maximizing the mutual information between the data and the unknown parameters of a generalized linear model (GLM) which we want to fit to the neuron’s activity. We rely on important logconcavity and asymptotic normality properties of the posterior to facilitate the required computations. Our algorithm requires only lowrank matrix manipulations and a 2dimensional search to choose the optimal stimulus. The average running time of these operations scales quadratically with the dimensionality of the GLM, making realtime adaptive experimental design feasible even for highdimensional stimulus and parameter spaces. For example, we