Abstract:
For manytypes of machine learning algorithms, one can compute the statistically "optimal" way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then showhow the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.
Citations
|
4821
|
Maximum-likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
4776
|
Probabilistic reasoning in intelligent systems: networks of plausible inference
– Pearl
- 1988
|
|
623
|
Learning Bayesian networks: The combination of knowledge and statistical data
– Heckerman, Geiger, et al.
- 1994
|
|
500
|
Queries and concept learning
– Angluin
- 1988
|
|
422
|
Statistical analysis of finite mixture distributions
– Titterington, Smith, et al.
- 1985
|
|
237
|
Empirical Model Building and Response Surfaces
– Box, Draper
- 1987
|
|
232
|
Improving generalization with active learning
– Cohn, Atlas, et al.
- 1994
|
|
193
|
Theory of Optimal Experiments
– Fedorov
- 1972
|
|
178
|
Information-based objective functions for active data selection
– MacKay
|
|
134
|
Supervised learning from incomplete data via an EM approach
– Ghahramani, Jordan
- 1994
|
|
112
|
Applied Linear Regression
– Weisberg
- 1980
|
|
105
|
A general regression neural network
– Specht
- 1991
|
|
94
|
Neural network exploration using optimal experiment design
– Cohn
- 1994
|
|
72
|
Soft competitive adaptation: Neural network learning algorithms based on fitting statistical mixtures
– Nowlan
- 1991
|
|
69
|
Robot juggling: An implementation of memory-based learning. Control Systems Magazine
– Schaal, Atkeson
- 1994
|
|
51
|
Active exploration in dynamic environments
– Thrun, Moeller
- 1992
|
|
47
|
Training connectionist networks with queries and selective sampling
– Cohn, Atlas, et al.
- 1990
|
|
42
|
Selecting concise training sets from clean data
– Franco, Plutowski, et al.
- 1993
|
|
34
|
Bayesian Classification
– Cheeseman, Self, et al.
- 1988
|
|
27
|
Optimal Control Systems
– Fe’ldbaum
- 1965
|
|
17
|
Reinforcement driven information acquisition in non-deterministic environments
– Storck, Hochreiter, et al.
- 1995
|
|
16
|
Bayesian query construction for neural network models
– Paas, Kindermann
- 1995
|
|
10
|
Regression By Local Fitting
– Cleveland, Devlin, et al.
- 1988
|
|
4
|
Bayesian classi cation
– Cheeseman, Self, et al.
- 1988
|
|
4
|
Implementing inner drive by competence reflection
– Linden, Weber
- 1993
|
|
3
|
Neural network algorithms that learn in polynomial time from examples and queries
– Baum
- 1991
|
|
3
|
Regression by local tting
– Cleveland, Devlin, et al.
- 1988
|
|
2
|
Implementing Inner Drive by Competence Re ection
– Linden, Weber
- 1993
|
|
1
|
Minimizing statistical bias with queries. AI Lab memo AIM1552, Massachusetts Institute of Technology. Available by anonymous ftp from publications.ai.mit.edu
– Cohn
- 1995
|
|
1
|
Active Learning with Statistical Models Geman
– Bienenstock, E
- 1992
|