## Neural network exploration using optimal experiment design (1994)

### Cached

### Download Links

- [publications.ai.mit.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [ftp.gmd.de]
- DBLP

### Other Repositories/Bibliography

Venue: | Neural Networks |

Citations: | 125 - 2 self |

### BibTeX

@INPROCEEDINGS{Cohn94neuralnetwork,

author = {David A. Cohn},

title = {Neural network exploration using optimal experiment design},

booktitle = {Neural Networks},

year = {1994},

pages = {679--686},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov [1972] and MacKay [1992], we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network learner. We demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely.We conclude that, while not a panacea, OED-based query/action has muchto offer, especially in domains where its high computational costs can be tolerated.

### Citations

626 |
Learnability and the Vapnik-Chervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1987
(Show Context)
Citation Context ... are compared against algorithms learning from randomly chosen examples. In general, the number of randomly chosen examples needed to achieve an expected error of no more than scales as O( 1 log 1 ) [=-=Blumer et al., 1989� Ba-=-um and Haussler 1989� Cohn and Tesauro, 1992� Haussler, 1992]. In some situations, active selection of training examples can reduce the sample complexity toO(log 1 ), 3 although worst case bounds ... |

612 | Neural networks and the bias/variance dilemma - Geman, Bienenstock, et al. - 1992 |

529 | Active learning with statistical models
- Cohn, Ghahramani, et al.
- 1996
(Show Context)
Citation Context ...y more tractable than feedforward neural networks. We are currently pursuing the application of optimal experiment design techniques to these models and have observed encouraging preliminary results [=-=Cohn et al., 1994-=-]. 6.2 Active elimination of bias Regardless of which learning architecture is used, the results in Section 4.2 make it clear that minimizing variance alone is not enough. For large, data-poor problem... |

469 |
Mixture Models: Inference & Applications to Clustering
- McLachlan, Basford
- 1988
(Show Context)
Citation Context ...se problems may lie in selection of a more amenable architecture and learning algorithm. Two such architectures, in which output variances have a direct role in estimation, are mixtures of Gaussians [=-=McLachlan and Basford, 1988� No-=-wlan, 1991� Ghahramani and Jordan, 1994] and locally weighted regression [Cleveland et al., 1988� Schaal and Atkeson, 1994]. Both have excellent statistical modeling properties, and are computatio... |

359 | Theory of optimal experiments - Fedorov - 1972 |

326 | Information-based objective functions for active data selection - MacKay, J |

317 |
What size net give valid generalization
- Baum, Haussler
- 1989
(Show Context)
Citation Context ... algorithms learning from randomly chosen examples. In general, the number of randomly chosen examples needed to achieve an expected error of no more than scales as O( 1 log 1 ) [Blumer et al., 1989��=-=� Baum and Haussler 1989� -=-Cohn and Tesauro, 1992� Haussler, 1992]. In some situations, active selection of training examples can reduce the sample complexity toO(log 1 ), 3 although worst case bounds for unconstrained queryi... |

299 | Forward models: Supervised learning with a distal teacher - Jordan, Rumelhart - 1992 |

224 | The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
- Moore, Atkeson
- 1995
(Show Context)
Citation Context ...te spaces. Continuous state and action spaces must be accommodated either through arbitrary discretization or through some form of on-line partitioning strategy, such as Moore's Parti-Game algorithm [=-=Moore, 1994-=-]. The OED-based approach discussed in this paper is, by nature, applicable to domains with both continuous state and action spaces. 3 Data selection according to OED In this section, we review the th... |

187 | Supervised learning from incomplete data via an EM approach
- Ghahramani, Jordan
- 1994
(Show Context)
Citation Context ...amenable architecture and learning algorithm. Two such architectures, in which output variances have a direct role in estimation, are mixtures of Gaussians [McLachlan and Basford, 1988� Nowlan, 1991=-=� Ghahramani and Jordan, 1994] -=-and locally weighted regression [Cleveland et al., 1988� Schaal and Atkeson, 1994]. Both have excellent statistical modeling properties, and are computationally more tractable than feedforward neura... |

177 | Introduction to Robotics - Craig - 1989 |

142 | Optimum Experimental Designs - Atkinson, Donev - 1992 |

92 |
Robot Juggling: Implementation of Memory-Based Learning
- Schaal, Atkeson
- 1994
(Show Context)
Citation Context ...ber of steps. In general, though, we willhave to resort to some online process to decide \what to try next." Some successful heuristic exploration strategies include trying to visit unvisited sta=-=tes [Schaal and Atkeson, 1994-=-], trying to visit places where we perform poorly [Linden and Weber, 1993], taking actions that improved our performance in similar situations [Schmidhuber and Storck, 1993], or maintaining a heuristi... |

81 |
Soft Competitive Adaptation: Neural Network Learning Algorithm based on RUMELHART
- Nowlan
- 1991
(Show Context)
Citation Context ...ion of a more amenable architecture and learning algorithm. Two such architectures, in which output variances have a direct role in estimation, are mixtures of Gaussians [McLachlan and Basford, 1988��=-=� Nowlan, 1991� -=-Ghahramani and Jordan, 1994] and locally weighted regression [Cleveland et al., 1988� Schaal and Atkeson, 1994]. Both have excellent statistical modeling properties, and are computationally more tra... |

72 | Fast exact multiplication by the hessian - Pearlmutter - 1994 |

69 |
Training connectionist networks with queries and selective sampling
- Atlas, Cohn, et al.
- 1989
(Show Context)
Citation Context ...nty and error. 1.1 Active learning Exploiting the active component of learning typically leads to improved generalization, usually at the cost of additional computation (see Figure 1) [Angluin, 1982��=-=� Cohn et al., 1990��-=-� Hwang et al., 1991]. 1 There are two common situations where this tradeo is desirable: In many situations the cost of taking an action outweighs the cost of the computation required to incorporate n... |

69 | Elements of Statistical Computing - Thisted - 1988 |

60 |
Active exploration in dynamic environments
- Thrun, Moller
(Show Context)
Citation Context ...places where we perform poorly [Linden and Weber, 1993], taking actions that improved our performance in similar situations [Schmidhuber and Storck, 1993], or maintaining a heuristic \con dence map&qu=-=ot; [Thrun and Moller, 1992-=-]. Some researchers, in cases where the exploration is considered a secondary problem, provide the learner with a uniformly distributed training set, in effect assuming the problem allows unconstraine... |

55 |
A note on the number of queries needed to identify regular languages
- Angluin
- 1981
(Show Context)
Citation Context ...ze its uncertainty and error. 1.1 Active learning Exploiting the active component of learning typically leads to improved generalization, usually at the cost of additional computation (see Figure 1) [=-=Angluin, 1982� -=-Cohn et al., 1990� Hwang et al., 1991]. 1 There are two common situations where this tradeo is desirable: In many situations the cost of taking an action outweighs the cost of the computation requir... |

55 | Neural model of adaptive hand-eye coordination for single postures - Kuperstein - 1988 |

48 | Selecting concise training sets from clean data - Plutowski, White - 1993 |

47 | Reinforcement driven information acquisition in non-deterministic environments
- Storck, Hochreiter, et al.
- 1995
(Show Context)
Citation Context ...ying to visit unvisited states [Schaal and Atkeson, 1994], trying to visit places where we perform poorly [Linden and Weber, 1993], taking actions that improved our performance in similar situations [=-=Schmidhuber and Storck, 1993], or-=- maintaining a heuristic \con dence map" [Thrun and Moller, 1992]. Some researchers, in cases where the exploration is considered a secondary problem, provide the learner with a uniformly distrib... |

40 |
Regression by Local Fitting
- Cleveland, Delvin, et al.
- 1988
(Show Context)
Citation Context ...ctures, in which output variances have a direct role in estimation, are mixtures of Gaussians [McLachlan and Basford, 1988; Nowlan, 1991; Ghahramani and Jordan, 1994] and locally weighted regression [=-=Cleveland et al., 1988-=-; Schaal and Atkeson, 1994]. Both have excellent statistical modeling properties, and are computationally more tractable than feedforward neural networks. We are currently pursuing the application of ... |

31 |
Constructing hidden units using examples and queries
- Baum, Lang
- 1991
(Show Context)
Citation Context ...scussion of the results and implications for future work. 1 In some cases active selection of training data can sharply reduce worst case computational complexity from NP-complete to polynomial time [=-=Baum and Lang, 1991-=-], and in special cases to linear time. 1 novel input novel input Training set Learning algorithm network weights Final network Passive learning Training set Learning algorithm network weights Final n... |

27 | Connectionist robot motion planning: A neurally-inspired approach to visually-guided reaching - Mel - 1990 |

26 | On the Sample Complexity of Pac-Learning Using Random and Chosen Examples. Submitted to Third Workshop on Computational Learning Theory - Eisenberg, Rivest - 1990 |

21 |
Generalizing the PAC Model for Neural Net and Other Learning Applications
- Haussler
- 1989
(Show Context)
Citation Context ...es. In general, the number of randomly chosen examples needed to achieve an expected error of no more than scales as O( 1 log 1 ) [Blumer et al., 1989� Baum and Haussler 1989� Cohn and Tesauro, 19=-=92� Haussler, 1992-=-]. In some situations, active selection of training examples can reduce the sample complexity toO(log 1 ), 3 although worst case bounds for unconstrained querying are no better than those for choosing... |

15 | Query construction, entropy and generalization in neural network models,” Phys - Sollich - 1994 |

15 |
On finding "exciting" trajectories for identification ex¬ periments involving systems with non-linear dynamics
- Armstrong
- 1987
(Show Context)
Citation Context ... the most information out of a limited number of steps. Manually designing such trajectories is a slow process, and intuitively "good" trajectories often fail to sufficiently explore the sta=-=te space [Armstrong, 1989]. In this-=- paper I discuss another alternative for exploration: automatic, incremental generation of training trajectories using results from "optimal experiment design." The study of optimal experime... |

10 |
Evolutionary Operations
- Box, Draper
- 1969
(Show Context)
Citation Context ...ing problems, we are not interested in the entire mapping X ! Y , but in nding the x that maximizes y. In this case, we mayrelyon the broad literature of optimization and response surface techniques [=-=Box and Draper, 1969]. In-=- other learning problems there may be additional constraints that must be considered, such astheneedtoavoid \failure" states. If the learner is required to perform as it learns (e.g. in a control... |

9 |
Implementing inner drive by competence reflection
- Linden, Weber
- 1993
(Show Context)
Citation Context ...ess to decide "what to try next." Some successful heuristic exploration strategies include trying to visit unvisited states [Schaal and Atkeson, 1994], trying to visit places where we perfor=-=m poorly [Linden and Weber, 1993], taking -=-actions that improved our performance in similar situations [Schmidhuber and Storck, 1993], or maintaining a heuristic "confidence map" [Thrun and Moller, 1992]. Some researchers, in cases w... |

8 | The Xerion Neural Network Simulator - Camp, Plate, et al. - 1991 |

6 | Bootstrap methods in neural network time series prediction - Connor - 1993 |

3 | On nding 'exciting' trajectories for identi cation experiments involving systems with nonlinear dynamics - Armstrong - 1987 |

3 |
Regression by local tting
- Cleveland, Devlin, et al.
- 1988
(Show Context)
Citation Context ...ctures, in which output variances have a direct role in estimation, are mixtures of Gaussians [McLachlan and Basford, 1988� Nowlan, 1991� Ghahramani and Jordan, 1994] and locally weighted regressi=-=on [Cleveland et al., 1988��-=-� Schaal and Atkeson, 1994]. Both have excellent statistical modeling properties, and are computationally more tractable than feedforward neural networks. We are currently pursuing the application of ... |

2 |
Implementing inner drive by competence re ection
- Linden, Weber
- 1993
(Show Context)
Citation Context ...ess to decide \what to try next." Some successful heuristic exploration strategies include trying to visit unvisited states [Schaal and Atkeson, 1994], trying to visit places where we perform poo=-=rly [Linden and Weber, 1993], ta-=-king actions that improved our performance in similar situations [Schmidhuber and Storck, 1993], or maintaining a heuristic \con dence map" [Thrun and Moller, 1992]. Some researchers, in cases wh... |

1 | Doctoral dissertation, in preparation - Choueiki - 1994 |