Results 1  10
of
119
Ensemble Methods in Machine Learning
 MULTIPLE CLASSIFIER SYSTEMS, LBCS1857
, 2000
"... Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include errorcorrecting output coding, Bagging, and boostin ..."
Abstract

Cited by 426 (3 self)
 Add to MetaCart
Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include errorcorrecting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.
J.C.: Best practices for convolutional neural networks applied to visual document analysis
 In: Int’l Conference on Document Analysis and Recognition
, 2003
"... Neural networks are a powerful technology for classification of visual inputs arising from documents. However, there is a confusing plethora of different neural network methods that are used in the literature and in industry. This paper describes a set of concrete best practices that document analys ..."
Abstract

Cited by 108 (7 self)
 Add to MetaCart
Neural networks are a powerful technology for classification of visual inputs arising from documents. However, there is a confusing plethora of different neural network methods that are used in the literature and in industry. This paper describes a set of concrete best practices that document analysis researchers can use to get good results with neural networks. The most important practice is getting a training set as large as possible: we expand the training set by adding a new form of distorted data. The next most important practice is that convolutional neural networks are better suited for visual document tasks than fully connected networks. We propose that a simple “doityourself ” implementation of convolution with a flexible architecture is suitable for many visual document problems. This simple convolutional neural network does not require complex methods, such as momentum, weight decay, structuredependent learning rates, averaging layers, tangent prop, or even finelytuning the architecture. The end result is a very simple yet general architecture which can yield stateoftheart performance for document analysis. We illustrate our claims on the MNIST set of English digit images. 1.
A nonparametric approach to pricing and hedging derivative securities via learning networks
 Journal of Finance
, 1994
"... http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncom ..."
Abstract

Cited by 104 (4 self)
 Add to MetaCart
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncommercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization
, 1993
"... ..."
Efficient Agnostic Learning of Neural Networks with Bounded Fanin
, 1996
"... We show that the class of two layer neural networks with bounded fanin is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to ..."
Abstract

Cited by 68 (18 self)
 Add to MetaCart
We show that the class of two layer neural networks with bounded fanin is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to approximate the neural network which minimizes the expected quadratic error. As special cases, the model allows learning realvalued functions with bounded noise, learning probabilistic concepts and learning the best approximation to a target function that cannot be well approximated by the neural network. The networks we consider have realvalued inputs and outputs, an unlimited number of threshold hidden units with bounded fanin, and a bound on the sum of the absolute values of the output weights. The number of computation This work was supported by the Australian Research Council and the Australian Telecommunications and Electronics Research Board. The material in this paper was pres...
On Learning the Derivatives of an Unknown Mapping with Multilayer Feedforward Networks
, 1989
"... Daniel F. Mccaffrey, and Douglas W. Nychka for helpful discussions relating to Recently, multiple input, single output, single hidden layer, feedforward neural networks have been shown to be capable of approximating a nonlinear map and its partial derivatives. Specifically, neural nets have been sho ..."
Abstract

Cited by 64 (7 self)
 Add to MetaCart
Daniel F. Mccaffrey, and Douglas W. Nychka for helpful discussions relating to Recently, multiple input, single output, single hidden layer, feedforward neural networks have been shown to be capable of approximating a nonlinear map and its partial derivatives. Specifically, neural nets have been shown to be dense in various Sobolev spaces (Hornik, Stinchcombe and White, 1989). Building upon this result, we show that a net can be trained so that the map and its derivatives are learned. Specifically, we use a result of Gallant (1987b) to show that least squares and similar estimates are strongly consistent in Sobolev norm provided the number of hidden units and the size of the training set increase together. We illustrate these results by an applic~tion to the inverse problem of chaotic dynamics: recovery of a nonlinear map from a time series of iterates. These results extend automatically to nets that embed the single hidden layer, feedforward network as a special case. 1.1 1.
The Design and Evolution of Modular Neural Network Architectures
 Neural Networks
, 1994
"... To investigate the relations between structure and function in both artificial and natural neural networks, we present a series of simulations and analyses with modular neural networks. We suggest a number of design principles in the form of explicit ways in which neural modules can cooperate in rec ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
To investigate the relations between structure and function in both artificial and natural neural networks, we present a series of simulations and analyses with modular neural networks. We suggest a number of design principles in the form of explicit ways in which neural modules can cooperate in recognition tasks. These results may supplement recent accounts of the relation between structure and function in the brain. The networks used consist out of several modules, standard subnetworks that serve as higherorder units with a distinct structure and function. The simulations rely on a particular network module called CALM (Murre, Phaf, and Wolters, 1989, 1992). This module, developed mainly for unsupervised categorization and learning, is able to adjust its local learning dynamics. The way in which modules are interconnected is an important determinant of the learning and categorization behaviour of the network as a whole. Based on arguments derived from neuroscience, psychology, compu...
Approximation theory of the MLP model in neural networks
 ACTA NUMERICA
, 1999
"... In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are appr ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximationtheoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model. In the first
Approximating a Function and its Derivatives Using MSEOptimal Linear Combinations of Trained Feedforward Neural Networks
 In Proceedings of the Joint Conference on Neural Networks
, 1993
"... In this paper, we show that using MSEoptimal linear combinations of a set of trained feedforward networks may significantly improve the accuracy of approximating a function and its first and second order derivatives. Our results are compared to the accuracies achieved by the single best network and ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
In this paper, we show that using MSEoptimal linear combinations of a set of trained feedforward networks may significantly improve the accuracy of approximating a function and its first and second order derivatives. Our results are compared to the accuracies achieved by the single best network and by the simple averaging of the outputs of the trained networks. 1 Introduction Feedforward neural networks (FNN) are widely used for function approximation. They are considered universal approximators capable of approximating an unknown mapping and its derivatives arbitrarily well (Hornik et al. 1990). Approximating the derivatives, that is the derivatives of the output with respect to the inputs, is of significant importance in many applications. For example in process optimization, the first and second order derivatives obtained from a neural network, which was trained on the process response, may be used in approximating the gradient vector and the Hessian matrix of the process response...