Results 11 - 20
of
245
Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems
- IEEE Transactions on Neural Networks
, 1997
"... In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm and the network architecture, is then presented. Keywords--- Constructive algorithm, structure learning, state space search, dynamic node creation, projection pursuit regression, cascade-correlation, resource-allocating network, group method of data handling. I. Introduction A. Problems with Fixed Size Networks I N recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among...
Large Sample Sieve Estimation of Semi-Nonparametric Models
- Handbook of Econometrics
, 2007
"... Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method o ..."
Abstract
-
Cited by 46 (11 self)
- Add to MetaCart
Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method of sieves provides one way to tackle such complexities by optimizing an empirical criterion function over a sequence of approximating parameter spaces, called sieves, which are significantly less complex than the original parameter space. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated econometric models. For example, it can simultaneously estimate the parametric and nonparametric components in semi-nonparametric models with or without constraints. It can easily incorporate prior information, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and non-negativity. This chapter describes estimation of semi-nonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve M-estimates, pointwise normality of series estimates of regression functions, root-n asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite dimensional parameters. Examples are used to illustrate the general results.
Uniqueness of the Weights for Minimal Feedforward Nets with a Given Input-Output Map
, 1992
"... We show that, for feedforward nets with a single hidden layer, a single output node, and a "transfer function" Tanh s, the net is uniquely determined by its inputoutput map, up to an obvious finite group of symmetries (permutations of the hidden nodes, and changing the sign of all the weights associ ..."
Abstract
-
Cited by 44 (2 self)
- Add to MetaCart
We show that, for feedforward nets with a single hidden layer, a single output node, and a "transfer function" Tanh s, the net is uniquely determined by its inputoutput map, up to an obvious finite group of symmetries (permutations of the hidden nodes, and changing the sign of all the weights associated to a particular hidden node), provided that the net is irreducible, i.e. that there does not exist an inner node that makes a zero contribution to the output, and there is no pair of hidden nodes that could be collapsed to a single node without altering the input-output map. Rutgers Center for Systems and Control, May 1991 Revised October 1991 Research supported in part by the Air Force Office of Scientific Research (AFOSR-91-0343). The author thanks Eduardo Sontag for suggesting the problem and for his helpful comments and ideas, and an anonymous referee for suggesting how to improve the exposition at several points. Requests for reprints should be sent to H'ector J. Sussmann, Departme...
On the Relationship Between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions
- NEURAL COMPUTATION
, 1996
"... Feedforward networks are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the ..."
Abstract
-
Cited by 42 (6 self)
- Add to MetaCart
Feedforward networks are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the generalization error can be decomposed in two terms: the approximation error, due to the insufficient representational capacity of a finite sized network, and the estimation error, due to insufficient information about the target function because of the finite number of samples. We then consider the problem of approximating functions belonging to certain Sobolev spaces with Gaussian Radial Basis Functions. Using the above mentioned decomposition we bound the generalization error in terms of the number of basis functions and number of examples. While the bound that we derive is specific for Radial Basis Functions, a number of observations deriving from it apply to any approximation t...
Noisy Time Series Prediction using a Recurrent Neural Network and Grammatical Inference
- Machine Learning
, 2001
"... Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method proposed uses conversion into a symbolic representation with a selforganizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for th...
Using Wavelet Network in Nonparametric Estimation
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1994
"... In this paper one approach is proposed for using wavelets in non parametric regression estimation. The proposed non parametric estimator, named wavelet network, has a neural network like structure, but consists of wavelets. It makes use of techniques of regressor selection completed with backpropaga ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
In this paper one approach is proposed for using wavelets in non parametric regression estimation. The proposed non parametric estimator, named wavelet network, has a neural network like structure, but consists of wavelets. It makes use of techniques of regressor selection completed with backpropagation procedures. It is capable of handling nonlinear regressions of moderately large input dimension with sparse training data. Numerical examples are reported to illustrate the performance of this proposed approach.
Neural Networks for Optimal Approximation of Smooth and Analytic Functions
- Neural Computation
, 1996
"... . We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if the activation function evaluated by each principal element satisfies certain technical conditions. Under these con ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
. We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if the activation function evaluated by each principal element satisfies certain technical conditions. Under these conditions, it is also possible to construct networks that provide a geometric order of approximation for analytic target functions. The permissible activation functions include the squashing function (1 + e -x ) -1 as well as a variety of radial basis functions. Our proofs are constructive. The weights and thresholds of our networks are chosen independently of the target function; we give explicit formulas for the coe#cients as simple, continuous, linear functionals of the target function. 1. Introduction. In recent years, there has been a great deal of research in the theory of approximation of real valued functions using artificial neural networks with one or more hidden layers, with each pr...
Covering Number Bounds of Certain Regularized Linear Function Classes
- Journal of Machine Learning Research
, 2002
"... Recently, sample complexity bounds have been derived for problems involving linear functions such as neural networks and support vector machines. In many of these theoretical studies, the concept of covering numbers played an important role. It is thus useful to study covering numbers for linear ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
Recently, sample complexity bounds have been derived for problems involving linear functions such as neural networks and support vector machines. In many of these theoretical studies, the concept of covering numbers played an important role. It is thus useful to study covering numbers for linear function classes. In this paper, we investigate two closely related methods to derive upper bounds on these covering numbers. The first method, already employed in some earlier studies, relies on the so-called Maurey's lemma; the second method uses techniques from the mistake bound framework in online learning. We compare results from these two methods, as well as their consequences in some learning formulations.
On Selecting Models for Nonlinear Time Series
- Physica D
, 1995
"... Constructing models from time series with nontrivial dynamics involves the problem of how to choose the best model from within a class of models, or to choose between competing classes. This paper discusses a method of building nonlinear models of possibly chaotic systems from data, while maintainin ..."
Abstract
-
Cited by 36 (11 self)
- Add to MetaCart
Constructing models from time series with nontrivial dynamics involves the problem of how to choose the best model from within a class of models, or to choose between competing classes. This paper discusses a method of building nonlinear models of possibly chaotic systems from data, while maintaining good robustness against noise. The models that are built are close to the simplest possible according to a description length criterion. The method will deliver a linear model if that has shorter description length than a nonlinear model. We show how our models can be used for prediction, smoothing and interpolation in the usual way. We also show how to apply the results to identification of chaos by detecting the presence of homoclinic orbits directly from time series. 1 The Model Selection Problem As our understanding of chaotic and other nonlinear phenomena has grown, it has become apparent that linear models are inadequate to model most dynamical processes. Nevertheless, linear models...
On the Rate of Convergence of Regularized Boosting Classifiers
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite a large class of distributions, the probability of error converges to the Bayes risk at a rate faster than n -(V+2)/(4(V+1)) where V is the VC dimension of the "base" class whose elements are combined by boosting methods to obtain an aggregated classifier. The dimension-independent nature of the rates may partially explain the good behavior of these methods in practical problems. Under Tsybakov's noise condition the rate of convergence is even faster. We investigate the conditions necessary to obtain such rates for different base classes. The special case of boosting using decision stumps is studied in detail. We characterize the class of classifiers realizable by aggregating decision stumps.

