Results 11  20
of
300
Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems
 IEEE Transactions on Neural Networks
, 1997
"... In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole ..."
Abstract

Cited by 66 (2 self)
 Add to MetaCart
In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm and the network architecture, is then presented. Keywords Constructive algorithm, structure learning, state space search, dynamic node creation, projection pursuit regression, cascadecorrelation, resourceallocating network, group method of data handling. I. Introduction A. Problems with Fixed Size Networks I N recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among...
Forecasting Exchange Rates Using Feedforward And Recurrent Neural Networks
, 1994
"... this paper (based on a different data set) was presented at the 1992 North American Winter Meeting of the Econometric SocietyinNew Orleans, Louisiana. ..."
Abstract

Cited by 63 (2 self)
 Add to MetaCart
this paper (based on a different data set) was presented at the 1992 North American Winter Meeting of the Econometric SocietyinNew Orleans, Louisiana.
Mixture Density Estimation
 IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12
, 1999
"... Gaussian mixtures (or socalled radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neural networks for function fitting and approximation. In both cases, it is possible to give simple expressions for the iterative improvement of performance as component ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
Gaussian mixtures (or socalled radial basis function networks) for density estimation provide a natural counterpart to sigmoidal neural networks for function fitting and approximation. In both cases, it is possible to give simple expressions for the iterative improvement of performance as components of the network are introduced one at a time. In particular, for mixture density estimation we show that a kcomponent mixture estimated by maximum likelihood (or by an iterative likelihood improvement that we introduce) achieves loglikelihood within order 1/k of the loglikelihood achievable by any convex combination. Consequences for approximation and estimation using KullbackLeibler risk are also given. A Minimum Description Length principle selects the optimal number of components k that minimizes the risk bound.
Using Wavelet Network in Nonparametric Estimation
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1994
"... In this paper one approach is proposed for using wavelets in non parametric regression estimation. The proposed non parametric estimator, named wavelet network, has a neural network like structure, but consists of wavelets. It makes use of techniques of regressor selection completed with backpropaga ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
In this paper one approach is proposed for using wavelets in non parametric regression estimation. The proposed non parametric estimator, named wavelet network, has a neural network like structure, but consists of wavelets. It makes use of techniques of regressor selection completed with backpropagation procedures. It is capable of handling nonlinear regressions of moderately large input dimension with sparse training data. Numerical examples are reported to illustrate the performance of this proposed approach.
Adaptive forwardbackward greedy algorithm for learning sparse representations
 IEEE Trans. Inform. Theory
, 2011
"... Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with nonzero coefficients and reconstructing the target function from noisy observations. Two heuristics that ..."
Abstract

Cited by 52 (8 self)
 Add to MetaCart
Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with nonzero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
Uniqueness of the Weights for Minimal Feedforward Nets with a Given InputOutput Map
, 1992
"... We show that, for feedforward nets with a single hidden layer, a single output node, and a "transfer function" Tanh s, the net is uniquely determined by its inputoutput map, up to an obvious finite group of symmetries (permutations of the hidden nodes, and changing the sign of all the weights associ ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
We show that, for feedforward nets with a single hidden layer, a single output node, and a "transfer function" Tanh s, the net is uniquely determined by its inputoutput map, up to an obvious finite group of symmetries (permutations of the hidden nodes, and changing the sign of all the weights associated to a particular hidden node), provided that the net is irreducible, i.e. that there does not exist an inner node that makes a zero contribution to the output, and there is no pair of hidden nodes that could be collapsed to a single node without altering the inputoutput map. Rutgers Center for Systems and Control, May 1991 Revised October 1991 Research supported in part by the Air Force Office of Scientific Research (AFOSR910343). The author thanks Eduardo Sontag for suggesting the problem and for his helpful comments and ideas, and an anonymous referee for suggesting how to improve the exposition at several points. Requests for reprints should be sent to H'ector J. Sussmann, Departme...
Vector Greedy Algorithms
"... Our objective is to study nonlinear approximation with regard to redundant systems. Redundancy on the one hand offers much promise for greater efficiency in terms of approximation rate, but on the other hand gives rise to highly nontrivial theoretical and practical problems. Greedy type approximati ..."
Abstract

Cited by 51 (8 self)
 Add to MetaCart
Our objective is to study nonlinear approximation with regard to redundant systems. Redundancy on the one hand offers much promise for greater efficiency in terms of approximation rate, but on the other hand gives rise to highly nontrivial theoretical and practical problems. Greedy type approximations proved to be convenient and efficient ways of constructing mterm approximants. We introduce and study vector greedy algorithms that are designed with aim of constructing mth greedy approximants simultaneously for a given finite number of elements. We prove convergence theorems and obtain some estimates for the rate of convergence of vector greedy algorithms when elements come from certain classes.
On the Relationship Between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions
 NEURAL COMPUTATION
, 1996
"... Feedforward networks are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
Feedforward networks are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the generalization error can be decomposed in two terms: the approximation error, due to the insufficient representational capacity of a finite sized network, and the estimation error, due to insufficient information about the target function because of the finite number of samples. We then consider the problem of approximating functions belonging to certain Sobolev spaces with Gaussian Radial Basis Functions. Using the above mentioned decomposition we bound the generalization error in terms of the number of basis functions and number of examples. While the bound that we derive is specific for Radial Basis Functions, a number of observations deriving from it apply to any approximation t...
Noisy Time Series Prediction using a Recurrent Neural Network and Grammatical Inference
 Machine Learning
, 2001
"... Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, nonstationarity, and nonlinearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, nonstationarity, and nonlinearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method proposed uses conversion into a symbolic representation with a selforganizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with nonstationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for th...
On the Rate of Convergence of Regularized Boosting Classifiers
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite ..."
Abstract

Cited by 46 (10 self)
 Add to MetaCart
A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite a large class of distributions, the probability of error converges to the Bayes risk at a rate faster than n (V+2)/(4(V+1)) where V is the VC dimension of the "base" class whose elements are combined by boosting methods to obtain an aggregated classifier. The dimensionindependent nature of the rates may partially explain the good behavior of these methods in practical problems. Under Tsybakov's noise condition the rate of convergence is even faster. We investigate the conditions necessary to obtain such rates for different base classes. The special case of boosting using decision stumps is studied in detail. We characterize the class of classifiers realizable by aggregating decision stumps.