Results 1  10
of
10
Efficient Agnostic Learning of Neural Networks with Bounded Fanin
, 1996
"... We show that the class of two layer neural networks with bounded fanin is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to ..."
Abstract

Cited by 68 (18 self)
 Add to MetaCart
We show that the class of two layer neural networks with bounded fanin is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to approximate the neural network which minimizes the expected quadratic error. As special cases, the model allows learning realvalued functions with bounded noise, learning probabilistic concepts and learning the best approximation to a target function that cannot be well approximated by the neural network. The networks we consider have realvalued inputs and outputs, an unlimited number of threshold hidden units with bounded fanin, and a bound on the sum of the absolute values of the output weights. The number of computation This work was supported by the Australian Research Council and the Australian Telecommunications and Electronics Research Board. The material in this paper was pres...
MemoryUniversal Prediction of Stationary Random Processes
 IEEE Trans. Inform. Theory
, 1998
"... We consider the problem of onestepahead prediction of a realvalued, stationary, strongly mixing random process fX i g i=01 . The best meansquare predictor of X0 is its conditional mean given the entire infinite past fX i g i=01 . Given a sequence of observations X1 X2 111 XN, we propose estimato ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
We consider the problem of onestepahead prediction of a realvalued, stationary, strongly mixing random process fX i g i=01 . The best meansquare predictor of X0 is its conditional mean given the entire infinite past fX i g i=01 . Given a sequence of observations X1 X2 111 XN, we propose estimators for the conditional mean based on sequences of parametric models of increasing memory and of increasing dimension, for example, neural networks and Legendre polynomials. The proposed estimators select both the model memory and the model dimension, in a datadriven fashion, by minimizing certain complexity regularized least squares criteria. When the underlying predictor function has a finite memory, we establish that the proposed estimators are memoryuniversal: the proposed estimators, which do not know the true memory, deliver the same statistical performance (rates of integrated meansquared error) as that delivered by estimators that know the true memory. Furthermore, when the underlying predictor function does not have a finite memory, we establish that the estimator based on Legendre polynomials is consistent.
Minimum Complexity Regression Estimation with Weakly Dependent Observations
 IEEE Trans. Inform. Theory
, 1996
"... Parameter Spaces and Abstract Complexities For each integer rt _> 1, let % denote a model dimension, for example, see (2), and let S, denote a compact subset of ]R The set S, will serve as a collection of parameters associated with the model dimension %, for example, see (5). For every v S,, let f( ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Parameter Spaces and Abstract Complexities For each integer rt _> 1, let % denote a model dimension, for example, see (2), and let S, denote a compact subset of ]R The set S, will serve as a collection of parameters associated with the model dimension %, for example, see (5). For every v S,, let f(,, v) denote a realvalued function on Bx parameterized by (n, v), for example, see (3). The following condition is required to invoke the exponential inequalities in Theorems 4.2 and 4.3.
Semiparametric ARX Neural Network Models with an Application to Forecasting Inflation
, 2001
"... In this paper we examine semiparametric nonlinear autoregressive models with exogenous variables (NLARX) via three classes of artificial neural networks: the first one uses smooth sigmoid activation functions; the second one uses radial basis activation functions; and the third one uses ridgelet act ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
In this paper we examine semiparametric nonlinear autoregressive models with exogenous variables (NLARX) via three classes of artificial neural networks: the first one uses smooth sigmoid activation functions; the second one uses radial basis activation functions; and the third one uses ridgelet activation functions. We provide root mean squared error convergence rates for these ANN estimators of the conditional mean and median functions with stationary betamixing data. As an empirical application, we compare the forecasting performance of linear and semiparametric NLARX models of U.S. inflation. We find that all of our semiparametric models outperform a benchmark linear model based on various forecast performance measures. In addition, a semiparametric ridgelet NLARX model which includes various lags of historical inflation and the GDP gap is best in terms of both forecast mean squared error and forecast mean absolute deviation error.
A Survey on Universal Approximation and Its Limits in Soft Computing Techniques
, 2003
"... This paper deals with the approximation behaviour of soft computing techniques. First, we give a survey of the results of universal approximation theorems achieved so far in various soft computing areas, mainly in fuzzy control and neural networks. We point out that these techniques have common appr ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
This paper deals with the approximation behaviour of soft computing techniques. First, we give a survey of the results of universal approximation theorems achieved so far in various soft computing areas, mainly in fuzzy control and neural networks. We point out that these techniques have common approximation behaviour in the sense that an arbitrary function of a certain set of functions (usually the set of continuous function, C) can be approximated with arbitrary accuracy # on a compact domain. The drawback of these results is that one needs unbounded numbers of "building blocks" (i.e. fuzzy sets or hidden neurons) to achieve the prescribed # accuracy. If the number of building blocks is restricted, it is proved for some fuzzy systems that the universal approximation property is lost, moreover, the set of controllers with bounded number of rules is nowhere dense in the set of continuous functions. Therefore it is reasonable to make a tradeo# between accuracy and the number of the building blocks, by determining the functional relationship between them. We survey this topic by showing the results achieved so far, and its inherent limitations. We point out that approximation rates, or constructive proofs can only be given if some characteristic of smoothness is known about the approximated function.
Selected Training Exemplars for Neural Network Learning
, 1994
"... The dissertation of Mark Plutowski is approved, and it is acceptable in quality and form for publication on microfilm: CoChair CoChair ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
The dissertation of Mark Plutowski is approved, and it is acceptable in quality and form for publication on microfilm: CoChair CoChair
Adaptive Estimation in Pattern Recognition by Combining Different Procedures
 Statistica Sinica
"... : We study a problem of adaptive estimation of a conditional probability function in a pattern recognition setting. In many applications, for more flexibility, one may want to consider various estimation procedures targeted at different scenarios and/or under different assumptions. For example, when ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
: We study a problem of adaptive estimation of a conditional probability function in a pattern recognition setting. In many applications, for more flexibility, one may want to consider various estimation procedures targeted at different scenarios and/or under different assumptions. For example, when the feature dimension is high, to overcome the familiar curse of dimensionality, one may seek a good parsimonious model among a number of candidates such as CART, neural nets, additive models, and others. For such a situation, one wishes to have an automated final procedure performing always as well as the best candidate. In this work, we propose a method to combine a countable collection of procedures for estimating the conditional probability. We show that the combined procedure has a property that its statistical risk is bounded above by that of any of the procedure being considered plus a small penalty. Thus in an asymptotic sense, the strengths of the different estimation procedures i...
Agnostic Learning and Single Hidden Layer Neural Networks
, 1996
"... This thesis is concerned with some theoretical aspects of supervised learning of realvalued functions. We study a formal model of learning called agnostic learning. The agnostic learning model assumes a joint probability distribution on the observations (inputs and outputs) and requires the learnin ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This thesis is concerned with some theoretical aspects of supervised learning of realvalued functions. We study a formal model of learning called agnostic learning. The agnostic learning model assumes a joint probability distribution on the observations (inputs and outputs) and requires the learning algorithm to produce an hypothesis with performance close to that of the best function within a specified class of functions. It is a very general model of learning which includes function learning, learning with additive noise and learning the best approximation in a class of functions as special cases. Within the agnostic learning model, we concentrate on learning functions which can be well approximated by single hidden layer neural networks. Artificial neural networks are often used as black box models for modelling phenomena for which very little prior knowledge is available. Agnostic learning is a natural model for such learning problems. The class of single hidden layer neural netwo...
Implementation of Backpropagation Neural Networks On Large Parallel Computers
"... Introduction The chapter will consider the problem of mapping the backpropagation training of real neural applications onto large parallel systems. Many parallel neural training programs have been implemented, but most are tested on none or few real applications. If the programs are tested for larg ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Introduction The chapter will consider the problem of mapping the backpropagation training of real neural applications onto large parallel systems. Many parallel neural training programs have been implemented, but most are tested on none or few real applications. If the programs are tested for large neural networks they will usually show better performance than what is obtainable for real neural networks, which are usually small. On parallel systems with few processing elements, many mapping schemes are able to run efficiently. However, as the number of processors increases, the problem of communication overheads and uneven load becomes more prominent. Based on the inherent degrees of parallelism in the training algorithm, it is possible to suggest an efficient mapping, which minimizes the problems. In the chapter, we propose a parallel mapping of neural training, which adapts the configuration to the neural application. A popular way of implementing backpropagation networks o
Nonasymptotic bounds on the L_2 error of neural network regression estimates
, 2002
"... Estimation of multivariate regression functions from bounded i.i.d. data is considered. The L 2 error with integration with respect to the design measure is used as an error criterion. The distribution of the design is assumed to be concentrated on a nite set. ..."
Abstract
 Add to MetaCart
Estimation of multivariate regression functions from bounded i.i.d. data is considered. The L 2 error with integration with respect to the design measure is used as an error criterion. The distribution of the design is assumed to be concentrated on a nite set.