Results 1  10
of
33
Algebraic analysis for nonidentifiable learning machines
 Neural Computation
"... This paper clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously pr ..."
Abstract

Cited by 46 (14 self)
 Add to MetaCart
This paper clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously prove that the Bayesian stochastic complexity or the free energy is asymptotically equal to λ1 log n − (m1 − 1) log log n+constant, where n is the number of training samples and λ1 and m1 are the rational number and the natural number which are determined as the birational invariant values of the singularities in the parameter space. Also we show an algorithm to calculate λ1 and m1 based on the resolution of singularities in algebraic geometry. In regular statistical models, 2λ1 is equal to the number of parameters and m1 = 1, whereas in nonregular models such as multilayer networks, 2λ1 is not larger than the number of parameters and m1 ≥ 1. Since the increase of the stochastic complexity is equal to the learning curve or the generalization error, the nonidentifiable learning machines are the better models than the regular ones if the Bayesian ensemble learning is applied. 1 1
Approximation theory of the MLP model in neural networks
 ACTA NUMERICA
, 1999
"... In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are appr ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximationtheoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model. In the first
Generalization Bounds for Function Approximation from Scattered Noisy Data
, 1998
"... this paper we investigate the problem of providing error bounds for approximation of an unknown function from scattered, noisy data. This problem has particular relevance in the field of machine learning, where the unknown function represents the task that has to be learned and the scattered data re ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
this paper we investigate the problem of providing error bounds for approximation of an unknown function from scattered, noisy data. This problem has particular relevance in the field of machine learning, where the unknown function represents the task that has to be learned and the scattered data represents the examples of this task. An obvious quantity of interest for us is the generalization error  a measure of how much the result of the approximation scheme differs from the unknown function  typically studied as a function of the number of data points. Since the data are randomly generated and noisy, the analysis of the generalization error necessarily involves statistical considerations in addition to the traditional
Nonparametric time series prediction through adaptive model selection
 Machine Learning
, 2000
"... Abstract. We consider the problem of onestep ahead prediction for time series generated by an underlying stationary stochastic process obeying the condition of absolute regularity, describing the mixing nature of process. We make use of recent results from the theory of empirical processes, and ada ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Abstract. We consider the problem of onestep ahead prediction for time series generated by an underlying stationary stochastic process obeying the condition of absolute regularity, describing the mixing nature of process. We make use of recent results from the theory of empirical processes, and adapt the uniform convergence framework of Vapnik and Chervonenkis to the problem of time series prediction, obtaining finite sample bounds. Furthermore, by allowing both the model complexity and memory size to be adaptively determined by the data, we derive nonparametric rates of convergence through an extension of the method of structural risk minimization suggested by Vapnik. All our results are derived for general L p error measures, and apply to both exponentially and algebraically mixing processes.
Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks
 Neural Computation
, 1998
"... We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC dimension grows as W log W , where W is the number of parameters in the network. This result stands in opp ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC dimension grows as W log W , where W is the number of parameters in the network. This result stands in opposition to the case where the number of layers is unbounded, in which case the VC dimension grows as W 2 . 1 MOTIVATION The VC dimension is an important measure of the complexity of a class of binaryvalued functions, since it characterizes the amount of data required for learning in the PAC setting (see [BEHW89, Vap82]). In this paper, we establish upper and lower bounds on the VC dimension of a specific class of multilayered feedforward neural networks. Let F be the class of binaryvalued functions computed by a feedforward neural network with W weights and k computational (noninput) units, each with a piecewise polynomial activation function. Goldberg and Jerrum [GJ95] have shown that...
Towards Robust Model Selection using Estimation and Approximation Error Bounds
 Proc. 9 th Annual Conference on Computational Learning Theory, p.57, ACM
, 1996
"... this paper we extend on previous work [17] and introduce a novel model selection criterion, based on combining two recent chains of thought. In particular we make use of the powerful framework of uniform convergence of empirical processes pioneered by Vapnik and Chernovenkins [23], combined with rec ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
this paper we extend on previous work [17] and introduce a novel model selection criterion, based on combining two recent chains of thought. In particular we make use of the powerful framework of uniform convergence of empirical processes pioneered by Vapnik and Chernovenkins [23], combined with recent results concerning the approximation ability of nonlinear manifolds of functions, focusing in particular on feedforward neural networks. The main contributions of this work are twofold: (i) Conceptual  elucidating a coherent and robust framework for model selection, (ii) Technical  the main contribution here is a lower bound on the approximation error (Theorem 10), which holds in a well specified sense for most functions of interest. As far as we are aware, this result is new in the field of function approximation. The remainder of the paper is organized as follows. In
Hierarchical MixturesofExperts for Exponential Family Regression Models: Approximation and Maximum Likelihood Estimation
 Ann. Statistics
, 1999
"... this paper we consider the denseness and consistency of these models in the generalized linear model context. Before proceeding we present some notation regarding mixtures and hierarchical mixtures of generalized linear models and oneparameter exponential family HIERARCHICAL MIXTURESOFEXPERTS 3 ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
this paper we consider the denseness and consistency of these models in the generalized linear model context. Before proceeding we present some notation regarding mixtures and hierarchical mixtures of generalized linear models and oneparameter exponential family HIERARCHICAL MIXTURESOFEXPERTS 3 regression models. Generalized linear models are widely used in statistical practice [McCullagh and Nelder (1989)]. Oneparameter exponential family regression models [see Bickel and Doksum (1977), page 67] with generalized linear mean functions (GLM1) are special examples of the generalized linear models, where the probability distribution can be parameterized by the mean function. In the regression context, a GLM1 model proposes that the conditional expectation (x) of a real response variable y (the output) is related to a vector of predictors (or inputs)
On the Approximation of Functional Classes Equipped with a Uniform Measure Using Ridge Functions
, 1999
"... this paper are threefold: (i) the construction of a uniform measure over a functional class B which is similar to a Besov class. (ii) Proving a lower bound on the degree of approximation by ridge functions which holds for all functions in some subset of B of probability measure 1&$ with respec ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
this paper are threefold: (i) the construction of a uniform measure over a functional class B which is similar to a Besov class. (ii) Proving a lower bound on the degree of approximation by ridge functions which holds for all functions in some subset of B of probability measure 1&$ with respect to the uniform measure. (iii) Introducing a probabilistic width d n, $ for nonlinear approximation and estimating , +, M n ) for a uniform measure +
1997], Error bounds for functional approximation and estimation using mixtures of experts
"... We examine some mathematical aspects of learning unknown mappings with the Mixture of Experts Model (MEM). Speci cally, we observe that the MEM is at least as powerful as a class of neural networks, in a sense that will be made precise. Upper bounds on the approximation error are established for a w ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We examine some mathematical aspects of learning unknown mappings with the Mixture of Experts Model (MEM). Speci cally, we observe that the MEM is at least as powerful as a class of neural networks, in a sense that will be made precise. Upper bounds on the approximation error are established for a wide class of target functions. The general theorem states that inf kf; f nk p c=n r=d holds uniformly for f 2 W r(L) (a Sobolev class over [;1 � 1] p d), where fn belongstoanndimensional manifold of normalized ridge functions. The same bound holds for the MEM as a special case of the above. The stochastic error, in the context of learning from i.i.d. examples, is also examined. An asymptotic analysis establishes the limiting behavior of this error, in terms of certain pseudoinformation matrices. These results substantiate the intuition behind the MEM, and motivate applications.
Learning Efficiency of Redundant Neural Networks in Bayesian Estimation
, 2001
"... This paper proves that the Bayesian stochastic complexity of a layered neural network is asymptotically smaller than that of a regular statistical model if it contains the true distribution. We consider a case when a threelayer perceptron with M input units, H hidden units, and N output units is t ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
This paper proves that the Bayesian stochastic complexity of a layered neural network is asymptotically smaller than that of a regular statistical model if it contains the true distribution. We consider a case when a threelayer perceptron with M input units, H hidden units, and N output units is trained to estimate the true distribution represented by the model with H 0 hidden units, and prove that the stochastic complexity is asymptotically smaller than (1/2){H 0 (M + N) + R} log n where n is the number of training samples and R is a function of HH 0 , M , and N that is far smaller than the number of redundant parameters. Since the generalization error of Bayesian estimation is equal to the increase of stochastic complexity, it is smaller than (1/2n){H 0 (M +N)+R} if it has an asymptotic expansion. Based on the results, the di#erence between layered neural networks and regular statistical models is discussed from the statistical point of view. Key Words: Generalization Error, Kullback Information, Free Energy, Bayesian Learning, Nonidentifiable model. 1