## Neural Networks for Optimal Approximation of Smooth and Analytic Functions (1996)

Venue: | Neural Computation |

Citations: | 43 - 5 self |

### BibTeX

@ARTICLE{Mhaskar96neuralnetworks,

author = {H. N. Mhaskar},

title = {Neural Networks for Optimal Approximation of Smooth and Analytic Functions},

journal = {Neural Computation},

year = {1996},

volume = {8},

pages = {164--177}

}

### Years of Citing Articles

### OpenURL

### Abstract

. We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if the activation function evaluated by each principal element satisfies certain technical conditions. Under these conditions, it is also possible to construct networks that provide a geometric order of approximation for analytic target functions. The permissible activation functions include the squashing function (1 + e -x ) -1 as well as a variety of radial basis functions. Our proofs are constructive. The weights and thresholds of our networks are chosen independently of the target function; we give explicit formulas for the coe#cients as simple, continuous, linear functionals of the target function. 1. Introduction. In recent years, there has been a great deal of research in the theory of approximation of real valued functions using artificial neural networks with one or more hidden layers, with each pr...

### Citations

1225 |
Multilayer feedforward networks are universal approximators. Neural Netw
- Hornik, Stinchcombe, et al.
- 1989
(Show Context)
Citation Context ...theory of approximation of real valued functions using artificial neural networks with one or more hidden layers, with each principal element (neuron) evaluating a sigmoidal or radial basis function (=-=[1, 2, 3, 5, 7, 8, 9, 15, 17, 18]-=-). A typical density result shows that a network can approximate an arbitrary function in a given function class to any degree of accuracy. Such theorems are proved for instance in [5, 8] in the case ... |

1106 | Integrals and Differential Properties of Functions - Singular - 1970 |

841 |
Approximation by superpositions of a sigmoidal function
- Cybenko
- 1989
(Show Context)
Citation Context ...theory of approximation of real valued functions using artificial neural networks with one or more hidden layers, with each principal element (neuron) evaluating a sigmoidal or radial basis function (=-=[1, 2, 3, 5, 7, 8, 9, 15, 17, 18]-=-). A typical density result shows that a network can approximate an arbitrary function in a given function class to any degree of accuracy. Such theorems are proved for instance in [5, 8] in the case ... |

639 |
F Girosi, “Networks for approximation and learning
- Poggio
- 1990
(Show Context)
Citation Context ...theory of approximation of real valued functions using artificial neural networks with one or more hidden layers, with each principal element (neuron) evaluating a sigmoidal or radial basis function (=-=[1, 2, 3, 5, 7, 8, 9, 15, 17, 18]-=-). A typical density result shows that a network can approximate an arbitrary function in a given function class to any degree of accuracy. Such theorems are proved for instance in [5, 8] in the case ... |

628 | Constructive Approximation
- Devore, Lorentz
- 1993
(Show Context)
Citation Context ...le, one is interested in approximating all functions of s real variables having a continuous gradient. By a suitable normalization, one may assume that the gradient is bounded by 1. It is known (e.g. =-=[6]-=-) that any reasonable approximation scheme to provide an approximation order # for all functions in this class must depend upon at least# # -s ) parameters. In [10], we showed how to construct network... |

434 |
Multivariable functional interpolation and adaptive networks
- Broomhead, Lowe
(Show Context)
Citation Context |

371 |
Universal approximation bounds for superpositions of a sigmoidal function
- Barron
- 1993
(Show Context)
Citation Context |

311 | Regularization theory and neural-network architectures, Neural Computation
- Girosi, Jones, et al.
- 1995
(Show Context)
Citation Context |

166 |
The Theory of Radial Basis Function Approximation
- Powell
- 1992
(Show Context)
Citation Context ...network can approximate an arbitrary function in a given function class to any degree of accuracy. Such theorems are proved for instance in [5, 8] in the case of sigmoidal activation functions and in =-=[16, 19]-=- for radial basis functions. Very general theorems of this nature can be found in [9, 12]. A related important problem is the complexity problem; i.e., to determine the number of neurons required to g... |

135 |
Fast learning in networks of locally tuned processing units
- Moody, Darken
- 1989
(Show Context)
Citation Context |

118 | Multilayer feed forward networks with a no polynomial activation function can approximate any function
- Leshno, Lin, et al.
- 1993
(Show Context)
Citation Context |

114 |
Universal approximation using radial basis function network
- Park, Sandberg
- 1991
(Show Context)
Citation Context ...network can approximate an arbitrary function in a given function class to any degree of accuracy. Such theorems are proved for instance in [5, 8] in the case of sigmoidal activation functions and in =-=[16, 19]-=- for radial basis functions. Very general theorems of this nature can be found in [9, 12]. A related important problem is the complexity problem; i.e., to determine the number of neurons required to g... |

82 | Singular Integrals and Di¤erential Properties of Functions - Stein - 1970 |

49 |
Statistical Learning Networks: A Unifying View
- Barron, Barron
- 1988
(Show Context)
Citation Context |

26 |
Timan, “Theory of Approximation of Functions of a Real Variable
- F
- 1963
(Show Context)
Citation Context ...ussin operator is defined by (3.3) vn(g, t) := 1 (n+1) s � n≤m≤2n sm(g, t), n ∈ Z, n ≥ 0, t ∈ [−π, π] s . The de la Valleé Poussin operator has the following important property. Proposition 3.1. cf. (=-=[22, 14]-=-) If r ≥ 1, s, m ≥ 1 are integers, 1 ≤ p ≤∞and g ∈ W p∗ r,s then vm(g) is a trigonometric polynomial of coordinatewise order at most 2m and (3.4) ||g − vm(g)|| p,[−π,π] s ≤ c m r ||g|| W p∗ r,s Furthe... |

16 |
Degree of approximation by neural and translation networks with a single hidden layer
- Mhaskar, Micchelli
- 1995
(Show Context)
Citation Context ... we showed how to construct networks with two hidden layers, each neuron evaluating a bounded sigmoidal function, to accomplish such an approximation order with O(# -s ) neurons. Along with Micchelli =-=[14]-=- we have studied this problem in much greater detail. The best result known so far for networks with a single hidden layer is that O(# -s-1 log(1/#)) neurons are enough if the activation function is t... |

15 |
Approximation by superposition of a sigmoidal function and radial basis functions
- Mhaskar, Micchelli
- 1992
(Show Context)
Citation Context ...ccuracy. Such theorems are proved for instance in [5, 8] in the case of sigmoidal activation functions and in [16, 19] for radial basis functions. Very general theorems of this nature can be found in =-=[9, 12]-=-. A related important problem is the complexity problem; i.e., to determine the number of neurons required to guarantee that all functions, assumed to belong to a certain function class, can be approx... |

15 |
On some extremal functions and their applications in the theory of analytic functions of several complex variables
- Siciak
- 1962
(Show Context)
Citation Context ... analytic in the poly-ellipse (2.11) E # := {z = (z 1 , . . . , z s ) # C s : |z j + # z 2 j - 1| # #, j = 1, . . . , s} for some # > 1 and 1s1sthen for every integer m # 1 there exists a polynomial (=-=[20]-=-) Lm (f) (di#erent from the polynomials described above) with coordinatewise degree not exceeding m such that (2.12) ||f - Lm (f)|| p # c #,#1 # -m 1 max z#E# |f(z)|. Approximating these polynomials b... |

11 |
Dimension independent bounds on the degree of approximation by neural networks
- Mhaskar, Micchelli
- 1994
(Show Context)
Citation Context ...ting aspect of this result is that the order of magnitude of the number of neurons is independent of the number of variables on which the function depends. Other bounds of this nature are obtained in =-=[13]-=- when the activation function is not necessarily sigmoidal. A very common assumption about the function class is defined in terms of the number of derivatives that a function possesses. For example, o... |

9 |
Approximation properties of a multilayered feedforward artificial neural network
- Mhaskar
- 1993
(Show Context)
Citation Context ...t is bounded by 1. It is known (e.g. [6]) that any reasonable approximation scheme to provide an approximation order # for all functions in this class must depend upon at least# # -s ) parameters. In =-=[10]-=-, we showed how to construct networks with two hidden layers, each neuron evaluating a bounded sigmoidal function, to accomplish such an approximation order with O(# -s ) neurons. Along with Micchelli... |

8 |
Timan, "Theory of Approximation of Functions of a Real Variable
- F
- 1963
(Show Context)
Citation Context ...operator is defined by (3.3) v n (g, t) := 1 (n + 1) s # n#m#2n s m (g, t), n # Z, n # 0, t # [-#, #] s . The de la Vallee Poussin operator has the following important property. Proposition 3.1. cf. (=-=[22, 14]-=-) If r # 1, s, m # 1 are integers, 1 # p # # and g # W p # r,s then vm (g) is a trigonometric polynomial of coordinatewise order at most 2m and (3.4) ||g - vm (g)|| p,[-#,#] s # c m r ||g|| W p # r,s ... |

4 |
Approximation of real functions using neural networks
- Mhaskar
- 1993
(Show Context)
Citation Context ...ate di#erent activation functions. A detailed discussion of the notion of localized approximation is not relevant within the context of this paper; we refer the reader to [4]. We made a conjecture in =-=[11]-=- that with a sigmoidal activation function, the number of neurons necessary to provide the approximation order # to all functions in this class, with or without localization, cannot be O(# -s ). In th... |

3 |
From regularization to radial, tensor and additive splines
- Poggio, Girosi, et al.
- 1993
(Show Context)
Citation Context |

2 |
Some limitations on neural networks with one hidden layer
- Chui, Li, et al.
- 1995
(Show Context)
Citation Context ...ult known so far for networks with a single hidden layer is that O(# -s-1 log(1/#)) neurons are enough if the activation function is the squashing function 1/(1 + e -x ). In our work with Chui and Li =-=[4] we have s-=-hown that if s > 1 and the approximation is required to be "localized", then at least # # -s log(1/#)) neurons are necessary, even if di#erent neurons may evaluate di#erent activation functi... |