Results 1  10
of
88
Networks of Spiking Neurons: The Third Generation of Neural Network Models
 Neural Networks
, 1997
"... The computational power of formal models for networks of spiking neurons is compared with that of other neural network models based on McCulloch Pitts neurons (i.e. threshold gates) respectively sigmoidal gates. In particular it is shown that networks of spiking neurons are computationally more powe ..."
Abstract

Cited by 138 (12 self)
 Add to MetaCart
The computational power of formal models for networks of spiking neurons is compared with that of other neural network models based on McCulloch Pitts neurons (i.e. threshold gates) respectively sigmoidal gates. In particular it is shown that networks of spiking neurons are computationally more powerful than these other neural network models. A concrete biologically relevant function is exhibited which can be computed by a single spiking neuron (for biologically reasonable values of its parameters), but which requires hundreds of hidden units on a sigmoidal neural net. This article does not assume prior knowledge about spiking neurons, and it contains an extensive list of references to the currently available literature on computations in networks of spiking neurons and relevant results from neurobiology. 1 Definitions and Motivations If one classifies neural network models according to their computational units, one can distinguish three different generations. The first generation i...
Statistical Behavior and Consistency of Classification Methods based on Convex Risk Minimization
, 2001
"... We study how close the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classification error function. The measurement of closeness is characterized by the loss function used in the estimation. ..."
Abstract

Cited by 112 (6 self)
 Add to MetaCart
We study how close the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classification error function. The measurement of closeness is characterized by the loss function used in the estimation. We show that such a classification scheme can be generally regarded as a (non maximumlikelihood) conditional inclass probability estimate, and we use this analysis to compare various convex loss functions that have appeared in the literature. Furthermore, the theoretical insight allows us to design good loss functions with desirable properties. Another aspect of our analysis is to demonstrate the consistency of certain classification methods using convex risk minimization.
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
Fast Sigmoidal Networks via Spiking Neurons
 Neural Computation
, 1997
"... We show that networks of relatively realistic mathematical models for biological neurons can in principle simulate arbitrary feedforward sigmoidal neural nets in a way which has previously not been considered. This new approach is based on temporal coding by single spikes (respectively by the timing ..."
Abstract

Cited by 52 (8 self)
 Add to MetaCart
We show that networks of relatively realistic mathematical models for biological neurons can in principle simulate arbitrary feedforward sigmoidal neural nets in a way which has previously not been considered. This new approach is based on temporal coding by single spikes (respectively by the timing of synchronous firing in pools of neurons), rather than on the traditional interpretation of analog variables in terms of firing rates. The resulting new simulation is substantially faster and hence more consistent with experimental results about the maximal speed of information processing in cortical neural systems. As a consequence we can show that networks of noisy spiking neurons are "universal approximators" in the sense that they can approximate with regard to temporal coding any given continuous function of several variables. This result holds for a fairly large class of schemes for coding analog variables by firing times of spiking neurons. Our new proposal for the possible organiza...
Neural Networks for Optimal Approximation of Smooth and Analytic Functions
 Neural Computation
, 1996
"... . We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if the activation function evaluated by each principal element satisfies certain technical conditions. Under these con ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
. We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if the activation function evaluated by each principal element satisfies certain technical conditions. Under these conditions, it is also possible to construct networks that provide a geometric order of approximation for analytic target functions. The permissible activation functions include the squashing function (1 + e x ) 1 as well as a variety of radial basis functions. Our proofs are constructive. The weights and thresholds of our networks are chosen independently of the target function; we give explicit formulas for the coe#cients as simple, continuous, linear functionals of the target function. 1. Introduction. In recent years, there has been a great deal of research in the theory of approximation of real valued functions using artificial neural networks with one or more hidden layers, with each pr...
Boosting with early stopping: convergence and consistency
 Annals of Statistics
, 2003
"... Abstract Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form an ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
Abstract Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency, and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early stopping strategies under which boosting is shown to be consistent based on iid samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step sizes, as known in practice through the works of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with ffl! 0 stepsize becomes an L1margin maximizer when left to run to convergence. 1 Introduction In this paper we consider boosting algorithms for classification and regression. These algorithms present one of the major progresses in machine learning. In their original version, the computational aspect is explicitly specified as part of the estimator/algorithm. That is, the empirical minimization of an appropriate loss function is carried out in a greedy fashion, which means that at each step, a basis function that leads to the largest reduction of empirical risk is added into the estimator. This specification distinguishes boosting from other statistical procedures which are defined by an empirical minimization of a loss function without the numerical optimization details.
Approximation theory of the MLP model in neural networks
 ACTA NUMERICA
, 1999
"... In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are appr ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximationtheoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model. In the first
Extreme learning machine: A new learning scheme of feedforward neural networks
 IN PROC. INT. JOINT CONF. NEURAL NETW
, 2006
"... It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: 1) the slow gradientbased learning algorithms are extensively used to train neural netwo ..."
Abstract

Cited by 37 (13 self)
 Add to MetaCart
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: 1) the slow gradientbased learning algorithms are extensively used to train neural networks, and 2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these traditional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for singlehidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide the best generalization performance at extremely fast learning speed. The experimental results based on realworld benchmarking function approximation and classification problems including large complex applications show that the new algorithm can produce best generalization performance in some cases and can learn much faster than traditional popular learning algorithms for feedforward neural networks.
Extreme learning machine: Theory and applications
, 2006
"... It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradientbased learning algorithms are extensively used to train neural net ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradientbased learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for singlehidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.
On the Computational Power of WinnerTakeAll
, 2000
"... This article initiates a rigorous theoretical analysis of the computational power of circuits that employ modules for computing winnertakeall. Computational models that involve competitive stages have so far been neglected in computational complexity theory, although they are widely used in com ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
This article initiates a rigorous theoretical analysis of the computational power of circuits that employ modules for computing winnertakeall. Computational models that involve competitive stages have so far been neglected in computational complexity theory, although they are widely used in computational brain models, artificial neural networks, and analog VLSI. Our theoretical analysis shows that winnertakeall is a surprisingly powerful computational module in comparison with threshold gates (= McCullochPitts neurons) and sigmoidal gates. We prove an optimal quadratic lower bound for computing winnertakeall in any feedforward circuit consisting of threshold gates. In addition we show that arbitrary continuous functions can be approximated by circuits employing a single soft winnertakeall gate as their only nonlinear operation. Our