Results 1 - 10
of
45
Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems
- Proceedings of the IEEE
, 1998
"... this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, ph ..."
Abstract
-
Cited by 193 (4 self)
- Add to MetaCart
this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, physics, biology, control and signal processing, information theory, complexity theory, and psychology (see [45]). Neural networks have provided a fertile soil for the infusion (and occasionally confusion) of ideas, as well as a meeting ground for comparing viewpoints, sharing tools, and renovating approaches. It is within the ill-defined boundaries of the field of neural networks that researchers in traditionally distant fields have come to the realization that they have been attacking fundamentally similar optimization problems.
Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems
- IEEE Transactions on Neural Networks
, 1997
"... In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm and the network architecture, is then presented. Keywords--- Constructive algorithm, structure learning, state space search, dynamic node creation, projection pursuit regression, cascade-correlation, resource-allocating network, group method of data handling. I. Introduction A. Problems with Fixed Size Networks I N recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among...
Global Optimization for Neural Network Training
- IEEE Computer
, 1996
"... In this paper, we study various supervised learning methods for training feed-forward neural networks. In general, such learning can be considered as a nonlinear global optimization problem in which the goal is to minimize a nonlinear error function that spans the space of weights using heuristic st ..."
Abstract
-
Cited by 38 (12 self)
- Add to MetaCart
In this paper, we study various supervised learning methods for training feed-forward neural networks. In general, such learning can be considered as a nonlinear global optimization problem in which the goal is to minimize a nonlinear error function that spans the space of weights using heuristic strategies that look for global optima (in contrast to local optima). We survey various global optimization methods suitable for neural-network learning, and propose the NOVEL method, a novel global optimization method for nonlinear optimization and neural network learning. By combining global and local searches, we show how NOVEL can be used to find a good local minimum in the error space. Our key idea is to use a user-defined trace that pulls a search out of a local minimum without having to restart it from a new starting point. Using five benchmark problems, we compare NOVEL against some of the best global optimization algorithms and demonstrate its superior improvement in performance. 1 In...
A review of dimension reduction techniques
, 1997
"... The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in high-dimensional spaces and as a modelling tool for such data. It is defined as the search for a low-dimensional manifold that embeds the high-dimensional data. A cl ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in high-dimensional spaces and as a modelling tool for such data. It is defined as the search for a low-dimensional manifold that embeds the high-dimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projection pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text.
Learning and Approximation Capabilities of Adaptive Spline Activation Function Neural Networks
- NEURAL NETWORKS
, 1998
"... In this paper, we study the theoretical properties of a new kind of artificial neural network, which is able to adapt its activation functions by varying the control points of a Catmull --Rom cubic spline. Most of all, we are interested in generalization capability, and we can show that our architec ..."
Abstract
-
Cited by 25 (17 self)
- Add to MetaCart
In this paper, we study the theoretical properties of a new kind of artificial neural network, which is able to adapt its activation functions by varying the control points of a Catmull --Rom cubic spline. Most of all, we are interested in generalization capability, and we can show that our architecture presents several advantages. First of all, it can be seen as a sub-optimal realization of the additive spline based model obtained by the reguralization theory. Besides, simulations confirm that the special learning mechanism allows to use in a very effective way the network's free parameters, keeping their total number at lower values than in networks with sigmoidal activation functions. Other notable properties are a shorter training time and a reduced hardware complexity, due to the surplus in the number of neurons. # 1998 Elsevier Science Ltd. All rights reserved. Keywords: Spline neural networks; Multilayer perceptron; Generalized sigmoidal functions; Adaptive activation functions...
Constructive Feedforward Neural Networks for Regression Problems: A Survey
, 1995
"... In this paper, we review the procedures for constructing feedforward neural networks in regression problems. While standard back-propagation performs gradient descent only in the weight space of a network with fixed topology, constructive procedures start with a small network and then grow additiona ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
In this paper, we review the procedures for constructing feedforward neural networks in regression problems. While standard back-propagation performs gradient descent only in the weight space of a network with fixed topology, constructive procedures start with a small network and then grow additional hidden units and weights until a satisfactory solution is found. The constructive procedures are categorized according to the resultant network architecture and the learning algorithm for the network weights. The Hong Kong University of Science & Technology Technical Report Series Department of Computer Science 1 Introduction In recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among them, the class of multi-layer feedforward networks is perhaps the most popular. Standard back-propagation performs gradient descent only in the weight space of a network with fixed topology; this approach is analogous to ...
Multilayer Feedforward Networks with Adaptive Spline Activation Function
- IEEE Trans. on Neural Network
"... In this paper, a new adaptive spline activation function neural network (ASNN) is presented. Due to the ASNN's high representation capabilities, networks with a small number of interconnections can be trained to solve both pattern recognition and data processing real-time problems. The main idea is ..."
Abstract
-
Cited by 20 (18 self)
- Add to MetaCart
In this paper, a new adaptive spline activation function neural network (ASNN) is presented. Due to the ASNN's high representation capabilities, networks with a small number of interconnections can be trained to solve both pattern recognition and data processing real-time problems. The main idea is to use a Catmull--Rom cubic spline as the neuron's activation function, which ensures a simple structure suitable for both software and hardware implementation. Experimental results demonstrate improvements in terms of generalization capability and of learning speed in both pattern recognition and data processing tasks. Index Terms--- Adaptive activation functions, function shape autotuning, generalization, generalized sigmoidal functions, multilayer perceptron, neural networks, spline neural networks. I. INTRODUCTION I N both hardware and software neural network (NN) implementations, the complexity, both structural, in terms of interconnections, and computational, in terms of the number...
Mixture of Experts Regression Modeling by Deterministic Annealing
- IEEE Transactions on Signal Processing
, 1997
"... We propose a new learning algorithm for regression modeling. The method is especially suitable for optimizing neural network structures that are amenable to a statistical description as mixture models. These include mixture of experts, hierarchical mixture of experts (HME), and normalized radial bas ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
We propose a new learning algorithm for regression modeling. The method is especially suitable for optimizing neural network structures that are amenable to a statistical description as mixture models. These include mixture of experts, hierarchical mixture of experts (HME), and normalized radial basis functions (NRBF). Unlike recent maximum likelihood (ML) approaches, we directly minimize the (squared) regression error. We use the probabilistic framework as means to define an optimization method that avoids many shallow local minima on the complex cost surface. Our method is based on deterministic annealing (DA), where the entropy of the system is gradually reduced, with the expected regression cost (energy) minimized at each entropy level. The corresponding Lagrangian is the system's "free-energy," and this annealing process is controlled by variation of the Lagrange multiplier, which acts as a "temperature" parameter. The new method consistently and substantially outperformed the com...
Spatial modelling using a new class of nonstationary covariance functions
- Environmetrics
, 2006
"... We introduce a new class of nonstationary covariance functions for spatial modelling. Nonstationary covariance functions allow the model to adapt to spatial surfaces whose variability changes with location. The class includes a nonstationary version of the Matérn stationary covariance, in which the ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
We introduce a new class of nonstationary covariance functions for spatial modelling. Nonstationary covariance functions allow the model to adapt to spatial surfaces whose variability changes with location. The class includes a nonstationary version of the Matérn stationary covariance, in which the differentiability of the spatial surface is controlled by a parameter, freeing one from fixing the differentiability in advance. The class allows one to knit together local covariance parameters into a valid global nonstationary covariance, regardless of how the local covariance structure is estimated. We employ this new nonstationary covariance in a fully Bayesian model in which the unknown spatial process has a Gaussian process (GP) distribution with a nonstationary covariance function from the class. We model the nonstationary structure in a computationally efficient way that creates nearly stationary local behavior and for which stationarity is a special case. We also suggest non-Bayesian approaches to nonstationary kriging. To assess the method, we compare the Bayesian nonstationary GP model with a Bayesian stationary GP model, various standard spatial smoothing approaches, and nonstationary models that can adapt to function heterogeneity. In simulations, the nonstationary GP model adapts to function heterogeneity, unlike the stationary models, and also outperforms the other nonstationary models. On a real dataset, GP models outperform the competitors, but while the nonstationary GP gives qualitatively more sensible results, it fails to outperform the stationary GP on held-out data, illustrating the difficulty in fitting complex spatial functions with relatively few observations. The nonstationary covariance model could also be used for non-Gaussian data and embedded in additive models as well as in more complicated, hierarchical spatial or spatio-temporal models. More complicated models may require simpler parameterizations for computational efficiency.
Bayesian Wavelet Networks for Nonparametric Regression
, 1997
"... Radial wavelet networks have recently been proposed as a method for nonparametric regression. In this paper we analyse their performance within a Bayesian framework. We derive probability distributions over both the dimension of the networks and the network coefficients by placing a prior on the deg ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
Radial wavelet networks have recently been proposed as a method for nonparametric regression. In this paper we analyse their performance within a Bayesian framework. We derive probability distributions over both the dimension of the networks and the network coefficients by placing a prior on the degrees of freedom of the model. This process bypasses the need to test or select a finite number of networks during the modelling process. Predictions are formed by mixing over many models of varying dimension and parameterization. We show that the complexity of the models adapts to the complexity of the data and produces good results on a number of benchmark test series. Keywords: Wavelets, radial basis functions, model choice, Bayesian neural networks, reversible jump Markov chain Monte Carlo, nonparametric regression, splines. 1 Introduction Wavelet networks have previously been studied in relation to nonparametric regression by Zhang (1997), Kugarajah and Zhang (1995), Zhang and Benveni...

