Results 1  10
of
45
A Fast Stochastic ErrorDescent Algorithm for Supervised Learning and Optimization
 In
, 1993
"... A parallel stochastic algorithm is investigated for errordescent learning and optimization in deterministic networks of arbitrary topology. No explicit information about internal network structure is needed. The method is based on the modelfree distributed learning mechanism of Dembo and Kaila ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
A parallel stochastic algorithm is investigated for errordescent learning and optimization in deterministic networks of arbitrary topology. No explicit information about internal network structure is needed. The method is based on the modelfree distributed learning mechanism of Dembo and Kailath. A modified parameter update rule is proposed by which each individual parameter vector perturbation contributes a decrease in error. A substantially faster learning speed is hence allowed. Furthermore, the modified algorithm supports learning timevarying features in dynamical networks. We analyze the convergence and scaling properties of the algorithm, and present simulation results for dynamic trajectory learning in recurrent networks. 1 Background and Motivation We address general optimization tasks that require finding a set of constant parameter values p i that minimize a given error functional E(p). For supervised learning, the error functional consists of some quantitativ...
Neural Network Adaptations to Hardware Implementations
, 1997
"... In order to take advantage of the massive parallelism offered by artificial neural networks, hardware implementations are essential. However, most standard neural network models are not very suitable for implementation in hardware and adaptations are needed. In this section an overview is given of t ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
In order to take advantage of the massive parallelism offered by artificial neural networks, hardware implementations are essential. However, most standard neural network models are not very suitable for implementation in hardware and adaptations are needed. In this section an overview is given of the various issues that are encountered when mapping an ideal neural network model onto a compact and reliable neural network hardware implementation, like quantization, handling nonuniformities and nonideal responses, and restraining computational complexity. Furthermore, a broad range of hardwarefriendly learning rules is presented, which allow for simpler and more reliable hardware implementations. The relevance of these neural network adaptations to hardware is illustrated by their application in existing hardware implementations.
Frequency sensitive competitive learning for balanced clustering on highdimensional hyperspheres
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2004
"... Competitive learning mechanisms for clustering in general suffer from poor performance for very high dimensional (> 1000) data because of “curse of dimensionality” effects. In applications such as document clustering, it is customary to normalize the high dimensional input vectors to unit length, a ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
Competitive learning mechanisms for clustering in general suffer from poor performance for very high dimensional (> 1000) data because of “curse of dimensionality” effects. In applications such as document clustering, it is customary to normalize the high dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regularkmeans and its soft EM based version,spkmeans tends to generate extremely imbalanced clusters in high dimensional spaces when the desired number of clusters is large (tens or more). In this paper, we first show that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von MisesFisher distributions as the generative model and in fact it can be considered as a batch mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency sensitive competitive learning variants that are applicable to static data and produced high quality and well balanced clusters for highdimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. We also propose a frequency sensitive algorithm to cluster streaming 1 data. Experimental results on clustering of highdimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques.
Using Single Layer Networks for Discrete, Sequential Data: an Example from Natural Language Processing
 Neural Computing Applications
, 1997
"... Natural Language Processing (NLP) is concerned with processing ordinary, unrestricted text. This work takes a new approach to a traditional NLP task, using neural computing methods. A parser which has been successfully implemented is described. It is a hybrid system, in which neural processors opera ..."
Abstract

Cited by 10 (10 self)
 Add to MetaCart
Natural Language Processing (NLP) is concerned with processing ordinary, unrestricted text. This work takes a new approach to a traditional NLP task, using neural computing methods. A parser which has been successfully implemented is described. It is a hybrid system, in which neural processors operate within a rule based framework. The neural processing components belong to the class of Generalized Single Layer Networks (GSLN). In general, supervised, feedforward networks need more than one layer to process data. However, in some cases data can be preprocessed with a nonlinear transformation, and then presented in a linearly separable form for subsequent processing by a single layer net. Such networks o er advantages of functional transparency and operational speed. For our parser, the initial stage of processing maps linguistic data onto a higher order representation, which can then be analysed by a single layer network. This transformation is supported by information theoretic analysis. Three di erent algorithms for the neural component were investigated. Single layer nets can be trained by nding weight adjustments based on (a) factors proportional to the input, as in the Perceptron, (b) factors proportional to the existing weights, and (c) an error minimization method. In our experiments generalization ability varies little � method (b) is used for a prototype parser. This is available via telnet.
Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
, 2004
"... Gradientfollowing learning methods can encounter problems of implementation in many applications, and stochastic variants are sometimes used to overcome these difficulties. We analyze three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and we ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Gradientfollowing learning methods can encounter problems of implementation in many applications, and stochastic variants are sometimes used to overcome these difficulties. We analyze three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. Learning speed is defined as the rate of exponential decay in the learning curves. When the scalar parameter that controls the size of weight updates is chosen to maximize learning speed, node perturbation is slower than direct gradient descent by a factor equal to the number of output units; weight perturbation is slower still by an additional factor equal to the number of input units. Parallel perturbation allows faster learning than sequential perturbation, by a factor that does not depend on network size. We also characterize how uncertainty in quantities used in the stochastic updates affects the learning curves. This study suggests that in practice, weight perturbation may be slow for large networks, and node perturbation can have performance comparable to that of direct gradient descent when there are few output units. However, these statements depend on the specifics of the learning problem, such as the input distribution and the target function, and are not universally applicable.
A fast partial parse of natural language sentences using a connectionist method
 In 7th Conference of the European Chapter of the Association of Computational Linguistics
, 1995
"... method ..."
The Potential of Prototype Styles of Generalization
 In Proceedings of the Sixth Australian Joint Conference on Artifical Intelligence (AI'93
, 1993
"... . There are many ways for a learning system to generalize from training set data. This paper presents several generalization styles using prototypes in an attempt to provide accurate generalization on training set data for a wide variety of applications. These generalization styles are efficient in ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
. There are many ways for a learning system to generalize from training set data. This paper presents several generalization styles using prototypes in an attempt to provide accurate generalization on training set data for a wide variety of applications. These generalization styles are efficient in terms of time and space, and lend themselves well to massively parallel architectures. Empirical results of generalizing on several realworld applications are given, and these results indicate that the prototype styles of generalization presented have potential to provide accurate generalization for many applications. 1. Introduction There are many ways for a learning system to generalize from training set data. This paper proposes several generalization styles using prototypes in an attempt to provide accurate generalization on training set data for a wide variety of applications. These generalization styles are efficient in terms of time and space, and lend themselves well to massively ...
Neural network design for a natural language parser
 In International Conference on Artificial Neural Networks (ICANN
, 1995
"... The pattern matching capabilities of neural networks can be mobilised for an automated, natural language, partial parser. First, language complexity is addressed by decomposing the problem into more tractable subtasks. Second, a representation is devised that enables effective, single layer networks ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
The pattern matching capabilities of neural networks can be mobilised for an automated, natural language, partial parser. First, language complexity is addressed by decomposing the problem into more tractable subtasks. Second, a representation is devised that enables effective, single layer networks to be used to map a predefined grammatic framework onto actual sentences. This paper examines data representation, network architecture and learning algorithms appropriate for linguistic data with their characteristic distributions. Users can access a working prototype via telnet on which they can try their own text. 1
HardwareFriendly Learning Algorithms for Neural Networks: an Overview
, 1996
"... The hardware implementation of artificial neural networks and their learning algorithms is a fascinating area of research with farreaching applications. However, the mapping from an ideal mathematical model to compact and reliable hardware is far from evident. This paper presents an overview of var ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The hardware implementation of artificial neural networks and their learning algorithms is a fascinating area of research with farreaching applications. However, the mapping from an ideal mathematical model to compact and reliable hardware is far from evident. This paper presents an overview of various methods that simplify the hardware implementation of neural network models. Adaptations that are proper to specific learning rules or network architectures are discussed. These range from the use of perturbation in multilayer feedforward networks and local learning algorithms to quantization effects in selforganizing feature maps. Moreover, in more general terms, the problems of inaccuracy, limited precision, and robustness are treated.