Results 1  10
of
28
The Evidence Framework applied to Classification Networks
 Neural Computation
, 1992
"... Three Bayesian ideas are presented for supervised adaptive classifiers. First, it is argued that the output of a classifier should be obtained by marginalising over the posterior distribution of the parameters; a simple approximation to this integral is proposed and demonstrated. This involves a `mo ..."
Abstract

Cited by 152 (10 self)
 Add to MetaCart
Three Bayesian ideas are presented for supervised adaptive classifiers. First, it is argued that the output of a classifier should be obtained by marginalising over the posterior distribution of the parameters; a simple approximation to this integral is proposed and demonstrated. This involves a `moderation' of the most probable classifier 's outputs, and yields improved performance. Second, it is demonstrated that the Bayesian framework for model comparison described for regression models in (MacKay, 1992a, 1992b) can also be applied to classification problems. This framework successfully chooses the magnitude of weight decay terms, and ranks solutions found using different numbers of hidden units. Third, an informationbased data selection criterion is derived and demonstrated within this framework. 1 Introduction A quantitative Bayesian framework has been described for learning of mappings in feedforward networks (MacKay, 1992a, 1992b). It was demonstrated that this `evidence' fram...
Connectionist Probability Estimation in HMM Speech Recognition
 IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract

Cited by 61 (16 self)
 Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
Natural language grammatical inference with recurrent neural networks
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1998
"... This paper examines the inductive inference of a complex grammar with neural networks  specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the P ..."
Abstract

Cited by 45 (1 self)
 Add to MetaCart
This paper examines the inductive inference of a complex grammar with neural networks  specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or GovernmentandBinding theory. Neural networks are trained, without the division into learned vs. innate components assumed by Chomsky, in an attempt to produce the same judgments as native speakers on sharply grammatical/ungrammatical data. How a recurrent neural network could possess linguistic capability and the properties of various common recurrent neural network architectures are discussed. The problem exhibits training behavior which is often not present with smaller grammars and training was initially difficult. However, after implementing several techniques aimed at improving the convergence of the gradient descent backpropagationthroughtime training algorithm, significant learning was possible. It was found that certain architectures are better able to learn an appropriate grammar. The operation of the networks and their training is analyzed. Finally, the extraction of rules in the form of deterministic finite state automata is investigated.
Neural networks for classification: a survey
 and Cybernetics  Part C: Applications and Reviews
, 2000
"... Abstract—Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes the some of the most important developments in neural network classification research. Specifically, the issues of posterior probability esti ..."
Abstract

Cited by 45 (0 self)
 Add to MetaCart
Abstract—Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes the some of the most important developments in neural network classification research. Specifically, the issues of posterior probability estimation, the link between neural and conventional classifiers, learning and generalization tradeoff in classification, the feature variable selection, as well as the effect of misclassification costs are examined. Our purpose is to provide a synthesis of the published research in this area and stimulate further research interests and efforts in the identified topics. Index Terms—Bayesian classifier, classification, ensemble methods, feature variable selection, learning and generalization, misclassification costs, neural networks. I.
Conditional distribution learning with neural networks and its application to channel equalization," to appear
 IEEE Trans. Signal Processing
, 1997
"... Abstract — We present a conditional distribution learning formulation for realtime signal processing with neural networks based on a recent extension of maximum likelihood theory—partial likelihood (PL) estimation—which allows for i) dependent observations and ii) sequential processing. For a gener ..."
Abstract

Cited by 26 (11 self)
 Add to MetaCart
Abstract — We present a conditional distribution learning formulation for realtime signal processing with neural networks based on a recent extension of maximum likelihood theory—partial likelihood (PL) estimation—which allows for i) dependent observations and ii) sequential processing. For a general neural network conditional distribution model, we establish a fundamental informationtheoretic connection, the equivalence of maximum PL estimation, and accumulated relative entropy (ARE) minimization, and obtain large sample properties of PL for the general case of dependent observations. As an example, the binary case with the sigmoidal perceptron as the probability model is presented. It is shown that the single and multilayer perceptron (MLP) models satisfy conditions for the equivalence of the two cost functions: ARE and negative log partial likelihood. The practical issue of their gradient descent minimization is then studied within the wellformed cost functions framework. It is shown that these are wellformed cost functions for networks without hidden units; hence, their gradient descent minimization is guaranteed to converge to a solution if one exists on such networks. The formulation is applied to adaptive channel equalization, and simulation results are presented to show the ability of the least relative entropy equalizer to realize complex decision boundaries and to recover during training from convergence at the wrong extreme in cases where the mean square errorbased MLP equalizer cannot. I.
Learning in Boltzmann Trees
 Neural Computation
, 1995
"... We introduce a large family of Boltzmann machines that can be trained using standard gradient descent. The networks can have one or more layers of hidden units, with treelike connectivity. We show how to implement the supervised learning algorithm for these Boltzmann machines exactly, without resor ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
We introduce a large family of Boltzmann machines that can be trained using standard gradient descent. The networks can have one or more layers of hidden units, with treelike connectivity. We show how to implement the supervised learning algorithm for these Boltzmann machines exactly, without resort to simulated or meanfield annealing. The stochastic averages that yield the gradients in weight space are computed by the technique of decimation. We present results on the problems of Nbit parity and the detection of hidden symmetries. 1 Introduction Boltzmann machines (Ackley, Hinton, & Sejnowski, 1985) have several compelling virtues. Unlike simple perceptrons, they can solve problems that are not linearly separable. The learning rule, simple and locally based, lends itself to massive parallelism. The theory of Boltzmann learning, moreover, has a solid foundation in statistical mechanics. Unfortunately, Boltzmann machines as originally conceivedalso have some serious drawbacks...
Principles for an Integrated Connectionist/Symbolic Theory of Higher Cognition
, 1992
"... The main claim of this paper is that connectionism offers cognitive science a number of excellent opportunities for turning methodological, theoretical. and metatheoretica! schisms into powerfnl integrationsopportunities for forging constructive synergy out of the destructive interference whic ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
The main claim of this paper is that connectionism offers cognitive science a number of excellent opportunities for turning methodological, theoretical. and metatheoretica! schisms into powerfnl integrationsopportunities for forging constructive synergy out of the destructive interference which plagues the field. The paper begins with an analysis of the rifts in tile field and what it would take to overcome them. We argue that while connectionism ha,s often contributed to the deepexLing of these schisms, ]t is nonetheless possible to turn this trend aroundpossible for connectionism to play a central role in a unification of cognitive science. Essential o this process is the development of strong theoretical principles founded (in part) on connectionist computation; a main goal of this paper is to demonstrate that such principles are indeed within the reach of a connectionistgrounded theory of cognition. The enterprise rests on a willingness to entertain, analyze, and extend characterizations of cognitive problems, and hypothesized solutions, which are deliberately overly simple and generalin order to discover the insights they can offer through mathematical analyses which this simplicity and generality are makes possible. In this
Equivalence of backpropagation and contrastive Hebbian learning in a layered network
"... BackprP&&mL666 and contrm866 e Hebbianlearnm6 ar two methods oftrPfi5W networ8 with hidden neur&; BackprWPmL6WW computes anerW signalfor the output neurtm andspr&j it over the hidden neurW5 Contrm85 e Hebbian learnm; involves clamping the outputneurm8 at desirP values, and letting the ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
BackprP&&mL666 and contrm866 e Hebbianlearnm6 ar two methods oftrPfi5W networ8 with hidden neur&; BackprWPmL6WW computes anerW signalfor the output neurtm andspr&j it over the hidden neurW5 Contrm85 e Hebbian learnm; involves clamping the outputneurm8 at desirP values, and letting the e#ect sprtm thrtmW feedback connections over the entir networ To investigate ther elationship between these twofor& of lear;W;m weconsider a special case in which they ar identical, a multilayer perWfi6;mL withlinear output units, to which weak feedback connections have been added. In this case, the change in networ state caused by clamping the outputneurm6 tur5 out to be the same as the erW signal spr&W by backprPfi;mLW656 exceptfor a scalar pr6&mL This suggests that the functionality of backprP&&mLWWP can ber&88&m altermLWW ely by a Hebbiantype learP5j algorP5mL which is suitablefor implementation in biological networW5 1
On the Applicability of Neural Network and Machine Learning Methodologies to Natural Language Processing
, 1995
"... We examine the inductive inference of a complex grammar  specifically, we consider the task of training a model to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic frame ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We examine the inductive inference of a complex grammar  specifically, we consider the task of training a model to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or GovernmentandBinding theory. We investigate the following models: feedforward neural networks, FransconiGoriSoda and BackTsoi locally recurrent networks, Elman, Narendra & Parthasarathy, and Williams & Zipser recurrent networks, Euclidean and editdistance nearestneighbors, simulated annealing, and decision trees. The feedforward neural networks and nonneural network machine learning models are included primarily for comparison. We address the question: How can a neural network, with its distributed nature and gradient descent based iterative calculations, possess linguistic capability which is traditionally handled with symbolic computation and recursive processes? Initial...
Combination Of Confidence Measures In Isolated Word Recognition
, 1998
"... In the context of commandandcontrol applications, we exploit confidence measures in order to classify singleword utterances into two categories: utterances within the vocabulary which are recognized correctly, and other utterances, namely outofvocabulary (OOV) or misrecognized utterances. ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
In the context of commandandcontrol applications, we exploit confidence measures in order to classify singleword utterances into two categories: utterances within the vocabulary which are recognized correctly, and other utterances, namely outofvocabulary (OOV) or misrecognized utterances.