Results 1 
7 of
7
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
ClassificationBased Objective Functions
 Machine Learning. In
, 2007
"... Abstract. Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classificationbased objective functions, an intuitive approach to training artificial neural networks on classification problems. Classificationbased ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classificationbased objective functions, an intuitive approach to training artificial neural networks on classification problems. Classificationbased learning attempts to guide the network directly to correct pattern classification rather than using an implicit search of common error minimization heuristics, such as sumsquarederror (SSE) and crossentropy (CE). CB1 is presented here as a novel objective function for learning classification problems. It seeks to directly minimize classification error by backpropagating error only on misclassified patterns from culprit output nodes. CB1 discourages weight saturation and overfitting and achieves higher accuracy on classification problems than optimizing SSE or CE. Experiments on a large OCR data set have shown CB1 to significantly increase generalization accuracy over SSE or CE optimization, from 97.86 % and 98.10%, respectively, to 99.11%. Comparable results are achieved over several data sets from the UC Irvine Machine Learning Database Repository, with an average increase in accuracy from 90.7 % and 91.3 % using optimized SSE and CE networks, respectively, to 92.1 % for CB1. Analysis indicates that CB1 performs a fundamentally different search of the feature space than optimizing SSE or CE and produces significantly different solutions.
Sensitivity Analysis for Selective Learning by Feedforward Neural Networks
, 2001
"... Research on improving the performance of feedforward neural networks has concentrated mostly on the optimal setting of initial weights and learning parameters, sophisticated optimization techniques, architecture optimization, and adaptive activation functions. An alternative approach is presented in ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Research on improving the performance of feedforward neural networks has concentrated mostly on the optimal setting of initial weights and learning parameters, sophisticated optimization techniques, architecture optimization, and adaptive activation functions. An alternative approach is presented in this paper where the neural network dynamically selects training patterns from a candidate training set during training, using the network's current attained knowledge about the target concept. Sensitivity analysis of the neural network output with respect to small input perturbations is used to quantify the informativeness of candidate patterns. Only the most informative patterns, which are those patterns closest to decision boundaries, are selected for training. Experimental results show a significant reduction in the training set size, without negatively influencing generalization performance and convergence characteristics. This approach to selective learning is then compared to an alternative where informativeness is measured as the magnitude in prediction error.
Improving Speech Recognition Learning through Lazy Training
"... Backpropagation, like most highorder learning algorithms, is prone to overfitting. We present a novel approach, called lazy training, for reducing the overfit in multipleoutput networks. Lazy training has been shown to reduce the error of optimized neural networks by more than half on a large OCR ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Backpropagation, like most highorder learning algorithms, is prone to overfitting. We present a novel approach, called lazy training, for reducing the overfit in multipleoutput networks. Lazy training has been shown to reduce the error of optimized neural networks by more than half on a large OCR data set and on several problems from the UCI machine learning database. Here, lazy training is shown to be effective in a multilayered adaptive learning system, reducing the error of an optimized backpropagation network in a speech recognition system by 55.0% on the TIDIGITS corpus.
Efficient Perceptron Learning Using Constrained Steepest Descent
"... Abstract — An algorithm is proposed for training the singlelayered perceptron. The algorithm follows successive steepest descent directions with respect to the perceptron cost function, taking care not to increase the number of misclassified patterns. The problem of finding these directions is stat ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract — An algorithm is proposed for training the singlelayered perceptron. The algorithm follows successive steepest descent directions with respect to the perceptron cost function, taking care not to increase the number of misclassified patterns. The problem of finding these directions is stated as a quadratic programming task, to which a fast and effective solution is proposed. The resulting algorithm has no free parameters and therefore no heuristics are involved in its application. It is proved that the algorithm always converges in a finite number of steps. For linearly separable problems, it always finds a hyperplane that completely separates patterns belonging to different categories. Termination of the algorithm without separating all given patterns means that the presented set of patterns is indeed linearly inseparable. Thus the algorithm provides a natural criterion for linear separability. Compared to other state of the art algorithms, the proposed method exhibits substantially improved speed, as demonstrated in a number of demanding benchmark classification tasks.
DOI 10.1007/s1099400662666 Classificationbased objective functions
, 2006
"... Abstract Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classificationbased objective functions, an approach to training artificial neural networks on classification problems. Classificationbased learning at ..."
Abstract
 Add to MetaCart
Abstract Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classificationbased objective functions, an approach to training artificial neural networks on classification problems. Classificationbased learning attempts to guide the network directly to correct pattern classification rather than using common error minimization heuristics, such as sumsquared error (SSE) and crossentropy (CE), that do not explicitly minimize classification error. CB1 is presented here as a novel objective function for learning classification problems. It seeks to directly minimize classification error by backpropagating error only on misclassified patterns from culprit output nodes. CB1 discourages weight saturation and overfitting and achieves higher accuracy on classification problems than optimizing SSE or CE. Experiments on a large OCR data set have shown CB1 to significantly increase generalization accuracy over SSE or CE optimization, from 97.86 % and 98.10%, respectively, to 99.11%. Comparable results are achieved over several data sets from the UC Irvine Machine Learning Database Repository, with an average increase in accuracy from 90.7 % and 91.3 % using optimized SSE and CE networks, respectively, to 92.1 % for CB1. Analysis indicates that CB1 performs a fundamentally different search of the feature space than optimizing SSE or CE and produces significantly different solutions.
Efficient Perceptron Learning Using Constrained Steepest Descent
"... An algorithm is proposed for training the singlelayered perceptron. The algorithm follows successive steepest descent directions with respect to the perceptron cost function, taking care not to increase the number of misclassified patterns. The problem of finding these directions is stated as a qua ..."
Abstract
 Add to MetaCart
An algorithm is proposed for training the singlelayered perceptron. The algorithm follows successive steepest descent directions with respect to the perceptron cost function, taking care not to increase the number of misclassified patterns. The problem of finding these directions is stated as a quadratic programming task, to which a fast and effective solution is proposed. The resulting algorithm has no free parameters and therefore no heuristics are involved in its application. It is proved that the algorithm always converges in a finite number of steps. For linearly separable problems, it always finds a hyperplane that completely separates patterns belonging to different categories. Termination of the algorithm without separating all given patterns means that the presented set of patterns is indeed linearly inseparable. Thus the algorithm provides a natural criterion for linear separability. Compared to other state of the art algorithms, the proposed method exhibits substantially im...