Results 1 
5 of
5
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
Using a financial training criterion rather than a prediction criterion
 International Journal of Neural Systems
, 1997
"... noisy time series The application of this work is to decision taking with nancial timeseries, using learning algorithms. The traditional approach is to train a model using a prediction criterion, such as minimizing the squared error between predictions and actual values of a dependent variable, or ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
noisy time series The application of this work is to decision taking with nancial timeseries, using learning algorithms. The traditional approach is to train a model using a prediction criterion, such as minimizing the squared error between predictions and actual values of a dependent variable, or maximizing the likelihood of a conditional model of the dependent variable. We nd here with noisy timeseries that better results can be obtained when the model is directly trained in order to maximize the nancial criterion of interest, here gains and losses (including those due to transactions) incurred during trading. Experiments were performed on portfolio selection with 35 Canadian stocks. 1
Why Error Measures are SubOptimal for Training Neural Network Pattern Classifiers
, 1992
"... Pattern classifiers that are trained in a supervisedfashion (e.g., multilayer perceptrons, radial basis functions, etc.) are typically trained with an error measure objective function such as meansquared error (MSE) or crossentropy (CE). These classifiers can in theory yield (optimal) Bayesian di ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Pattern classifiers that are trained in a supervisedfashion (e.g., multilayer perceptrons, radial basis functions, etc.) are typically trained with an error measure objective function such as meansquared error (MSE) or crossentropy (CE). These classifiers can in theory yield (optimal) Bayesian discrimination, but in practice they often fail to do so. We explain why this happens. In so doing, we identify a number of characteristics that the optimal objective function for training classifiers must have. We show that classification figures of merit (CFM mono ) possess these optimal characteristics, whereas error measures such as MSE and CE do not. We illustrate our arguments with a simple example in which a CFM mono trained loworder polynomial neural network approximates Bayesian discrimination on a random scalar with the fewest number of training samples and the minimum functional complexity necessary for the task. A comparable MSEtrained net yields significantly worse discriminati...
Differential Theory of Learning for Efficient Neural Network Pattern Recognition
 in Applications of Artificial Neural Networks IV
, 1965
"... We describe a new theory of differential learning by which a broad family of pattern classifiers (including many wellknown neural network paradigms) can learn stochastic concepts efficiently. We describe the relationship between a classifier's ability to generalize well to unseen test examples and ..."
Abstract
 Add to MetaCart
We describe a new theory of differential learning by which a broad family of pattern classifiers (including many wellknown neural network paradigms) can learn stochastic concepts efficiently. We describe the relationship between a classifier's ability to generalize well to unseen test examples and the efficiency of the strategy by which it learns. We list a series of proofs that differential learning is efficient in its information and computational resource requirements, whereas traditional probabilistic learning strategies are not. The proofs are illustrated by a simple example that lends itself to closedform analysis. We conclude with an optical character recognition task for which three different types of differentially generated classifiers generalize significantly better than their probabilistically generated counterparts. 1 DIFFERENTIAL LEARNING A differentiable supervised classifier is one that learns an inputtooutput mapping by adjusting a set of internal parameters ` via...
Differentially Generated Neural Network Classifiers Are Efficient
"... Differential learning for statistical pattern classification is described in [5]; it is based on the classification figureofmerit (CFM) objective function described in [9, 5]. We prove that differential learning is asymptotically efficient, guaranteeing the best generalization allowed by the choic ..."
Abstract
 Add to MetaCart
Differential learning for statistical pattern classification is described in [5]; it is based on the classification figureofmerit (CFM) objective function described in [9, 5]. We prove that differential learning is asymptotically efficient, guaranteeing the best generalization allowed by the choice of hypothesis class (see below) as the training sample size grows large, while requiring the least classifier complexity necessary for Bayesian (i.e., minimum probabilityoferror) discrimination. Moreover, differential learning almost always guarantees the best generalization allowed by the choice of hypothesis class for small training sample sizes.