Results 1 
3 of
3
Structured Support Vector Machines for Speech Recognition
, 2014
"... Discriminative training criteria and discriminative models are two eective improvements for HMMbased speech recognition.is thesis proposed a structured support vector machine (SSVM) framework suitable for medium to large vocabulary continuous speech recognition. An important aspect of structured ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Discriminative training criteria and discriminative models are two eective improvements for HMMbased speech recognition.is thesis proposed a structured support vector machine (SSVM) framework suitable for medium to large vocabulary continuous speech recognition. An important aspect of structured SVMs is the form of features. Several previously proposed features in the eld are summarized in this framework. Since some of these features can be extracted based on generative models, this provides an elegant way of combine generative and discriminative models. To apply the structured SVMs to continuous speech recognition, a number of issues need to be addressed. First, features require a segmentation to be specied. To incorporate the optimal segmentation into the training process, the training algorithm is modied making use of the concaveconvex optimisation procedure. A Viterbistyle algorithm is described for inferring the optimal segmentation based on discriminative parameters. Second, structured SVMs can be viewed as large margin log linear models using a zero mean Gaussian prior of the discriminative parameter. However this form of
Annotating large lattices with the exact word error
"... The acoustic model in modern speech recognisers is trained discriminatively, for example with the minimum Bayes risk. This criterion is hard to compute exactly, so that it is normally approximated by a criterion that uses fixed alignments of lattice arcs. This approximation becomes particularly pro ..."
Abstract
 Add to MetaCart
(Show Context)
The acoustic model in modern speech recognisers is trained discriminatively, for example with the minimum Bayes risk. This criterion is hard to compute exactly, so that it is normally approximated by a criterion that uses fixed alignments of lattice arcs. This approximation becomes particularly problematic with new types of acoustic models that require flexible alignments. It would be best to annotate lattices with the risk measure of interest, the exact word error. However, the algorithm for this uses finitestate automaton determinisation, which has exponential complexity and runs out of memory for large lattices. This paper introduces a novel method for determinising and minimising finitestate automata incrementally. Since it uses less memory, it can be applied to larger lattices. Index Terms: speech recognition, discriminative training, minimum Bayes risk
noiserobust speech recognition
, 2010
"... Model compensation techniques for noiserobust speech recognition approximate the corrupted speech distribution. This work introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it tr ..."
Abstract
 Add to MetaCart
Model compensation techniques for noiserobust speech recognition approximate the corrupted speech distribution. This work introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it transforms the integral in the likelihood expression, and then applies sequential importance resampling. Though it is too slow to compensate a speech recognition system, it enables a more finegrained assessment of compensation techniques, based on the kl divergences to the ideal compensation for individual components. The kl divergence appears to predict the word error rate well. This technique also makes it possible to evaluate the impact of approximations that compensation schemes make. For example, this work examines the influence of the assumption that the corrupted speech distribution is Gaussian and diagonalising that Gaussianâ€™s covariance. It also assesses the impact of a common approximation to the mismatch function for vts compensation, namely setting the