Results 1 - 10
of
22
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Large Vocabulary Decoding And Confidence Estimation Using Word Posterior Probabilities
- IN PROC. ICASSP 2000
, 2000
"... This paper investigates the estimation of word posterior probabilities based on word lattices and presents applications of these posteriors in a large vocabulary speech recognition system. A novel approach to integrating these word posterior probability distributions into a conventional Viterbi deco ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
This paper investigates the estimation of word posterior probabilities based on word lattices and presents applications of these posteriors in a large vocabulary speech recognition system. A novel approach to integrating these word posterior probability distributions into a conventional Viterbi decoder is presented. The problem of the robust estimation of confidence scores from word posteriors is examined and a method based on decision trees is suggested. The effectiveness of these techniques is demonstrated on the broadcast news and the conversational telephone speech corpora where improvements both in terms of word error rate and normalised cross entropy were achieved compared to the baseline HTK evaluation systems.
The CU-HTK March 2000 Hub5E Transcription System
, 2000
"... This paper describes the Cambridge University HTK (CU-HTK) system developed for the NIST March 2000 evaluation of English conversational telephone speech transcription (Hub5E). A range of new features have been added to the HTK system used in the 1998 Hub5 evaluation, and the changes taken together ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
This paper describes the Cambridge University HTK (CU-HTK) system developed for the NIST March 2000 evaluation of English conversational telephone speech transcription (Hub5E). A range of new features have been added to the HTK system used in the 1998 Hub5 evaluation, and the changes taken together have resulted in an 11% relative decrease in word error rate on the 1998 evaluation test set. Major changes include the use of maximum mutual information estimation in training as well as conventional maximum likelihood estimation; the use of a full variance transform for adaptation; the inclusion of unigram pronunciation probabilities; and word-level posterior probability estimation using confusion networks for use in minimum word error rate decoding, confidence score estimation and system combination. On the March 2000 Hub5 evaluation set the CU-HTK system gave an overall word error rate of 25.4%, which was the best performance by a statistically significant margin. This paper describes th...
The 2005 AMI system for the transcription of speech
- in Proc. MLMI’05
, 2005
"... Abstract. The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. This paper describes the development of a baseline automatic speech transcription system for meetings in the context of the ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Abstract. The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. This paper describes the development of a baseline automatic speech transcription system for meetings in the context of the AMI (Augmented Multiparty Interaction) project. We present several techniques important to processing of this data and show the performance in terms of word error rates (WERs). An important aspect of transcription of this data is the necessary flexibility in terms of audio pre-processing. Real world systems have to deal with flexible input, for example by using microphone arrays or randomly placed microphones in a room. Automatic segmentation and microphone array processing techniques are described and the effect on WERs is discussed. The system and its components presented in this paper yield compettive performance and form a baseline for future research in this domain. 1
Transcription of Conference Room Meetings: an Investigation
- IN PROCEEDINGS INTERSPEECH
, 2005
"... The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. In this paper we explore the use of various meeting corpora for the purpose of automatic speech recognition. In particular we investig ..."
Abstract
-
Cited by 15 (8 self)
- Add to MetaCart
The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. In this paper we explore the use of various meeting corpora for the purpose of automatic speech recognition. In particular we investigate the similarity of these resources and how to efficiently use them in the construction of a meeting transcription system. The analysis shows distinctive features for each resource. However the benefit in pooling data and hence the similarity seems sufficient to speak of a generic "conference meeting domain". In this context this paper also presents work on development for the AMI meeting transcription system, a joint effort by seven sites working on the AMI (augmented multi-party interaction) project.
Factor analysed hidden Markov models for Speech Recognition
- COMPUTER SPEECH AND LANGUAGE
, 2004
"... Recently various techniques to improve the correlation model of feature vector elements in speech recognition systems have been proposed. Such techniques include semi-tied covariance HMMs and systems based on factor analysis. All these schemes have been shown to improve the speech recognition perfor ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Recently various techniques to improve the correlation model of feature vector elements in speech recognition systems have been proposed. Such techniques include semi-tied covariance HMMs and systems based on factor analysis. All these schemes have been shown to improve the speech recognition performance without dramatically increasing the number of model parameters compared to standard diagonal covariance Gaussian mixture HMMs. This paper introduces a general form of acoustic model, the factor analysed HMM. A variety of configurations of this model and parameter sharing schemes, some of which correspond to standard systems, were examined. An EM algorithm for the parameter optimisation is presented along with a number of methods to increase the e#ciency of training. The performance of FAHMMs on medium to large vocabulary continuous speech recognition tasks was investigated. The experiments show that without elaborate complexity control an equivalent or better performance compared to a standard diagonal covariance Gaussian mixture HMM system can be achieved with considerably fewer parameters.
Automatic Transcription of Conversational Telephone Speech - Development of the CU-HTK 2002 System
- IEEE Transactions on Acoustics, Speech and Signal Processing
, 2003
"... This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modelling and model training, language and pronunciation modelling are pre ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modelling and model training, language and pronunciation modelling are presented. These include the use of conversation side based cepstral normalisation, vocal tract length normalisation, heteroscedastic linear discriminant analysis for feature projection, Minimum Phone Error Training and speaker adaptive training, latticebased model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation and class based language models.
Linear Gaussian models for speech recognition
- CAMBRIDGE UNIVERSITY
, 2004
"... Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete stat ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete state that generated them is not a good assumption for speech recognition. State space models may be used to address some shortcomings of this assumption. State space models are based on a continuous state vector evolving through time according to a state evo-
Assessment of Dialogue Systems By Means of a New Simulation Technique
, 2002
"... In recent years, aquestiT of greatieatTV: has been the development of tools and techni8T# tofaci))T#Z the evaluatiT ofdi:ZG9T systems. The latter can be evaluated fromvari(: poi( ofviZK such asrecogni#ZG and understandi # rates,dis,TVV naturalness and robustnessagaist recognissT errors.EvaluatiZ usu ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In recent years, aquestiT of greatieatTV: has been the development of tools and techni8T# tofaci))T#Z the evaluatiT ofdi:ZG9T systems. The latter can be evaluated fromvari(: poi( ofviZK such asrecogni#ZG and understandi # rates,dis,TVV naturalness and robustnessagaist recognissT errors.EvaluatiZ usually requiyT compim -T a large corpus of words and sentences uttered by users, relevant to theappli:VT#Z domai the systemi desimT9for.Thi paper proposes a newtechni9B that makesi possi(9 to reuse such a corpus for theevaluati# and to check the performance of the system whendinTV)G dinTV)G strategiT are used. ThetechniKZ i based on theautomati generatiT of conversati)) between thediT(B(K system, togetherwie anaddiK9T#( didiK9 system user#si8GG8T#()9 wi8 thediT(GZ: system. Thetechni8G has beenappliV to evaluate a di9:K8: system developedi our labusiV twodiT((ZK recogniT#( front-ends and twodiTZ8:( diTZ8:( strategi# to handle user confirmati(KZ The experiVT#( show that the prompt-dependentrecogniepe front-endachi-en better results, but that thi front-endi appropriVG onlyi users lirs thei utterances to those related to the current system prompt. The prompt-i(9VBKTiK front-endachi-en ihi-en results, but enables front-end users to utter anypermi89G utterance at anytiVB iVB9K(T#(ZB of the system prompt. In consequence,thi front-end may allow a more natural and comfortable imfortableT TheexperiBT#( also show that there-promptiV confirmati strategy enhances system performance for both recogniVT# front-ends.
Statistical Modelling in Continuous Speech Recognition (CSR)
- IN CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2001
"... Automatic continuous speech recognition (CSR) is sufficiently ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Automatic continuous speech recognition (CSR) is sufficiently

