Results 1 -
2 of
2
Active Learning for acoustic speech recognition modeling
, 2004
"... In this work, we investigate a machine learning approach to cost-effectively train acoustic models for speech recognition. More specifically, we utilize an active learning method that allows the system/learner to exert control over what new data is introduced into training, allowing us to selectivel ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this work, we investigate a machine learning approach to cost-effectively train acoustic models for speech recognition. More specifically, we utilize an active learning method that allows the system/learner to exert control over what new data is introduced into training, allowing us to selectively invest in the resources necessary to provide the truth labels required to train the models. We propose a two-pronged approach to improve speech recognition performance through the selective use of training data. First, we make effective use of the available transcribed data by selectively using only those examples that are likely to improve system performance. And second, we focus future transcription effort on data that has the biggest potential to improve performance. Our approach has the capability to select a set of data from which to build a recognition system that outperforms a system built on larger, but randomly selected, data. We start our investigation of our proposed data-selective methods by using a simple alphadigit recognition problem. We demonstrate both a model-selective and a sequence-selective approach appropriate for situations when whole words are modeled independently of
LANGUAGE MODEL TRANSFORMATION APPLIED TO LIGHTLY SUPERVISED TRAINING OF ACOUSTIC MODEL FOR CONGRESS MEETINGS
"... For effective training of acoustic and language models for spontaneous speech such as meetings, it is significant to exploit the texts available in a large scale, which may not be faithful transcripts of the utterances. We have proposed a language model transformation scheme to cope with the differe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
For effective training of acoustic and language models for spontaneous speech such as meetings, it is significant to exploit the texts available in a large scale, which may not be faithful transcripts of the utterances. We have proposed a language model transformation scheme to cope with the differences between verbatim transcripts of spontaneous utterances and human-made transcripts such as those in proceedings. In this paper, we investigate its application to lightly supervised training of the acoustic model. By transforming the corresponding text in the proceedings, we can generate a very constrained model to predict the actual utterances. The experimental evaluation with the transcription system for the Japanese Congress meetings demonstrated that the proposed scheme can generate accurate labels for acoustic model training and thus realizes the comparable ASR (Automatic Speech Recognition) performance to the case using manual transcripts. Index Terms — speech recognition, language model, acoustic model, lightly supervised training

