Results 1 - 10
of
18
Discriminative language modeling with conditional random fields and the perceptron algorithm
- In Proc. ACL
, 2004
"... This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. However, using the feature set output from the perceptron algorithm (initialized with their weights), CRF training provides an additional 0.5 % reduction in word error rate, for a total 1.8 % absolute reduction from the baseline of 39.2%. 1
A Segmental CRF Approach to Large Vocabulary Continuous Speech Recognition
"... Abstract—This paper proposes a segmental conditional random field framework for large vocabulary continuous speech recognition. Fundamental to this approach is the use of acoustic detectors as the basic input, and the automatic construction of a versatile set of segment-level features. The detector ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Abstract—This paper proposes a segmental conditional random field framework for large vocabulary continuous speech recognition. Fundamental to this approach is the use of acoustic detectors as the basic input, and the automatic construction of a versatile set of segment-level features. The detector streams operate at multiple time scales (frame, phone, multi-phone, syllable or word) and are combined at the word level in the CRF training and decoding processes. A key aspect of our approach is that features are defined at the word level, and are naturally geared to explain long span phenomena such as formant trajectories, duration, and syllable stress patterns. Generalization to unseen words is possible through the use of decomposable consistency features [1], [2], and our framework allows for the joint or separate discriminative training of the acoustic and language models. An initial evaluation of this framework with voice search data from the Bing Mobile (BM) application results in a 2 % absolute improvement over an HMM baseline. Index Terms—speech recognition, conditional random field, direct modeling, detector features I.
Improving Language Models by Learning from Speech Recognition Errors in a Reading Tutor That Listens
- In Proceedings of the Second International Conference on Applied Artificial Intelligence, Fort Panhala
, 2003
"... Lowering the perplexity of a language model does not always translate into higher speech recognition accuracy. Our goal is to improve language models by learning from speech recognition errors. In this paper we present an algorithm that first learns to predict which n–grams are likely to increase re ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Lowering the perplexity of a language model does not always translate into higher speech recognition accuracy. Our goal is to improve language models by learning from speech recognition errors. In this paper we present an algorithm that first learns to predict which n–grams are likely to increase recognition errors, and then uses that prediction to improve language models so that the errors are reduced. We show that our algorithm reduces a measure of tracking error by more than 24 % on unseen test data from a Reading Tutor that listens to children read aloud. 1.
Large-scale discriminative n-gram language models for statistical machine translation
- In Proceedings of AMTA
, 2008
"... We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systema ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We extend discriminative n-gram language modeling techniques originally proposed for automatic speech recognition to a statistical machine translation task. In this context, we propose a novel data selection method that leads to good models using a fraction of the training data. We carry out systematic experiments on several benchmark tests for Chinese to English translation using a hierarchical phrase-based machine translation system, and show that a discriminative language model significantly improves upon a state-of-the-art baseline. The experiments also highlight the benefits of our data selection method. 1
Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition
- in Proc. ICASSP’07
, 2007
"... Finite-state decoding graphs integrate the decision trees, pronunciation model and language model for speech recognition into a unified representation of the search space. We explore discriminative training of the transition weights in the decoding graph in the context of large vocabulary speech rec ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Finite-state decoding graphs integrate the decision trees, pronunciation model and language model for speech recognition into a unified representation of the search space. We explore discriminative training of the transition weights in the decoding graph in the context of large vocabulary speech recognition. In preliminary experiments on the RT-03 English Broadcast News evaluation set, the word error rate was reduced by about 5.7 % relative, from 23.0 % to 21.7%. We discuss how this method is particularly applicable to low-latency and low-resource applications such as real-time closed captioning of broadcast news and interactive speech-to-speech translation. Index Terms — Discriminative training, Finite-state decoding graph, Language model, Pronunciation model, Low-resource speech recognition.
EXPLOITING USER FEEDBACK FOR LANGUAGE MODEL ADAPTATION IN MEETING RECOGNITION
"... We investigate language model (LM) adaptation in a meeting recognition application, where the LM is adapted based on recognition output from relevant prior meetings and partial manual corrections. Unlike previous work, which has considered either completely unsupervised or supervised adaptation, we ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We investigate language model (LM) adaptation in a meeting recognition application, where the LM is adapted based on recognition output from relevant prior meetings and partial manual corrections. Unlike previous work, which has considered either completely unsupervised or supervised adaptation, we investigate a scenario where a human (e.g., a meeting participant) can correct some of the recognition mistakes. We find that recognition accuracy using the adapted LM can be enhanced substantially by partial correction. In particular, if all content words (about half of all recognition errors) are corrected, recognition improves to the same accuracy as if completely error-free (manually created) transcriptions had been used for adaptation. We also compare and combine a variety of adaptation methods, including linear interpolation, unigram marginal adaptation, and a discriminative method based on “positive ” and “negative” N-grams. Index Terms — speech processing, language modeling, meeting recognition, unsupervised adaptation, user feedback.
An overview of discriminative training for speech recognition
"... This paper gives an overview of discriminative training as it pertains to the speech recognition problem. The basic theory of discriminative training will be discussed and an explanation of maximum mutual information (MMI) given. Common problems inherent to discriminative training will be explored a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper gives an overview of discriminative training as it pertains to the speech recognition problem. The basic theory of discriminative training will be discussed and an explanation of maximum mutual information (MMI) given. Common problems inherent to discriminative training will be explored as well as practicalities associated with implementing discriminative training for large vocabulary recognition. Alternatives to the MMI objective function such as minimum word error (MWE) and minimum phone error (MPE) will be discussed. The application of discriminative techniques for adaptation will be described. Finally, possible future avenues of research will be given. 1.
LEVERAGING MULTIPLE QUERY LOGS TO IMPROVE LANGUAGE MODELS FOR SPOKEN QUERY RECOGNITION
"... A voice search system requires a speech interface that can correctly recognize spoken queries uttered by users. The recognition performance strongly relies on a robust language model. In this work, we present the use of multiple data sources, with the focus on query logs, in improving ASR language m ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A voice search system requires a speech interface that can correctly recognize spoken queries uttered by users. The recognition performance strongly relies on a robust language model. In this work, we present the use of multiple data sources, with the focus on query logs, in improving ASR language models for a voice search application. Our contributions are three folds: (1) the use of text queries from web search and mobile search in language modeling; (2) the use of web click data to predict query forms from business listing forms; and (3) the use of voice query logs in creating a positive feedback loop. Experiments show that by leveraging these resources, we can achieve recognition performance comparable to, or even better than, that of a previously deploy system where a large amount of spoken query transcripts are used in language modeling. Index Terms — language modeling, voice search, query log, click data
Discriminative Pruning of Language Models for Chinese Word Segmentation
"... This paper presents a discriminative pruning method of n-gram language model for Chinese word segmentation. To reduce the size of the language model that is used in a Chinese word segmentation system, importance of each bigram is computed in terms of discriminative pruning criterion that is related ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents a discriminative pruning method of n-gram language model for Chinese word segmentation. To reduce the size of the language model that is used in a Chinese word segmentation system, importance of each bigram is computed in terms of discriminative pruning criterion that is related to the performance loss caused by pruning the bigram. Then we propose a step-by-step growing algorithm to build the language model of desired size. Experimental results show that the discriminative pruning method leads to a much smaller model compared with the model pruned using the state-of-the-art method. At the same Chinese word segmentation F-measure, the number of bigrams in the model can be reduced by up to 90%. Correlation between language model perplexity and word segmentation performance is also discussed. 1
DISCRIMINATIVE TRAINING METHODS FOR LANGUAGE MODELS USING CONDITIONAL ENTROPY CRITERIA
"... This paper addresses the problem of discriminative training of language models that does not require any transcribed acoustic data. We propose to minimize the conditional entropy of word sequences given phone sequences, and present two settings in which this criterion can be applied. In an inductive ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper addresses the problem of discriminative training of language models that does not require any transcribed acoustic data. We propose to minimize the conditional entropy of word sequences given phone sequences, and present two settings in which this criterion can be applied. In an inductive learning setting, the phonetic/acoustic confusability information is given by a general phone error model. A transductive approach, in contrast, obtains that information by running a speech recognizer on test-set acoustics, with the goal of optimizing the test-set performance. Experiments show significant recognition accuracy improvements in both rescoring and first-pass decoding experiments using the transductive approach, and mixed results using the inductive approach. Index Terms — Discriminative training, language model, unsupervised training, conditional entropy.

