Results 1 -
9 of
9
Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures
- Proc. HLT-NAACL 2003
, 2003
"... Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger perfor ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.
Distributed language modeling for N-best list re-ranking
- in Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
, 2006
"... In this paper we describe a novel distributed language model for N-best list re-ranking. The model is based on the client/server paradigm where each server hosts a portion of the data and provides information to the client. This model allows for using an arbitrarily large corpus in a very efficient ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this paper we describe a novel distributed language model for N-best list re-ranking. The model is based on the client/server paradigm where each server hosts a portion of the data and provides information to the client. This model allows for using an arbitrarily large corpus in a very efficient way. It also provides a natural platform for relevance weighting and selection. We applied this model on a 2.97 billion-word corpus and re-ranked the N-best list from Hiero, a state-of-theart phrase-based system. Using BLEU as a metric, the re-ranked translation achieves a relative improvement of 4.8%, significantly better than the model-best translation. 1
Automatic induction of language model data for a spoken dialogue system
- In Proceedings of SIGDIAL
, 2005
"... When building a new spoken dialogue application, large amounts of domain specific data are required. This paper addresses the issue of generating in-domain training data when little or no real user data are available. The twostage approach taken begins with a data induction phase whereby linguistic ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
When building a new spoken dialogue application, large amounts of domain specific data are required. This paper addresses the issue of generating in-domain training data when little or no real user data are available. The twostage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. The second stage involves sampling the synthetic corpus towards the goal of obtaining data that would be representative of the statistics of applicationspecific real user interactions. The sampling methods proposed employ an example-based generation framework, a simulated user model and information extracted from development data. Evaluation is conducted on recognition performance in a restaurant information domain. We show that word error rate can be reduced when limited amounts of real user training data are augmented with synthetic data derived by our methods. 1
Text normalization with varied data sources for conversational speech language modeling
- In Proc. ICASSP
, 2002
"... Collecting sufficient language model training data for good speech recognition performance in a new domain is often difficult. However, there may be other sources of data that are matched in terms of topic or style, if not both. This paper looks at the use of text normalization tools to make these d ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Collecting sufficient language model training data for good speech recognition performance in a new domain is often difficult. However, there may be other sources of data that are matched in terms of topic or style, if not both. This paper looks at the use of text normalization tools to make these data more suitable for language model training, in conjunction with mixture models to combine data from different sources. We specifically address the task of recognizing meeting speech, showing a small reduction in word error rate over a baseline language model trained from conversational speech data. 1.
Conversational Telephone Speech Recognition
- IN INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING
"... This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are requir ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve state-of-theart performance on conversational speech. Some major changes on the acoustic side include the use of speaker normalization (VTLN), the need to cope with channel variability, and the need for efficient speaker adaptation and better pronunciation modeling. On the linguistic side the primary challenge is to cope with the limited amount of language model training data. To address this issue we make use of a data selection technique, and a smoothing technique based on a neural network language model. At the decoding level lattice rescoring and minimum word error decoding are applied. On the development data, the improvements yield an overall word error rate of 24.9% whereas the original BN transcription system had a word error rate of about 50% on the same data.
Mori: Vocabulary and Language Model Adaptation using Information Retrieval
- In Proceedings of the ECIR-2003
"... The goal of vocabulary optimization is to construct a vocabulary with exactly those words that are the most likely to appear in the test data. We will present a new approach to reduce the out-of-vocabulary (OOV) rate by adapting the vocabulary model during the ASR process. This method can also be us ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The goal of vocabulary optimization is to construct a vocabulary with exactly those words that are the most likely to appear in the test data. We will present a new approach to reduce the out-of-vocabulary (OOV) rate by adapting the vocabulary model during the ASR process. This method can also be used for the statistical language model (SLM) adaptation. An information retrieval system is used after the first pass of the ASR system to obtain a set of relevant documents. These documents are then used to generate the new vocabulary and/or corpus. In this paper, we propose a new retrieving method welladapted for this purpose. Experiments were carried out on French with a 28 % OOV rate reduction. Experiments were also carried out on English for the SLM adaptation, with 7.9 % perplexity reduction, and minor WER improvement. 1.
Class-dependent Interpolation for Estimating Language Models from Multiple Text Sources
, 2003
"... Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger perf ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.
Bootstrapping Spoken Dialog Systems with Data Reuse
"... Building natural language spoken dialog systems requires large amounts of human transcribed and labeled speech utterances to reach useful operational service performances. Furthermore, the design of such complex systems consists of several manual steps. The User Experience (UE) expert analyzes ..."
Abstract
- Add to MetaCart
Building natural language spoken dialog systems requires large amounts of human transcribed and labeled speech utterances to reach useful operational service performances. Furthermore, the design of such complex systems consists of several manual steps. The User Experience (UE) expert analyzes and defines by hand the system core functionalities: the system semantic scope (call-types) and the dialog manager strategy which will drive the human-machine interaction. This approach is extensive and error prone since it involves several non-trivial design decisions that can only be evaluated after the actual system deployment. Moreover,
Structured Language Models for . . .
, 2009
"... Language model plays an important role in statistical machine translation systems. It is the key knowledge source to determine the right word order of the translation. Standard n-gram based language model predicts the next word based on the n − 1 immediate left context. Increasing the order of n and ..."
Abstract
- Add to MetaCart
Language model plays an important role in statistical machine translation systems. It is the key knowledge source to determine the right word order of the translation. Standard n-gram based language model predicts the next word based on the n − 1 immediate left context. Increasing the order of n and the size of the training data improves the performance of the LM as shown by the suffix array language model and distributed language model systems. However, such improvements narrow down very fast after n reaches 6. To improve the n-gram language model, we also developed dynamic n-gram language model adaptation and discriminative language model to tackle issues with the standard n-gram language models and observed improvements in the translation qualities. The fact is that human beings do not reuse long n-grams to create new sentences. Rather, we reuse the structure (grammar) and replace constituents to construct new sentences. Structured language model tries to model the structural information in natural language, especially the long-distance dependencies in a probabilistic framework. However, exploring and using structural information is computationally expensive, as the number of possible structures for a sentence is very large even with the constraint of a grammar. It is difficult to apply parsers on data that is different from the training data of the treebank and parsers are usually hard to scale up. In this

