Results 1 -
5 of
5
Modeling Out-Of-Vocabulary Words For Robust Speech Recognition
, 2000
"... This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognize ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance.
Rapid language model development for new task domains
- Proc. First International Conference on Language Resources and Evaluation (LREC
, 1998
"... Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The f ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The first technique is based on using a context-free grammar to generate a corpus of word collocations. The second is an adaptation technique based on using out-of-domain corpora to estimate target domain language models. We report results of successfully using these two techniques individually and in combination to build efficient models for a spontaneous speech recognition task in a medium-sized vocabulary domain. 1.
Estimation of Language Models for New Spoken Language Applications
, 1996
"... Spoken language interfaces can provide natural communication for many database retrieval tasks. The CMU ATIS system provides an example of accessing airline information using spoken natural language queries. However, a lot of training data is needed to develop a spoken language application. For exam ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Spoken language interfaces can provide natural communication for many database retrieval tasks. The CMU ATIS system provides an example of accessing airline information using spoken natural language queries. However, a lot of training data is needed to develop a spoken language application. For example, we need training data to generate a language model that can be used by the recognizer to reduce the search space. In this paper, we will address some issues arising from small amount of training data available for a new spoken language application.
Hierarchical Statistical Language Models: Experiments On In-Domain Adaptation
- PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP'2000)
, 2000
"... We introduce a hierarchical statistical language model, represented as a collection of local models plus a general sentence model. We provide an example that mixes a trigram general model and a PFSA local model for the class of decimal numbers, described in terms of sub-word units (graphemes). This ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We introduce a hierarchical statistical language model, represented as a collection of local models plus a general sentence model. We provide an example that mixes a trigram general model and a PFSA local model for the class of decimal numbers, described in terms of sub-word units (graphemes). This model practically extends the vocabulary of the overall model to an infinite size, but still has better performance compared to a word-based model. Using in-domain language model adaptation experiments, we show that local models can encode enough linguistic information, if well trained, that they may be ported to new language models without re-estimation.
Detection of Language (Model) Errors
"... The bigram language models are popular, in much language processing applications, in both Indo-European and Asian languages. ..."
Abstract
- Add to MetaCart
The bigram language models are popular, in much language processing applications, in both Indo-European and Asian languages.

