Results 21 - 30
of
70
A Category Based Approach for Recognition of Out-of-Vocabulary Words
- In Int. Conf. on Spoken Language Processing
, 1996
"... In almost all applications of automatic speech recognition, especially in spontaneous speech tasks, the recognizer vocabulary cannot cover all occurring words. There is always a significant amount of out-of-vocabulary words even when the vocabulary size is very large. In this paper we present a new ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
In almost all applications of automatic speech recognition, especially in spontaneous speech tasks, the recognizer vocabulary cannot cover all occurring words. There is always a significant amount of out-of-vocabulary words even when the vocabulary size is very large. In this paper we present a new approach for the integration of out-of-vocabulary words into statistical language models. We use category information for all words in the training corpus to define a function that gives an approximation of the out-of-vocabulary word emission probability for each word category. This information is integrated into the language models. Although we use a simple acoustic model for out-of-vocabulary words, we achieve a 6% reduction of word error rate on spontaneous speech data with about 5% out-of-vocabulary rate.
POS Tagging versus Classes in Language Modeling
, 1998
"... Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. The use of POS t ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. The use of POS tags allows more sophisticated generalizations than are afforded by using a class-based approach. Furthermore, if we want to incorporate speech repair and intonational phrase modeling into the language model, using POS tags rather than classes gives .bet- ter performance in this task.
Improving Statistical Natural Language Translation with Categories and Rules
- in Proc. of the 35th Annual Conf. of the Association for Computational Linguistics and the 17th Int. Conf. on Computational Linguistics
, 1998
"... This paper describes an all level approach on statistical natural language translation (SNLT). Without any predefined knowledge the system learns a statistical translation lexicon (STL), word classes (WCs) and translation rules (TRs) from a parallel corpus thereby producing a gen-eralized form of a ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
This paper describes an all level approach on statistical natural language translation (SNLT). Without any predefined knowledge the system learns a statistical translation lexicon (STL), word classes (WCs) and translation rules (TRs) from a parallel corpus thereby producing a gen-eralized form of a word alignment (WA). The translation process itself is realized as a beam search. In our method example-based tech-niques enter an overall statistical approach lead-ing to about 50 percent correctly translated sentences applied to the very ditficult English-German VERBMOBIL spontaneous speech cor-pus. 1
Ergodic Hidden Markov Models And Polygrams For Language Modeling
- In Proc. Int. Conf. on Acoustics, Speech and Signal Processing
, 1994
"... In this paper we present two new techniques for language modeling in speech recognition. The first technique is based on ergodic discrete density Hidden Markov Models (HMM) which can be applied to bigrams based on word categories. This statistical approach of the so-called Markov bigrams enables an ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
In this paper we present two new techniques for language modeling in speech recognition. The first technique is based on ergodic discrete density Hidden Markov Models (HMM) which can be applied to bigrams based on word categories. This statistical approach of the so-called Markov bigrams enables an efficient unsupervised learning procedure for the bigram probabilities with the well-known Baum-Welch algorithm. Furthermore, maximizing the model-conditional probability is equivalent to minimizing the perplexity of the training corpus. The second technique is based on polygrams which are an extension of the bigram (n = 2) or trigram (n = 3) grammars to any possible value of n. According to the smoothing techniques for bigram or trigram models, the probabilities of the n-grams in the polygram model are interpolated using the relative frequencies of all n 0 -grams with n 0 n. Both techniques were evaluated on the ATIS corpus by computing the test set perplexity. Furthermore we integr...
Correction of Disfluencies in Spontaneous Speech using a Noisy-Channel Approach
- in Proceedings of the 8th Eurospeech Conference
, 2003
"... In this paper we present a system which automatically corrects disfluencies such as repairs and restarts typically occurring in spontaneously spoken speech. The system is based on a noisy-channel model and its development requires no linguistic knowledge, but only annotated texts. Therefore, it has ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
In this paper we present a system which automatically corrects disfluencies such as repairs and restarts typically occurring in spontaneously spoken speech. The system is based on a noisy-channel model and its development requires no linguistic knowledge, but only annotated texts. Therefore, it has large potential for rapid deployment and the adaptation to new target languages. The experiments were conducted on spontaneously spoken dialogs from the English VERBMOBIL corpus where a recall of 77.2% and a precision of 90.2% was obtained. To demonstrate the feasibility of rapid adaptation additional experiments on the spontaneous Mandarin Chinese CallHome corpus were performed achieving 49.4% recall and 76.8% precision.
Automatic Transcription of Conversational Telephone Speech - Development of the CU-HTK 2002 System
- IEEE Transactions on Acoustics, Speech and Signal Processing
, 2003
"... This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modelling and model training, language and pronunciation modelling are pre ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modelling and model training, language and pronunciation modelling are presented. These include the use of conversation side based cepstral normalisation, vocal tract length normalisation, heteroscedastic linear discriminant analysis for feature projection, Minimum Phone Error Training and speaker adaptive training, latticebased model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation and class based language models.
Category-Based Statistical Language Models
, 1997
"... this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams. ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams.
POS Tags and Decision Trees for Language Modeling
- IN PROCEEDINGS OF THE JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA
, 1999
"... Language model's for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. To use POS tags ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Language model's for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. To use POS tags effectively, we use clustering and decision tree algorithms, which allow generalizations between POS tags and words to be effectively used in estimating the probability distributions. We show that our POS model gives.a reduction in word error rate and perplexity for the Trains corpus in comparison to word and class-based approaches. By using the Wall Street Journal corpus, we show that this approach scales up when more training data is available.
Unsupervised language model adaptation for meeting recognition
- in Proc. ICASSP
, 2007
"... We present an application of unsupervised language model (LM) adaptation to meeting recognition, in a scenario where sequences of multiparty meetings on related topics are to be recognized, but no prior in-domain data for LM training is available. The recognizer LMs are adapted according to the reco ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
We present an application of unsupervised language model (LM) adaptation to meeting recognition, in a scenario where sequences of multiparty meetings on related topics are to be recognized, but no prior in-domain data for LM training is available. The recognizer LMs are adapted according to the recognition output on temporally preceding meetings, either in speaker-dependent or speakerindependent mode. Model adaptation is carried out by interpolating then-gram probabilities of a large generic LM with those of a small LM estimated from the adaptation data, and minimizing perplexity on the automatic transcripts of a separate meeting set, also previously recognized. The adapted LMs yield about 5-9 % relative reduction in word error compared to the baseline. This improvement is about half of what can be achieved with supervised adaptation, i.e., using human-generated speech transcripts. Index Terms — speech processing, language modeling, meeting recognition, unsupervised adaptation
Hierarchical probabilistic neural network language model
- AISTATS’05
, 2005
"... In recent years, variants of a neural network architecture for statistical language modeling have been proposed and successfully applied, e.g. in the language modeling component of speech recognizers. The main advantage of these architectures is that they learn an embedding for words (or other symbo ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
In recent years, variants of a neural network architecture for statistical language modeling have been proposed and successfully applied, e.g. in the language modeling component of speech recognizers. The main advantage of these architectures is that they learn an embedding for words (or other symbols) in a continuous space that helps to smooth the language model and provide good generalization even when the number of training examples is insufficient. However, these models are extremely slow in comparison to the more commonly used n-gram models, both for training and recognition. As an alternative to an importance sampling method proposed to speed-up training, we introduce a hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition. The hierarchical decomposition is a binary hierarchical clustering constrained by the prior knowledge extracted from the WordNet semantic hierarchy.

