Results 1 -
4 of
4
K.: Corrective Models for Speech Recognition of Inflected Languages
- In: Proceedings of EMNLP 2006
, 2006
"... This paper presents a corrective model for speech recognition of inflected languages. The model, based on a discriminative framework, incorporates word n-grams features as well as factored morphological features, providing error reduction over the model based solely on word n-gram features. Experime ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This paper presents a corrective model for speech recognition of inflected languages. The model, based on a discriminative framework, incorporates word n-grams features as well as factored morphological features, providing error reduction over the model based solely on word n-gram features. Experiments on a large vocabulary task, namely the Czech portion of the MALACH corpus, demonstrate performance gain of about 1.1–1.5 % absolute in word error rate, wherein morphological features contribute about a third of the improvement. A simple feature selection mechanism based on χ 2 statistics is shown to be effective in reducing the number of features by about 70 % without any loss in performance, making it feasible to explore yet larger feature spaces. 1
Joint Morphological-Lexical Language Modeling for Processing Morphologically Rich Languages with Application to Dialectal Arabic
, 2007
"... Abstract — Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract — Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve.
Investigating the Use of Morphological Decomposition and Diacritization for Improving Arabic LVCSR
"... One of the challenges related to large vocabulary Arabic speech recognition is the rich morphology nature of Arabic language which leads to both high out-of-vocabulary (OOV) rates and high language model (LM) perplexities. Another challenge is the absence of the short vowels (diacritics) from the Ar ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
One of the challenges related to large vocabulary Arabic speech recognition is the rich morphology nature of Arabic language which leads to both high out-of-vocabulary (OOV) rates and high language model (LM) perplexities. Another challenge is the absence of the short vowels (diacritics) from the Arabic written transcripts which causes a large difference between spoken and written language and thus a weaker connection between the acoustic and language models. In this work, we try to address these two important challenges by introducing both morphological decomposition and diacritization in Arabic language modeling. Finally, we are able to obtain about 3.7 % relative reduction in word error rate (WER) with respect to a comparable non-diacritized full-words system running on our test set. Index Terms: speech recognition, morphological decomposition, diacritization, Arabic
Acoustic and Language Modeling for Czech ASR in MALACH
"... Automatic transcription of Czech testimonials in MALACH posses a unique problem and a new opportunity as documented extensively elsewhere [2]. Briefly, the problem of automatic transcription is made difficult because the speech is largely from older speakers, often speaking with heavy accent on an e ..."
Abstract
- Add to MetaCart
Automatic transcription of Czech testimonials in MALACH posses a unique problem and a new opportunity as documented extensively elsewhere [2]. Briefly, the problem of automatic transcription is made difficult because the speech is largely from older speakers, often speaking with heavy accent on an emotionally charged topic. The corpus, however, provides a challenge which reflects

