Results 1 - 10
of
17
Recent innovations in speech-to-text transcription at sri-icsi-uw
- IEEE Transactions on Audio, Speech & Language Processing
, 2006
"... Abstract — We summarize recent progress in automatic speechto-text ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Abstract — We summarize recent progress in automatic speechto-text
POS Tagging of Dialectal Arabic: A Minimally Supervised Approach
"... Natural language processing technology for the dialects of Arabic is still in its infancy, due to the problem of obtaining large amounts of text data for spoken Arabic. In this paper ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Natural language processing technology for the dialects of Arabic is still in its infancy, due to the problem of obtaining large amounts of text data for spoken Arabic. In this paper
Morpheme-based language modeling for Arabic LVCSR
- In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing
, 2006
"... In this paper, we concentrate on Arabic speech recognition. Taking advantage of the rich morphological structure of the language, we use morpheme-based language modeling to improve the word error rate. We propose a simple constraining method to rid the decoding output of illegal morpheme sequences. ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper, we concentrate on Arabic speech recognition. Taking advantage of the rich morphological structure of the language, we use morpheme-based language modeling to improve the word error rate. We propose a simple constraining method to rid the decoding output of illegal morpheme sequences. We report the results obtained for word and morpheme language models using medium (<64kw) and large (∼800kw) vocabularies, the morpheme LM obtaining an absolute improvement of 2.4 % for the former and only 0.2% for the latter. The 2.4 % gain surpasses previous gains for morpheme-based LMs for Arabic, and the large vocabulary runs represent the first comparative results for vocabularies of this size for any language. Finally, we analyze the performance of the morpheme LM on word OOV’s. 1.
Morph-Based Speech Recognition and Modeling of Out-of-Vocabulary Words Across Languages
"... We explore the use of morph-based language models in large-vocabulary continuous speech recognition systems across four so-called “morphologically rich ” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are subword units discovered in an unsupervised, data-driven way ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We explore the use of morph-based language models in large-vocabulary continuous speech recognition systems across four so-called “morphologically rich ” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are subword units discovered in an unsupervised, data-driven way using the Morfessor algorithm. By estimating n-gram language models over sequences of morphs instead of words, the quality of the language model is improved through better vocabulary coverage and reduced data sparsity. Standard word models suffer from high out-of-vocabulary (OOV) rates, whereas the morph models can recognize previously unseen word forms by concatenating morphs. It is shown that the morph models do perform fairly well on OOVs without compromising the recognition accuracy on in-vocabulary words. The Arabic experiment constitutes the only exception, since here the standard word model outperforms the morph model. Differences in the data sets and the amount of data are discussed as a plausible explanation.
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, an ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1
Use of hidden markov models and factored language models for automatic chord recognition
- in Proc. ISMIR
, 2009
"... This paper focuses on automatic extraction of acoustic chord sequences from a musical piece. Standard and factored language models are analyzed in terms of applicability to the chord recognition task. Pitch class profile vectors that represent harmonic information are extracted from the given audio ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper focuses on automatic extraction of acoustic chord sequences from a musical piece. Standard and factored language models are analyzed in terms of applicability to the chord recognition task. Pitch class profile vectors that represent harmonic information are extracted from the given audio signal. The resulting chord sequence is obtained by running a Viterbi decoder on trained hidden Markov models and subsequent lattice rescoring, applying the language model weight. We performed several experiments using the proposed technique. Results obtained on 175 manually-labeled songs provided an increase in accuracy of about 2%. 1.
Joint Morphological-Lexical Language Modeling for Processing Morphologically Rich Languages with Application to Dialectal Arabic
, 2007
"... Abstract — Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract — Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve.
Development of a conversational telephone speech recognizer for Levantine Arabic
- in Proc. Interspeech
, 2005
"... Many languages, including Arabic, are characterized by a wide variety of different dialects that often differ strongly from each other. When developing speech technology for dialect-rich languages, the portability and reusability of data, algorithms, and system components becomes extremely important ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Many languages, including Arabic, are characterized by a wide variety of different dialects that often differ strongly from each other. When developing speech technology for dialect-rich languages, the portability and reusability of data, algorithms, and system components becomes extremely important. In this paper, we describe the development of a large-vocabulary speech recognition system for Levantine Arabic, which was a new dialectal recognition task for our existing system. We discuss the dialect-specific modeling choices (grapheme vs. phoneme based acoustic models, automatic vowelization techniques, and morphological language models) and investigate to what extent techniques previously tested on other languages are portable to the present task. We present stateof-the-art
Factored neural language models
- In HLT-NAACL
, 2006
"... We present a new type of neural probabilistic language model that learns a mapping from both words and explicit word features into a continuous space that is then used for word prediction. Additionally, we investigate several ways of deriving continuous word representations for unknown words from th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a new type of neural probabilistic language model that learns a mapping from both words and explicit word features into a continuous space that is then used for word prediction. Additionally, we investigate several ways of deriving continuous word representations for unknown words from those of known words. The resulting model significantly reduces perplexity on sparse-data tasks when compared to standard backoff models, standard neural language models, and factored language models. 1
Generative and Discriminative Methods using Morphological Information for Sentence Segmentation of Turkish
"... This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish utterances into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turk ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish utterances into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turkish has a productive morphology that generates an exponential vocabulary size, harming language models such as the established hidden event language model (HELM). We extend this model as a factored hidden event language model (fHELM) in order to take advantage of morphologically informed features in addition to the word sequence. Our results indicate that fHELMs result in a 26 % reduction in error rate for Turkish broadcast news. Combining lexical, morphological, and prosodic information using these new models and discriminative classifiers (boosting and conditional random fields) results in significant performance improvements over any of the classifiers alone.

