Results 1 - 10
of
10
Iterative language model estimation: Efficient data structure & algorithms
- in Proc. Interspeech
, 2008
"... Despite the availability of better performing techniques, most language models are trained using popular toolkits that do not support perplexity optimization. In this work, we present an efficient data structure and optimized algorithms specifically designed for iterative parameter tuning. With the ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Despite the availability of better performing techniques, most language models are trained using popular toolkits that do not support perplexity optimization. In this work, we present an efficient data structure and optimized algorithms specifically designed for iterative parameter tuning. With the resulting implementation, we demonstrate the feasibility and effectiveness of such iterative techniques in language model estimation. Index Terms: language modeling, smoothing, interpolation 1.
Generalized linear interpolation of language models
- IEEE Workshop on ASRU
, 2007
"... Despite the prevalent use of model combination techniques to improve speech recognition performance on domains with limited data, little prior research has focused on the choice of the actual interpolation model. For merging language models, the most popular approach has been the simple linear inter ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Despite the prevalent use of model combination techniques to improve speech recognition performance on domains with limited data, little prior research has focused on the choice of the actual interpolation model. For merging language models, the most popular approach has been the simple linear interpolation. In this work, we propose a generalization of linear interpolation that computes context-dependent mixture weights from arbitrary features. Results on a lecture transcription task yield up to a 1.0 % absolute improvement in recognition word error rate (WER). Index Terms — Language modeling, interpolation, adaptation, mixture models
Automatic prediction of parser accuracy
- In EMNLP
, 2008
"... Statistical parsers have become increasingly accurate, to the point where they are useful in many natural language applications. However, estimating parsing accuracy on a wide variety of domains and genres is still a challenge in the absence of gold-standard parse trees. In this paper, we propose a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Statistical parsers have become increasingly accurate, to the point where they are useful in many natural language applications. However, estimating parsing accuracy on a wide variety of domains and genres is still a challenge in the absence of gold-standard parse trees. In this paper, we propose a technique that automatically takes into account certain characteristics of the domains of interest, and accurately predicts parser performance on data from these new domains. As a result, we have a cheap (no annotation involved) and effective recipe for measuring the performance of a statistical parser on any given domain. 1
A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation
"... In this paper we present a doubly hierarchical Pitman-Yor process language model. Its bottom layer of hierarchy consists of multiple hierarchical Pitman-Yor process language models, one each for some number of domains. The novel top layer of hierarchy consists of a mechanism to couple together multi ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we present a doubly hierarchical Pitman-Yor process language model. Its bottom layer of hierarchy consists of multiple hierarchical Pitman-Yor process language models, one each for some number of domains. The novel top layer of hierarchy consists of a mechanism to couple together multiple language models such that they share statistical strength. Intuitively this sharing results in the “adaptation ” of a latent shared language model to each domain. We introduce a general formalism capable of describing the overall model which we call the graphical Pitman-Yor process and explain how to perform Bayesian inference in it. We present encouraging language model domain adaptation results that both illustrate the potential benefits of our new model and suggest new avenues of inquiry. 1
N-gram weighting: Reducing training data mismatch in cross-domain language model estimation
- in Proc. EMNLP
, 2008
"... In domains with insufficient matched training data, language models are often constructed by interpolating component models trained from partially matched corpora. Since the n-grams from such corpora may not be of equal relevance to the target domain, we propose an n-gram weighting technique to adju ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In domains with insufficient matched training data, language models are often constructed by interpolating component models trained from partially matched corpora. Since the n-grams from such corpora may not be of equal relevance to the target domain, we propose an n-gram weighting technique to adjust the component n-gram probabilities based on features derived from readily available segmentation and metadata information for each corpus. Using a log-linear combination of such features, the resulting model achieves up to a 1.2 % absolute word error rate reduction over a linearly interpolated baseline language model on a lecture transcription task. 1
AUTOMATICALLY DERIVED SPOKEN LANGUAGE MARKERS FOR DETECTING MILD COGNITIVE IMPAIRMENT
"... Speech produced by subjects during neuropsychological exams can provide markers other than test performance, via spoken language characteristics that discriminate between subject groups. We present preliminary results on the utility of such markers, automatically derived from spoken responses to nar ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Speech produced by subjects during neuropsychological exams can provide markers other than test performance, via spoken language characteristics that discriminate between subject groups. We present preliminary results on the utility of such markers, automatically derived from spoken responses to narrative recall tests, in discriminating between healthy elderly and subjects with Mild Cognitive Impairment (MCI). Given the audio and transcript of the retellings, a range of markers were automatically derived, including (among others) pause frequency and grammatical complexity. Certain spoken language derived markers, which do not measure the fidelity of the retelling to the original narrative, show statistically significant differences between the group means, when calculated either manually or automatically. 1.
Language Model Adaptation for a Speech to Sign Language Translation System using Web Frequencies and a MAP Framework
"... This paper presents a successful technique for creating a new language model (LM) that adapts the original target LM used by a machine translation (MT) system. This technique is especially useful for situations where there are very scarce resources for training the target side (Spanish Sign Language ..."
Abstract
- Add to MetaCart
This paper presents a successful technique for creating a new language model (LM) that adapts the original target LM used by a machine translation (MT) system. This technique is especially useful for situations where there are very scarce resources for training the target side (Spanish Sign Language (LSE) in our case) in order to properly estimate the target LM, the Sign Language Model (SLM), used by the MT system. The technique uses information from the source language, Spanish in our task, and from the phrase-based translation matrix in order to create a new LM, estimated using web frequencies, which adapts the counts of the SLM through the Maximum A Posteriori method (MAP). The corpus consists of common used sentences spoken by an officer when assisting people in applying for, or renewing, the National Identification Document. The proposed technique allows relative reductions of 15.5 % on perplexity and 2.7% on WER for translation, which are close to half the maximum performance obtainable when only the LM is optimized. Index Terms: language model adaptation, machine translation, sign language, web counts.
An Empirical Investigation of Discounting in Cross-Domain Language Models
"... We investigate the empirical behavior of n-gram discounts within and across domains. When a language model is trained and evaluated on two corpora from exactly the same domain, discounts are roughly constant, matching the assumptions of modified Kneser-Ney LMs. However, when training and test corpor ..."
Abstract
- Add to MetaCart
We investigate the empirical behavior of n-gram discounts within and across domains. When a language model is trained and evaluated on two corpora from exactly the same domain, discounts are roughly constant, matching the assumptions of modified Kneser-Ney LMs. However, when training and test corpora diverge, the empirical discount grows essentially as a linear function of the n-gram count. We adapt a Kneser-Ney language model to incorporate such growing discounts, resulting in perplexity improvements over modified Kneser-Ney and Jelinek-Mercer baselines. 1
Adapting Translation Models to Translationese Improves SMT
"... Translation models used for statistical machine translation are compiled from parallel corpora; such corpora are manually translated, but the direction of translation is usually unknown, and is consequently ignored. However, much research in Translation Studies indicates that the direction of transl ..."
Abstract
- Add to MetaCart
Translation models used for statistical machine translation are compiled from parallel corpora; such corpora are manually translated, but the direction of translation is usually unknown, and is consequently ignored. However, much research in Translation Studies indicates that the direction of translation matters, as translated language (translationese) has many unique properties. Specifically, phrase tables constructed from parallel corpora translated in the same direction as the translation task perform better than ones constructed from corpora translated in the opposite direction. We reconfirm that this is indeed the case, but emphasize the importance of using also texts translated in the ‘wrong ’ direction. We take advantage of information pertaining to the direction of translation in constructing phrase tables, by adapting the translation model to the special properties of translationese. We define entropybased measures that estimate the correspondence of target-language phrases to translationese, thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation. We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent, statistically significant improvement in the quality of the translation.
unknown title
, 2009
"... This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or sel ..."
Abstract
- Add to MetaCart
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright Author's personal copy Available online at www.sciencedirect.com

