Results 1 - 10
of
48
Moses: Open source toolkit for statistical machine translation
, 2007
"... We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolki ..."
Abstract
-
Cited by 434 (26 self)
- Add to MetaCart
We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks. 1
Morphology-Based Language Modeling for Arabic Speech Recognition
- In Proc. of ICSLP
, 2004
"... Language modeling is a difficult problem for languages with rich morphology. In this paper we investigate the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic. Classbased and single-stream factored language models using morphologica ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Language modeling is a difficult problem for languages with rich morphology. In this paper we investigate the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic. Classbased and single-stream factored language models using morphological word representations are applied within an N-best list rescoring framework. In addition, we explore the use of factored language models in first-pass recognition, which is facilitated by two novel procedures: the data-driven optimization of a multi-stream language model structure, and the conversion of a factored language model to a standard word-based model. We evaluate these techniques on a large-vocabulary recognition task and demonstrate that they lead to perplexity and word error rate reductions.
Improved language modeling for statistical machine translation
, 2005
"... Statistical machine translation systems use a combination of one or more translation models and a language model. While there is a significant body of research addressing the improvement of translation models, the problem of optimizing language models for a specific translation task has not received ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Statistical machine translation systems use a combination of one or more translation models and a language model. While there is a significant body of research addressing the improvement of translation models, the problem of optimizing language models for a specific translation task has not received much attention. Typically, standard word trigram models are used as an out-of-the-box component in a statistical machine translation system. In this paper we apply language modeling techniques that have proved beneficial in automatic speech recognition to the ACL05 machine translation shared data task and demonstrate improvements over a baseline system with a standard language model. 1
A joint language model with fine-grain syntactic tags
- In EMNLP
, 2009
"... We present a scalable joint language model designed to utilize fine-grain syntactic tags. We discuss challenges such a design faces and describe our solutions that scale well to large tagsets and corpora. We advocate the use of relatively simple tags that do not require deep linguistic knowledge of ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
We present a scalable joint language model designed to utilize fine-grain syntactic tags. We discuss challenges such a design faces and describe our solutions that scale well to large tagsets and corpora. We advocate the use of relatively simple tags that do not require deep linguistic knowledge of the language but provide more structural information than POS tags and can be derived from automatically generated parse trees – a combination of properties that allows easy adoption of this model for new languages. We propose two fine-grain tagsets and evaluate our model using these tags, as well as POS tags and SuperARV tags in a speech recognition task and discuss future directions. 1
Towards Broad Coverage Surface Realization with CCG
- In Proceedings of the Workshop on Using Corpora for NLG: Language Generation and Machine Translation (UCNLG+MT
, 2007
"... This paper reports on progress towards developing the first broad coverage English surface realizer for Combinatory Categorial Grammar (CCG). The paper provides initial automatic evaluation results which are roughly comparable to those reported with other formalisms when using a (nonblind) grammar d ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
This paper reports on progress towards developing the first broad coverage English surface realizer for Combinatory Categorial Grammar (CCG). The paper provides initial automatic evaluation results which are roughly comparable to those reported with other formalisms when using a (nonblind) grammar derived from the development section of the CCGbank; the results are worse, though still respectable, when using the standard dev/train/test splits, highlighting the need for better lexical smoothing and more focused search. The paper also shows that factored language models that interpolate word-level n-grams with n-grams over POS tags and supertags provide similar absolute performance improvements over word-level n-grams as have been observed with parsing-inspired log-linear models. 1
Recent innovations in speech-to-text transcription at sri-icsi-uw
- IEEE Transactions on Audio, Speech & Language Processing
, 2006
"... Abstract — We summarize recent progress in automatic speechto-text ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Abstract — We summarize recent progress in automatic speechto-text
Hypertagging: Supertagging for Surface Realization with CCG
"... In lexicalized grammatical formalisms, it is possible to separate lexical category assignment from the combinatory processes that make use of such categories, such as parsing and realization. We adapt techniques from supertagging — a relatively recent technique that performs complex lexical tagging ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
In lexicalized grammatical formalisms, it is possible to separate lexical category assignment from the combinatory processes that make use of such categories, such as parsing and realization. We adapt techniques from supertagging — a relatively recent technique that performs complex lexical tagging before full parsing (Bangalore and Joshi, 1999; Clark, 2002) — for chart realization in OpenCCG, an open-source NLP toolkit for CCG. We call this approach hypertagging, as it operates at a level “above ” the syntax, tagging semantic representations with syntactic lexical categories. Our results demonstrate that a hypertagger-informed chart realizer can achieve substantial improvements in realization speed (being approximately twice as fast) with superior realization quality.
A bayesian interpretation of interpolated kneserney
, 2006
"... Interpolated Kneser-Ney is one of the best smoothing methods for n-gram language models. Previous explanations for its superiority have been based on intuitive and empirical justifications of specific properties of the method. We propose a novel interpretation of interpolated Kneser-Ney as approxima ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Interpolated Kneser-Ney is one of the best smoothing methods for n-gram language models. Previous explanations for its superiority have been based on intuitive and empirical justifications of specific properties of the method. We propose a novel interpretation of interpolated Kneser-Ney as approximate inference in a hierarchical Bayesian model consisting of Pitman-Yor processes. As opposed to past explanations, our interpretation can recover exactly the formulation of interpolated Kneser-Ney, and performs better than interpolated Kneser-Ney when a better inference procedure is used. 1
Hierarchical Pitman-Yor language models for ASR in meetings
- In Proceedings of IEEE ASRU International Conference
, 2007
"... In this paper we investigate the application of a hierarchical Bayesian language model (LM) based on the Pitman-Yor process for automatic speech recognition (ASR) of multiparty meetings. The hierarchical Pitman-Yor language model (HPY-LM) provides a Bayesian interpretation of LM smoothing. An approx ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
In this paper we investigate the application of a hierarchical Bayesian language model (LM) based on the Pitman-Yor process for automatic speech recognition (ASR) of multiparty meetings. The hierarchical Pitman-Yor language model (HPY-LM) provides a Bayesian interpretation of LM smoothing. An approximation to the HPYLM recovers the exact formulation of the interpolated Kneser-Ney smoothing method in n-gram models. This paper focuses on the application and scalability of HPYLM on a practical large vocabulary ASR system. Experimental results on NIST RT06s evaluation meeting data verify that HPYLM is a competitive and promising language modeling technique, which consistently performs better than interpolated Kneser-Ney and modified Kneser-Ney n-gram LMs in terms of both perplexity and word error rate.
Phrase-based Statistical Language Generation using Graphical Models and Active Learning
"... Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits the ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data. 1

