Results 11 - 20
of
35
An Unsupervised Model for Joint Phrase Alignment and Extraction
"... We present an unsupervised model for joint phrase alignment and extraction using nonparametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memori ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present an unsupervised model for joint phrase alignment and extraction using nonparametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size. 1
Handling phrase reorderings for machine translation
"... We propose a distance phrase reordering model (DPR) for statistical machine translation (SMT), where the aim is to capture phrase reorderings using a structure learning framework. On both the reordering classification and a Chinese-to-English translation task, we show improved performance over a bas ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We propose a distance phrase reordering model (DPR) for statistical machine translation (SMT), where the aim is to capture phrase reorderings using a structure learning framework. On both the reordering classification and a Chinese-to-English translation task, we show improved performance over a baseline SMT system. 1
Automatic Sentence Structure Annotation for Spoken Language Processing
, 2008
"... Increasing amounts of easily available electronic data are precipitating a need for automatic processing
that can aid humans in digesting large amounts of data. Speech and video are becoming
an increasingly significant portion of on-line information, from news and television broadcasts, to
oral hist ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Increasing amounts of easily available electronic data are precipitating a need for automatic processing
that can aid humans in digesting large amounts of data. Speech and video are becoming
an increasingly significant portion of on-line information, from news and television broadcasts, to
oral histories, on-line lectures, or user generated content. Automatic processing of audio and video
sources requires automatic speech recognition (ASR) in order to provide transcripts. Typical ASR
generates only words, without punctuation, capitalization, or further structure. Many techniques
available from natural language processing therefore suffer when applied to speech recognition output,
because they assume the presence of reliable punctuation and structure. In addition, errors from
automatic transcription also degrade the performance of downstream processing such as machine
translation, name detection, or information retrieval. We develop approaches for automatically
annotating structure in speech, including sentence and sub-sentence segmentation, and then turn
towards optimizing ASR and annotation for downstream applications.
The University of Edinburgh System Description for IWSLT 2007
"... We present the University of Edinburgh’s submission for the IWSLT 2007 shared task. Our efforts focused on adapting our statistical machine translation system to the open data conditions for the Italian-English task of the evaluation campaign. We examine the challenges of building a system with a li ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present the University of Edinburgh’s submission for the IWSLT 2007 shared task. Our efforts focused on adapting our statistical machine translation system to the open data conditions for the Italian-English task of the evaluation campaign. We examine the challenges of building a system with a limited set of in-domain development data (SITAL), a small training corpus in a related but distinct domain (BTEC), and a large out of domain corpus (Europarl). We concentrated on the corrected text track, and present additional results of our experiments using the open-source Moses MT system with speech input. 1.
OpenMaTrEx: A Free/Open-Source Marker-Driven Example-Based Machine Translation System
"... Abstract. We describe OpenMaTrEx, a free/open-source examplebased machine translation (EBMT) system based on the marker hypothesis, comprising a marker-driven chunker, a collection of chunk aligners, and two engines: one based on a simple proof-of-concept monotone EBMT recombinator and a Moses-based ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We describe OpenMaTrEx, a free/open-source examplebased machine translation (EBMT) system based on the marker hypothesis, comprising a marker-driven chunker, a collection of chunk aligners, and two engines: one based on a simple proof-of-concept monotone EBMT recombinator and a Moses-based statistical decoder. OpenMa-TrEx is a free/open-source release of the basic components of MaTrEx, the Dublin City University machine translation system.
Linguistically Annotated Reordering: Evaluation and Analysis
"... Linguistic knowledge plays an important role on phrase movement in statistical machine translation. To efficiently incorporate linguistic knowledge into phrase reordering, we propose a new approach: Linguistically Annotated Reordering (LAR). In LAR, we build hard hierarchical skeletons and inject so ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Linguistic knowledge plays an important role on phrase movement in statistical machine translation. To efficiently incorporate linguistic knowledge into phrase reordering, we propose a new approach: Linguistically Annotated Reordering (LAR). In LAR, we build hard hierarchical skeletons and inject soft linguistic knowledge from source parse trees to nodes of hard skeletons during translation. The experimental results on large-scale training data show that LAR is comparable with boundary word based reordering (BWR) (Xiong, Liu, and Lin 2006) which is a very competitive lexicalized reordering approach. When combined with BWR, LAR provides complementary information for phrase reordering, which collectively improves the BLEU score significantly. To further understand the contribution of linguistic knowledge in LAR to phrase reordering, we introduce a syntax-based analysis method to automatically detect constituent movement in both reference and system translations, and summarize syntactic reordering patterns that are captured by reordering models. With the proposed analysis method, we conduct a comparative analysis that not only provides the insight into how linguistic knowledge affects phrase movement but also reveals new challenges in phrase reordering. 1.
Source-side Dependency Tree Reordering Models with Subtree Movements and Constraints
"... We propose a novel source-side dependency tree reordering model for statistical machine translation, in which subtree movements and constraints are represented as reordering events associated with the widely used lexicalized reordering models. This model allows us to not only efficiently capture the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We propose a novel source-side dependency tree reordering model for statistical machine translation, in which subtree movements and constraints are represented as reordering events associated with the widely used lexicalized reordering models. This model allows us to not only efficiently capture the statistical distribution of the subtree-to-subtree transitions in training data, but also utilize it directly at the decoding time to guide the search process. Using subtree movements and constraints as features in a log-linear model, we are able to help the reordering models make better selections. It also allows the subtle importance of monolingual syntactic movements to be learned alongside other reordering features. We show improvements in translation quality in English→Spanish and English→Iraqi translation tasks. 1
MACHINE TRANSLATION BY PATTERN MATCHING
, 2008
"... The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amoun ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. Callison-Burch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call translation by pattern matching, which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively. The main
Proceedings of the Workshop on Statistical Machine Translation, pages 154--157,
- In Proceedings on the Workshop on Statistical Machine Translation
, 2006
"... The joint probability model proposed by Marcu and Wong (2002) provides a strong probabilistic framework for phrase-based statistical machine translation (SMT). The model's usefulness is, however, limited by the computational complexity of estimating parameters at the phrase level. We present ..."
Abstract
- Add to MetaCart
The joint probability model proposed by Marcu and Wong (2002) provides a strong probabilistic framework for phrase-based statistical machine translation (SMT). The model's usefulness is, however, limited by the computational complexity of estimating parameters at the phrase level. We present the first model to use word alignments for constraining the space of phrasal alignments searched during Expectation Maximization (EM) training. Constraining the joint model improves performance, showing results that are very close to stateof -the-art phrase-based models. It also allows it to scale up to larger corpora and therefore be more widely applicable.
An Improved Statistical Transfer System for French–English Machine Translation
"... This paper presents the Carnegie Mellon University statistical transfer MT system submitted to the 2009 WMT shared task in French-to-English translation. We describe a syntax-based approach that incorporates both syntactic and non-syntactic phrase pairs in addition to a syntactic grammar. After repo ..."
Abstract
- Add to MetaCart
This paper presents the Carnegie Mellon University statistical transfer MT system submitted to the 2009 WMT shared task in French-to-English translation. We describe a syntax-based approach that incorporates both syntactic and non-syntactic phrase pairs in addition to a syntactic grammar. After reporting development test results, we conduct a preliminary analysis of the coverage and effectiveness of the system’s components. 1

