Results 1 -
7 of
7
A Systematic Comparison of Various Statistical Alignment Models
- Computational Linguistics
, 2003
"... this article the problem of finding the word alignment of a bilingual sentence-aligned corpus by using language-independent statistical methods. There is a vast literature on this topic, and many different systems have been suggested to solve this problem. Our work follows and extends the methods in ..."
Abstract
-
Cited by 805 (22 self)
- Add to MetaCart
this article the problem of finding the word alignment of a bilingual sentence-aligned corpus by using language-independent statistical methods. There is a vast literature on this topic, and many different systems have been suggested to solve this problem. Our work follows and extends the methods introduced by Brown, Della Pietra, Della Pietra, and Mercer (1993) by using refined statistical models for the translation process. The basic idea of this approach is to develop a model of the translation process with the word alignment as a hidden variable of this process, to apply statistical estimation theory to compute the "optimal" model parameters, and to perform alignment search to compute the best word alignment
Identifying idiomatic expressions using automatic word alignment
- Proceedings of the EACL 2006 Workshop on Multiword Expressions in
, 2006
"... For NLP applications that require some sort of semantic interpretation it would be helpful to know what expressions exhibit an idiomatic meaning and what expressions exhibit a literal meaning. We investigate whether automatic word-alignment in existing parallel corpora facilitates the classification ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
For NLP applications that require some sort of semantic interpretation it would be helpful to know what expressions exhibit an idiomatic meaning and what expressions exhibit a literal meaning. We investigate whether automatic word-alignment in existing parallel corpora facilitates the classification of candidate expressions along a continuum ranging from literal and transparent expressions to idiomatic and opaque expressions. Our method relies on two criteria: (i) meaning predictability that is measured as semantic entropy and (ii), the overlap between the meaning of an expression and the meaning of its component words. We approximate the mentioned overlap as the proportion of default alignments. We obtain a significant improvement over the baseline with both measures. 1
Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity
"... There have been many proposals to extract semantically related words using measures of distributional similarity, but these typically are not able to distinguish between synonyms and other types of semantically related words such as antonyms, (co)hyponyms and hypernyms. We present a method based on ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
There have been many proposals to extract semantically related words using measures of distributional similarity, but these typically are not able to distinguish between synonyms and other types of semantically related words such as antonyms, (co)hyponyms and hypernyms. We present a method based on automatic word alignment of parallel corpora consisting of documents translated into multiple languages and compare our method with a monolingual syntax-based method. The approach that uses aligned multilingual data to extract synonyms shows much higher precision and recall scores for the task of synonym extraction than the monolingual syntax-based approach. 1
Word Alignment Annotation in a Japanese-Chinese Parallel Corpus
"... Parallel corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual annotation of word alignment is of significance to provide gold-standard for developing and evaluating both example-based ..."
Abstract
- Add to MetaCart
Parallel corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual annotation of word alignment is of significance to provide gold-standard for developing and evaluating both example-based machine translation model and statistical machine translation model. This paper presents the work of word alignment annotation in the NICT Japanese-Chinese parallel corpus, which is constructed at the National Institute of Information and Communications Technology (NICT). We describe the specification of word alignment annotation and the tools specially developed for the manual annotation. The manual annotation on 17,000 sentence pairs has been completed. We examined the manually annotated word alignment data and extracted translation knowledge from the word aligned corpus. 1.
• Identification of Dutch mwes • Discussion
"... • Automated method to extract multiword expressions (mwes) from large corpora. • Ideally, a technique applicable to all subtypes of mwes. • Lists of mwes together with syntactic frame, modifiability information and frequency. Talk outline 2 • What are Multiword Expressions (mwes)? • The landscape of ..."
Abstract
- Add to MetaCart
• Automated method to extract multiword expressions (mwes) from large corpora. • Ideally, a technique applicable to all subtypes of mwes. • Lists of mwes together with syntactic frame, modifiability information and frequency. Talk outline 2 • What are Multiword Expressions (mwes)? • The landscape of mwes
RROPQR Defining the problem
, 2006
"... • We investigate automated methods to rank a list of candidate expressions in terms of their idiomaticity. • We explore whether automatic word alignment can be useful to identify idiomatic expressions. • Compile a lexicon of idiomatic multiword expressions to be used in nlp systems. RROPQR Idiomatic ..."
Abstract
- Add to MetaCart
• We investigate automated methods to rank a list of candidate expressions in terms of their idiomaticity. • We explore whether automatic word alignment can be useful to identify idiomatic expressions. • Compile a lexicon of idiomatic multiword expressions to be used in nlp systems. RROPQR Idiomatic expressions... 2 • constitute a subset of multiword expressions [Sag et al., 2001]. • show idiosyncratic behavior at various levels of analysis ⋆ rigidity in syntax and morphology ⋆ strong lexical affinity ⋆ meaning is conventionalized • have a non-compositional meaning. RROPQR How can we capture non-compositional meaning? 3 compositional spill + soup spill(soup) non-compositional spill + beans ̸↓ reveal(secret) • Approximate meaning by looking up translation in a foreign language. literal spill + soup Sp derramar + sopa Sp derramar(sopa) idiomatic spill + beans Sp derramar + judias
Author manuscript, published in "10th Annual Conference of the International Speech Communication Association- INTERSPEECH 2009 (2009)" Efficient Combination of Confidence Measures for Machine Translation
, 2009
"... We present in this paper a twofold contribution to Confidence Measures for Machine Translation. First, in order to train and test confidence measures, we present a method to automatically build corpora containing realistic errors. Errors introduced into reference translation simulate classical machi ..."
Abstract
- Add to MetaCart
We present in this paper a twofold contribution to Confidence Measures for Machine Translation. First, in order to train and test confidence measures, we present a method to automatically build corpora containing realistic errors. Errors introduced into reference translation simulate classical machine translation errors (word deletion and word substitution), and are supervised by Wordnet. Second, we use SVM to combine original and classical confidence measures both at word- and sentence-level. We show that the obtained combination outperforms by 14 % (absolute) our best single word-level confidence measure, and that combination of sentence-level confidence measures produces meaningful scores. Index Terms: statistical machine translation systems, confidence measures, support vector machine, support vector regression

