Results 11 -
18 of
18
利用依存關係之辭彙翻譯 Word Translation Disambiguation via Dependency
"... We introduce a new method for automatically disambiguation of word translations by using dependency relationships. In our approach, we learn the relationships between translations and dependency relationships from a parallel corpus. The method consists of a training stage and a runtime stage. During ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We introduce a new method for automatically disambiguation of word translations by using dependency relationships. In our approach, we learn the relationships between translations and dependency relationships from a parallel corpus. The method consists of a training stage and a runtime stage. During the training stage, the system automatically learns a translation decision list based on source sentences and its dependency relationships. At runtime, for each content word in the given sentence, we give a most appropriate Chinese translation relevant to the context of the given sentence according to the decision list. We also describe the implementation of the proposed method using bilingual Hong Kong news and Hong Kong Hansard corpus. In the experiment, we use five different ways to translate content words in the test data and evaluate the results based an automatic BLEU-like evaluation methodology. Experimental results indicate that dependency relations can obviously help us to disambiguate word translations and some kinds of dependency are more effective than others. 關鍵 詞 : 翻譯選擇,統計式機器翻譯,平行語料庫,決策表,依存關係
Semantic Evidence for Automatic Identification of Cognates
"... The identification of cognate word pairs has recently started to attract the attention of NLP research, but it is still a rather unexplored area requiring more focused attention. This paper builds on a purely orthographic approach to this task by introducing semantic evidence in the form of monoling ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The identification of cognate word pairs has recently started to attract the attention of NLP research, but it is still a rather unexplored area requiring more focused attention. This paper builds on a purely orthographic approach to this task by introducing semantic evidence in the form of monolingual thesauri and corpora to support the identification process. The proposed method is easily portable between languages and specialisation domains, since it does not depend on the availability of parallel texts or extensive knowledge resources, requiring only monolingual corpora and a bilingual dictionary encoding correspondences only the core vocabularies of both languages. Our evaluation of the method on four different language pairs suggests that the introduction of semantic evidence in cognate detection helps to substantially increase the precision of cognate identification.
Paraphrase Fragment Extraction from Monolingual Comparable Corpora
"... We present a novel paraphrase fragment pair extraction method that uses a monolingual comparable corpus containing different articles about the same topics or events. The procedure consists of document pair extraction, sentence pair extraction, and fragment pair extraction. At each stage, we evaluat ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a novel paraphrase fragment pair extraction method that uses a monolingual comparable corpus containing different articles about the same topics or events. The procedure consists of document pair extraction, sentence pair extraction, and fragment pair extraction. At each stage, we evaluate the intermediate results manually, and tune the later stages accordingly. With this minimally supervised approach, we achieve 62 % of accuracy on the paraphrase fragment pairs we collected and 67 % extracted from the MSR corpus. The results look promising, given the minimal supervision of the approach, which can be further scaled up. 1
An Iterative Algorithm for Translation Acquisition of Adpositions
, 2002
"... This paper describes an algorithm which acquires prepositions for translation from large corpora. Corpora of both the source language and the target language are used, but they can be independent of each other. Moreover, the algorithm does not require any type of manual tagging. Using an iterative a ..."
Abstract
- Add to MetaCart
This paper describes an algorithm which acquires prepositions for translation from large corpora. Corpora of both the source language and the target language are used, but they can be independent of each other. Moreover, the algorithm does not require any type of manual tagging. Using an iterative algorithm, the system selects preferred prepositions between specific verbs and nouns in the target language, and simultaneously detects compound verbs which may be obstacles to the proper selection of prepositions. This algorithm is applied for the translation of the Japanese postposition `de' into English.
A New Measure for Extracting Semantically Related Words
, 2004
"... The identification of semantically related terms for a given word is an important problem. A number of statistical approaches have been proposed to address this problem. Most approaches draw their statistics from a large general corpus. In this paper, we propose to use specialized corpora which focu ..."
Abstract
- Add to MetaCart
The identification of semantically related terms for a given word is an important problem. A number of statistical approaches have been proposed to address this problem. Most approaches draw their statistics from a large general corpus. In this paper, we propose to use specialized corpora which focus strongly on the individual words of interest. We propose to collect such corpora through targeted queries to Internet search engines. Furthermore, we introduce a new statistical measure, Relative Frequency Ratio,tailored specifically for such specialized corpora. We evaluated our approach by using the extracted related terms to attack the target word selection problem in machine translation. This type of indirect evaluation is conducted because a direct evaluation on the set of related terms thus extracted relies heavily on direct human involvement and is not quantitatively comparable to others' results. Our experimental results so far are very encouraging.
Using Comparable Corpora to Adapt a Translation Model to Domains
"... Statistical machine translation (SMT) requires a large parallel corpus, which is available only for restricted language pairs and domains. To expand the language pairs and domains to which SMT is applicable, we created a method for estimating translation pseudo-probabilities from bilingual comparabl ..."
Abstract
- Add to MetaCart
Statistical machine translation (SMT) requires a large parallel corpus, which is available only for restricted language pairs and domains. To expand the language pairs and domains to which SMT is applicable, we created a method for estimating translation pseudo-probabilities from bilingual comparable corpora. The essence of our method is to calculate pairwise correlations between the words associated with a source-language word, presently restricted to a noun, and its translations; word translation pseudo-probabilities are calculated based on the assumption that the more associated words a translation is correlated with, the higher its translation probability. We also describe a method we created for calculating noun-sequence translation pseudo-probabilities based on occurrence frequencies of noun sequences and constituent-word translation pseudo-probabilities. Then, we present a framework for merging the translation pseudo-probabilities estimated from in-domain comparable corpora with a translation model learned from an out-of-domain parallel corpus. Experiments using Japanese and English comparable corpora of scientific paper abstracts and a Japanese-English parallel corpus of patent abstracts showed promising results; the BLEU score was improved to some degree by incorporating the pseudo-probabilities estimated from the in-domain comparable corpora. Future work includes an optimization of the parameters and an extension to estimate translation pseudo-probabilities for verbs. 1.
ACL Special Interest Group on the Lexicon (SIGLEX), Philadelphia, July 2002, pp. 9-16. Association for Computational Linguistics. Learning a Translation Lexicon from Monolingual Corpora
"... This paper presents work on the task of constructing a word-level translation lexicon purely from unrelated monolingual corpora. We combine various clues such as cognates, similar context, preservation of word similarity, and word frequency. Experimental results for the construction of a German-Engl ..."
Abstract
- Add to MetaCart
This paper presents work on the task of constructing a word-level translation lexicon purely from unrelated monolingual corpora. We combine various clues such as cognates, similar context, preservation of word similarity, and word frequency. Experimental results for the construction of a German-English noun lexicon are reported. Noun translation accuracy of 39 % scored against a parallel test corpus could be achieved. 1
Paraphrasing for Style
"... We present initial investigation into the task of paraphrasing language while targeting a particular writing style. The plays of William Shakespeare and their modern translations are used as a testbed for evaluating paraphrase systems targeting a specific style of writing. We show that even with a r ..."
Abstract
- Add to MetaCart
We present initial investigation into the task of paraphrasing language while targeting a particular writing style. The plays of William Shakespeare and their modern translations are used as a testbed for evaluating paraphrase systems targeting a specific style of writing. We show that even with a relatively small amount of parallel training data, it is possible to learn paraphrase models which capture stylistic phenomena, and these models outperform baselines based on dictionaries and out-of-domain parallel text. In addition we present an initial investigation into automatic evaluation metrics for paraphrasing writing style. To the best of our knowledge this is the first work to investigate the task of paraphrasing text with the goal of targeting a specific style of writing.

