Results 21 - 30
of
55
Improving statistical translation through editing. European Association for Machine Translation (EAMT-04) Workshop
- In European Association for Machine Translation
, 2004
"... In this paper we introduce Linear B’s statistical machine translation system. We describe how Linear B’s phrase-based translation models are learned from a parallel corpus, and show how the quality of the translations produced by our system can be improved over time through editing. There are two le ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In this paper we introduce Linear B’s statistical machine translation system. We describe how Linear B’s phrase-based translation models are learned from a parallel corpus, and show how the quality of the translations produced by our system can be improved over time through editing. There are two levels at which our translations can be edited. The first is through a simple correction of the text that is produced by our system. The second is through a mechanism which allows an advanced user to examine the sentences that a particular translation was learned from. The learning process can be improved by correcting which phrases in the sentence should be considered translations of each other. 1
A Part-of-Speech-Based Alignment Algorithm
, 1994
"... To align bilingual texts becomes a crucial issue recently. Rather than usiug length-based or translation-based criterion, a part-of-speech-based criterion is proposed. We postnlate that source lexis and target texts sbonld share the same concepts, ideas, entities, and evenIs. Simulated anneallug app ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
To align bilingual texts becomes a crucial issue recently. Rather than usiug length-based or translation-based criterion, a part-of-speech-based criterion is proposed. We postnlate that source lexis and target texts sbonld share the same concepts, ideas, entities, and evenIs. Simulated anneallug approach is used to implement this alignment algorifiun. The preliminary experiments show good perfommncc. Most importantly, the experimental objects are Chinese-English texts, which are selected from different language families.
An Exploration of Data-driven Machine Translation for Sign Languages
, 2008
"... A dissertation submitted in fulfilment of the requirements for the award of ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
A dissertation submitted in fulfilment of the requirements for the award of
Does o-substitution preserve recognizability?
- IN PROC. 11TH INT. CONF. IMPLEM. AND APPL. OF AUTOMATA
, 2006
"... Substitution operations on tree series are at the basis of systems of equations (over tree series) and tree series transducers. Tree series transducers seem to be an interesting transformation device in syntactic pattern matching. In this contribution, it is shown that o-substitution preserves reco ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Substitution operations on tree series are at the basis of systems of equations (over tree series) and tree series transducers. Tree series transducers seem to be an interesting transformation device in syntactic pattern matching. In this contribution, it is shown that o-substitution preserves recognizable tree series provided that the target tree series is linear and the semiring is idempotent, commutative, and continuous. This result is applied to prove that the range of the o-t-ts transformation computed by a linear recognizable tree series transducer is pointwise recognizable.
Latest developments in machine translation technology
- In: MT Summit
, 1993
"... which had been established in the late 1970s. These were the systems which had built upon experi-ence gained in what may be called the 'quiet ' decade of machine translation, the ten years after the publication of the ALPAC report in 1966 had brought to an end MT research in the United States and ha ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
which had been established in the late 1970s. These were the systems which had built upon experi-ence gained in what may be called the 'quiet ' decade of machine translation, the ten years after the publication of the ALPAC report in 1966 had brought to an end MT research in the United States and had profoundly affected its support elsewhere. Throughout the 1980s, it can be asserted without contradiction, the dominant framework of MT research was the essentially syntax-oriented 'transfer ' approach exemplified by such systems as ARIANE at Grenoble University, METAL at Texas, SUSY at Saarbrücken, the Mu system at Kyoto University, and of course the multilingual Eurotra project of the European Communities. In addition, many of the commercial systems which appeared at this time were based on the same principles. For some time it appeared as if the 'interlingua ' approach was not viable. Earlier efforts in the 1970s had been unsuccessful at Grenoble- the CETA system- and at the University of Texas. These were, however, essentially syntax-oriented approaches: while structural transfer was via interlingual ('universal') tree representations, lexical transfer was still via bilingual dictionary substitution. Dur-ing the 1980s, new approaches to the interlingua model appeared. Some remained essentially lin-
Improved word alignments using the web as a corpus
- In Proceedings of RANLP’07
, 2007
"... We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word translations (dynamically augmented from the Web using bootstrapping), the vector space model, linguistically motivated weighted minimum edit distance, competitive linking, and the IBM models. Evaluation results on a Bulgarian-Russian corpus show a sizable improvement both in word alignment and in translation quality.
Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
"... Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its ability to learn a variety of general paraphrastic transformations, including passivization, dative shift, and topicalization. We discuss how our model can be adapted to many text generation tasks by augmenting its feature set, development data, and parameter estimation routine. We illustrate this adaptation by using our paraphrase model for the task of sentence compression and achieve results competitive with state-of-the-art compression systems.
Term-list translation using monolingual word co-occurrence vectors
- In Proceedings of the 17th International Conference on Computational Linguistics
, 1998
"... A term-list is a list of content words that charac-terize a consistent text or a concept. This paper presents a new method for translating a term-list by using a corpus in the target language. The method first retrieves alternative translations for each input word from a bilingual dictionary. It the ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A term-list is a list of content words that charac-terize a consistent text or a concept. This paper presents a new method for translating a term-list by using a corpus in the target language. The method first retrieves alternative translations for each input word from a bilingual dictionary. It then determines the most 'coherent ' combination of alternative trans-lations, where the coherence of a set of words is defined as the proximity among multi-dimensional vectors produced from the words on the basis of co-occurrence statistics. The method was applied to term-lists extracted from newspaper articles and achieved 81 % translation accuracy for ambiguous words (i.e., words with multiple translations). 1
Segmenting Sentences into Linky Strings Using D-bigram Statistics
- In COLING-96: Proceedings of the 16th International Conference on Computational Linguistics
, 1996
"... It is obvious that segmentation takes an important role in natural language processing(NLP), especially for the lan- guages whose sentences are not easily separated into morphcrees. In this study we propose a method of segmenting a sentence. The system described in this paper does not use any grmnma ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
It is obvious that segmentation takes an important role in natural language processing(NLP), especially for the lan- guages whose sentences are not easily separated into morphcrees. In this study we propose a method of segmenting a sentence. The system described in this paper does not use any grmnmatical information or knowledge in processing. Instead, it uses statistical information drawn from non-tagged corpus of the target language. Most of the segmenting systems are to pick out conventional morphcrees which is defined for human use. ltowever, we still do not know whether those conventional norphemes are good units for computational processing.
Adaptive Sentence Alignment based on Length and Lexical Information
- In Proceedings of the 40th Annual Meeting of Association for Computational Linguistics, Comp. Volume
, 2002
"... This prototype system demonstrates a novel sentence alignment method for bilingual texts based on adaptive learning and lexical information. The system aligns bilingual text at the paragraph level first and acquires length related statistics for the subsequent sentence alignment process. In addition ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This prototype system demonstrates a novel sentence alignment method for bilingual texts based on adaptive learning and lexical information. The system aligns bilingual text at the paragraph level first and acquires length related statistics for the subsequent sentence alignment process. In addition to lengths, a probabilistic translation lexicon is utilized to further enhance the precision. The system is especially effective in the case of noisy translations produced in either translation direction that may involve different domains. 1

