Improved word alignments using the web as a corpus (2007)
| Venue: | In Proceedings of RANLP’07 |
| Citations: | 3 - 3 self |
BibTeX
@INPROCEEDINGS{Nakov07improvedword,
author = {Preslav Nakov and Svetlin Nakov},
title = {Improved word alignments using the web as a corpus},
booktitle = {In Proceedings of RANLP’07},
year = {2007}
}
OpenURL
Abstract
We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word translations (dynamically augmented from the Web using bootstrapping), the vector space model, linguistically motivated weighted minimum edit distance, competitive linking, and the IBM models. Evaluation results on a Bulgarian-Russian corpus show a sizable improvement both in word alignment and in translation quality.







