Bilingual Lexicon Generation Using Non-Aligned Signatures
SVM HeaderParse 0.2
AUTHOR NAME
Daphna Shezaf
SVM HeaderParse 0.2
AUTHOR AFFIL
Institute of Computer Science; Hebrew University of Jerusalem
SVM HeaderParse 0.2
ABSTRACT
Bilingual lexicons are fundamental resources. Modern automated lexicon generation methods usually require parallel corpora, which are not available for most language pairs. Lexicons can be generated using non-parallel corpora or a pivot language, but such lexicons are noisy. We present an algorithm for generating a high quality lexicon from a noisy one, which only requires an independent corpus for each language. Our algorithm introduces non-aligned signatures (NAS), a cross-lingual word context similarity score that avoids the over-constrained and inefficient nature of alignment-based methods. We use NAS to eliminate incorrect translations from the generated lexicon. We evaluate our method by improving the quality of noisy Spanish-Hebrew lexicons generated from two pivot English lexicons. Our algorithm substantially outperforms other lexicon generation methods. 1