2004b. Alignment of bilingual named entities in parallel corpora using statistical model (2005)
| Venue: | Lecture Notes in Artificial Intelligence |
| Citations: | 1 - 0 self |
BibTeX
@INPROCEEDINGS{Lee052004b.alignment,
author = {Chun-jen Lee},
title = {2004b. Alignment of bilingual named entities in parallel corpora using statistical model},
booktitle = {Lecture Notes in Artificial Intelligence},
year = {2005},
pages = {144--153}
}
OpenURL
Abstract
Named entity (NE) extraction is one of the fundamental tasks in natural language processing (NLP). Although many studies have focused on identifying NEs within monolingual documents, aligning NEs in bilingual documents has not been investigated extensively due to the complexity of the task. In this article, we introduce a new approach to aligning bilingual NEs in parallel cor-pora by incorporating statistical models with multiple knowledge sources. In our approach, we model the process of translating an English NE phrase into a Chinese equivalent using lexical translation/transliteration probabilities for word translation and alignment probabilities for word reordering. The method involves automatically learning phrase alignment and acquiring word translations from a bilingual phrase dictionary and parallel corpora, and automatically discover-ing transliteration transformations from a training set of name-transliteration pairs. The method also involves language-specific knowledge functions, including abbreviation handling, Chinese person name recognition, and acronym expansion. At run time, the proposed models are applied to each source NE in a pair of bilingual sentences to generate and evaluate the target NE candi-dates, and the source and target NEs are aligned based on the computed probabilities. Experi-







