Results 1 -
4 of
4
Automatic Identification of Cognates, False Friends, and Partial Cognates
, 2006
"... Cognates are words in different languages that have similar spelling and meaning. They can help second-language learners with vocabulary expansion and reading comprehension tasks. Special attention needs to be paid to pairs of words that appear similar but are in fact false friends: they have differ ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Cognates are words in different languages that have similar spelling and meaning. They can help second-language learners with vocabulary expansion and reading comprehension tasks. Special attention needs to be paid to pairs of words that appear similar but are in fact false friends: they have different meanings in all contexts. Partial cognates are pairs of words in two languages that have the same meaning in some, but not all, contexts. Detecting the actual meaning of a partial cognate in context can be useful for Machine Translation and Computer-Assisted Language Learning tools. Our research on cognate and false-friend words between two pair of languages (French and English in our case) consists in automatically classifying a pair of words from two languages as cognates or false friends. We use Machine Learning techniques with several measures of orthographic similarity as features for classification. We study the impact of selecting different features, averaging them, and combining them through Machine Learning techniques. The methods work on different pair of languages as long as a small amount of annotated pairs of words is provided as training data. In addition to the work done on cognate and false-friend identification we propose a
Combining Evidence in Cognate Identification
, 2004
"... Cognates are words of the same origin that belong to distinct languages. The problem of automatic identification of cognates arises in language reconstruction and bitext-related tasks. The evidence of cognation may come from various information sources, such as phonetic similarity, semantic similari ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Cognates are words of the same origin that belong to distinct languages. The problem of automatic identification of cognates arises in language reconstruction and bitext-related tasks. The evidence of cognation may come from various information sources, such as phonetic similarity, semantic similarity, and recurrent sound correspondences. I discuss ways of defining the measures of the various types of similarity and propose a method of combining then into an integrated cognate identification program. The new method requires no manual parameter tuning and performs well when tested on the Indoeuropean and Algonquian lexical data.
Revealing Phonological Similarities between German and Dutch
, 2005
"... In this paper, we present an approach to automatically revealing phonological classes within historically related languages. A newly created bilingual German-Dutch pronunciation dictionary is used for learning phonological similarities between the onsets, nuclei and codas of these two languages via ..."
Abstract
- Add to MetaCart
In this paper, we present an approach to automatically revealing phonological classes within historically related languages. A newly created bilingual German-Dutch pronunciation dictionary is used for learning phonological similarities between the onsets, nuclei and codas of these two languages via EM-based clustering. Our evaluation is twofold: we apply the models to predict from a German word the phonemes of a Dutch cognate. The results show that it is harder to predict the pronunciation of the nucleus and the coda than the onset. We also evaluate our approach qualitatively, finding meaningful classes caused by historical sound changes.
A Knowledge-Rich Approach to Measuring the Similarity between Bulgarian and Russian Words
"... nakov @ comp.nus.edu.sg We propose a novel knowledge-rich approach to measuring the similarity between a pair of words. The algorithm is tailored to Bulgarian and Russian and takes into account the orthographic and the phonetic correspondences between the two Slavic languages: it combines lemmatizat ..."
Abstract
- Add to MetaCart
nakov @ comp.nus.edu.sg We propose a novel knowledge-rich approach to measuring the similarity between a pair of words. The algorithm is tailored to Bulgarian and Russian and takes into account the orthographic and the phonetic correspondences between the two Slavic languages: it combines lemmatization, hand-crafted transformation rules, and weighted Levenshtein distance. The experimental results show an 11-pt interpolated average precision of 90.58%, which represents a sizeable improvement over two classic rivaling approaches.

