Results 1 - 10
of
19
Identification of transliterated foreign words in Hebrew script
- In Proc. CICLing, volume LNCS 4919
, 2008
"... Abstract. We present a loosely-supervised method for context-free identification of transliterated foreign names and borrowed words in Hebrew text. The method is purely statistical and does not require the use of any lexicons or linguistic analysis tool for the source languages (Hebrew, in our case) ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. We present a loosely-supervised method for context-free identification of transliterated foreign names and borrowed words in Hebrew text. The method is purely statistical and does not require the use of any lexicons or linguistic analysis tool for the source languages (Hebrew, in our case). It also does not require any manually annotated data for training – we learn from noisy data acquired by over-generation. We report precision/recall results of 80/82 for a corpus of 4044 unique words, containing 368 foreign words. 1
Improved word alignments using the web as a corpus
- In Proceedings of RANLP’07
, 2007
"... We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word translations (dynamically augmented from the Web using bootstrapping), the vector space model, linguistically motivated weighted minimum edit distance, competitive linking, and the IBM models. Evaluation results on a Bulgarian-Russian corpus show a sizable improvement both in word alignment and in translation quality.
Learning bilingual lexicons using the visual similarity of labeled web images
- In Proceedings of the International Joint Conference on Artificial Intelligence
, 2011
"... Speakers of many different languages use the Internet. A common activity among these users is uploading images and associating these images with words (in their own language) as captions, filenames, or surrounding text. We use these explicit, monolingual, image-to-word connections to successfully le ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Speakers of many different languages use the Internet. A common activity among these users is uploading images and associating these images with words (in their own language) as captions, filenames, or surrounding text. We use these explicit, monolingual, image-to-word connections to successfully learn implicit, bilingual, word-to-word translations. Bilingual pairs of words are proposed as translations if their corresponding images have similar visual features. We generate bilingual lexicons in 15 language pairs, focusing on words that have been automatically identified as physical objects. The use of visual similarity substantially improves performance over standard approaches based on string similarity: for generated lexicons with 1000 translations, including visual information leads to an absolute improvement in accuracy of 8-12 % over string edit distance alone. 1
A PREDICTIVE MODEL OF PROSODY THROUGH GRAMMATICAL INTERFACE: A COMPUTATIONAL APPROACH
, 2007
"... Speech prosody is manifest in the acoustic signal through the modulation of pitch, loudness, duration, and source characteristics (voice quality), which combine to encode the prosodic structure of an utterance. Prosodic structure defines the location of prominent words and syllables, and the groupin ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Speech prosody is manifest in the acoustic signal through the modulation of pitch, loudness, duration, and source characteristics (voice quality), which combine to encode the prosodic structure of an utterance. Prosodic structure defines the location of prominent words and syllables, and the grouping of words into phonological phrases. Prosodic structure, in turn, relates the phonological form of an utterance to its morphological, syntactic, semantic, and pragmatic context. The listener’s task in comprehending speech includes decoding prosodic structure to aid in identifying the morphological, syntactic, semantic, and pragmatic contexts that comprise the meaning of the utterance. The research reported in this dissertation focuses on acoustic and perceptual evidence for prosody in spoken language, and the relationship between prosodic structure and higher levels of linguistic organization. The study adopts a computational approach that employs natural language processing tools, machine learning algorithms, and speech and signal pro-cessing techniques to investigate prosody in speech corpus data. In this study, I show that prosodic features of an utterance can be reliably predicted from a set of features that en-
Cognate or false friend? Ask the Web
- In Proceedings of the RANLP’2007 workshop: Acquisition and management of multilingual lexicons
, 2007
"... We propose a novel unsupervised semantic method for distinguishing cognates from false friends. The basic intuition is that if two words are cognates, then most of the words in their respective local contexts should be translations of each other. The idea is formalised using the Web as a corpus, a g ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We propose a novel unsupervised semantic method for distinguishing cognates from false friends. The basic intuition is that if two words are cognates, then most of the words in their respective local contexts should be translations of each other. The idea is formalised using the Web as a corpus, a glossary of known word translations used as cross-linguistic “bridges”, and the vector space model. Unlike traditional orthographic similarity measures, our method can easily handle words with identical spelling. The evaluation on 200 Bulgarian-Russian word pairs shows this is a very promising approach.
Discriminative Substring Decoding for Transliteration
"... We present a discriminative substring decoder for transliteration. This decoder extends recent approaches for discriminative character transduction by allowing for a list of known target-language words, an important resource for transliteration. Our approach improves upon Sherif and Kondrak’s (2007b ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We present a discriminative substring decoder for transliteration. This decoder extends recent approaches for discriminative character transduction by allowing for a list of known target-language words, an important resource for transliteration. Our approach improves upon Sherif and Kondrak’s (2007b) state-of-theart decoder, creating a 28.5 % relative improvement in transliteration accuracy on a Japanese katakana-to-English task. We also conduct a controlled comparison of two feature paradigms for discriminative training: indicators and hybrid generative features. Surprisingly, the generative hybrid outperforms its purely discriminative counterpart, despite losing access to rich source-context features. Finally, we show that machine transliterations have a positive impact on machine translation quality, improving human judgments by 0.5 on a 4-point scale. 1
Automatic Transliteration of Proper Nouns from Arabic to English. The Challenge of Arabic For NLP/MT
, 2006
"... ..."
Multilingual Cognate Identification using Integer Linear Programming
"... Abstract The identification of cognates in natural languages is a crucial part of automatic translation lexicon construction and other multilingual lexical tasks. We present new methods for multilingual cognate identification using the global inference framework of Integer Linear Programming. While ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract The identification of cognates in natural languages is a crucial part of automatic translation lexicon construction and other multilingual lexical tasks. We present new methods for multilingual cognate identification using the global inference framework of Integer Linear Programming. While previous approaches to cognate identification have focused on pairs of natural languages, we provide a methodology that directly forms sets of cognates across groups of languages. We show improvements over simple clustering techniques that do not inherently consider the transitivity of cognate relations. Furthermore, we show that formulations that jointly link cognates across groups of natural languages achieve higher performance than traditional pairwise approaches. We also describe applications of our technique to other important problems in multilingual natural language processing.
Mining Name Translations from Comparable Corpora by Creating Bilingual Information Networks
"... This paper describes a new task to extract and align information networks from comparable corpora. As a case study we demonstrate the effectiveness of this task on automatically mining name translation pairs. Starting from a small set of seeds, we design a novel approach to acquire name translation ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper describes a new task to extract and align information networks from comparable corpora. As a case study we demonstrate the effectiveness of this task on automatically mining name translation pairs. Starting from a small set of seeds, we design a novel approach to acquire name translation pairs in a bootstrapping framework. The experimental results show this approach can generate highly accurate name translation pairs for persons, geopolitical and organization entities. 1
Transliteration Generation and Mining with Limited Training Resources
"... We present DIRECTL+: an online discriminative sequence prediction model based on many-to-many alignments, which is further augmented by the incorporation of joint n-gram features. Experimental results show improvement over the results achieved by DIRECTL in 2009. We also explore a number of diverse ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present DIRECTL+: an online discriminative sequence prediction model based on many-to-many alignments, which is further augmented by the incorporation of joint n-gram features. Experimental results show improvement over the results achieved by DIRECTL in 2009. We also explore a number of diverse resource-free and language-independent approaches to transliteration mining, which range from simple to sophisticated. 1

