Results 1 -
3 of
3
Toward Statistical Machine Translation without Parallel Corpora
"... We estimate the parameters of a phrasebased statistical machine translation system from monolingual corpora instead of a bilingual parallel corpus. We extend existing research on bilingual lexicon induction to estimate both lexical and phrasal translation probabilities for MT-scale phrasetables. We ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We estimate the parameters of a phrasebased statistical machine translation system from monolingual corpora instead of a bilingual parallel corpus. We extend existing research on bilingual lexicon induction to estimate both lexical and phrasal translation probabilities for MT-scale phrasetables. We propose a novel algorithm to estimate reordering probabilities from monolingual data. We report translation results for an end-to-end translation system using these monolingual features alone. Our method only requires monolingual corpora in source and target languages, a small bilingual dictionary, and a small bitext for tuning feature weights. In this paper, we examine an idealization where a phrase-table is given. We examine the degradation in translation performance when bilingually estimated translation probabilities are removed and show that 80%+ of the loss can be recovered with monolingually estimated features alone. We further show that our monolingual features add 1.5 BLEU points when combined with standard bilingually estimated phrase table features. 1
Efficient Online Locality Sensitive Hashing via Reservoir Counting
"... We describe a novel mechanism called Reservoir Counting for application in online Locality Sensitive Hashing. This technique allows for significant savings in the streaming setting, allowing for maintaining a larger number of signatures, or an increased level of approximation accuracy at a similar m ..."
Abstract
- Add to MetaCart
We describe a novel mechanism called Reservoir Counting for application in online Locality Sensitive Hashing. This technique allows for significant savings in the streaming setting, allowing for maintaining a larger number of signatures, or an increased level of approximation accuracy at a similar memory footprint. 1
Using Visual Information to Predict Lexical Preference
"... Most NLP systems make predictions based solely on linguistic (textual or spoken) input. We show how to use visual information to make better linguistic predictions. We focus on selectional preference; specifically, determining the plausible noun arguments for particular verb predicates. For each arg ..."
Abstract
- Add to MetaCart
Most NLP systems make predictions based solely on linguistic (textual or spoken) input. We show how to use visual information to make better linguistic predictions. We focus on selectional preference; specifically, determining the plausible noun arguments for particular verb predicates. For each argument noun, we extract visual features from corresponding images on the web. For each verb predicate, we train a classifier to select the visual features that are indicative of its preferred arguments. We show that for certain verbs, using visual information can significantly improve performance over a baseline. For the successful cases, visual information is useful even in the presence of cooccurrence information derived from webscale text. We assess a variety of training configurations, which vary over classes of visual features, methods of image acquisition, and numbers of images. 1

