Results 11 - 20
of
737
A Phrase-Based, Joint Probability Model for Statistical Machine Translation
- In Proceedings of EMNLP
, 2002
"... We present a joint probability model for statistical machine translation, which automatically learns word and phrase equivalents from bilingual corpora. Translations produced with parameters estimated using the joint model are more accurate than translations produced using IBM Model 4. ..."
Abstract
-
Cited by 135 (2 self)
- Add to MetaCart
We present a joint probability model for statistical machine translation, which automatically learns word and phrase equivalents from bilingual corpora. Translations produced with parameters estimated using the joint model are more accurate than translations produced using IBM Model 4.
Summarization beyond sentence extraction: A probabilistic approach to sentence compression
, 2002
"... When humans produce summaries of documents, they do not simply extract sentences and concatenate them. Rather, they create new sentences that are grammatical, that cohere with one another, and that capture the most salient pieces of information in the original document. Given that large collections ..."
Abstract
-
Cited by 104 (11 self)
- Add to MetaCart
When humans produce summaries of documents, they do not simply extract sentences and concatenate them. Rather, they create new sentences that are grammatical, that cohere with one another, and that capture the most salient pieces of information in the original document. Given that large collections of text/abstract pairs are available online, it is now possible to envision algorithms that are trained to mimic this process. In this paper, we focus on sentence compression, a simpler version of this larger challenge. We aim to achieve two goals simultaneously: our compressions should be grammatical, and they should retain the most important pieces of information. These two goals can conflict. We devise both a noisy-channel and a decision-tree approach to the problem, and we evaluate results against manual compressions and a simple baseline. 2002 Elsevier Science B.V. All rights reserved.
A Smorgasbord of Features for Statistical Machine Translation
, 2004
"... We describe a methodology for rapid experimentation in statistical machine translation which we use to add a large number of features to a baseline system exploiting features from a wide range of levels of syntactic representation. Feature ..."
Abstract
-
Cited by 93 (3 self)
- Add to MetaCart
We describe a methodology for rapid experimentation in statistical machine translation which we use to add a large number of features to a baseline system exploiting features from a wide range of levels of syntactic representation. Feature
Decoding Complexity in Word-Replacement Translation Models
- Computational Linguistics
, 1999
"... This paper looks at decoding complexity. ..."
IBM's Statistical Question Answering System
- In Proceedings of the Tenth Text REtrieval Conference (TREC
, 2000
"... We describe the IBM Statistical Question Answering for TREC-9 system in detail and look at several examples and errors. The system is an application of maximum entropy classification for question/answer type prediction and named entity marking. We describe our system for information retrieval whi ..."
Abstract
-
Cited by 81 (0 self)
- Add to MetaCart
We describe the IBM Statistical Question Answering for TREC-9 system in detail and look at several examples and errors. The system is an application of maximum entropy classification for question/answer type prediction and named entity marking. We describe our system for information retrieval which in the first step did document retrieval from a local encyclopedia, and in the second step performed an expansion of the query words and finally did passage retrieval from the TREC collection. We will also discuss the answer selection algorithm which determines the best sentence given both the question and the occurrence of a phrase belonging to the answer class desired by the question. Results at the 250 byte and 50 byte levels for the overall system as well as results on each subcomponent are presented.
Termight: Identifying and Translating Technical Terminology
, 1994
"... We propose a semi-automatic tool, termight, that helps professional translators and terminologists identify technical terms and their translations. The tool makes use of part-of-speech tagging and word-alignment programs to extract candidate terms and their translations. Although the extraction prog ..."
Abstract
-
Cited by 80 (1 self)
- Add to MetaCart
We propose a semi-automatic tool, termight, that helps professional translators and terminologists identify technical terms and their translations. The tool makes use of part-of-speech tagging and word-alignment programs to extract candidate terms and their translations. Although the extraction programs are far from perfect, it isn't too hard for the user to filter out the wheat from the chaff. The extraction algorithms emphasize completeness. Alter-native proposals are likely to miss important but infrequent terms/translations. To reduce the burden on the user during the filtering phase, candidates are presented in a convenient order, along with some useful concordance evidence, in an interface that is designed to minimize keystrokes. Termight is currently being used by the trans-
Large language models in machine translation
- In EMNLP
, 2007
"... This paper reports on the benefits of largescale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 trillion tokens, resulting in language models having up to 300 billion n-grams. It is capable of providing smoothed probabi ..."
Abstract
-
Cited by 78 (2 self)
- Add to MetaCart
This paper reports on the benefits of largescale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 trillion tokens, resulting in language models having up to 300 billion n-grams. It is capable of providing smoothed probabilities for fast, single-pass decoding. We introduce a new smoothing method, dubbed Stupid Backoff, that is inexpensive to train on large data sets and approaches the quality of Kneser-Ney Smoothing as the amount of training data increases. 1
Statistics-based summarization – Step one: Sentence compression
- In Proceedings of AAAI
, 2000
"... When humans produce summaries of documents, they do not simply extract sentences and concatenate them. Rather, they create new sentences that are grammatical, that cohere with one another, and that capture the most salient pieces of information in the original document. pairs are available online, i ..."
Abstract
-
Cited by 77 (2 self)
- Add to MetaCart
When humans produce summaries of documents, they do not simply extract sentences and concatenate them. Rather, they create new sentences that are grammatical, that cohere with one another, and that capture the most salient pieces of information in the original document. pairs are available online, it is now possible to envision algorithms that are trained to mimic this process. In this paper, we focus on sentence compression, a simpler version of this larger challenge. We aim to achieve two goals simultaneously:our compressions should be grammatical, and they should retain the most important pieces of information. These two goals can conflict. We devise both noisy-channel and decision-tree approaches to the problem, and we evaluate results against manual compressions and a simple baseline.
A Statistical Model for General Contextual Object Recognition
- IN ECCV
, 2004
"... We consider object recognition as the process of attaching meaningful labels to specific regions of an image, and propose a model that learns spatial relationships between objects. Given a set of images and their associated text (e.g. keywords, captions, descriptions), the objective is to segmen ..."
Abstract
-
Cited by 73 (7 self)
- Add to MetaCart
We consider object recognition as the process of attaching meaningful labels to specific regions of an image, and propose a model that learns spatial relationships between objects. Given a set of images and their associated text (e.g. keywords, captions, descriptions), the objective is to segment an image, in either a crude or sophisticated fashion, then to find the proper associations between words and regions. Previous models are limited by the scope of the representation. In particular, they fail to exploit spatial context in the images and words. We develop a more expressive model that takes this into account. We formulate a spatially consistent probabilistic mapping between continuous image feature vectors and the supplied word tokens. By learning both word-to-region associations and object relations, the proposed model augments scene segmentations due to smoothing implicit in spatial consistency. Context introduces cycles to the undirected graph, so we cannot rely on a straightforward implementation of the EM algorithm for estimating the model parameters and densities of the unknown alignment variables. Instead, we develop an approximate EM algorithm that uses loopy belief propagation in the inference step and iterative scaling on the pseudo-likelihood approximation in the parameter update step. The experiments indicate that our approximate inference and learning algorithm converges to good local solutions. Experiments on a diverse array of images show that spatial context considerably improves the accuracy of object recognition. Most

