Results 1 - 10
of
18
Syntactic Constraints on Paraphrases Extracted from Parallel Corpora
"... ccb cs jhu edu We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alo ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
ccb cs jhu edu We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alongside bilingual phrase pairs. In order to retain broad coverage of non-constituent phrases, complex syntactic labels are introduced. A manual evaluation indicates a 19% absolute improvement in paraphrase quality over the baseline method. 1
Combining Multiple Resources to Improve SMT-based Paraphrasing Model
- IN PROCEEDINGS OF ACL-08:HLT
, 2008
"... This paper proposes a novel method that exploits multiple resources to improve statistical machine translation (SMT) based paraphrasing. In detail, a phrasal paraphrase table and a feature function are derived from each resource, which are then combined in a log-linear SMT model for sentence-level p ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper proposes a novel method that exploits multiple resources to improve statistical machine translation (SMT) based paraphrasing. In detail, a phrasal paraphrase table and a feature function are derived from each resource, which are then combined in a log-linear SMT model for sentence-level paraphrase generation. Experimental results show that the SMT-based paraphrasing model can be enhanced using multiple resources. The phrase-level and sentence-level precision of the generated paraphrases are above 60 % and 55%, respectively. In addition, the contribution of each resource is evaluated, which indicates that all the exploited resources are useful for generating paraphrases of high quality.
Hitting the right paraphrases in good time
- In Proc. NAACL’2010
, 2010
"... We present a random-walk-based approach to learning paraphrases from bilingual parallel corpora. The corpora are represented as a graph in which a node corresponds to a phrase, and an edge exists between two nodes if their corresponding phrases are aligned in a phrase table. We sample random walks t ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We present a random-walk-based approach to learning paraphrases from bilingual parallel corpora. The corpora are represented as a graph in which a node corresponds to a phrase, and an edge exists between two nodes if their corresponding phrases are aligned in a phrase table. We sample random walks to compute the average number of steps it takes to reach a ranking of paraphrases with better ones being “closer ” to a phrase of interest. This approach allows “feature ” nodes that represent domain knowledge to be built into the graph, and incorporates truncation techniques to prevent the graph from growing too large for efficiency. Current approaches, by contrast, implicitly presuppose the graph to be bipartite, are limited to finding paraphrases that are of length two away from a phrase, and do not generally permit easy incorporation of domain knowledge. Manual evaluation of generated output shows that our approach outperforms the state-of-the-art system of Callison-Burch (2008). 1
Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases
"... Untranslated words still constitute a major problem for Statistical Machine Translation (SMT), and current SMT systems are limited by the quantity of parallel training texts. Augmenting the training data with paraphrases generated by pivoting through other languages alleviates this problem, especial ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Untranslated words still constitute a major problem for Statistical Machine Translation (SMT), and current SMT systems are limited by the quantity of parallel training texts. Augmenting the training data with paraphrases generated by pivoting through other languages alleviates this problem, especially for the so-called “low density ” languages. But pivoting requires additional parallel texts. We address this problem by deriving paraphrases monolingually, using distributional semantic similarity measures, thus providing access to larger training resources, such as comparable and unrelated monolingual corpora. We present what is to our knowledge the first successful integration of a collocational approach to untranslated words with an end-to-end, state of the art SMT system demonstrating significant translation improvements in a low-resource setting.
Error driven paraphrase annotation using mechanical turk
, 2010
"... The source text provided to a machine translation system is typically only one of many ways the input sentence could have been expressed, and alternative forms of expression can often produce a better translation. We introduce here error driven paraphrasing of source sentences: instead of paraphrasi ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The source text provided to a machine translation system is typically only one of many ways the input sentence could have been expressed, and alternative forms of expression can often produce a better translation. We introduce here error driven paraphrasing of source sentences: instead of paraphrasing a source sentence exhaustively, we obtain paraphrases for only the parts that are predicted to be problematic for the translation system. We report on an Amazon Mechanical Turk study that explores this idea, and establishes via an oracle evaluation that it holds the potential to substantially improve translation quality. 1
A Survey of Paraphrasing and Textual Entailment Methods
, 2010
"... Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads ( ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.
Are Multiple Reference Translations Necessary? Investigating the Value of Paraphrased Reference Translations in Parameter Optimization
"... Most state-of-the-art statistical machine translation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to maximize a translation quality metric, using heldout test sentences and their ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Most state-of-the-art statistical machine translation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to maximize a translation quality metric, using heldout test sentences and their corresponding reference translations. However, obtaining reference translations is expensive. In our earlier work (Madnani et al., 2007), we introduced a new full-sentence paraphrase technique, based on English-to-English decoding with an MT system, and demonstrated that the resulting paraphrases can be used to cut the number of human reference translations needed in half. In this paper, we take the idea a step further, asking how far it is possible to get with just a single good reference translation for each item in the development set. Our analysis suggests that it is necessary to invest in four or more human translations in order to significantly improve on a single translation augmented by monolingual paraphrases. 1
Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
"... Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its ability to learn a variety of general paraphrastic transformations, including passivization, dative shift, and topicalization. We discuss how our model can be adapted to many text generation tasks by augmenting its feature set, development data, and parameter estimation routine. We illustrate this adaptation by using our paraphrase model for the task of sentence compression and achieve results competitive with state-of-the-art compression systems.
Unsupervised discriminative language model training for machine translation using simulated confusion sets
- in Proc. Coling
, 2010
"... An unsupervised discriminative training procedure is proposed for estimating a language model (LM) for machine translation (MT). An English-to-English synchronous context-free grammar is derived from a baseline MT system to capture translation alternatives: pairs of words, phrases or other sentence ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
An unsupervised discriminative training procedure is proposed for estimating a language model (LM) for machine translation (MT). An English-to-English synchronous context-free grammar is derived from a baseline MT system to capture translation alternatives: pairs of words, phrases or other sentence fragments that potentially compete to be the translation of the same source-language fragment. Using this grammar, a set of impostor sentences is then created for each English sentence to simulate confusions that would arise if the system were to process
PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts
"... We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic simi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic similarity measure based on pivot language N-grams that allows us to approximate adequacy independently of lexical similarity. Human evaluation shows that PEM achieves high correlation with human judgments. 1

