Results 1 - 10
of
26
Paraphrasing with Bilingual Parallel Corpora
- In ACL-2005
, 2005
"... Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases ..."
Abstract
-
Cited by 97 (10 self)
- Add to MetaCart
Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases in one language can be identified using a phrase in another language as a pivot. We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. We evaluate our paraphrase extraction and ranking methods using a set of manual word alignments, and contrast the quality with paraphrases extracted from automatic alignments. 1
Monolingual machine translation for paraphrase generation
- In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing
, 2004
"... We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is mea ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A monotone phrasal decoder generates contextual replacements. Human evaluation shows that this system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current best-of-breed paraphrasing approaches. 1
Question answering from the web using knowledge annotation and knowledge mining techniques
- In Proc. CIKM
, 2003
"... We present a strategy for answering fact-based natural language questions that is guided by a characterization of realworld user queries. Our approach, implemented in a system called Aranea, extracts answers from the Web using two different techniques: knowledge annotation and knowledge mining. Know ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
We present a strategy for answering fact-based natural language questions that is guided by a characterization of realworld user queries. Our approach, implemented in a system called Aranea, extracts answers from the Web using two different techniques: knowledge annotation and knowledge mining. Knowledge annotation is an approach to answering large classes of frequently occurring questions by utilizing semistructured and structured Web sources. Knowledge mining is a statistical approach that leverages massive amounts of Web data to overcome many natural language processing challenges. We have integrated these two different paradigms into a question answering system capable of providing users with concise answers that directly address their information needs.
Syntactic Constraints on Paraphrases Extracted from Parallel Corpora
"... ccb cs jhu edu We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alo ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
ccb cs jhu edu We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alongside bilingual phrase pairs. In order to retain broad coverage of non-constituent phrases, complex syntactic labels are introduced. A manual evaluation indicates a 19% absolute improvement in paraphrase quality over the baseline method. 1
Using Paraphrases for Parameter Tuning in Statistical Machine Translation
"... Most state-of-the-art statistical machine translation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to maximize a translation quality metric, using held-out test sentences and their ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
Most state-of-the-art statistical machine translation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to maximize a translation quality metric, using held-out test sentences and their corresponding reference translations. However, obtaining reference translations is expensive. In this paper, we introduce a new full-sentence paraphrase technique, based on English-to-English decoding with an MT system, and we demonstrate that the resulting paraphrases can be used to drastically reduce the number of human reference translations needed for parameter tuning, without a significant decrease in translation quality. 1
Learning Entailment Rules for Unary Templates
"... Most work on unsupervised entailment rule acquisition focused on rules between templates with two variables, ignoring unary rules- entailment rules between templates with a single variable. In this paper we investigate two approaches for unsupervised learning of such rules and compare the proposed m ..."
Abstract
-
Cited by 16 (10 self)
- Add to MetaCart
Most work on unsupervised entailment rule acquisition focused on rules between templates with two variables, ignoring unary rules- entailment rules between templates with a single variable. In this paper we investigate two approaches for unsupervised learning of such rules and compare the proposed methods with a binary rule learning method. The results show that the learned unary rule-sets outperform the binary rule-set. In addition, a novel directional similarity measure for learning entailment, termed Balanced-Inclusion, is the best performing measure. 1
Citances: Citation Sentences for Semantic Analysis of Bioscience Text
- In Proceedings of the SIGIR’04 workshop on Search and Discovery in Bioinformatics
, 2004
"... We propose the use of the text of the sentences surrounding citations as an important tool for semantic interpretation of bioscience text. We hypothesize several di#erent uses of citation sentences (which we call citances), including the creation of training and testing data for semantic analysis (e ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We propose the use of the text of the sentences surrounding citations as an important tool for semantic interpretation of bioscience text. We hypothesize several di#erent uses of citation sentences (which we call citances), including the creation of training and testing data for semantic analysis (especially for entity and relation recognition), synonym set creation, database curation, document summarization, and information retrieval generally. We illustrate some of these ideas, showing that citations to one document in particular align well with what a hand-built curator extracted. We also show preliminary results on the problem of normalizing the di#erent ways that the same concepts are expressed within a set of citances, using and improving on existing techniques in automatic paraphrase generation.
2008b. Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora
- In Proceedings of ACL-08:HLT
"... Paraphrase patterns are useful in paraphrase recognition and generation. In this paper, we present a pivot approach for extracting paraphrase patterns from bilingual parallel corpora, whereby the English paraphrase patterns are extracted using the sentences in a foreign language as pivots. We propos ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Paraphrase patterns are useful in paraphrase recognition and generation. In this paper, we present a pivot approach for extracting paraphrase patterns from bilingual parallel corpora, whereby the English paraphrase patterns are extracted using the sentences in a foreign language as pivots. We propose a loglinear model to compute the paraphrase likelihood of two patterns and exploit feature functions based on maximum likelihood estimation (MLE) and lexical weighting (LW). Using the presented method, we extract over 1,000,000 pairs of paraphrase patterns from 2M bilingual sentence pairs, the precision of which exceeds 67%. The evaluation results show that: (1) The pivot approach is effective in extracting paraphrase patterns, which significantly outperforms the conventional method DIRT. Especially, the log-linear model with the proposed feature functions achieves high performance. (2) The coverage of the extracted paraphrase patterns is high, which is above 84%. (3) The extracted paraphrase patterns can be classified into 5 types, which are useful in various applications. 1
Evaluating evaluation methods for generation in the presence of variation
- in Proceedings of CICLing 2005
, 2005
"... Abstract. Recent years have seen increasing interest in automatic metrics for the evaluation of generation systems. When a system can generate syntactic variation, automatic evaluation becomes more difficult. In this paper, we compare the performance of several automatic evaluation metrics using a c ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract. Recent years have seen increasing interest in automatic metrics for the evaluation of generation systems. When a system can generate syntactic variation, automatic evaluation becomes more difficult. In this paper, we compare the performance of several automatic evaluation metrics using a corpus of automatically generated paraphrases. We show that these evaluation metrics can at least partially measure adequacy (similarity in meaning), but are not good measures of fluency (syntactic correctness). We make several proposals for improving the evaluation of generation systems that produce variation. 1
Hitting the right paraphrases in good time
- In Proc. NAACL’2010
, 2010
"... We present a random-walk-based approach to learning paraphrases from bilingual parallel corpora. The corpora are represented as a graph in which a node corresponds to a phrase, and an edge exists between two nodes if their corresponding phrases are aligned in a phrase table. We sample random walks t ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We present a random-walk-based approach to learning paraphrases from bilingual parallel corpora. The corpora are represented as a graph in which a node corresponds to a phrase, and an edge exists between two nodes if their corresponding phrases are aligned in a phrase table. We sample random walks to compute the average number of steps it takes to reach a ranking of paraphrases with better ones being “closer ” to a phrase of interest. This approach allows “feature ” nodes that represent domain knowledge to be built into the graph, and incorporates truncation techniques to prevent the graph from growing too large for efficiency. Current approaches, by contrast, implicitly presuppose the graph to be bipartite, are limited to finding paraphrases that are of length two away from a phrase, and do not generally permit easy incorporation of domain knowledge. Manual evaluation of generated output shows that our approach outperforms the state-of-the-art system of Callison-Burch (2008). 1

