Results 1 - 10
of
115
Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
, 2003
"... We address the text-to-text generation problem of sentence-level paraphrasing --- a phenomenon distinct from and more difficult than word- or phrase-level paraphrasing. Our approach applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of para ..."
Abstract
-
Cited by 147 (2 self)
- Add to MetaCart
We address the text-to-text generation problem of sentence-level paraphrasing --- a phenomenon distinct from and more difficult than word- or phrase-level paraphrasing. Our approach applies multiple-sequence alignment to sentences gathered from unannotated comparable corpora: it learns a set of paraphrasing patterns represented by word lattice pairs and automatically determines how to apply these patterns to rewrite new sentences. The results of our evaluation experiments show that the system derives accurate paraphrases, outperforming baseline systems.
Paraphrasing with Bilingual Parallel Corpora
- In ACL-2005
, 2005
"... Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases ..."
Abstract
-
Cited by 97 (10 self)
- Add to MetaCart
Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases in one language can be identified using a phrase in another language as a pivot. We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. We evaluate our paraphrase extraction and ranking methods using a set of manual word alignments, and contrast the quality with paraphrases extracted from automatic alignments. 1
Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences
- In Proceedings of HLT/NAACL
, 2003
"... We describe a syntax-based algorithm that automatically builds Finite State Automata (word lattices) from semantically equivalent translation sets. These FSAs are good representations of paraphrases. They can be used to extract lexical and syntactic paraphrase pairs and to generate new, unseen sente ..."
Abstract
-
Cited by 96 (4 self)
- Add to MetaCart
We describe a syntax-based algorithm that automatically builds Finite State Automata (word lattices) from semantically equivalent translation sets. These FSAs are good representations of paraphrases. They can be used to extract lexical and syntactic paraphrase pairs and to generate new, unseen sentences that express the same meaning as the sentences in the input sets. Our FSAs can also predict the correctness of alternative semantic renderings, which may be used to evaluate the quality of translations.
Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources
- In Proceedings of the 20th International Conference on Computational Linguistics
, 2004
"... We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy ..."
Abstract
-
Cited by 89 (1 self)
- Add to MetaCart
We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy that pairs initial (presumably summary) sentences from different news stories in the same cluster. We evaluate both datasets using a word alignment algorithm and a metric borrowed from machine translation. Results show that edit distance data is cleaner and more easily-aligned than the heuristic data, with an overall alignment error rate (AER) of 11.58 % on a similarly-extracted test set. On test data extracted by the heuristic strategy, however, performance of the two training sets is similar, with AERs of 13.2% and 14.7 % respectively. Analysis of 100 pairs of sentences from each set reveals that the edit distance data lacks many of the complex lexical and syntactic alternations that characterize monolingual paraphrase. The summary sentences, while less readily alignable, retain more of the non-trivial alternations that are of greatest interest learning paraphrase relationships. 1
Automatic Paraphrase Acquisition from News Articles
, 2002
"... Paraphrases play an important role in the variety and complexity of natural language documents. However they adds to the difficulty of natural language processing. Here we describe a procedure for obtaining paraphrases from news article. A set of paraphrases can be useful for various kinds of applic ..."
Abstract
-
Cited by 59 (3 self)
- Add to MetaCart
Paraphrases play an important role in the variety and complexity of natural language documents. However they adds to the difficulty of natural language processing. Here we describe a procedure for obtaining paraphrases from news article. A set of paraphrases can be useful for various kinds of applications. Articles derived from different newspapers can contain paraphrases if they report the same event of the same day. We exploit this feature by using Named Entity recognition. Our basic approach is based on the assumption that Named Entities are preserved across paraphrases. We applied our method to articles of two domains and obtained notable examples. Although this is our initial attempt to automatically extracting paraphrases from a corpus, the results are promising. 1.
Monolingual machine translation for paraphrase generation
- In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing
, 2004
"... We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is mea ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A monotone phrasal decoder generates contextual replacements. Human evaluation shows that this system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current best-of-breed paraphrasing approaches. 1
Question answering from the web using knowledge annotation and knowledge mining techniques
- In Proc. CIKM
, 2003
"... We present a strategy for answering fact-based natural language questions that is guided by a characterization of realworld user queries. Our approach, implemented in a system called Aranea, extracts answers from the Web using two different techniques: knowledge annotation and knowledge mining. Know ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
We present a strategy for answering fact-based natural language questions that is guided by a characterization of realworld user queries. Our approach, implemented in a system called Aranea, extracts answers from the Web using two different techniques: knowledge annotation and knowledge mining. Knowledge annotation is an approach to answering large classes of frequently occurring questions by utilizing semistructured and structured Web sources. Knowledge mining is a statistical approach that leverages massive amounts of Web data to overcome many natural language processing challenges. We have integrated these two different paradigms into a question answering system capable of providing users with concise answers that directly address their information needs.
Bootstrapping Lexical Choice via Multiple-Sequence Alignment
- IN PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP
, 2002
"... An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically, ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically,
Measuring semantic similarity by latent relational analysis
- In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI05
, 2005
"... (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from co ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
(LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fundamental to many cognitive and linguistic tasks (e.g., analogical reasoning). In the Vector Space Model (VSM) approach to measuring relational similarity, the similarity between two pairs is calculated by the cosine of the angle between the vectors that represent the two pairs. The elements in the vectors are based on the frequencies of manually constructed patterns in a large corpus. LRA extends
Improved statistical machine translation using paraphrases
- In Proceedings of HLT/NAACL-2006
, 2006
"... Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown. We show how techniques from paraphrasing can be used to deal with th ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown. We show how techniques from paraphrasing can be used to deal with these otherwise unknown source language phrases. Our results show that augmenting a stateof-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we increase the coverage of unique test set unigrams from 48 % to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches. 1

