Unsupervised Learning of Paraphrases (2007)
| Venue: | In Research in Computer Science. National Polytechnic Institute, Mexico. ISSN |
| Citations: | 1 - 1 self |
BibTeX
@INPROCEEDINGS{Cordeiro07unsupervisedlearning,
author = {João Cordeiro and Gaël Dias and Pavel Brazdil},
title = {Unsupervised Learning of Paraphrases},
booktitle = {In Research in Computer Science. National Polytechnic Institute, Mexico. ISSN},
year = {2007},
pages = {1870--4069}
}
OpenURL
Abstract
Abstract. Paraphrasing constitutes a corner stone in many Natural Language Processing fields like monolingual text-to-text generation and automatic text summarization. Indeed, aligned monolingual corpora are likely to boost the learning process of text-to-text generation models. A Paraphrase learning strategy can be defined as a two-step process: (1) identifying and extracting related sentence pairs from on-line comparable corpora (for example sentences that convey the same information but yet are written in different forms) and (2) applying learning methodologies over the extracted material to induce text-to-text rewriting rules. In this paper, we compare different lexical distance metrics for the identification of related sentences, i.e. paraphrase candidates. In particular, we discuss how different metrics lead to the identification of different types of paraphrases. Finally, the comparisons and discussions give relevant insights towards automatic generation of paraphrase corpora. 1







