Results 11 - 20
of
115
Extracting Structural Paraphrases from Aligned Monolingual Corpora
, 2003
"... We present an approach for automatically learning paraphrases from aligned monolingual corpora. Our algorithm works by generalizing the syntactic paths between corresponding anchors in aligned sentence pairs. Compared to previous work, structural paraphrases generated by our algorithm tend to be muc ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
We present an approach for automatically learning paraphrases from aligned monolingual corpora. Our algorithm works by generalizing the syntactic paths between corresponding anchors in aligned sentence pairs. Compared to previous work, structural paraphrases generated by our algorithm tend to be much longer on average, and are capable of capturing long-distance dependencies. In addition to a standalone evaluation of our paraphrases, we also describe a question answering application currently under development that could immensely benefit from automatically-learned structural paraphrases.
Acquiring Correct Knowledge for Natural Language Generation
, 2003
"... Natural language generation (NLG) systems are computer software systems that pro- duce texts in English and other human languages, often from non-linguistic input data. ..."
Abstract
-
Cited by 34 (18 self)
- Add to MetaCart
Natural language generation (NLG) systems are computer software systems that pro- duce texts in English and other human languages, often from non-linguistic input data.
Should Corpora Texts Be Gold Standards for NLG?
- In Proceedings of the Second International Conference on Natural Language Generation
, 2002
"... There is increasing interest in using corpora in NLG, perhaps because of the success of corpus-based techniques in other areas of speech and language processing. Many uses of corpora in NLG implicitly assume that the human-authored texts in a corpora are a 'gold standard', in other words that ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
There is increasing interest in using corpora in NLG, perhaps because of the success of corpus-based techniques in other areas of speech and language processing. Many uses of corpora in NLG implicitly assume that the human-authored texts in a corpora are a 'gold standard', in other words that the NLG system should produce texts similar to the corpora texts. However, our experience with several corpora raises questions about this assumption, because human au- thors make mistakes and because different people write differently.
Syntactic Constraints on Paraphrases Extracted from Parallel Corpora
"... ccb cs jhu edu We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alo ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
ccb cs jhu edu We improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type. This is achieved by parsing the English side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alongside bilingual phrase pairs. In order to retain broad coverage of non-constituent phrases, complex syntactic labels are introduced. A manual evaluation indicates a 19% absolute improvement in paraphrase quality over the baseline method. 1
Empirical lower bounds on the complexity of translational equivalence
- In Proceedings of ACL 2006
, 2006
"... This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helpe ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helped to improve statistical translation models, including finitestate phrase-based models, tree-to-string models, and tree-to-tree models. The paper also presents evidence that inversion transduction grammars cannot generate some translational equivalence relations, even in relatively simple real bitexts in syntactically similar languages with rigid word order. Instructions for replicating our experiments are at
Support Vector Machines for Paraphrase Identification and Corpus Construction
- In Proceedings of The Third International Workshop on Paraphrasing (IWP2005), Jeju, Republic of Korea
, 2005
"... The lack of readily-available large corpora of aligned monolingual sentence pairs is a major obstacle to the development of Statistical Machine Translation-based paraphrase models. In this paper, we describe the use of annotated datasets and Support Vector Machines to induce larger monolingual parap ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
The lack of readily-available large corpora of aligned monolingual sentence pairs is a major obstacle to the development of Statistical Machine Translation-based paraphrase models. In this paper, we describe the use of annotated datasets and Support Vector Machines to induce larger monolingual paraphrase corpora from a comparable corpus of news clusters found on the World Wide Web. Features include: morphological variants; WordNet synonyms and hypernyms; loglikelihood-based word pairings dynamically obtained from baseline sentence alignments; and formal string features such as word-based edit distance. Use of this technique dramatically reduces the Alignment Error Rate of the extracted corpora over heuristic methods based on position of the sentences in the text. 1
ISP: Learning inferential selectional preferences
- In Proceedings of NAACL 2007
, 2007
"... Semantic inference is a key component for advanced natural language understanding. However, existing collections of automatically acquired inference rules have shown disappointing results when used in applications such as textual entailment and question answering. This paper presents ISP, a collecti ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Semantic inference is a key component for advanced natural language understanding. However, existing collections of automatically acquired inference rules have shown disappointing results when used in applications such as textual entailment and question answering. This paper presents ISP, a collection of methods for automatically learning admissible argument values to which an inference rule can be applied, which we call inferential selectional preferences, and methods for filtering out incorrect inferences. We evaluate ISP and present empirical evidence of its effectiveness. 1
A Probabilistic Classification Approach for Lexical Textual Entailment
, 2005
"... The textual entailment task -- determining if a given text entails a given hypothesis -- provides an abstraction of applied semantic inference. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
The textual entailment task -- determining if a given text entails a given hypothesis -- provides an abstraction of applied semantic inference. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the lexical concepts present in the hypothesis are entailed from the text. This problem is recast as one of text categorization in which the classes are the vocabulary words. We make novel use of Nave Bayes to model the problem in an entirely unsupervised fashion. Empirical tests suggest that the method is effective and compares favorably with state-of-the-art heuristic scoring approaches.
Learning Document-Level Semantic Properties from Free-text Annotations
"... This paper demonstrates a new method for leveraging unstructured annotations to infer semantic document properties. We consider the domain of product reviews, which are often annotated by their authors with free-text keyphrases, such as “a real bargain ” or “good value. ” We leverage these unstructu ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
This paper demonstrates a new method for leveraging unstructured annotations to infer semantic document properties. We consider the domain of product reviews, which are often annotated by their authors with free-text keyphrases, such as “a real bargain ” or “good value. ” We leverage these unstructured annotations by clustering them into semantic properties, and then tying the induced clusters to hidden topics in the document text. This allows us to predict relevant properties of unannotated documents. Our approach is implemented in a hierarchical Bayesian model with joint inference, which increases the robustness of the keyphrase clustering and encourages document topics to correlate with semantically meaningful properties. We perform several evaluations of our model, and find that it substantially outperforms alternative approaches. 1
Text Simplification for Reading Assistance: A Project Note
- In Proceedings of the 2nd International Workshop on Paraphrasing: Paraphrase Acquisition and Applications (IWP
, 2003
"... This paper describes our ongoing research project on text simplification for congenitally deaf people. Text simplification we are aiming at is the task of offering a deaf reader a syntactic and lexical paraphrase of a given text for assisting her/him to understand what it means. ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
This paper describes our ongoing research project on text simplification for congenitally deaf people. Text simplification we are aiming at is the task of offering a deaf reader a syntactic and lexical paraphrase of a given text for assisting her/him to understand what it means.

