Results 1 -
9 of
9
Modeling local coherence: An entity-based approach
- In Proceedings of ACL 2005
, 2005
"... This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the ..."
Abstract
-
Cited by 70 (5 self)
- Add to MetaCart
This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the proposed discourse representation supports the effective learning of a ranking function. Our experiments demonstrate that the induced model achieves significantly higher accuracy than a state-of-the-art coherence model. 1
Towards robust context-sensitive sentence alignment for monolingual corpora
- In Proc. EACL
, 2006
"... Aligning sentences belonging to comparable monolingual corpora has been suggested as a first step towards training text rewriting algorithms, for tasks such as summarization or paraphrasing. We present here a new monolingual sentence alignment algorithm, combining a sentence-based TF*IDF score, turn ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Aligning sentences belonging to comparable monolingual corpora has been suggested as a first step towards training text rewriting algorithms, for tasks such as summarization or paraphrasing. We present here a new monolingual sentence alignment algorithm, combining a sentence-based TF*IDF score, turned into a probability distribution using logistic regression, with a global alignment dynamic programming algorithm. Our approach provides a simpler and more robust solution achieving a substantial improvement in accuracy over existing systems. 1
The Distributional Similarity of Sub-Parses
, 2005
"... This work explores computing distributional similarity between sub-parses, i.e., fragments of a parse tree, as an extension to general lexical distributional similarity techniques. In the same way that lexical distributional similarity is used to estimate lexical semantic similarity, we propos ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This work explores computing distributional similarity between sub-parses, i.e., fragments of a parse tree, as an extension to general lexical distributional similarity techniques. In the same way that lexical distributional similarity is used to estimate lexical semantic similarity, we propose using distributional similarity between subparses to estimate the semantic similarity of phrases. Such a technique will allow us to identify paraphrases where the component words are not semantically similar. We demonstrate the potential of the method by applying it to a small number of examples and showing that the paraphrases are more similar than the non-paraphrases.
Constructing Corpora for the Development and Evaluation of Paraphrase Systems
"... Automatic paraphrasing is an important component in many natural language processing tasks. In this paper we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word-alignments and show that it yields high inter-annotator agreement. As Kappa is sui ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Automatic paraphrasing is an important component in many natural language processing tasks. In this paper we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word-alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically (e.g., by measuring precision, recall and F1) and also in developing linguistically rich paraphrase models based on syntactic structure. 1.
Clustering and Matching Headlines for Automatic Paraphrase Acquisition
"... For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article headlines are a rich source of paraphrases; they tend to describe the same event in various different ways, and can easily be obtained f ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article headlines are a rich source of paraphrases; they tend to describe the same event in various different ways, and can easily be obtained from the web. We compare two methods of aligning headlines to construct such an aligned corpus of paraphrases, one based on clustering, and the other on pairwise similarity-based matching. We show that the latter performs best on the task of aligning paraphrastic headlines. 1
User-Sensitive Text Summarization: Application to the Medical Domain
, 2006
"... In this thesis, we present a user-sensitive approach to text summarization. One domain which would highly benefit from tailoring summaries to both individual and class-based user characteristics is the medical domain, where physicians and patients access similar information, each with their own need ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this thesis, we present a user-sensitive approach to text summarization. One domain which would highly benefit from tailoring summaries to both individual and class-based user characteristics is the medical domain, where physicians and patients access similar information, each with their own needs and abilities. Our framework is a medical digital library for physicians and patients. We describe a summarizer, which generates summaries of findings in an input set of clinical studies. When a physician is treating a specific patient, he’s looking for information relevant to the patient’s history and problems. The summarizer takes the user’s interests into account and presents only the findings pertaining to a user model, as approximated by an existing patient record. The same synthesis of information can also be of interest to the patient. The summarizer predicts which medical terms used in a text will be too technical for patients, and augments it with appropriate definitions when necessary. We adopt a generation-like architecture for our summarizer. However, be-cause our input is textual and not semantic, new challenges arise. We operate over
Putting it Simply: a Context-Aware Approach to Lexical Simplification
"... We present a method for lexical simplification. Simplification rules are learned from a comparable corpus, and the rules are applied in a context-aware fashion to input sentences. Our method is unsupervised. Furthermore, it does not require any alignment or correspondence among the complex and simpl ..."
Abstract
- Add to MetaCart
We present a method for lexical simplification. Simplification rules are learned from a comparable corpus, and the rules are applied in a context-aware fashion to input sentences. Our method is unsupervised. Furthermore, it does not require any alignment or correspondence among the complex and simple corpora. We evaluate the simplification according to three criteria: preservation of grammaticality, preservation of meaning, and degree of simplification. Results show that our method outperforms an established simplification baseline for both meaning preservation and simplification, while maintaining a high level of grammaticality. 1
Failing to Find Paraphrases Using PNrule
, 2007
"... In this paper, we attempt to detect clause-level paraphrases in cases where they are extremely rare, using a combination of lexical and syntactic measures along with a machine learning algorithm designed specifically for detecting rare classes: PNrule. When our method fails, we examine the probable ..."
Abstract
- Add to MetaCart
In this paper, we attempt to detect clause-level paraphrases in cases where they are extremely rare, using a combination of lexical and syntactic measures along with a machine learning algorithm designed specifically for detecting rare classes: PNrule. When our method fails, we examine the probable causes of this failure, and what they mean for future work. 1
A Monolingual Tree-based Translation Model for Sentence Simplification ∗
"... In this paper, we consider sentence simplification as a special form of translation with the complex sentence as the source and the simple sentence as the target. We propose a Tree-based Simplification Model (TSM), which, to our knowledge, is the first statistical simplification model covering split ..."
Abstract
- Add to MetaCart
In this paper, we consider sentence simplification as a special form of translation with the complex sentence as the source and the simple sentence as the target. We propose a Tree-based Simplification Model (TSM), which, to our knowledge, is the first statistical simplification model covering splitting, dropping, reordering and substitution integrally. We also describe an efficient method to train our model with a large-scale parallel dataset obtained from the Wikipedia and Simple Wikipedia. The evaluation shows that our model achieves better readability scores than a set of baseline systems. 1

