Results 1 -
7 of
7
An extractive supervised two-stage method for sentence compression
"... We present a new method that compresses sentences by removing words. In a first stage, it generates candidate compressions by removing branches from the source sentence’s dependency tree using a Maximum Entropy classifier. In a second stage, it chooses the best among the candidate compressions using ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We present a new method that compresses sentences by removing words. In a first stage, it generates candidate compressions by removing branches from the source sentence’s dependency tree using a Maximum Entropy classifier. In a second stage, it chooses the best among the candidate compressions using a Support Vector Machine Regression model. Experimental results show that our method achieves state-of-the-art performance without requiring any manually written rules. 1
Single-document and Multidocument Summarization Techniques for Email Threads Using Sentence Compression
- In Information Processing and Management: an International Journal, Volume 44, Issue 4
, 2008
"... We present two approaches to email thread summarization: Collective Message Summarization (CMS) applies a multi-document summarization approach, while Individual Message Summarization (IMS) treats the problem as a sequence of single-document summarization tasks. Both approaches are implemented in ou ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We present two approaches to email thread summarization: Collective Message Summarization (CMS) applies a multi-document summarization approach, while Individual Message Summarization (IMS) treats the problem as a sequence of single-document summarization tasks. Both approaches are implemented in our general framework driven by sentence compression. Instead of a purely extractive approach, we employ linguistic and statistical methods to generate multiple compressions, and then select from those candidates to produce a final summary. We demonstrate these ideas on the Enron collection—a very challenging corpus because of the highly technical language. Experimental results point to two findings: that CMS represents a better approach to email thread summarization, and that current sentence compression techniques do not improve summarization performance in this genre. 1
Text relatedness based on a word thesaurus
- Artificial Intelligence Research
, 2010
"... The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments m ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure that captures well both aspects of text relatedness may help in many tasks, such as text retrieval, classification and clustering. In this paper we present a new approach for measuring the semantic relatedness between words based on their implicit semantic links. The approach exploits only a word thesaurus in order to devise implicit semantic links between words. Based on this approach, we introduce Omiotis, a new measure of semantic relatedness between texts which capitalizes on the word-to-word semantic relatedness measure (SR) and extends it to measure the relatedness between texts. We gradually validate our method: we first evaluate the performance of the semantic relatedness measure between individual words, covering word-to-word similarity and relatedness, synonym identification and word analogy; then, we proceed with evaluating the performance of our method in measuring text-to-text semantic relatedness in two tasks, namely sentence-to-sentence similarity and paraphrase recognition. Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based and hybrid approaches. 1.
Supervised Machine Learning for Email Thread Summarization
, 2008
"... Email has become a part of most people’s lives, and the ever increasing amount of messages people receive can lead to email overload. We attempt to mitigate this problem using email thread summarization. Summaries can be used for things other than just replacing an incoming email message. They can b ..."
Abstract
- Add to MetaCart
Email has become a part of most people’s lives, and the ever increasing amount of messages people receive can lead to email overload. We attempt to mitigate this problem using email thread summarization. Summaries can be used for things other than just replacing an incoming email message. They can be used in the business world as a form of corporate memory, or to allow a new team member an easy way to catch up on an ongoing conversation. Email threads are of particular interest to summarization because they contain much structural redundancy due to their conversational nature. Our email thread summarization approach uses machine learning to pick which sentences from the email thread to use in the summary. A machine learning summarizer must be trained using previously labeled data, i.e. manually created summaries. After being trained our summarization algorithm can generate summaries that on average contain over 70 % of the same sentences as human annotators. We show that labeling some key features such as speech acts, meta sentences, and subjectivity can improve performance to over 80 % weighted recall.
A New Sentence Compression Dataset and Its Use in an Abstractive Generate-and-Rank Sentence Compressor
"... Sentence compression has attracted much interest in recent years, but most sentence compressors are extractive, i.e., they only delete words. There is a lack of appropriate datasets to train and evaluate abstractive sentence compressors, i.e., methods that apart from deleting words can also rephrase ..."
Abstract
- Add to MetaCart
Sentence compression has attracted much interest in recent years, but most sentence compressors are extractive, i.e., they only delete words. There is a lack of appropriate datasets to train and evaluate abstractive sentence compressors, i.e., methods that apart from deleting words can also rephrase expressions. We present a new dataset that contains candidate extractive and abstractive compressions of source sentences. The candidate compressions are annotated with human judgements for grammaticality and meaning preservation. We discuss how the dataset was created, and how it can be used in generate-and-rank abstractive sentence compressors. We also report experimental results with a novel abstractive sentence compressor that uses the dataset. 1

