Results 1 - 10
of
16
Generating Impact-Based Summaries for Scientific Literature
"... In this paper, we present a study of a novel summarization problem, i.e., summarizing the impact of a scientific publication. Given a paper and its citation context, we study how to extract sentences that can represent the most influential content of the paper. We propose language modeling methods f ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
In this paper, we present a study of a novel summarization problem, i.e., summarizing the impact of a scientific publication. Given a paper and its citation context, we study how to extract sentences that can represent the most influential content of the paper. We propose language modeling methods for solving this problem, and study how to incorporate features such as authority and proximity to accurately estimate the impact language model. Experiment results on a SIGIR publication collection show that the proposed methods are effective for generating impact-based summaries. 1
Enhancing Single-document Summarization by Combining RankNet and Third-party Sources
"... We present a new approach to automatic summarization based on neural nets, called NetSum. We extract a set of features from each sentence that helps identify its importance in the document. We apply novel features based on news search query logs and Wikipedia entities. Using the RankNet learning alg ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We present a new approach to automatic summarization based on neural nets, called NetSum. We extract a set of features from each sentence that helps identify its importance in the document. We apply novel features based on news search query logs and Wikipedia entities. Using the RankNet learning algorithm, we train a pair-based sentence ranker to score every sentence in the document and identify the most important sentences. We apply our system to documents gathered from CNN.com, where each document includes highlights and an article. Our system significantly outperforms the standard baseline in the ROUGE-1 measure on over 70 % of our document set. 1
Comments-oriented document summarization: Understanding documents with readers’ feedback
- In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. SIGIR ’08. ACM
, 2008
"... Comments left by readers on Web documents contain valuable information that can be utilized in different information retrieval tasks including document search, visualization, and summarization. In this paper, we study the problem of comments-oriented document summarization and aim to summarize a Web ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Comments left by readers on Web documents contain valuable information that can be utilized in different information retrieval tasks including document search, visualization, and summarization. In this paper, we study the problem of comments-oriented document summarization and aim to summarize a Web document (e.g., a blog post) by considering not only its content, but also the comments left by its readers. We identify three relations (namely, topic, quotation, andmention) by which comments can be linked to one another, and model the relations in three graphs. The importance of each comment is then scored by: (i) graph-based method, where the three graphs are merged into a multirelation graph; (ii) tensor-based method, where the three graphs are used to construct a 3rd-order tensor. To generate a comments-oriented summary, we extract sentences from the given Web document using either feature-biased approach or uniform-document approach. The former scores sentences to bias keywords derived from comments; while the latter scores sentences uniformly with comments. In our experiments using a set of blog posts with manually labeled sentences, our proposed summarization methods utilizing comments showed significant improvement over those not using comments. The methods using feature-biased sentence extraction approach were observed to outperform that using uniform-document approach.
Comment-Oriented Blog Summarization by Sentence Extraction
- In Proceedings of 6 th ACM CIKM
, 2007
"... Much existing research on blogs focused on posts only, ignoring their comments. Our user study conducted on summarizing blog posts, however, showed that reading comments does change one’s understanding about blog posts. In this research, we aim to extract representative sentences from ablogpostthatb ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Much existing research on blogs focused on posts only, ignoring their comments. Our user study conducted on summarizing blog posts, however, showed that reading comments does change one’s understanding about blog posts. In this research, we aim to extract representative sentences from ablogpostthatbestrepresentthetopicsdiscussedamong its comments. The proposed solution first derives representative words from comments and then selects sentences containing representative words. The representativeness of words is measured using ReQuT (i.e., Reader, Quotation, and Topic). Evaluated on human labeled sentences, ReQuT together with summation-based sentence selection showed promising results.
Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion
, 2008
"... For a given query raised by a specific user, the Query Suggestion technique aims to recommend relevant queries which potentially suit the information needs of that user. Due to the complexity of the Web structure and the ambiguity of users ’ inputs, most of the suggestion algorithms suffer from the ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
For a given query raised by a specific user, the Query Suggestion technique aims to recommend relevant queries which potentially suit the information needs of that user. Due to the complexity of the Web structure and the ambiguity of users ’ inputs, most of the suggestion algorithms suffer from the problem of poor recommendation accuracy. In this paper, aiming at providing semantically relevant queries for users, we develop a novel, effective and efficient two-level query suggestion model by mining clickthrough data, in the form of two bipartite graphs (user-query and query-URL bipartite graphs) extracted from the clickthrough data. Based on this, we first propose a joint matrix factorization method which utilizes two bipartite graphs to learn the low-rank query latent feature space, and then build a query similarity graph based on the features. After that, we design an online ranking algorithm to propagate similarities on the query similarity graph, and finally recommend latent semantically relevant queries to users. Experimental analysis on the clickthrough data of a commercial search engine shows the effectiveness and the efficiency of our method.
Generating Succinct Titles for Web URLs
"... How can a search engine automatically provide the best and most appropriate title for a result URL (link-title) so that users will be persuaded to click on the URL? We consider the problem of automatically generating link-titles for URLs and propose a general statistical framework for solving this p ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
How can a search engine automatically provide the best and most appropriate title for a result URL (link-title) so that users will be persuaded to click on the URL? We consider the problem of automatically generating link-titles for URLs and propose a general statistical framework for solving this problem. The framework is based on using information from a diverse collection of sources, each of which can be thought of as contributing one or more candidate link-titles for the URL. It can also incorporate the context in which the link-title will be used, along with constraints on its length. Our framework is applicable to several scenarios: obtaining succinct titles for displaying quicklinks, obtaining titles for URLs that lack a good title, constructing succinct sitemaps, etc. Extensive experiments show that our method is very effective, producing results that are at least 20 % better than non-trivial baselines.
A Hierarchical Model of Web Summaries
"... We investigate the relevance of hierarchical topic models to represent the content of Web gists. We focus our attention on DMOZ, a popular Web directory, and propose two algorithms to infer such a model from its manually-curated hierarchy of categories. Our first approach, based on information-theor ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We investigate the relevance of hierarchical topic models to represent the content of Web gists. We focus our attention on DMOZ, a popular Web directory, and propose two algorithms to infer such a model from its manually-curated hierarchy of categories. Our first approach, based on information-theoretic grounds, uses an algorithm similar to recursive feature selection. Our second approach is fully Bayesian and derived from the more general model, hierarchical LDA. We evaluate the performance of both models against a flat 1-gram baseline and show improvements in terms of perplexity over held-out data. 1
Contextual ranking of keywords using click data
- In Proc. of the 25th Intl. Conf. on Data Engineering (ICDE
, 2009
"... Abstract — The problem of automatically extracting the most interesting and relevant keyword phrases in a document has been studied extensively as it is crucial for a number of applications. These applications include contextual advertising, automatic text summarization, and user-centric entity dete ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — The problem of automatically extracting the most interesting and relevant keyword phrases in a document has been studied extensively as it is crucial for a number of applications. These applications include contextual advertising, automatic text summarization, and user-centric entity detection systems. All these applications can potentially benefit from a successful solution as it enables computational efficiency (by decreasing the input size), noise reduction, or overall improved user satisfaction. In this paper, we study this problem and focus on improving the overall quality of user-centric entity detection systems. First, we review our concept extraction technique, which relies on search engine query logs. We then define a new feature space to represent the interestingness of concepts, and describe a new approach to estimate their relevancy for a given context. We utilize click through data obtained from a large scale user-centric entity detection system – Contextual Shortcuts – to train a model to rank the extracted concepts, and evaluate the resulting model extensively again based on their click through data. Our results show that the learned model outperforms the baseline model, which employs similar features but whose weights are tuned carefully based on empirical observations, and reduces the error rate from 30.22 % to 18.66%. I.
CLUSTERING AND FEATURE SPECIFIC SENTENCE EXTRACTION BASED SUMMARIZATION OF MULTIPLE DOCUMENTS
"... This paper presents an approach to cluster multiple documents by using document clustering approach and to produce cluster wise summary based on feature profile oriented sentence extraction strategy. Related documents are grouped into same cluster using document clustering algorithm. Feature profile ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents an approach to cluster multiple documents by using document clustering approach and to produce cluster wise summary based on feature profile oriented sentence extraction strategy. Related documents are grouped into same cluster using document clustering algorithm. Feature profile is generated by considering word weight, sentence position, sentence length, sentence centrality, proper nouns in the sentence and numerical data in the sentence. Based on the feature profile sentence score is calculated for each sentence. According to different compression rates sentences are extracted from each cluster and ranked in order of importance based on sentence score. Extracted sentences are arranged in chronological order as in original documents and from this, cluster wise summary can be generated. Experimental results show that the proposed clustering algorithm is efficient and feature profile is used to extract most important sentences from multiple documents.
WWW 2009 MADRID! Track: Data Mining / Session: Text Mining Enhancing Diversity, Coverage and Balance for Summarization through Structure Learning
"... Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, ..."
Abstract
- Add to MetaCart
Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, and balance issues which to a large extent determine the quality of summaries. In this paper, we consider extract-based summarization emphasizing the following three requirements: 1) diversity in summarization, which seeks to reduce redundancy among sentences in the summary; 2) sufficient coverage, which focuses on avoiding the loss of the document’s main information when generating the summary; and 3) balance, which demands that different aspects of the document need to have about the same relative importance in the summary. We formulate the extract-based summarization problem as learning a mapping from a set of sentences of a given document to a subset of the sentences that satisfies the above three requirements. The mapping is learned by incorporating several constraints in a structure learning framework, and we explore the graph structure of the output variables and employ structural SVM for solving the resulted optimization problem. Experiments on the DUC2001 data sets demonstrate significant performance improvements in terms of F1 and ROUGE metrics.

