Results 1 - 10
of
15
Web-page summarization using clickthrough data
- In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ( SIGIR’05
, 2005
"... Most previous Web-page summarization methods treat a Web page as plain text. However, such methods fail to uncover the full knowledge associated with a Web page to build a high-quality summary, because the Web contains many hidden relationships that are not used in these methods. Uncovering the inhe ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Most previous Web-page summarization methods treat a Web page as plain text. However, such methods fail to uncover the full knowledge associated with a Web page to build a high-quality summary, because the Web contains many hidden relationships that are not used in these methods. Uncovering the inherent knowledge is important to building good Web-page summarizers. In this paper, we extract the extra knowledge from the clickthrough data of a Web search engine to improve Web-page summarization. We first analyze the feasibility to utilize clickthrough data in text summarization, and then propose two adapted summarization methods that take advantage of the relationships discovered from the clickthrough data. For those pages not covered by the clickthrough data, we put forward a thematic lexicon approach to generate implicit knowledge for them. Our methods are evaluated on a relatively small dataset consisting of manually annotated pages as well as a large dataset that is crawled from the Open Directory Project website. The experimental results indicate that significant improvements can be achieved through our proposed summarizer as compared with summarizers without using the clickthrough data. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous; I.5.4 [Pattern Recognition]: Applications—Text processing
Csurf: A context-driven non-visual web-browser
- In Proc. of the Intl. Conf. on the World Wide Web (WWW ’07
, 2007
"... Web sites are designed for graphical mode of interaction. Sighted users can “cut to the chase ” and quickly identify relevant information in Web pages. On the contrary, individuals with visual disabilities have to use screen-readers to browse the Web. As screen-readers process pages sequentially and ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Web sites are designed for graphical mode of interaction. Sighted users can “cut to the chase ” and quickly identify relevant information in Web pages. On the contrary, individuals with visual disabilities have to use screen-readers to browse the Web. As screen-readers process pages sequentially and read through everything, Web browsing can become strenuous and time-consuming. Although, the use of shortcuts and searching offers some improvements, the problem still remains. In this paper, we address the problem of information overload in non-visual Web access using the notion of context. Our prototype system, CSurf, embodying our approach, provides the usual features of a screen-reader. However, when a user follows a link, CSurf captures the context of the link using a simple topic-boundary detection technique, and uses it to identify relevant information on the next page with the help of a Support Vector Machine, a statistical machine-learning model. Then, CSurf reads the Web page starting from the most relevant section, identified by the model. We conducted a series experiments to evaluate the performance of CSurf against the state-of-the-art screen-reader, JAWS. Our results show that the use of context can potentially save browsing time and substantially improve browsing experience of visually disabled people.
Generating Impact-Based Summaries for Scientific Literature
"... In this paper, we present a study of a novel summarization problem, i.e., summarizing the impact of a scientific publication. Given a paper and its citation context, we study how to extract sentences that can represent the most influential content of the paper. We propose language modeling methods f ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
In this paper, we present a study of a novel summarization problem, i.e., summarizing the impact of a scientific publication. Given a paper and its citation context, we study how to extract sentences that can represent the most influential content of the paper. We propose language modeling methods for solving this problem, and study how to incorporate features such as authority and proximity to accurately estimate the impact language model. Experiment results on a SIGIR publication collection show that the proposed methods are effective for generating impact-based summaries. 1
World Wide Web Site Summarization
- Web Intelligence and Agent Systems: An International Journal
, 2002
"... Summaries of Web sites help Web users get an idea of the site contents without having to spend time browsing the sites. Currently, manually constructed summaries of Web sites by volunteer experts are available, such as the DMOZ Open Directory Project. This research is directed towards automating the ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Summaries of Web sites help Web users get an idea of the site contents without having to spend time browsing the sites. Currently, manually constructed summaries of Web sites by volunteer experts are available, such as the DMOZ Open Directory Project. This research is directed towards automating the Web site summarization task. To achieve this objective, an approach which applies machine learning and natural language processing techniques is developed to summarize a Web site automatically. The information content of the automatically generated summaries is compared, via a formal evaluation process involving human subjects, to DMOZ summaries, home page browsing and time-limited site browsing, for a number of academic and commercial Web sites. Statistical evaluation of the scores of the answers to a list of questions about the sites demonstrates that the automatically generated summaries convey the same information to the reader as DMOZ summaries do, and more information than the two browsing options. 1
Temporal Multi-Page Summarization
- WEB INTELLIGENCE AND AGENT SYSTEMS
, 2006
"... With the increasing popularity of the Web, efficient approaches to the information overload are becoming more necessary. Summarization of web pages aims at detecting the most important contents from pages so that a user can obtain a compact version of a web document or a group of pages. Traditional ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
With the increasing popularity of the Web, efficient approaches to the information overload are becoming more necessary. Summarization of web pages aims at detecting the most important contents from pages so that a user can obtain a compact version of a web document or a group of pages. Traditionally, summaries are constructed on static snapshots of web pages. However, web pages are dynamic objects that can change their contents anytime. In this paper, we discuss the research on temporal multi-document summarization in the Web. We analyze the temporal contents of topically related collections of web pages monitored for certain time intervals. The contents derived from the temporal versions of web documents are summarized to provide information on hot topics and popular events in the collection. We propose two summarization methods that use changing and static contents of web pages downloaded at defined time intervals. The first uses a sliding window mechanism and the second is based on analyzing the time series of the document frequencies of terms. Additionally, we introduce a novel sentence selection algorithm designed for time-dependent scenarios such as temporal summarization.
A Comparison of Keyword- and Keyterm-Based Methods for Automatic Web Site Summarization
- In Technical Report WS-04-01, Papers from the AAAI’04 Workshop on Adaptive Text Extraction and Mining
, 2004
"... ..."
A Comparative Study on Key Phrase Extraction Methods in Automatic Web Site Summarization
- Journal of Digital Information Management, Special Issue on Web Information Retrieval
, 2007
"... Web Site Summarization is the process of automatically generating a concise and informative summary for a given Web site. It has gained more and more attention in recent years as effective summarization could lead to enhanced Web information retrieval systems such as searching for Web sites. Extract ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Web Site Summarization is the process of automatically generating a concise and informative summary for a given Web site. It has gained more and more attention in recent years as effective summarization could lead to enhanced Web information retrieval systems such as searching for Web sites. Extraction-based approaches to Web site summarization rely on the extraction of the most significant sentences from the target Web site based on the density of a list of key phrases that best describe the entire Web site. In this work, we benchmark five alternative key phrase extraction methods, TFIDF, KEA, Keyword, Keyterm, and Mixture, in an automatic Web site summarization framework we previously developed. We investigate the performance of these underlying methods via a formal user study and demonstrate that Keyterm is the best choice for key phrase extraction while Mixture should be used to obtain key sentences. We also discuss why one method performs better than another and what could be done to further improve the summarization system. 1
A Hierarchical Model of Web Summaries
"... We investigate the relevance of hierarchical topic models to represent the content of Web gists. We focus our attention on DMOZ, a popular Web directory, and propose two algorithms to infer such a model from its manually-curated hierarchy of categories. Our first approach, based on information-theor ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We investigate the relevance of hierarchical topic models to represent the content of Web gists. We focus our attention on DMOZ, a popular Web directory, and propose two algorithms to infer such a model from its manually-curated hierarchy of categories. Our first approach, based on information-theoretic grounds, uses an algorithm similar to recursive feature selection. Our second approach is fully Bayesian and derived from the more general model, hierarchical LDA. We evaluate the performance of both models against a flat 1-gram baseline and show improvements in terms of perplexity over held-out data. 1
,hnEf d Dig:itaJ
"... In f o nna t i on l,Ianagenen t ABSTRACT: Web Site Summarization is the process af automatically generating a concise and informative summary for a given Web site. It has gained more and more attention in recent Years as effective summarization could lead to enhanced Web information retrieval system ..."
Abstract
- Add to MetaCart
In f o nna t i on l,Ianagenen t ABSTRACT: Web Site Summarization is the process af automatically generating a concise and informative summary for a given Web site. It has gained more and more attention in recent Years as effective summarization could lead to enhanced Web information retrieval systems such as searching for Web sites. Extraction-based approaches to Web sife summarization rely on the extraction of the most significant sentences from the target Web site based on the density of a list of key phrases that best describe the entire Web site. In this wark, we benchmark five alternative key phrase extraction methods, TFIDF, KEA, Keyword, Keyterm, and Mixture, in an automatic Web site summarization framework we previously developed. We investigate the performance af these underlying methods via a farmal user study and demonstrate that Keyterm is the best chaice for key phrase extraction while Mixture shauld be used to obtain key sentences. We also discuss why one method performs better than another and what could be done to further improve the summarizatian system. Categories and Subj ect Descr iptors H.3.1 [Contart Analydsand I ndexing]; Linguistic processing |.2.7
Mind Maps and Information Retrieval
"... Abstract—Mind maps are used by millions of people. In this paper we present how information retrieval on mind maps could be used to enhance expert search, document summarization, keyword based search engines, document recommender systems and determining word relatedness. For instance, words in a min ..."
Abstract
- Add to MetaCart
Abstract—Mind maps are used by millions of people. In this paper we present how information retrieval on mind maps could be used to enhance expert search, document summarization, keyword based search engines, document recommender systems and determining word relatedness. For instance, words in a mind map could be used for creating a skill profile of the mind maps’ author and hence enhance expert search. This paper is a research-in-progress paper which means no research results are presented but only ideas. Keywords-data mining, information retrieval, mind maps, expert seach, document clustering, document classification I.

