• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering (2005)

by Paolo Ferragina, Antonio Gulli
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 33
Next 10 →

ABSTRACT Term Ranking for Clustering Web Search Results

by Fatih Gelgi
"... searches poses unique challenges. First, we show that one cannot readily import the frequency based feature ranking to cluster the web search results as in the text document clustering. Next, we present TermRank, a variation of the PageRank algorithm based on a relational graph representation of the ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
searches poses unique challenges. First, we show that one cannot readily import the frequency based feature ranking to cluster the web search results as in the text document clustering. Next, we present TermRank, a variation of the PageRank algorithm based on a relational graph representation of the content of web document collections. TermRank achieves desirable ranking of discriminative terms higher than the ambiguous terms, and ranking ambiguous terms higher than common terms. We experiment with two clustering algorithms to demonstrate the efficacy of TermRank. TermRank is shown to perform substantially better than frequency based classical methods.

Mobile Information Retrieval with Search Results Clustering: Prototypes and Evaluations

by Claudio Carpineto, Stefano Mizzaro, Giovanni Romano, Matteo Snidero - Journal of American Society for Information Science and Technology (JASIST , 2009
"... Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we investigate the application of search results cl ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we investigate the application of search results clustering, used with some success for desktop computer searches, to the mobile scenario. Building on CREDO (Conceptual Reorganization of Documents), a Web clustering engine based on concept lattices, we present its mobile versions Credino and SmartCREDO, for PDAs and cell phones, respectively. Next, we evaluate the retrieval performance of the three prototype systems. We measure the effectiveness of their clustered results compared to a ranked list of results on a subtopic retrieval task, by means of the device-independent notion of subtopic reach time together with a reusable test collection built from Wikipedia ambiguous entries. Then, we make a crosscomparison of methods (i.e., clustering and ranked list) and devices (i.e., desktop, PDA, and cell phone), using an interactive information-finding task performed by external participants. The main finding is that clustering engines are a viable complementary approach to plain search engines both for desktop and mobile searches especially, but not only, for multitopic informational queries.

Privacy-enhancing personalized web search

by Yabo Xu , 2007
"... Personalized web search is a promising way to improve search quality by customizing search results for people with individual information goals. However, users are uncomfortable with exposing private preference information to search engines. On the other hand, privacy is not absolute, and often can ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Personalized web search is a promising way to improve search quality by customizing search results for people with individual information goals. However, users are uncomfortable with exposing private preference information to search engines. On the other hand, privacy is not absolute, and often can be compromised if there is a gain in service or profitability to the user. Thus, a balance must be struck between search quality and privacy protection. This paper presents a scalable way for users to automatically build rich user profiles. These profiles summarize a user’s interests into a hierarchical organization according to specific interests. Two parameters for specifying privacy requirements are proposed to help the user to choose the content and degree of detail of the profile information that is exposed to the search engine. Experiments showed that the user profile improved search quality when compared to standard MSN rankings. More importantly, results verified our hypothesis that a significant improvement on search quality can be achieved by only sharing some higher-level user profile information, which is potentially less sensitive than detailed personal information.

PolyNews: Delivering Multiple Aspects of News to Mitigate Media Bias

by Souneil Park, Seungwoo Kang, Junehwa Song, Souneil Park, Seungwoo Kang, Junehwa Song , 2006
"... The bias of news media is an inherent flaw of the news production process, spanning news gathering, writing, and editing stages. Producer’s subjective valuation, wittingly or unwittingly, takes place during the daily production process. The resulting bias often causes a sharp increase in political p ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
The bias of news media is an inherent flaw of the news production process, spanning news gathering, writing, and editing stages. Producer’s subjective valuation, wittingly or unwittingly, takes place during the daily production process. The resulting bias often causes a sharp increase in political polarization and in the cost of conflict on social issues such as Iraq war [7]. With the rapid growth of the Internet news media, it gets very difficult, if not impossible, for readers to have penetrating views on realities against such bias. We present PolyNews, a novel Internet news service framework aiming at mitigating the effect of media bias. PolyNews automatically creates and promptly provides readers with multiple classified viewpoints on a news event of interest. As such, it effectively helps readers understand a fact from a plural of viewpoints and formulate their own, more balanced viewpoints free from specific biased views. The proposed focus-based clustering is realized through two important clustering steps, i.e., news structure-based clustering and collaborative clustering. We the focus of an article. 1.

Building an Open Source Meta-Search Engine

by A. Gulli, A. Signorini , 2005
"... In this short paper we introduce Helios, a flexible and e#- cient open source meta-search engine. Helios currently runs on the top of 18 search engines (in Web, Books, News, and Academic publication domains), but additional search engines can be easily plugged in. We also report some performance mes ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
In this short paper we introduce Helios, a flexible and e#- cient open source meta-search engine. Helios currently runs on the top of 18 search engines (in Web, Books, News, and Academic publication domains), but additional search engines can be easily plugged in. We also report some performance mesured during its development.

The Anatomy of a News Search Engine

by A. Gulli , 2005
"... Today, news browsing and searching is one of the most important Internet activity. This paper introduces a general framework to build a News search engine by describing Velthune, an academic News search engine available on line. ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Today, news browsing and searching is one of the most important Internet activity. This paper introduces a general framework to build a News search engine by describing Velthune, an academic News search engine available on line.

M.: Cluster generation and cluster labelling for web snippets: A fast and accurate hierarchical solution

by Marco Pellegrini, Marco Maggini, Fabrizio Sebastiani - In Proceedings of the 13th Symposium on String Processing and Information Retrieval (SPIRE 2006 , 2006
"... Abstract. This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Abstract. This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-point-first algorithm for metric kcenter clustering. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Directory Project hierarchy. According to two widely accepted “external” metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. 1

M.: CatS: A classification-powered meta-search engine

by Mirjana Ivanović - In: Advances in Web Intelligence and Data Mining. Studies in Computational Intelligence 23, Springer-Verlag , 2006
"... Summary. CatS is a meta-search engine that utilizes text classification techniques to improve the presentation of search results. After posting a query, the user is offered an opportunity to refine the results by browsing through a category tree derived from the dmoz Open Directory topic hierarchy. ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Summary. CatS is a meta-search engine that utilizes text classification techniques to improve the presentation of search results. After posting a query, the user is offered an opportunity to refine the results by browsing through a category tree derived from the dmoz Open Directory topic hierarchy. This paper describes some key aspects of the system (including HTML parsing, classification and displaying of results), outlines the text categorization experiments performed in order to choose the right parameters for classification, and puts the system into the context of related work on (meta-)search engines. The approach of using a separate category tree represents an extension of the standard relevance list, and provides a way to refine the search on need, offering the user a non-imposing, but potentially powerful tool for locating needed information quickly and efficiently. The current implementation of CatS may be considered a baseline, on top of which many enhancements are possible. 1

Topical Query Decomposition

by Francesco Bonchi, Debora Donato, Carlos Castillo, Aristides Gionis
"... We introduce the problem of query decomposition, where we are given a query and a document retrieval system, and we want to produce a small set of queries whose union of resulting documents corresponds approximately to that of the original query. Ideally, these queries should represent coherent, con ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
We introduce the problem of query decomposition, where we are given a query and a document retrieval system, and we want to produce a small set of queries whose union of resulting documents corresponds approximately to that of the original query. Ideally, these queries should represent coherent, conceptually well-separated topics. We provide an abstract formulation of the query decomposition problem, and we tackle it from two different perspectives. We first show how the problem can be instantiated as a specific variant of a set cover problem, for which we provide an efficient greedy algorithm. Next, we show how the same problem can be seen as a constrained clustering problem, with a very particular kind of constraint, i.e., clustering with predefined clusters. We develop a two-phase algorithm based on hierarchical agglomerative clustering followed by dynamic programming. Our experiments, conducted on a set of actual queries in a Web scale search engine, confirm the effectiveness of the proposed solutions.

MakeMyPage: Social Media Meets Automatic Content Generation

by Francisco Iacobelli, Kristian Hammond, Larry Birnbaum
"... Finding out about a topic online can be time consuming. It involves visiting multiple news sites, encyclopedia entries, video repositories and other resources while discarding irrelevant information. MakeMyPage aims to speed this process by combining automatic aggregation of information with social ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Finding out about a topic online can be time consuming. It involves visiting multiple news sites, encyclopedia entries, video repositories and other resources while discarding irrelevant information. MakeMyPage aims to speed this process by combining automatic aggregation of information with social media to build web pages with images, videos and links to important information about a topic. MakeMyPage uses automatic aggregation to provide the initial content of the web pages. This content is organized by type: blogs, news, web links, images, video and a main article. MakeMyPage creates a web page by selecting a few items from each category, plus links to more resources within it. Users can vote on the links and media they like best for a given topic and, based on these votes, the system promotes them to and within the main web page. MakeMyPage can be thought of as a collection of wiki pages where people enhance automatically generated content not by editing the text in it, but by voting and suggesting new links. The system’s focus is on the organization of content that is genuinely useful and on point. MakeMyPage continuously tracks popular search queries and maintains a database of web pages about these topics.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University