• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Web search results clustering in Polish: experimental evaluation of Carrot (2003)

by Dawid Weiss, Jerzy Stefanowski
Venue:In IIS03
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering

by Paolo Ferragina, Antonio Gulli , 2005
"... In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current ..."
Abstract - Cited by 54 (3 self) - Add to MetaCart
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current search engines. Users can navigate through the hierarchy driven by their search needs. This is especially useful for informative, polysemous and poor queries.

Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition

by Stanislaw Osinski, Jerzy Stefanowski, Dawid Weiss , 2004
"... Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo---a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe meth ..."
Abstract - Cited by 20 (3 self) - Add to MetaCart
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo---a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the term-document matrix and frequent phrase extraction using su#x arrays. Finally, we discuss results acquired from an empirical evaluation of the algorithm.

A topology-driven approach to the design of web meta-search clustering engines

by Emilio Di Giacomo, Emilio Di Giacomo, Walter Didimo, Walter Didimo, Luca Grilli, Luca Grilli - In Theory and Practice of Computer Science (SOFSEM ’05), volume 3381 of Lecture Notes in Computer Science , 2005
"... The paradigm adopted by classical Web search engines to output the results of a query is often inadequate. It typically consists of a ranked list of URLs, which may be very long and difficult to browse for the interested user. Recently, a lot of attention has been devoted to the design of Web meta-s ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
The paradigm adopted by classical Web search engines to output the results of a query is often inadequate. It typically consists of a ranked list of URLs, which may be very long and difficult to browse for the interested user. Recently, a lot of attention has been devoted to the design of Web meta-search clustering engines. These systems support the user by grouping the URLs returned by a search engine into distinct semantic categories, which are organized in a hierarchy; each category is properly labeled with a sentence that reflects its topics. However, even the most effective Web meta-search engines usually end-up by presenting many “meaningful ” categories together with a few “inexpressive ” categories on some specific queries. In this paper we describe a novel topology-driven approach to the design of a Web meta-search clustering engine. By this approach the set of URLs is modeled as a suitable graph and the hierarchy of categories is obtained by variants of classical graph-clustering algorithms. The topologydriven approach turns out to be comparable with traditional text-based strategies for the definition of the cluster hierarchy. In addition, our approach makes it natural to use graph visualization techniques to support the user in handling inexpressive labels. Namely, categories with inexpressive labels can be visually related to more meaningful ones. 1

Inter-document similarity in web searches

by Bruno Martins, Bruno Emanuel, Bruno Emanuel, Da Graça Martins, Da Graça Martins, Mestre Em Informática, Mestre Em Informática, Mário Gaspar, Mário Gaspar, Da Silva, Da Silva, José Luís, José Luís, Cabral De, Cabral De, Moura Borges, Moura Borges, André Osório, André Osório, E Cruz, E Cruz, De Azevedo Falcão, De Azevedo Falcão, Thibault Nicolas Langlois, Thibault Nicolas Langlois , 2004
"... are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri: ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri:

Carrot 2 and Language Properties

by Jerzy Stefanowski, Dawid Weiss - In Proceedings of AWIC-2003, First International Atlantic Web Intelligence Conference , 2003
"... This paper relates to a technique of improving results visualization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and devel- opment of search results clustering algorithms Carrot 2. We also discuss attempts to measuring ..."
Abstract - Add to MetaCart
This paper relates to a technique of improving results visualization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and devel- opment of search results clustering algorithms Carrot 2. We also discuss attempts to measuring quality of discovered clusters and demonstrate results of our experiments with quality assessment when inflectionally rich language (Polish) is clustered using a representative algorithm - Suffix Tree Clustering.

The Anatomy of a Hierarchical Clustering Engine for Web-Page, News and Book Snippets

by For Web-page, Paolo Ferragina, Antonio Gullì
"... this paper, we investigate the web snippet hierarchical clustering problem in its full extent by devising an algorithmic solution, and a software prototype called SnakeT (accessible at http://roquefort.di.unipi.it/), that: (1) draws the snippets from 16 Web search engines, the Amazon collection of b ..."
Abstract - Add to MetaCart
this paper, we investigate the web snippet hierarchical clustering problem in its full extent by devising an algorithmic solution, and a software prototype called SnakeT (accessible at http://roquefort.di.unipi.it/), that: (1) draws the snippets from 16 Web search engines, the Amazon collection of books a9.com, the news of Google News and the blogs of Blogline; (2) builds the clusters on-the-fly (ephemeral clustering [10]) in response to a user query without adopting any pre-defined organization in categories; (3) labels the clusters with sentences of variable length, drawn from the snippets and possibly missing some terms, pro- # Partially supported by the Italian MIUR projects ALINWEB, ECD, the "Italian Grid Project", "Distributed high-performance platform ", and by the Italian Registry of ccTLD.it. Contact: {ferragina,gulli}@di.unipi.it vided they are not too many; Figure 1. SnakeT's Clusters for "Java"

STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching

by Stella Kopidaki, Panagiotis Papadakos, Yannis Tzitzikas
"... Abstract. Results clustering in Web Searching is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. However, the task of deriving singleword or multiple-word names for the clusters (usually referred as cluster labeling) is di ..."
Abstract - Add to MetaCart
Abstract. Results clustering in Web Searching is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. However, the task of deriving singleword or multiple-word names for the clusters (usually referred as cluster labeling) is difficult, because they have to be syntactically correct and predictive. Moreover efficiency is an important requirement since results clustering is an online task. Suffix Tree Clustering (STC) is a clustering technique where search results (mainly snippets) can be clustered fast (in linear time), incrementally, and each cluster is labeled with a phrase. In this paper we introduce: (a) a variation of the STC, called STC+, with a scoring formula that favors phrases that occur in document titles and differs in the way base clusters are merged, and (b) a novel algorithm called NM-STC that results in hierarchically organized clusters. The comparative user evaluation showed that both STC+ and NM-STC are significantly more preferred than STC, and that NM-STC is about two times faster than STC and STC+. 1

Carrot Search

by Claudio Carpineto, Stanislaw Osiński, Giovanni Romano, Dawid Weiss , 2009
"... Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preproces ..."
Abstract - Add to MetaCart
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization. Search results clustering, the core of the system, has specific requirements that cannot be addressed by classical clustering algorithms. We emphasize the role played by the quality of the cluster labels as opposed to optimizing only the clustering structure. We highlight the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance. Some directions for future research are finally presented.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University