Results 1 -
8 of
8
A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering
, 2005
"... In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current search engines. Users can navigate through the hierarchy driven by their search needs. This is especially useful for informative, polysemous and poor queries.
Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition
, 2004
"... Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo---a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe meth ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo---a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the term-document matrix and frequent phrase extraction using su#x arrays. Finally, we discuss results acquired from an empirical evaluation of the algorithm.
A topology-driven approach to the design of web meta-search clustering engines
- In Theory and Practice of Computer Science (SOFSEM ’05), volume 3381 of Lecture Notes in Computer Science
, 2005
"... The paradigm adopted by classical Web search engines to output the results of a query is often inadequate. It typically consists of a ranked list of URLs, which may be very long and difficult to browse for the interested user. Recently, a lot of attention has been devoted to the design of Web meta-s ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The paradigm adopted by classical Web search engines to output the results of a query is often inadequate. It typically consists of a ranked list of URLs, which may be very long and difficult to browse for the interested user. Recently, a lot of attention has been devoted to the design of Web meta-search clustering engines. These systems support the user by grouping the URLs returned by a search engine into distinct semantic categories, which are organized in a hierarchy; each category is properly labeled with a sentence that reflects its topics. However, even the most effective Web meta-search engines usually end-up by presenting many “meaningful ” categories together with a few “inexpressive ” categories on some specific queries. In this paper we describe a novel topology-driven approach to the design of a Web meta-search clustering engine. By this approach the set of URLs is modeled as a suitable graph and the hierarchy of categories is obtained by variants of classical graph-clustering algorithms. The topologydriven approach turns out to be comparable with traditional text-based strategies for the definition of the cluster hierarchy. In addition, our approach makes it natural to use graph visualization techniques to support the user in handling inexpressive labels. Namely, categories with inexpressive labels can be visually related to more meaningful ones. 1
Inter-document similarity in web searches
, 2004
"... are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri: ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri:
Carrot 2 and Language Properties
- In Proceedings of AWIC-2003, First International Atlantic Web Intelligence Conference
, 2003
"... This paper relates to a technique of improving results visualization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and devel- opment of search results clustering algorithms Carrot 2. We also discuss attempts to measuring ..."
Abstract
- Add to MetaCart
This paper relates to a technique of improving results visualization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and devel- opment of search results clustering algorithms Carrot 2. We also discuss attempts to measuring quality of discovered clusters and demonstrate results of our experiments with quality assessment when inflectionally rich language (Polish) is clustered using a representative algorithm - Suffix Tree Clustering.
The Anatomy of a Hierarchical Clustering Engine for Web-Page, News and Book Snippets
"... this paper, we investigate the web snippet hierarchical clustering problem in its full extent by devising an algorithmic solution, and a software prototype called SnakeT (accessible at http://roquefort.di.unipi.it/), that: (1) draws the snippets from 16 Web search engines, the Amazon collection of b ..."
Abstract
- Add to MetaCart
this paper, we investigate the web snippet hierarchical clustering problem in its full extent by devising an algorithmic solution, and a software prototype called SnakeT (accessible at http://roquefort.di.unipi.it/), that: (1) draws the snippets from 16 Web search engines, the Amazon collection of books a9.com, the news of Google News and the blogs of Blogline; (2) builds the clusters on-the-fly (ephemeral clustering [10]) in response to a user query without adopting any pre-defined organization in categories; (3) labels the clusters with sentences of variable length, drawn from the snippets and possibly missing some terms, pro- # Partially supported by the Italian MIUR projects ALINWEB, ECD, the "Italian Grid Project", "Distributed high-performance platform ", and by the Italian Registry of ccTLD.it. Contact: {ferragina,gulli}@di.unipi.it vided they are not too many; Figure 1. SnakeT's Clusters for "Java"
STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching
"... Abstract. Results clustering in Web Searching is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. However, the task of deriving singleword or multiple-word names for the clusters (usually referred as cluster labeling) is di ..."
Abstract
- Add to MetaCart
Abstract. Results clustering in Web Searching is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. However, the task of deriving singleword or multiple-word names for the clusters (usually referred as cluster labeling) is difficult, because they have to be syntactically correct and predictive. Moreover efficiency is an important requirement since results clustering is an online task. Suffix Tree Clustering (STC) is a clustering technique where search results (mainly snippets) can be clustered fast (in linear time), incrementally, and each cluster is labeled with a phrase. In this paper we introduce: (a) a variation of the STC, called STC+, with a scoring formula that favors phrases that occur in document titles and differs in the way base clusters are merged, and (b) a novel algorithm called NM-STC that results in hierarchically organized clusters. The comparative user evaluation showed that both STC+ and NM-STC are significantly more preferred than STC, and that NM-STC is about two times faster than STC and STC+. 1
Carrot Search
, 2009
"... Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preproces ..."
Abstract
- Add to MetaCart
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization. Search results clustering, the core of the system, has specific requirements that cannot be addressed by classical clustering algorithms. We emphasize the role played by the quality of the cluster labels as opposed to optimizing only the clustering structure. We highlight the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance. Some directions for future research are finally presented.

