Results 1 - 10
of
15
A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering
, 2005
"... In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current search engines. Users can navigate through the hierarchy driven by their search needs. This is especially useful for informative, polysemous and poor queries.
Web search results clustering in Polish: experimental evaluation of Carrot
- In IIS03
, 2003
"... In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration -- Su#x Tree Clustering has been acknowledged as being very e#cient when appl ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration -- Su#x Tree Clustering has been acknowledged as being very e#cient when applied to English. We present conclusions from its experimental application to Polish, indicating fragile areas, where the algorithm seem to fail due to specific properties of the input data. We indicate that the characteristics of produced clusters (number, value), unlike in English, strongly depend on pre-processing phase. We also attempt to investigate the influence of two primary STC parameters: merge threshold and minimum base cluster score on the number and quality of results. Finally, we introduce two approaches to e#cient, approximate stemming of Polish words: quasi-stemmer and an automaton-based method.
Mobile Information Retrieval with Search Results Clustering: Prototypes and Evaluations
- Journal of American Society for Information Science and Technology (JASIST
, 2009
"... Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we investigate the application of search results cl ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we investigate the application of search results clustering, used with some success for desktop computer searches, to the mobile scenario. Building on CREDO (Conceptual Reorganization of Documents), a Web clustering engine based on concept lattices, we present its mobile versions Credino and SmartCREDO, for PDAs and cell phones, respectively. Next, we evaluate the retrieval performance of the three prototype systems. We measure the effectiveness of their clustered results compared to a ranked list of results on a subtopic retrieval task, by means of the device-independent notion of subtopic reach time together with a reusable test collection built from Wikipedia ambiguous entries. Then, we make a crosscomparison of methods (i.e., clustering and ranked list) and devices (i.e., desktop, PDA, and cell phone), using an interactive information-finding task performed by external participants. The main finding is that clustering engines are a viable complementary approach to plain search engines both for desktop and mobile searches especially, but not only, for multitopic informational queries.
M.: Cluster generation and cluster labelling for web snippets: A fast and accurate hierarchical solution
- In Proceedings of the 13th Symposium on String Processing and Information Retrieval (SPIRE 2006
, 2006
"... Abstract. This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-point-first algorithm for metric kcenter clustering. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Directory Project hierarchy. According to two widely accepted “external” metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. 1
Conceptual Clustering Using Lingo Algorithm: Evaluation on Open Directory Project Data
- In IIPWM04
, 2004
"... Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of hi ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrowtopic document references and mix them into several multi-topic test sets for the algorithm. We then compare the clusters acquired from Lingo to the expected set of ODP categories mixed in the input. Finally we discuss observations from the experiment, highlighting the algorithm's strengths and weaknesses and conclude with research directions for the future.
Combining Statistics and Semantics for Word and Document Clustering
- In Ontology Learning Workshop, IJCAI’01
, 2001
"... A new approach for constructing pseudo-keywords, referred to as Sense Units, is proposed. Sense Units are obtained by a word clustering process, where the underlying similarity reflects both statistical and semantic properties, respectively detected through Latent Semantic Analysis and WordNet. ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
A new approach for constructing pseudo-keywords, referred to as Sense Units, is proposed. Sense Units are obtained by a word clustering process, where the underlying similarity reflects both statistical and semantic properties, respectively detected through Latent Semantic Analysis and WordNet.
Inter-document similarity in web searches
, 2004
"... are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri: ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri:
Self-organising maps for hierarchical tree view document clustering using contextual information
- PROCEEDINGS OF THE
, 2002
"... In this paper we propose an effective method to cluster documents into a dynamically built taxonomy of topics, directly extracted from the documents. We take into account short contextual information within the text corpus, which is weighted by importance and used as input to a set of independen ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper we propose an effective method to cluster documents into a dynamically built taxonomy of topics, directly extracted from the documents. We take into account short contextual information within the text corpus, which is weighted by importance and used as input to a set of independently spun growing Self-Organising Maps (SOM). This work shows an increase in precision and labelling quality in the hierarchy of topics, using these indexing units. The use of the tree structure over sets of conventional twodimensional maps creates topic hierarchies that are easy to browse and understand, in which the documents are stored based on their content similarity.
Clustering Web Search Results with Maximum Spanning Trees
"... Abstract. We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. We show that our approach improves classical search result clustering methods in terms of both clustering quality and degree of diversification. 1
ABSTRACT A Scalable Algorithm for High-Quality Clustering of Web Snippets
"... We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment the well-known furthest-point-first algorithm for k-center clustering in metric spaces with a filtering scheme based on the ..."
Abstract
- Add to MetaCart
We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment the well-known furthest-point-first algorithm for k-center clustering in metric spaces with a filtering scheme based on the triangular inequality. We apply this algorithm to Web snippet clustering, comparing it against strong baselines consisting of recent, fast variants of the classical k-means iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clustering, either the real-time nature of the task or the large amount of data make the poorly scalable, traditional clustering methods unsuitable.

