Results 1 -
7 of
7
Collecting community wisdom: integrating social search & social navigation
- In IUI ’07: Proceedings of the 12th international conference on Intelligent user interfaces
, 2007
"... The goal of this paper is to detail the integration of two “social Web ” technologies – social search and social navigation – and to highlight the benefits of such integration on two levels. Firstly, both technologies harvest and harness “community wisdom ” and in an integrated system each of the se ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
The goal of this paper is to detail the integration of two “social Web ” technologies – social search and social navigation – and to highlight the benefits of such integration on two levels. Firstly, both technologies harvest and harness “community wisdom ” and in an integrated system each of the search and navigation components can benefit from the additional community wisdom gathered by the other when assisting users to locate relevant information. Secondly, by integrating search and browsing we facilitate the development of a unique interface that effectively blends search and browsing functionality as part of a seamless social information access service. This service allows users to effectively combine their search and browsing behaviors. In this paper we will argue that this integration provides significantly more than the simple sum of the parts. ACM Classification: H.3.1 [Content Analysis and Indexing]: Indexing method; H.3.7 [Digital Libraries]: User issues;
Web search results clustering in Polish: experimental evaluation of Carrot
- In IIS03
, 2003
"... In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration -- Su#x Tree Clustering has been acknowledged as being very e#cient when appl ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration -- Su#x Tree Clustering has been acknowledged as being very e#cient when applied to English. We present conclusions from its experimental application to Polish, indicating fragile areas, where the algorithm seem to fail due to specific properties of the input data. We indicate that the characteristics of produced clusters (number, value), unlike in English, strongly depend on pre-processing phase. We also attempt to investigate the influence of two primary STC parameters: merge threshold and minimum base cluster score on the number and quality of results. Finally, we introduce two approaches to e#cient, approximate stemming of Polish words: quasi-stemmer and an automaton-based method.
M.: Cluster generation and cluster labelling for web snippets: A fast and accurate hierarchical solution
- In Proceedings of the 13th Symposium on String Processing and Information Retrieval (SPIRE 2006
, 2006
"... Abstract. This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-point-first algorithm for metric kcenter clustering. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Directory Project hierarchy. According to two widely accepted “external” metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. 1
on Computational Linguistics Contents
, 2004
"... Workshop organization................................................................................................... 5 Workshop Timetable....................................................................................................... 6 Introduction to ROMAND 2004, Vincenzo Pallotta and Am ..."
Abstract
- Add to MetaCart
Workshop organization................................................................................................... 5 Workshop Timetable....................................................................................................... 6 Introduction to ROMAND 2004, Vincenzo Pallotta and Amalia Todirascu..................... 7 Robust models of human parsing, Frank Keller............................................................. 11
Document retro-conversion for personalized electronic reedition
"... In this paper, we propose a generic framework to store, retrieve, transform and present mixed sets of native and virtual documents. We intend to use or to develop specific tools organized in a global architecture, from document analysis and capture, document retrieval and classification-categorizati ..."
Abstract
- Add to MetaCart
In this paper, we propose a generic framework to store, retrieve, transform and present mixed sets of native and virtual documents. We intend to use or to develop specific tools organized in a global architecture, from document analysis and capture, document retrieval and classification-categorization, to full generation of personal sets of documents, corresponding to user’s specific needs and profile. The first step concerns document preparation and formal analysis. The second step adds semantic metadata, content indexing, and structure-semantic analysis. The third step helps user for the constitution of personalized documents. Research is based on domain specific large sets of documents, as for example European Union law documents (many millions, many file formats, in twenty official languages).
An Integrated System for Building Enterprise Taxonomies
, 2007
"... Although considerable research has been conducted in the field of hierarchical text categorization, little has been done on automatically collecting labeled corpus for building hierarchical taxonomies. In this paper, we propose an automatic method of collecting training samples to build hierarchical ..."
Abstract
- Add to MetaCart
Although considerable research has been conducted in the field of hierarchical text categorization, little has been done on automatically collecting labeled corpus for building hierarchical taxonomies. In this paper, we propose an automatic method of collecting training samples to build hierarchical taxonomies. In our method, the category node is initially defined by
some keywords, the web search engine is then used to construct a small set
of labeled documents, and a topic tracking algorithm with keyword-based
content normalization is applied to enlarge the training corpus on the basis
of the seed documents. We also design a method to check the consistency
of the collected corpus. The above steps produce a flat category structure which contains all the categories for building the hierarchical taxonomy.
Next, linear discriminant projection approach is utilized to construct more
meaningful intermediate levels of hierarchies in the generated flat set of cat-
egories. Experimental results show that the training corpus is good enough
for statistical classification methods.
Carrot Search
, 2009
"... Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preproces ..."
Abstract
- Add to MetaCart
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization. Search results clustering, the core of the system, has specific requirements that cannot be addressed by classical clustering algorithms. We emphasize the role played by the quality of the cluster labels as opposed to optimizing only the clustering structure. We highlight the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance. Some directions for future research are finally presented.

