Results 1 -
1 of
1
Web search results clustering in Polish: experimental evaluation of Carrot
- In IIS03
, 2003
"... In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration -- Su#x Tree Clustering has been acknowledged as being very e#cient when appl ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration -- Su#x Tree Clustering has been acknowledged as being very e#cient when applied to English. We present conclusions from its experimental application to Polish, indicating fragile areas, where the algorithm seem to fail due to specific properties of the input data. We indicate that the characteristics of produced clusters (number, value), unlike in English, strongly depend on pre-processing phase. We also attempt to investigate the influence of two primary STC parameters: merge threshold and minimum base cluster score on the number and quality of results. Finally, we introduce two approaches to e#cient, approximate stemming of Polish words: quasi-stemmer and an automaton-based method.

