Results 1 -
2 of
2
Carrot 2 : Design of a Flexible and Efficient Web Information Retrieval Framework
- In: Proceedings of the Third International Atlantic Web Intelligence Conference, AWIC-2005, ̷Lód´z, Poland. Volume 3528 of Lecture Notes in Computer Science
, 2005
"... Abstract. In this paper we present the design goals and implementation outline of Carrot 2, an open source framework for rapid development of applications dealing with Web Information Retrieval and Web Mining. The framework has been written from scratch keeping in mind flexibility and efficiency of ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Abstract. In this paper we present the design goals and implementation outline of Carrot 2, an open source framework for rapid development of applications dealing with Web Information Retrieval and Web Mining. The framework has been written from scratch keeping in mind flexibility and efficiency of processing. We show two software architectures that meet the requirements of these two aspects and provide evidence of their use in clustering of search results. We also discuss the importance and advantages of contributing and integrating the results of scientific projects with the open source community. Keywords: Information Retrieval, Clustering, Systems Design. 1
Improving Quality of Search Results Clustering with Approximate Matrix Factorisations
"... Abstract. In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We compare four different factorisations (SVD, NMF, LNMF and K-Means/Concept Decomposition) with respect to topic separati ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We compare four different factorisations (SVD, NMF, LNMF and K-Means/Concept Decomposition) with respect to topic separation capability, outlier detection and label quality. We also compare our approach with two other clustering algorithms: Suffix Tree Clustering (STC) and Tolerance Rough Set Clustering (TRC). For our experiments we use the standard merge-thencluster approach based on the Open Directory Project web catalogue as a source of human-clustered document summaries. 1

