Results 1 
4 of
4
Incremental Clustering and Dynamic Information Retrieval
, 1997
"... Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retri ..."
Abstract

Cited by 188 (4 self)
 Add to MetaCart
(Show Context)
Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters.
Enhancing an Incremental Clustering Algorithm for Web Page Collections, Honours dissertation
, 2006
"... With the size and state of the Internet today, a good quality approach to organizing this mass of information is of great importance. Clustering web pages into groups of similar documents is one approach, but relies heavily on good feature extraction and document representation as well as a good clu ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
With the size and state of the Internet today, a good quality approach to organizing this mass of information is of great importance. Clustering web pages into groups of similar documents is one approach, but relies heavily on good feature extraction and document representation as well as a good clustering approach and algorithm. Due to the changing nature of the Internet, resulting in a dynamic dataset, an incremental approach is preferred. In this work we propose an enhanced incremental clustering approach to develop a better clustering algorithm that can help to better organize the information available on the Internet in an incremental fashion. Experiments show that the enhanced algorithm outperforms the original histogram based algorithm by up to 7.5%. 1.
Clustering for Dynamic Processing
"... Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. An algorithm for incremental clustering is introduced. The complexity and cost analysis of the algorithm together with an inves ..."
Abstract
 Add to MetaCart
(Show Context)
Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. An algorithm for incremental clustering is introduced. The complexity and cost analysis of the algorithm together with an investigation of its expected behavior are presented. Through empirical testing it is shown that the algorithm achieves cost effectiveness and generates statistically valid clusters that are compatible with those of reclustering. The experimental evidence shows that the algorithm creates an effective and efficient retrieval environment.
Experiments on Incremental Clustering Fazli Can
"... Clustering of very large document databases is essential to reduce the spacehime complexity of information retrieval. The periodic updating of clusters is required due to the dynamic nature of databases. An algorithm for incremental clustering at discrete times is introduced, Its complexity and cost ..."
Abstract
 Add to MetaCart
(Show Context)
Clustering of very large document databases is essential to reduce the spacehime complexity of information retrieval. The periodic updating of clusters is required due to the dynamic nature of databases. An algorithm for incremental clustering at discrete times is introduced, Its complexity and cost analysis and an investigation of the expected behavior of the algorithm are provided. Through empirical testing, it is shown that the algorithm is achieving its purpose in terms of being cost effective, generating statistically valid clusters that are compatible with those of reclustering, and providing effective information retrieval.