Results 11  20
of
69
FCMBased Model Selection Algorithms for Determining the Number of Clusters
, 2004
"... Clustering is an important research topic that has practical applications in many elds. It has been demonstrated that fuzzy clustering, using algorithms such as the fuzzy Cmeans (FCM), has clear advantages over crisp and probabilistic clustering methods. Like most clustering algorithms, however, FC ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Clustering is an important research topic that has practical applications in many elds. It has been demonstrated that fuzzy clustering, using algorithms such as the fuzzy Cmeans (FCM), has clear advantages over crisp and probabilistic clustering methods. Like most clustering algorithms, however, FCM and its derivatives need the number of clusters in the given data set as one of their initializing parameters. The main goal of this paper is to develop an e ective fuzzy algorithm for automatically determining the number of clusters. After a brief review of the relevant literature, we present a new algorithm for determining the number of clusters in a given data set and a new validity index for measuring the “goodness ” of clustering. Experimental results and comparisons are given to illustrate the performance of the new algorithm.
Selforganizing Maps as Substitutes for KMeans Clustering
 In
, 2005
"... One of the most widely used clustering techniques used in GISc problems is the kmeans algorithm. One of the most important issues in the correct use of kmeans is the initialization procedure that ultimately determines which part of the solution space will be searched. In this paper we briefly ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
One of the most widely used clustering techniques used in GISc problems is the kmeans algorithm. One of the most important issues in the correct use of kmeans is the initialization procedure that ultimately determines which part of the solution space will be searched. In this paper we briefly review different initialization procedures, and propose Kohonen's SelfOrganizing Maps as the most convenient method, given the proper training parameters. Furthermore, we show that in the final stages of its training procedure the SelfOrganizing Map algorithms is rigorously the same as the kmeans algorithm. Thus we propose the use of SelfOrganizing Maps as possible substitutes for the more classical kmeans clustering algorithms.
Improvements to the scalability of multiobjective clustering
 In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, IEEE
, 2005
"... Abstract In previous work, we have introduced a novel and highly effective approach to data clustering, based on the explicit optimization of a partitioning with respect to two complementary clustering objectives [4, 5, 6]. In this paper, we make three modifications to the algorithm that improve it ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Abstract In previous work, we have introduced a novel and highly effective approach to data clustering, based on the explicit optimization of a partitioning with respect to two complementary clustering objectives [4, 5, 6]. In this paper, we make three modifications to the algorithm that improve its scalability to large data sets with high dimensionality and large numbers of clusters. Specifically, we introduce new initialization and mutation schemes that enable a more efficient exploration of the search space, and modify the null data model that is used as a basis for selecting the most significant solution from the Pareto front. The high performance of the resulting algorithm is demonstrated on a newly developed clustering test suite. 1
A genetic algorithm using hyperquadtrees for lowdimensional kmeans clustering
 IEEE Trans. on Pattern Analysis and Machine Intelligence
, 2006
"... Abstract—The kmeans algorithm is widely used for clustering because of its computational efficiency. Given n points in ddimensional space and the number of desired clusters k, kmeans seeks a set of k cluster centers so as to minimize the sum of the squared Euclidean distance between each point an ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Abstract—The kmeans algorithm is widely used for clustering because of its computational efficiency. Given n points in ddimensional space and the number of desired clusters k, kmeans seeks a set of k cluster centers so as to minimize the sum of the squared Euclidean distance between each point and its nearest cluster center. However, the algorithm is very sensitive to the initial selection of centers and is likely to converge to partitions that are significantly inferior to the global optimum. We present a genetic algorithm (GA) for evolving centers in the kmeans algorithm that simultaneously identifies good partitions for a range of values around a specified k. The set of centers is represented using a hyperquadtree constructed on the data. This representation is exploited in our GA to generate an initial population of good centers and to support a novel crossover operation that selectively passes good subsets of neighboring centers from parents to offspring by swapping subtrees. Experimental results indicate that our GA finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets. Index Terms—kmeans algorithm, clustering, genetic algorithms, quadtrees, optimal partition, center selection. 1
Kboost: A Scalable Algorithm for High Quality Clustering of Microarray Gene Expression Data TR IIT2007015, Istituto di Informatica e Telematica del CNR
, 2007
"... We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k nonoverlapping clusters. We augment the wellknown furthestpointfirst algorithm for kcenter clustering in metric spaces with a filtering scheme based on the ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k nonoverlapping clusters. We augment the wellknown furthestpointfirst algorithm for kcenter clustering in metric spaces with a filtering scheme based on the triangular inequality. We apply this algorithm to Web snippet clustering, comparing it against strong baselines consisting of recent, fast variants of the classical kmeans iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clustering, either the realtime nature of the task or the large amount of data make the poorly scalable, traditional clustering methods unsuitable.
Realtime compression for dynamic 3d environments
 in MULTIMEDIA ’03: Proceedings of the eleventh ACM international conference on Multimedia
, 2003
"... The goal of teleimmersion has long been to enable people at remote locations to share a sense of presence. A teleimmersion system acquires the 3D representation of a collaborator’s environment remotely and sends it over the network where it is rendered in the user’s environment. Acquisition, recon ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The goal of teleimmersion has long been to enable people at remote locations to share a sense of presence. A teleimmersion system acquires the 3D representation of a collaborator’s environment remotely and sends it over the network where it is rendered in the user’s environment. Acquisition, reconstruction, transmission, and rendering all have to be done in realtime to create a sense of presence. With added commodity hardware resources, parallelism can increase the acquisition volume and reconstruction data quality while maintaining realtime performance. However this is not as easy for rendering since all of the data need to be combined into a single display. In this paper we present an algorithm to compress data from such 3D environments in realtime to solve this imbalance. We expect the compression algorithm to scale comparably to the acquisition and reconstruction, reduce network transmission bandwidth, and reduce the rendering requirement for realtime performance. We have tested the algorithm using a synthetic office data set and have achieved a 5 to 1 compression for 22 depth streams.
A Fuzzy Clustering and Fuzzy Merging Algorithm
, 1999
"... . Some major problems in clustering are: i) find the optimal number K of clusters; ii) assess the validity of a given clustering; iii) permit the classes to form natural shapes rather than forcing them into normed balls of the distance function; iv) prevent the order in which the feature vectors are ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
. Some major problems in clustering are: i) find the optimal number K of clusters; ii) assess the validity of a given clustering; iii) permit the classes to form natural shapes rather than forcing them into normed balls of the distance function; iv) prevent the order in which the feature vectors are read in from affecting the clustering; and v) prevent the order of merging from affecting the clustering. The kmeans algorithm is the most efficient, easiest to implement and has known convergence, but it suffers from all of the above deficiencies. We employ a relatively large number K of uniformly randomly distributed initial prototypes and then thin by deleting any prototypes that are too close to another in a manner to leave fewer uniformly distributed prototypes. We then employ the kmeans algorithm, eliminate empty and very small clusters and iterate a process of computing a new type of fuzzy prototypes and reassigning the feature vectors until the prototypes become fixed. At that poi...
An Iterated Local Search Approach for Minimum SumOfSquares Clustering
"... Abstract. Since minimum sumofsquares clustering (MSSC) is an NPhard combinatorial optimization problem, applying techniques from global optimization appears to be promising for reliably clustering numerical data. In this paper, concepts of combinatorial heuristic optimization are considered for a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. Since minimum sumofsquares clustering (MSSC) is an NPhard combinatorial optimization problem, applying techniques from global optimization appears to be promising for reliably clustering numerical data. In this paper, concepts of combinatorial heuristic optimization are considered for approaching the MSSC: An iterated local search (ILS) approach is proposed which is capable of finding (near)optimum solutions very quickly. On gene expression data resulting from biological microarray experiments, it is shown that ILS outperforms multi–start kmeans as well as three other clustering heuristics combined with kmeans. 1
Intelligent choice of the number of clusters in kmeans clustering: An experimental study with different cluster spreads
 J. Classification
, 2010
"... The issue of determining “the right number of clusters ” in KMeans has attracted considerable interest, especially in the recent years. Cluster overlap appears to be a factor most affecting the clustering results. This paper proposes an experimental setting for comparison of different approaches at ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The issue of determining “the right number of clusters ” in KMeans has attracted considerable interest, especially in the recent years. Cluster overlap appears to be a factor most affecting the clustering results. This paper proposes an experimental setting for comparison of different approaches at data generated from Gaussian clusters with the controlled parameters of between and withincluster spread to model different cluster overlaps. The setting allows for evaluating the centroid recovery on par with conventional evaluation of the cluster recovery. The subjects of our interest are two versions of the “intelligent ” KMeans method, ikMeans, that find the right number of clusters onebyone extracting “anomalous patterns ” from the data. We compare them with seven other methods, including Hartigan’s rule, averaged Silhouette width and Gap statistic, under six different between and within cluster spreadshape conditions. There are several consistent patterns in the results of our experiments, such as that the right K is reproduced best by Hartigan’s rule – but not clusters or their centroids. This leads us to propose an adjusted version of iKMeans, which performs well in the current experiment setting.
Weighted kmeans for densitybiased clustering
 In DaWaK
, 2005
"... Abstract. Clustering is a task of grouping data based on similarity. A popular kmeans algorithm groups data by firstly assigning all data points to the closest clusters, then determining the cluster means. The algorithm repeats these two steps until it has converged. We propose a variation called w ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. Clustering is a task of grouping data based on similarity. A popular kmeans algorithm groups data by firstly assigning all data points to the closest clusters, then determining the cluster means. The algorithm repeats these two steps until it has converged. We propose a variation called weighted kmeans to improve the clustering scalability. To speed up the clustering process, we develop the reservoirbiased sampling as an efficient data reduction technique since it performs a single scan over a data set. Our algorithm has been designed to group data of mixture models. We present an experimental evaluation of the proposed method. 1