Results 11  20
of
432
Trajectory Clustering with Mixtures of Regression Models
 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
, 1999
"... In this paper we address the problem of clustering trajectories, namely sets of short sequences of data measured as a function of a dependent variable such as time. Examples include storm path trajectories, longitudinal data such as drug therapy response, functional expression data in computational ..."
Abstract

Cited by 122 (8 self)
 Add to MetaCart
(Show Context)
In this paper we address the problem of clustering trajectories, namely sets of short sequences of data measured as a function of a dependent variable such as time. Examples include storm path trajectories, longitudinal data such as drug therapy response, functional expression data in computational biology, and movements of objects or individuals in video sequences. Our clustering algorithm is based on a principled method for probabilistic modelling of a set of trajectories as individual sequences of points generated from a finite mixture model consisting of regression model components. Unsupervised learning is carried out using maximum likelihood principles. Specifically, the EM algorithm is used to cope with the hidden data problem (i.e., the cluster memberships). We also develop generalizations of the method to handle nonparametric (kernel) regression components as well as multidimensional outputs. Simulation results comparing our method with other clustering methods such as Kmea...
2001 A SAS procedure based on mixture models for estimating developmental trajectories
 Sociological Methods & Research 29:374–393. Katz, Rebecca S
"... This article introduces a new SAS procedure written by the authors that analyzes longitudinal data (developmental trajectories) by fitting a mixture model. The TRAJ procedure fits semiparametric (discrete) mixtures of censored normal, Poisson, zeroinflated Poisson, and Bernoulli distributions to ..."
Abstract

Cited by 107 (10 self)
 Add to MetaCart
This article introduces a new SAS procedure written by the authors that analyzes longitudinal data (developmental trajectories) by fitting a mixture model. The TRAJ procedure fits semiparametric (discrete) mixtures of censored normal, Poisson, zeroinflated Poisson, and Bernoulli distributions to longitudinal data. Applications to psychometric scale data, offense counts, and a dichotomous prevalence measure in violence research are illustrated. In addition, the use of the Bayesian information criterion to address the problem of model selection, including the estimation of the number of components in the mixture, is demonstrated.
Combining multiple clusterings using evidence accumulation
 IEEE Transaction on Pattern Analysis and Machine Intelligence
, 2005
"... We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: (1) applying differ ..."
Abstract

Cited by 105 (7 self)
 Add to MetaCart
(Show Context)
We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: (1) applying different clustering algorithms, and (2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n × n similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulationbased clustering algorithm, using a split and merge strategy based on the Kmeans clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well known clustering algorithms.
Performance Evaluation of Some Clustering Algorithms and Validity Indices
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... Abstract—In this article, we evaluate the performance of three clustering algorithms, hard KMeans, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely DaviesBouldin index, Dunn’s index, CalinskiHarabasz index, and a recently de ..."
Abstract

Cited by 104 (2 self)
 Add to MetaCart
(Show Context)
Abstract—In this article, we evaluate the performance of three clustering algorithms, hard KMeans, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely DaviesBouldin index, Dunn’s index, CalinskiHarabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn’s index, a lower bound of the value of the former is theoretically estimated in order to get unique hard Kpartition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and reallife data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SAbased clustering technique is used for proper partitioning of the data into the said number of clusters.
Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms
, 2003
"... Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments are specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. In this pape ..."
Abstract

Cited by 101 (2 self)
 Add to MetaCart
(Show Context)
Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments are specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. In this paper, we investigate techniques to determine the number of clusters or segments to return from hierarchical clustering and segmentation algorithms. We propose an efficient algorithm, the L method, that finds the “knee ” in a ‘ # of clusters vs. clustering evaluation metric ’ graph. Using the knee is wellknown, but is not a particularly wellunderstood method to determine the number of clusters. We explore the feasibility of this method, and attempt to determine in which situations it will and will not work. We also compare the L method to existing methods based on the accuracy of the number of clusters that are determined and efficiency. Our results show favorable performance for these criteria compared to the existing methods that were evaluated.
StabilityBased Validation of Clustering Solutions
, 2004
"... Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract “natural” group structure in data. Such groupings need to be validated to separate the signal in the data from spurious structure. In this context, finding an appropriate number of clusters is a ..."
Abstract

Cited by 94 (7 self)
 Add to MetaCart
Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract “natural” group structure in data. Such groupings need to be validated to separate the signal in the data from spurious structure. In this context, finding an appropriate number of clusters is a particularly important model selection question. We introduce a measure of cluster stability to assess the validity of a cluster model. This stability measure quantifies the reproducibility of clustering solutions on a second sample, and it can be interpreted as a classification risk with regard to class labels produced by a clustering algorithm. The preferred number of clusters is determined by minimizing this classification risk as a function of the number of clusters. Convincing results are achieved on simulated as well as gene expression data sets. Comparisons to other methods demonstrate the competitive performance of our method and its suitability as a general validation tool for clustering solutions in realworld problems.
ExampleBased Photometric Stereo: Shape Reconstruction with General . . .
, 2005
"... This paper presents a technique for computing the geometry of objects with general reflectance properties from images. For surfaces ..."
Abstract

Cited by 89 (2 self)
 Add to MetaCart
This paper presents a technique for computing the geometry of objects with general reflectance properties from images. For surfaces
An Analysis of Recent Work on Clustering Algorithms
, 1999
"... This paper describes four recent papers on clustering, each of which approaches the clustering problem from a different perspective and with different goals. It analyzes the strengths and weaknesses of each approach and describes how a user could could decide which algorithm to use for a given clust ..."
Abstract

Cited by 84 (0 self)
 Add to MetaCart
This paper describes four recent papers on clustering, each of which approaches the clustering problem from a different perspective and with different goals. It analyzes the strengths and weaknesses of each approach and describes how a user could could decide which algorithm to use for a given clustering application. Finally, it concludes with ideas that could make the selection and use of clustering algorithms for data analysis less difficult.
ªModel Selection for Probabilistic Clustering Using CrossValidated Likelihood,º Statistics and
 Computing
, 2002
"... ..."
AE: MCLUST Version 3 for R: Normal Mixture Modeling and ModelBased Clustering
 Department of Statistics, University of Washington
, 2006
"... MCLUST is a contributed R package for normal mixture modeling and modelbased clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions ..."
Abstract

Cited by 81 (1 self)
 Add to MetaCart
(Show Context)
MCLUST is a contributed R package for normal mixture modeling and modelbased clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions that combine modelbased hierarchical clustering, EM for mixture estimation and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation and discriminant analysis. There is additional functionality for displaying and visualizing the models along with clustering and classification results. A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior. MCLUST is licensed by the University of Washington and distributed through