Results 1  10
of
21
Detecting Features in Spatial Point Processes with . . .
, 1995
"... We consider the problem of detecting features in spatial point processes in the presence of substantial clutter. One example is the detection of mine elds using reconnaissance aircraft images that erroneously identify many objects that are not mines. Another is the detection of seismic faults on the ..."
Abstract

Cited by 81 (31 self)
 Add to MetaCart
We consider the problem of detecting features in spatial point processes in the presence of substantial clutter. One example is the detection of mine elds using reconnaissance aircraft images that erroneously identify many objects that are not mines. Another is the detection of seismic faults on the basis of earthquake catalogs: earthquakes tend to be clustered close to the faults, but there are many that are farther away. Our solution uses modelbased clustering based on a mixture model for the process, in which features are assumed to generate points according to highly linear multivariate normal densities, and the clutter arises according to a spatial Poisson process. Very nonlinear features are represented by several highly linear multivariate normal densities, giving a piecewise linear representation. The model is estimated in two stages. In the rst stage, hierarchical modelbased clustering is used to provide a rst estimate of the features. In the second stage, this clustering is re ned using the EM algorithm. The number of features is found using an approximation to the posterior probability of each number of features. For the minefield
An experimental comparison of several clustering and intialization methods
, 1998
"... We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the Kmeans algorithm, a ..."
Abstract

Cited by 78 (0 self)
 Add to MetaCart
We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the Kmeans algorithm, and modelbased hierarchical agglomerative clustering. We learn naiveBayes models with a hidden root node, using highdimensional discretevariable data sets (both real and synthetic). We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization schemes on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of hierarchical agglomerative clustering. Although the methods are substantially different, they lead to learned models that are strikingly similar in quality. 1
Algorithms for modelbased Gaussian hierarchical clustering
 SIAM Journal on Scientific Computing
, 1998
"... 1 Funded by the O ce of Naval Research under contracts N000149610192 and N00014961 ..."
Abstract

Cited by 54 (11 self)
 Add to MetaCart
1 Funded by the O ce of Naval Research under contracts N000149610192 and N00014961
Modelbased clustering and visualization of navigation patterns on a web site
 Data Mining and Knowledge Discovery
, 2003
"... We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through th ..."
Abstract

Cited by 53 (0 self)
 Add to MetaCart
We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach weemployis modelbased (as opposed to distancebased) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of rstorder Markov models using the ExpectationMaximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data � and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on usertra c data from msnbc.com. Keywords: Modelbased clustering, sequence clustering, data visualization, Internet, web 1
MCLUST: Software for Modelbased Cluster Analysis
 Journal of Classification
, 1999
"... MCLUST is a software package for cluster analysis written in Fortran and interfaced to the SPLUS commercial software package1. It implements parameterized Gaussian hierarchical clustering algorithms [16, 1, 7] and the EM algorithm for parameterized Gaussian mixture models [5, 13, 3, 14] with the po ..."
Abstract

Cited by 52 (16 self)
 Add to MetaCart
MCLUST is a software package for cluster analysis written in Fortran and interfaced to the SPLUS commercial software package1. It implements parameterized Gaussian hierarchical clustering algorithms [16, 1, 7] and the EM algorithm for parameterized Gaussian mixture models [5, 13, 3, 14] with the possible addition of a Poisson noise term. MCLUST also includes functions that combine hierarchical clustering, EM and the Bayesian Information Criterion (BIC) in a comprehensive clustering strategy [4, 8]. Methods of this type have shown promise in a number of practical applications, including character recognition [16], tissue segmentation [1], mine eld and seismic fault detection [4], identi cation of textile aws from images [2], and classi cation of astronomical data [3, 15]. Aweb page with related links can be found at
Regularized Gaussian Discriminant Analysis Through Eigenvalue Decomposition
 Journal of the American Statistical Association
, 1996
"... Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis in the Gaussian framework. RDA makes use of two regularization parameters to design an intermediate classi cation rule between linear and quadratic discriminant analysis. In this paper, we propose an alternative a ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis in the Gaussian framework. RDA makes use of two regularization parameters to design an intermediate classi cation rule between linear and quadratic discriminant analysis. In this paper, we propose an alternative approach to design classi cation rules which have also a median position between linear and quadratic discriminant analysis. Our approach is based on the reparametrization of the covariance matrix k of a group Gk in terms of its eigenvalue decomposition, k = kDkAkD 0 k where k speci es the volume of Gk, Ak its shape, and Dk its orientation. Variations on constraints concerning k�Ak and Dk lead to 14 discrimination models of interest. For each model, we derived the maximum likelihood parameter estimates and our approach consists in selecting the model among the 14 possible models by minimizing the samplebased estimate of future misclassi cation risk by crossvalidation. Numerical experiments show favorable behavior of this approach as compared to RDA.
Linear Flaw Detection in Woven Textiles Using ModelBased Clustering
 Pattern Recognition Letters
, 1997
"... We combine imageprocessing techniques with a powerful new statistical methodology to test for and nd the location of linear production faults in woven textiles. Our approach detects an alignment pattern in preprocessed images via modelbased clustering and uses an approximate Bayes factor to assess ..."
Abstract

Cited by 30 (13 self)
 Add to MetaCart
We combine imageprocessing techniques with a powerful new statistical methodology to test for and nd the location of linear production faults in woven textiles. Our approach detects an alignment pattern in preprocessed images via modelbased clustering and uses an approximate Bayes factor to assess the evidence for the presence of a defect. Results are shown for some representative examples, and the associated software has been made available on the Internet. Keywords. Modelbased clustering, pattern recognition, Bayesian cluster analysis, machine vision, industrial inspection. Supported by O ceofNaval Research under contracts N000149610192 and N000149610330. y Corresponding author.
Non Parametric Maximum Likelihood Estimation of Features in . . .
, 1996
"... This paper addresses the problem of estimating the support domain of a bounded point process in presence of background noise. This situation occurs for example in the detection of a mine field from aerial observations. A Maximum Likelihood Estimator for a mixture of uniform point processes is derive ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
This paper addresses the problem of estimating the support domain of a bounded point process in presence of background noise. This situation occurs for example in the detection of a mine field from aerial observations. A Maximum Likelihood Estimator for a mixture of uniform point processes is derived using a natural partition of the space defined by the data themselves: the Voronoï tessellation. The methodology is tested on simulations and compared to a modelbased clustering technique.
THE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND tCOMPONENTS
 JOURNAL OF STATISTICAL SOFTWARE
, 1999
"... We consider the fitting of normal or tcomponent mixturemodels to multivariate data, using maximum likelihood via the EM algorithm. This approach requires the initial specification of an initial estimate of the vector of unknown parameters, or equivalently, of an initial classification of the data w ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
We consider the fitting of normal or tcomponent mixturemodels to multivariate data, using maximum likelihood via the EM algorithm. This approach requires the initial specification of an initial estimate of the vector of unknown parameters, or equivalently, of an initial classification of the data with respect to the components of the mixturemodel under fit. We describe an algorithm called EMMIX that automatically undertakes this fitting, including the provision of suitable initial values if not supplied by the user. The EMMIX algorithm has several options, including the option to carry out a resamplingbased test for the number of components in the mixture model.