Results 1 -
6 of
6
Resampling Method For Unsupervised Estimation Of Cluster Validity
- Neural Computation
, 2001
"... We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A figure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters which are stable against resampling give ris ..."
Abstract
-
Cited by 56 (3 self)
- Add to MetaCart
We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A figure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters which are stable against resampling give rise to local maxima of this figure of merit. This is presented first for a one-dimensional data set, for which an analytic approximation for the figure of merit is derived and compared with numerical measurements. Next, the applicability of the method is demonstrated for higher dimensional data, including gene microarray expression data.
Unsupervised Learning Using MML
- IN MACHINE LEARNING: PROCEEDINGS OF THE THIRTEENTH INTERNATIONAL CONFERENCE (ICML 96
, 1996
"... This paper discusses the unsupervised learning problem. An important part of the unsupervised learning problem is determining the number of constituent groups (components or classes) which best describes some data. We apply the Minimum Message Length (MML) criterion to the unsupervised learning prob ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
This paper discusses the unsupervised learning problem. An important part of the unsupervised learning problem is determining the number of constituent groups (components or classes) which best describes some data. We apply the Minimum Message Length (MML) criterion to the unsupervised learning problem, modifying an earlier such MML application. We give an empirical comparison of criteria prominent in the literature for estimating the number of components in a data set. We conclude that the Minimum Message Length criterion performs better than the alternatives on the data considered here for unsupervised learning tasks.
An empirical comparison of logit choice models with discrete versus continuous representations of heterogeneity
- Journal of Marketing Research
, 2002
"... Currently, there is an important debate about the relative merits of models with discrete and continuous representations of consumer heterogeneity. In a recent JMR study, Andrews, Ansari, and Currim (2002; hereafter AAC) compared metric conjoint analysis models with discrete and continuous represent ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Currently, there is an important debate about the relative merits of models with discrete and continuous representations of consumer heterogeneity. In a recent JMR study, Andrews, Ansari, and Currim (2002; hereafter AAC) compared metric conjoint analysis models with discrete and continuous representations of heterogeneity and found no differences between the two models with respect to parameter recovery and prediction of ratings for holdout profiles. Models with continuous representations of heterogeneity fit the data better than models with discrete representations of heterogeneity. The goal of the current study is to compare the relative performance of logit choice models with discrete versus continuous representations of heterogeneity in terms of the accuracy of household-level parameters, fit, and forecasting accuracy. To accomplish this goal, the authors conduct an extensive simulation experiment with logit models in a scanner data context, using an experimental design based on AAC and other recent simulation studies. One of the main findings is that models with continuous and discrete representations of heterogeneity recover household-level parameter estimates and predict holdout choices about equally well except when the number of purchases per household is small, in which case the models with continuous representations perform very poorly. As in the AAC study, models with continuous representations of heterogeneity fit the data better.
Finding Overlapping Components with MML
- Statistics and Computing
, 2000
"... We use minimum message length (MML) estimation for mixture modelling. MML estimates are derived to choose the number of components in the mixture model to best describe the data and to estimate the parameters of the component densities for Gaussian mixture models. An empirical comparison of criteria ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We use minimum message length (MML) estimation for mixture modelling. MML estimates are derived to choose the number of components in the mixture model to best describe the data and to estimate the parameters of the component densities for Gaussian mixture models. An empirical comparison of criteria prominent in the literature for estimating the number of components in a data set is performed.
Fitting mixtures of Kent distributions to aid in joint set identification
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2001
"... When examining a rock mass, joint sets and their orientations can play a significant role with regard to how the rock mass will behave. To identify joint sets present in the rock mass, the orientation of individual fracture planes can be measured on exposed rock faces and the resulting data can be e ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
When examining a rock mass, joint sets and their orientations can play a significant role with regard to how the rock mass will behave. To identify joint sets present in the rock mass, the orientation of individual fracture planes can be measured on exposed rock faces and the resulting data can be examined for heterogeneity. In this article, the expectation–maximization algorithm is used to fit mixtures of Kent component distributions to the fracture data to aid in the identification of joint sets. An additional uniform component is also included in the model to accommodate the noise present in the data.
ASSESSING THE NUMBER OF COMPONENTS IN MIXTURE MODELS: A REVIEW. ABSTRACT
"... Despite the widespread application of finite mixture models, the decision of how many classes are required to adequately represent the data is, according to many authors, an important, but unsolved issue. This work aims to review, describe and organize the available approaches designed to help the s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Despite the widespread application of finite mixture models, the decision of how many classes are required to adequately represent the data is, according to many authors, an important, but unsolved issue. This work aims to review, describe and organize the available approaches designed to help the selection of the adequate number of mixture components (including Monte Carlo test procedures, information criteria and classification-based criteria); we also provide some published simulation results about their relative performance, with the purpose of identifying the scenarios where each criterion is more effective (adequate). Key words: Finite mixture; number of mixture components; information criteria; simulation studies.

