Results 1 
9 of
9
Mixtures of skewt factor analyzers
, 2013
"... In this paper, we introduce a mixture of skewt factor analyzers as well as a family of mixture models based thereon. The mixture of skewt distributions model that we use arises as a limiting case of the mixture of generalized hyperbolic distributions. Like their Gaussian and tdistribution analog ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we introduce a mixture of skewt factor analyzers as well as a family of mixture models based thereon. The mixture of skewt distributions model that we use arises as a limiting case of the mixture of generalized hyperbolic distributions. Like their Gaussian and tdistribution analogues, our mixture of skewt factor analyzers are very wellsuited to the modelbased clustering of highdimensional data. Imposing constraints on components of the decomposed covariance parameter results in the development of eight flexible models. The alternating expectationconditional maximization algorithm is used for model parameter estimation and the Bayesian information criterion is used for model selection. The models are applied to both real and simulated data, giving superior clustering results compared to a wellestablished family of Gaussian mixture models. 1
The skewt factor analysis model
"... Factor analysis is a classical data reduction technique that seeks a potentially lower number of unobserved variables that can account for the correlations among the observed variables. This paper presents an extension of the factor analysis model by assuming jointly a restricted version of multiv ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Factor analysis is a classical data reduction technique that seeks a potentially lower number of unobserved variables that can account for the correlations among the observed variables. This paper presents an extension of the factor analysis model by assuming jointly a restricted version of multivariate skew t distribution for the latent factors and unobservable errors, called the skewt factor analysis model. The proposed model shows robustness to violations of normality assumptions of the underlying latent factors and provides flexibility in capturing extra skewness as well as heavier tails of the observed data. A computationally feasible ECM algorithm is developed for computing maximum likelihood estimates of the parameters. The usefulness of the proposed methodology is illustrated by a reallife example and results also demonstrates its better performance over various existing methods. Key words: ECM algorithm; ML estimation; SNFA model; STFA model; rMSN distribution; rMST distribution
Mixtures of common skewt factor analyzers
, 2013
"... A mixture of common skewt factor analyzers model is introduced for modelbased clustering of highdimensional data. By assuming common component factor loadings, this model allows clustering to be performed in the presence of a large number of mixture components or when the number of dimensions is ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
A mixture of common skewt factor analyzers model is introduced for modelbased clustering of highdimensional data. By assuming common component factor loadings, this model allows clustering to be performed in the presence of a large number of mixture components or when the number of dimensions is too large to be wellmodelled by the mixtures of factor analyzers model or a variant thereof. Furthermore, assuming that the component densities follow a skewt distribution allows robust clustering of skewed data. This paper is the first time that skewed common factors have been used, and it marks an important step in robust clustering and classification of high dimensional data. The alternating expectationconditional maximization algorithm is employed for parameter estimation. We demonstrate excellent clustering performance when our mixture of common skewt factor analyzers model is applied to real and simulated data. 1
Variational Bayes Approximations for Clustering via Mixtures of Normal Inverse Gaussian Distributions
"... Parameter estimation for modelbased clustering using a finite mixture of normal inverse Gaussian (NIG) distributions is achieved through variational Bayes approximations. Univariate NIG mixtures and multivariate NIG mixtures are considered. The use of variational Bayes approximations here is a sub ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Parameter estimation for modelbased clustering using a finite mixture of normal inverse Gaussian (NIG) distributions is achieved through variational Bayes approximations. Univariate NIG mixtures and multivariate NIG mixtures are considered. The use of variational Bayes approximations here is a substantial departure from the traditional EM approach and alleviates some of the associated computational complexities and uncertainties. Our variational algorithm is applied to simulated and real data. The paper concludes with discussion and suggestions for future work.
NonGaussian Mixtures for Dimension Reduction, Clustering, Classification, and Discriminant Analysis
"... We introduce a method for dimension reduction with clustering, classification, or discriminant analysis. This mixture modelbased approach is based on fitting generalized hyperbolic mixtures on a reduced subspace within the paradigm of modelbased clustering, classification, or discriminant analysi ..."
Abstract
 Add to MetaCart
(Show Context)
We introduce a method for dimension reduction with clustering, classification, or discriminant analysis. This mixture modelbased approach is based on fitting generalized hyperbolic mixtures on a reduced subspace within the paradigm of modelbased clustering, classification, or discriminant analysis. A reduced subspace of the data is derived by considering the extent to which group means and group covariances vary. The members of the subspace arise through linear combinations of the original data, and are ordered by importance via the associated eigenvalues. The observations can be projected onto the subspace, resulting in a set of variables that captures most of the clustering information available. The use of generalized hyperbolic mixtures gives a robust framework capable of dealing with skewed clusters. Although dimension reduction is increasingly in demand across many application areas, the authors are most familiar with biological applications and so two of the three real data examples are within that sphere. Simulated data are also used for illustration. The approach introduced herein can be considered the most general such approach available, and so we compare results to three special and limiting cases. We also compare with well several established techniques. Across all comparisons, our approach performs remarkably well.
Model Based Clustering of HighDimensional Binary Data
"... We propose a mixture of latent trait models with common slope parameters (MCLT) for modelbased clustering of highdimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based on a ddimensional Gaussian latent variable, is extended by ..."
Abstract
 Add to MetaCart
(Show Context)
We propose a mixture of latent trait models with common slope parameters (MCLT) for modelbased clustering of highdimensional binary data, a data type for which few established methods exist. Recent work on clustering of binary data, based on a ddimensional Gaussian latent variable, is extended by incorporating common factor analyzers. Accordingly, our approach facilitates a lowdimensional visual representation of the clusters. We extend the model further by the incorporation of random block effects. The dependencies in each block are taken into account through blockspecific parameters that are considered to be random variables. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. Our approach is demonstrated on real and simulated data. 1
Comment on “Comparing two formulations of skew distributions with special reference to modelbased
"... In this paper, we comment on the recent comparison in Azzalini et al. (2014) of two different distributions proposed in the literature for the modelling of data that have asymmetric and possibly longtailed clusters. They are referred to as the restricted and unrestricted skew tdistributions by Lee ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we comment on the recent comparison in Azzalini et al. (2014) of two different distributions proposed in the literature for the modelling of data that have asymmetric and possibly longtailed clusters. They are referred to as the restricted and unrestricted skew tdistributions by Lee and McLachlan (2013a). Firstly, we wish to point out that in Lee and McLachlan (2014b), which preceded this comparison, it is shown how a distribution belonging to the broader class, the canonical fundamental skew t (CFUST) class, can be fitted with essentially no additional computational effort than for the unrestricted distribution. The CFUST class includes the restricted and unrestricted distributions as special cases. Thus the user now has the option of letting the data decide as to which model is appropriate for their particular dataset. Secondly, we wish to identify several statements in the comparison by Azzalini et al. (2014) that demonstrate a serious misunderstanding of the reporting of results in Lee and McLachlan (2014a) on the relative performance of these two skew tdistributions. In particular, there is an apparent misunderstanding of the nomenclature that has been adopted to distinguish between these two models. Thirdly, we take the opportunity to report here that we have obtained improved fits, in some cases a marked improvement, for the unrestricted model for various cases corresponding to different combinations of the variables in the two real datasets that were used in Azzalini et al. (2014) to mount their claims on the relative superiority of the restricted and unrestricted models. For one case the misclassification rate of our fit under the unrestricted model is less than one third of their reported error rate. Our results thus reverse their claims on the ranking of the restricted and unrestricted models in such cases. 1