## Wallace’s Approach to Unsupervised Learning: The Snob Program (2008)

### Cached

### Download Links

Citations: | 2 - 0 self |

### BibTeX

@MISC{Jorgensen08wallace’sapproach,

author = {Murray A. Jorgensen and Geoffrey J. McLachlan},

title = {Wallace’s Approach to Unsupervised Learning: The Snob Program},

year = {2008}

}

### OpenURL

### Abstract

### Citations

8089 | Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird, et al. - 1977 |

1500 |
A k-means clustering algorithm
- Hartigan, Wong
(Show Context)
Citation Context ...rse, once such a criterion is proposed, it is a small step to seek a clustering that optimizes it. That there was and is a need for such a criterion cannot be denied. Consulting works such as [12] or =-=[13]-=- reveals an embarrassment of clustering methods and the alternatives have only increased with the passage of time. Consider, for example, traditional hierarchical clustering methods based on a distanc... |

942 |
The EM Algorithm and Extensions
- Mclachlan, Krishnan
- 1996
(Show Context)
Citation Context ...vation (as that is how the classes are encoded). If the above procedure were used directly to assign observations to clusters, there would be some similarity between Snob and the stochastic EM method =-=[19, 20]-=-. However instead the fqjg for the ith thing are used to define weights wij and the ‘distribution adjustment’ for the jth class is carried out with all data but with weights wij. (This is not entirely... |

879 |
Mixture Models
- Mclachlan, Basford
- 1988
(Show Context)
Citation Context ...18]. 6. THE EM ALGORITHM FOR MIXTURE MODELS Outside the MML community mixture models such as f ðyiÞ XT j1 pj f ðyi; f jÞ are commonly fitted by maximum likelihood using the EM algorithm (see, e.g. =-=[22]-=-). Here we seek to maximize the likelihood LðuÞ YS i1 XT " # pj f ðyi; fjÞ ; j1 where u is the vector of unknown parameters, containing the mixing proportions pj and the component parameters fj fo... |

710 |
Cluster Analysis for Applications
- Anderberg
- 1973
(Show Context)
Citation Context ...N Of course, once such a criterion is proposed, it is a small step to seek a clustering that optimizes it. That there was and is a need for such a criterion cannot be denied. Consulting works such as =-=[12]-=- or [13] reveals an embarrassment of clustering methods and the alternatives have only increased with the passage of time. Consider, for example, traditional hierarchical clustering methods based on a... |

466 | Mixture Models: Inference and Applications to Clustering - McLachlan, Basford - 1988 |

312 |
Model-based Gaussian and non-Gaussian clustering
- Banfield, Raftery
- 1993
(Show Context)
Citation Context ...meters. Scott and Symons [11] note that many classical cluster analysis methods for observations with continuous attributes can be seen as mixture modelling with full assignment. Banfield and Raftery =-=[16]-=- describe a program for clustering with full assignment that builds on the work of Scott and Symons [11]. McLachlan and Basford [17, p. 31–35] discuss maximum likelihood estimation under the full assi... |

311 |
An Information Measure for Classification
- Wallace, Boulton
- 1968
(Show Context)
Citation Context ...nalysis or numerical taxonomy. We focus our attention on the pioneering Snob program, wryly so-called because it places individuals in classes (C.S. Wallace, personal communication). The Snob program =-=[1, 2]-=- represents a pioneer contribution to a model-based approach to unsupervised learning. At the same time, as Snob made an early contribution to model-based clustering (as unsupervised learning based on... |

187 |
Estimation and inference by compact coding
- Wallace, Freeman
- 1987
(Show Context)
Citation Context ...ix FC(u), in the usual way. We can also define the complete-data conditional expected information matrix I Cðu; yÞ EuICðu; y; zÞjyŠ: 7. THE WALLACE–FREEMAN APPROACH TO INFERENCE Wallace and Freeman =-=[23]-=- present what seems to be the most comprehensive approach to MML inference published prior to [18]. They motivate and present the following estimate of message length: log hðuÞþ 1 log jFðuÞj log f ðy;... |

86 |
2005): Statistical and Inductive Inference by Minimum Message
- Wallace
(Show Context)
Citation Context ...mated. As the early version of Snob fully assigned observations to classes, it is subject to this sort of bias. For this reason, Snob was revised to work under partial assignment. In section 6.8.2 of =-=[18]-=- Wallace also considers a mixture of two univariate normal distributions to show that full assignment leads to inconsistent parameter estimates. 5. PARTIAL ASSIGNMENT FOR SNOB Wallace [14] describes s... |

68 | Pattern Clustering by Multivariate Mixture Analysis - Wolfe - 1970 |

63 | M.J.: Clustering Methods Based on Likelihood Ratio Criteria
- Scott, Symons
- 1971
(Show Context)
Citation Context ...9] also considered a mixture model-based approach, primarily focussing on the assumption of normality for the component distributions. In a related approach, Hartley and Rao [10] and Scott and Symons =-=[11]-=- considered the so-called classification—likelihood method of clustering. As to be discussed later in more detail, the distinction between the mixture and classification approaches to clustering is on... |

30 | A Program for Numerical Classification - Boulton, Wallace - 1970 |

24 | Estimation of parameters for a mixture of normal distributions - HASSELBLAD - 1966 |

20 |
An Improved Program for Classification
- Wallace
- 1984
(Show Context)
Citation Context ...tial classification, which will then be improved in terms of Snob’s message length criterion, or by a random start with a given number of classes. A feature of Snob added at the revision described in =-=[14]-=- is similar to attribute selection, but more flexible. A facility is provided whereby an attribute may be declared ‘significant for a class’ and distributional parameters estimated specifically for th... |

15 |
Mixture model clustering using the MULTIMIX program
- Hunt, Jorgensen
- 1999
(Show Context)
Citation Context ...nob’s model search strategy may have to be rethought as the possibility of groups of multivariate normal vectors of attributes opens up a much larger model space to be searched in. Hunt and Jorgensen =-=[33]-=- consider the maximum likelihood fitting of mixture models similar to Snob for continuous and categorical variables. The continuous variables may be assumed to have block-diagonal covariance structure... |

12 | A Monte Carlo study of the sampling distributions of the likelihood ratio for mixtures of multinormal distributions - Wolfe - 1971 |

12 | Estimation of finite mixture of distributions from the exponential family - Hasselblad - 1969 |

11 |
Highdimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length
- Bouguila, Ziou
(Show Context)
Citation Context ... û . The right-hand side of Equation (3) is termed the Laplace empirical criterion (LEC) by McLachlan and Peel [22]. The apparent similarity of the LEC and MML criteria is confirmed by a recent study =-=[25]-=-, which found for a simulation study involving generalized Dirichlet mixtures that MML and LEC performed similarly in determining the number of clusters and better than the other alternatives consider... |

10 |
Unsupervised Learning of Correlated Multivariate Gaussian Mixture Models Using MML
- Dowe
- 2003
(Show Context)
Citation Context ...of successful MML approaches to genuine multivariate distributions, so extending Snob to have the ability to fit more complicated component distributions is a harder problem. However, Agusta and Dowe =-=[32]-=- have succeeded in developing an MML approach to fitting mixtures of multivariate normal distributions. It appears that Snob’s model search strategy may have to be rethought as the possibility of grou... |

9 | Estimating the components of a mixture of two normal distributions - Day - 1969 |

8 | MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions
- Wallace, Dowe
- 2000
(Show Context)
Citation Context ...ibution to the research programme envisaged in [29] appears in [30] which gives more information about the message length approximations used. The most recent reference on the Snob program as such is =-=[31]-=-. Snob has been extended to allow univariate Poisson and von Mises circular variables (attributes) in addition to the normal and discrete variables originally allowed. Because Snob assumes that all va... |

7 |
Intrinsic classification of spatially correlated data
- Wallace
- 1998
(Show Context)
Citation Context ...id Dowe for many helpful conversations about minimum message length inference and the Snob program. SNOB TODAY Another strand in MML work in unsupervised learning began in 1998 with Wallace’s article =-=[29]-=- in which Wallace considers strategies for incorporating spatial information into mixture model clustering. The basic setup is in terms of Markov random fields. A contribution to the research programm... |

6 | Finding Overlapping Components with MML
- Baxter, Oliver
(Show Context)
Citation Context ...mixture model, but, in a weighted form, to the individual models for each component. This is similar to approximating the Fisher information matrix F (u) by the complete-data information matrix FC(u) =-=[26]-=-. We note that the determinants of the complete-data expected information matrix F C (u) and the (incomplete data) information matrix F (u) can be quite different. To see this note the rate of converg... |

5 | Minimum message length clustering of spatially-correlated data with varying inter-class penalties
- Visser, Dowe
- 2007
(Show Context)
Citation Context ...ategies for incorporating spatial information into mixture model clustering. The basic setup is in terms of Markov random fields. A contribution to the research programme envisaged in [29] appears in =-=[30]-=- which gives more information about the message length approximations used. The most recent reference on the Snob program as such is [31]. Snob has been extended to allow univariate Poisson and von Mi... |

4 |
Classification and estimation in analysis of variance problems
- Hartley, Rao
- 1968
(Show Context)
Citation Context ...statistical literature [3–9] also considered a mixture model-based approach, primarily focussing on the assumption of normality for the component distributions. In a related approach, Hartley and Rao =-=[10]-=- and Scott and Symons [11] considered the so-called classification—likelihood method of clustering. As to be discussed later in more detail, the distinction between the mixture and classification appr... |

2 | A Computer Program for the Computation of Maximum Likelihood Analysis of Types - Wolfe - 1965 |

1 | NORMIX: Computations for Estimating the Parameters of Multivariate Normal Mixtures of Distributions - Wolfe - 1967 |

1 |
EM algorithm
- Jorgensen
- 2001
(Show Context)
Citation Context ...vation (as that is how the classes are encoded). If the above procedure were used directly to assign observations to clusters, there would be some similarity between Snob and the stochastic EM method =-=[19, 20]-=-. However instead the fqjg for the ith thing are used to define weights wij and the ‘distribution adjustment’ for the jth class is carried out with all data but with weights wij. (This is not entirely... |

1 |
Minimum message length estimation using EM methods: a case study
- Jorgensen
- 2005
(Show Context)
Citation Context ...û ) can be quite different. Despite this there are situations in which the determinant of FC(û ) may replace the determinant of F (û ) in (1) with little effect on the estimated parameters. Jorgensen =-=[27]-=- shows that, in the case of the single-factor analysis model studied THE COMPUTER JOURNAL, Vol. 51 No. 5, 2008THE SNOB PROGRAM 577 by Wallace and Freeman [28], using an EM algorithm to implement a ve... |

1 |
Single factor analysis by minimum message length estimation
- Wallace, Freeman
- 1992
(Show Context)
Citation Context ...t on the estimated parameters. Jorgensen [27] shows that, in the case of the single-factor analysis model studied THE COMPUTER JOURNAL, Vol. 51 No. 5, 2008THE SNOB PROGRAM 577 by Wallace and Freeman =-=[28]-=-, using an EM algorithm to implement a version of MML in which the determinant of F (û ) is replaced by that of FC (û ) yields very similar results to those of [28]. Jorgensen [27] also shows that the... |