## Hierarchical Latent Class Models for Cluster Analysis (2002)

### Cached

### Download Links

- [www.cs.ust.hk]
- [www.cs.ust.hk]
- [www.aaai.org]
- [www.jmlr.org]
- [www.ai.mit.edu]
- [www.ai.mit.edu]
- [jmlr.org]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Machine Learning Research |

Citations: | 48 - 12 self |

### BibTeX

@ARTICLE{Zhang02hierarchicallatent,

author = {Nevin L. Zhang},

title = {Hierarchical Latent Class Models for Cluster Analysis},

journal = {Journal of Machine Learning Research},

year = {2002},

volume = {5},

pages = {230--237}

}

### Years of Citing Articles

### OpenURL

### Abstract

Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is often untrue. In this paper we propose hierarchical latent class models as a framework where the local dependence problem can be addressed in a principled manner. We develop a search-based algorithm for learning hierarchical latent class models from data. The algorithm is evaluated using both synthetic and real-world data.

### Citations

8618 | Elements of Information Theory - Cover, Thomas - 1983 |

8166 | Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird, et al. - 1977 |

2335 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...lgorithm for learning HLC models. Hill-climbing requires a scoring metric for comparing candidate models. In this work we experiment with four existing scoring metrics, namely AIC (Akaike 1974), BIC (=-=Schwarz 1978-=-), the Cheeseman-Stutz (CS) score (Cheeseman and Stutz 1995), and the holdout logarithmic score (LS) (Cowell et al. 1999) . Hill-climbing also requires the specification of a search space and search o... |

1870 |
A new look at the statistical model identification
- Akaike
- 1974
(Show Context)
Citation Context ...t a hill-climbing algorithm for learning HLC models. Hill-climbing requires a scoring metric for comparing candidate models. In this work we experiment with four existing scoring metrics, namely AIC (=-=Akaike 1974-=-), BIC (Schwarz 1978), the Cheeseman-Stutz (CS) score (Cheeseman and Stutz 1995), and the holdout logarithmic score (LS) (Cowell et al. 1999) . Hill-climbing also requires the specification of a searc... |

1346 |
Finding groups in data : an introduction to cluster analysis. Wiley series in probability and mathematical statistics
- Kaufman, Rousseeuw
- 1990
(Show Context)
Citation Context ...works, learning. Introduction Cluster analysis is the partitioning of similar objects into meaningful classes, when both the number of classes and the composition of the classes are to be determined (=-=Kaufman and Rousseeuw 1990-=-; Everitt 1993). In model-based clustering, it is assumed that the objects under study are generated by a mixture of probability distributions, with one component corresponding to each class. When the... |

832 | Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
- Durbin, Eddy, et al.
- 1998
(Show Context)
Citation Context ...namely (some approximation) of the marginal likelihood. The task of learning HLC models is similar to the reconstruction of phylogenetic trees, which is a major topic in biological sequence analysis (=-=Durbin et al. 1998-=-). As a matter of fact, phylogenetic trees are special HLC models where the model structures are binary (bifurcating) trees and all the variables share the same set of possible states. However, phylog... |

641 | Approximating discrete probability distributions with dependence trees - Chow, Liu - 1968 |

630 |
Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks
- Cowell, Dawid, et al.
- 1999
(Show Context)
Citation Context ...work we experiment with four existing scoring metrics, namely AIC (Akaike 1974), BIC (Schwarz 1978), the Cheeseman-Stutz (CS) score (Cheeseman and Stutz 1995), and the holdout logarithmic score (LS) (=-=Cowell et al. 1999-=-) . Hill-climbing also requires the specification of a search space and search operators. According to Theorem 3, a natural search space for our task is the set of all regular (unparameterized) HLC mo... |

481 |
Bayesian Classification (AutoClass): Theory and Results
- Cheeseman, Stutz
(Show Context)
Citation Context ... requires a scoring metric for comparing candidate models. In this work we experiment with four existing scoring metrics, namely AIC (Akaike 1974), BIC (Schwarz 1978), the Cheeseman-Stutz (CS) score (=-=Cheeseman and Stutz 1995-=-), and the holdout logarithmic score (LS) (Cowell et al. 1999) . Hill-climbing also requires the specification of a search space and search operators. According to Theorem 3, a natural search space fo... |

474 |
Cluster Analysis
- Everitt
- 1993
(Show Context)
Citation Context ...n Cluster analysis is the partitioning of similar objects into meaningful classes, when both the number of classes and the composition of the classes are to be determined (Kaufman and Rousseeuw 1990; =-=Everitt 1993-=-). In model-based clustering, it is assumed that the objects under study are generated by a mixture of probability distributions, with one component corresponding to each class. When the attributes of... |

231 |
Latent Variable Models and Factor Analysis
- Bartholomew
- 1987
(Show Context)
Citation Context ...s, with one component corresponding to each class. When the attributes of objects are continuous, cluster analysis is sometimes called latent profile analysis (Gibson 1959; Lazarsfeld and Henry 1968; =-=Bartholomew and Knott 1999-=-; Vermunt and Magidson 2002). When the attributes are categorical, cluster analysis is sometimes called latent class analysis (LCA) (Lazarsfeld and Henry 1968; Goodman 1974b; Bartholomew and Knott 199... |

177 | Efficient Approximations for the Marginal Likelihood of Incomplete Data Given a Bayesian Network. UAI-96 - Chickering, Heckerman - 1996 |

141 |
Latent Structure Analysis
- Lazarsfeld, Henry
- 1968
(Show Context)
Citation Context ...of probability distributions, with one component corresponding to each class. When the attributes of objects are continuous, cluster analysis is sometimes called latent profile analysis (Gibson 1959; =-=Lazarsfeld and Henry 1968-=-; Bartholomew and Knott 1999; Vermunt and Magidson 2002). When the attributes are categorical, cluster analysis is sometimes called latent class analysis (LCA) (Lazarsfeld and Henry 1968; Goodman 1974... |

112 |
Exploratory Latent Structure Analysis Using Both Identifiable and Unidentifiable Models
- Goodman
- 1974
(Show Context)
Citation Context ...d Henry 1968; Bartholomew and Knott 1999; Vermunt and Magidson 2002). When the attributes are categorical, cluster analysis is sometimes called latent class analysis (LCA) (Lazarsfeld and Henry 1968; =-=Goodman 1974-=-b; Bartholomew and Knott 1999; Uebersax 2001). There is also cluster analysis of mixed-mode data (Everitt 1993) where some attributes are continuous while others are categorical. This paper is concern... |

72 |
Maximum likelihood inference of protein phylogeny and the origin of chloroplasts
- Kishino, Miyata, et al.
- 1990
(Show Context)
Citation Context ... structure in Figure 1. The car5 Node-introduction is similar to an operator that PROMTL, a system for inferring phylogenetic trees, uses to search for optimal tree topologies via star decomposition (=-=Kishino et al. 1990-=-). The former is slightly less constrained than the latter in that it is allowed to create singly connected nodes as by-products. 6 Neighbor relocation is related to but significantly different than a... |

56 | Algorithms for model-based Gaussian hierarchical clustering
- Fraley
- 1997
(Show Context)
Citation Context ...C models because we do not know, a priori, the number of latent variables and their cardinalities. HLC models should not be confused with model-based hierarchical clustering (e.g. Hanson et al. 1991, =-=Fraley 1998-=-). In an LC model (or similar models with continuous manifest variables), there is only one latent variable and each state of the variable corresponds to a class. HLC models generalize LC models by al... |

54 | Stratified exponential families: graphical models and model selection, Ann - Geiger, Heckerman, et al. |

50 | A structural EM algorithm for phylogenetic inference - Friedman, Ninio, et al. |

45 |
An introduction to mathematical sociology
- Coleman
- 1964
(Show Context)
Citation Context ...AIC, our algorithm discovered exactly the same model. When LS was used, however, it computed a very different model which does not fit the data well. The second data set is known as the Coleman Data (=-=Coleman 1964-=-). It involves four binary manifest variables named A, B, C, and D. There are 3,398 records. This data set has been previously analyzed by Goodman (1974) and Hagenaars (1988). Goodman started with a 2... |

43 |
PAUP* 40: Phylogenetic analysis using parsimony (*and other methods), version 4.0b10
- Swofford
- 2003
(Show Context)
Citation Context ...Neighbor relocation is related to but significantly different than an operator called branch swapping that PAUP, a system for inferring phylogenetic trees, uses to search for optimal tree topologies (=-=Swofford 1998-=-). The latter includes what are called nearest neighbor interchange; subtree pruning and regrafting; and tree bisection/reconnection . 234 AAAI-02Logarithmic Score -35410 -35420 -35430 -35440 -35450 ... |

41 | Discovering hidden variables: A structure-based approach - Elidan, Lotner, et al. - 2001 |

32 | Bayesian classification with correlation and inheritance
- Cheeseman, Hanson, et al.
- 1991
(Show Context)
Citation Context ...does not work for HLC models because we do not know, a priori, the number of latent variables and their cardinalities. HLC models should not be confused with model-based hierarchical clustering (e.g. =-=Hanson et al. 1991-=-, Fraley 1998). In an LC model (or similar models with continuous manifest variables), there is only one latent variable and each state of the variable corresponds to a class. HLC models generalize LC... |

26 | rissanen: Intertwining themes in theories of model order estimation - Lanterman, Schwarz |

26 |
Latent class cluster analysis
- Vermunt, Magidson
- 2002
(Show Context)
Citation Context ...ponding to each class. When the attributes of objects are continuous, cluster analysis is sometimes called latent profile analysis (Gibson 1959; Lazarsfeld and Henry 1968; Bartholomew and Knott 1999; =-=Vermunt and Magidson 2002-=-). When the attributes are categorical, cluster analysis is sometimes called latent class analysis (LCA) (Lazarsfeld and Henry 1968; Goodman 1974b; Bartholomew and Knott 1999; Uebersax 2001). There is... |

25 | 2001, ‘Learning the dimensionality of hidden variables - Elidan, Friedman |

22 |
The analysis of systems of qualitative variables when some of the variables are unobservable: Part I - A modified latent structure approach
- Goodman
- 1974
(Show Context)
Citation Context ...d Henry 1968; Bartholomew and Knott 1999; Vermunt and Magidson 2002). When the attributes are categorical, cluster analysis is sometimes called latent class analysis (LCA) (Lazarsfeld and Henry 1968; =-=Goodman 1974-=-b; Bartholomew and Knott 1999; Uebersax 2001). There is also cluster analysis of mixed-mode data (Everitt 1993) where some attributes are continuous while others are categorical. This paper is concern... |

22 | Discrete factor analysis: Learning hidden variables in Bayesian networks - Martin, VanLehn - 1994 |

17 | Learning with Mixtures of Trees - Meilă-Predoviciu - 1999 |

16 | 2002, ‘Dimension correction for hierarchical latent class models - Kočka, Zhang |

14 | Constructing Hidden Variables in Bayesian Networks via Conceptual Clustering - Connolly - 1993 |

13 | Penalized likelihood
- Green
- 1999
(Show Context)
Citation Context ...ous. Let M be a parameterized HLC model and D be a set of i.i.d. samples generated by M. If M is not parsimonious, then there must exist another HLC model whose penalized loglikelihood score given D (=-=Green 1998-=-, Lanternman 2001) is greater than that of M. This means that, if one uses penalized loglikelihood for model selection, one would prefer this other parsimonious models over the nonparsimonious model M... |

10 |
Latent structure models with direct effects between indicators: local dependence models
- Hagenaars
- 1988
(Show Context)
Citation Context ...ariables contain overlapping information (Vermunt and Magidson 2002). The local dependence problem has attracted some attention in the LCA literature (Espeland & Handelman 1989; Garrett & Zeger 2000; =-=Hagenaars 1988-=-; Vermunt & Magidson 2000). Methods for detecting and modeling local dependence have been proposed. To detect local dependence, one typically compares observed and expected crossclassification frequen... |

9 |
Using latent class models to characterize and assess relative error in discrete measurements
- Espeland, Handelman
- 1989
(Show Context)
Citation Context ...assification because locally dependent manifest variables contain overlapping information (Vermunt and Magidson 2002). The local dependence problem has attracted some attention in the LCA literature (=-=Espeland & Handelman 1989-=-; Garrett & Zeger 2000; Hagenaars 1988; Vermunt & Magidson 2000). Methods for detecting and modeling local dependence have been proposed. To detect local dependence, one typically compares observed an... |

8 |
Three multivariate models: factor analysis, latent structure analysis and latent profile analysis
- Gibson
- 1959
(Show Context)
Citation Context ...by a mixture of probability distributions, with one component corresponding to each class. When the attributes of objects are continuous, cluster analysis is sometimes called latent profile analysis (=-=Gibson 1959-=-; Lazarsfeld and Henry 1968; Bartholomew and Knott 1999; Vermunt and Magidson 2002). When the attributes are categorical, cluster analysis is sometimes called latent class analysis (LCA) (Lazarsfeld a... |

7 |
Latent class model diagnosis
- Garrett, Zeger
- 2000
(Show Context)
Citation Context ...y dependent manifest variables contain overlapping information (Vermunt and Magidson 2002). The local dependence problem has attracted some attention in the LCA literature (Espeland & Handelman 1989; =-=Garrett & Zeger 2000-=-; Hagenaars 1988; Vermunt & Magidson 2000). Methods for detecting and modeling local dependence have been proposed. To detect local dependence, one typically compares observed and expected crossclassi... |

7 |
Latent Gold Users Guide
- Vermunt, Magidson
- 2000
(Show Context)
Citation Context ... overlapping information (Vermunt and Magidson 2002). The local dependence problem has attracted some attention in the LCA literature (Espeland & Handelman 1989; Garrett & Zeger 2000; Hagenaars 1988; =-=Vermunt & Magidson 2000-=-). Methods for detecting and modeling local dependence have been proposed. To detect local dependence, one typically compares observed and expected crossclassification frequencies for pairs of manifes... |

3 | DSM-III major depressive disorder in the community, a latent class analysis of data from the NIMH epidemiologic catchment area programme - EATON, DRYMAN, et al. - 1989 |

2 | A method for predicting individual HIV infection status in the absence of clinical information - Biggar, J, et al. - 1988 |

2 | Statistics for social data analysis (3rd Edition). F.E.Peacock - Bohrnstedt, Knoke - 1994 |

2 | Using Latent Class Models to analyze Response Patterns in Epidemiologic Mail Surveys - Kohlmann, Formann - 1997 |

2 | A practical guide to local dependence in latent class models. http://ourworld.compuserve.com/homepages/jsuebersax/ condep.htm - Uebersax - 2000 |

2 | Activity and severity of rheumatoid arthritis in Hannover/FRG and in one regional referral center - Wasmus, Kindel, et al. - 1989 |

2 | I-A Modified latent structure approach - Part |