## Model-Based Hierarchical Clustering (2000)

Venue: | In Proc. 16th Conf. Uncertainty in Artificial Intelligence |

Citations: | 21 - 0 self |

### BibTeX

@INPROCEEDINGS{Vaithyanathan00model-basedhierarchical,

author = {Shivakumar Vaithyanathan and Byron Dom},

title = {Model-Based Hierarchical Clustering},

booktitle = {In Proc. 16th Conf. Uncertainty in Artificial Intelligence},

year = {2000},

pages = {599--608},

publisher = {UAI}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key component of our model. Features can have either a unique distribution in every cluster or a common distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distribution correspond to the nodes (clusters) of the tree representing the hierarchy. We apply this general model to the problem of document clustering for which we use a multinomial likelihood function and Dirichlet priors. Our algorithm consists of a two-stage process wherein we first perform a flat clustering followed by a modified hierarchical agglomerative merging process that includes determining the features that will have common distributions over the merged clusters. The regularization induced...

### Citations

1039 |
Bayesian Theory
- Bernardo, Smith
- 1994
(Show Context)
Citation Context ... on recursively until the leaf clusters are reached. A simple example of such a cluster hierarchy is diagrammed in Figure 1. We note here that a model similar to ours has been discussed in [7]. 1 See =-=[3]-=- for an in-depth treatment of the Bayesian statistical paradigm. OEAE U 0 ; N 0 OEAE U 1 ; N 1 % % % % OEAE OEAE U 2 ; N 2 e e e e OEAE \Delta \Delta \Delta OEAE A A A OEAE \Delta \Delta \Delta OEAE A... |

622 | Scatter/gather:a cluster-based approach to browsing large document collections
- Cutting, Karger, et al.
- 1992
(Show Context)
Citation Context ...f documents. A more formal definition of these noise and useful features is provided in subsequent sections. In passing we note that document clustering is a rich area with several approaches such as =-=[12, 5]-=- The next section describes this hierarchical model and also derives the objective function, based on marginal likelihood, that describes such a model. In Section 3 we describe some approximate scheme... |

126 |
Reference Posterior Distributions for Bayesian Inference” (with discussion
- Bernardo
- 1979
(Show Context)
Citation Context ...f i g are referred to as hyperparameters. This integration results in the following expression, which is a product of terms that are instances of what is known as the MultinomialDirichletsdistribution=-=[2]-=-. The marginal likelihood can be written as below: p(D) = \Gamma(ff 0 ) \Gamma(ff 0 + n) M Y j=1 \Gamma(ff j + t j ) \Gamma(ff j ) ; (10) where t j = Psi=1 t i;j , n = P M j=1 t j . In the document cl... |

78 | An Experimental Comparison of Several Clustering and
- Meila, Heckerman
- 1998
(Show Context)
Citation Context ...ased hierarchical clustering of special cases of Gaussian generative models are described in Fraley [6]. In addition a model-based HAC algorithm based on a multinomial mixture model has been developed=-=[9]-=-. In the rest of the paper our references to HAC will be to the version of HAC used in a likelihood setting as described above. In particular we will be concentrating on multinomial mixture models. Ot... |

55 | Algorithms for Model-Based Gaussian Hierarchical Clustering
- Fraley
- 1998
(Show Context)
Citation Context ...s a cluster. A complete description of such model-based clustering can be found in [1]. Extensions to these generative models incorporating hierarchical agglomerative algorithms have also been studied=-=[6]-=-. These algorithms operate by merging clusters such that the resulting likelihood is maximized. Efficient algorithms for model-based hierarchical clustering of special cases of Gaussian generative mod... |

51 | The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data
- Hofmann
- 1999
(Show Context)
Citation Context ...ersion of HAC used in a likelihood setting as described above. In particular we will be concentrating on multinomial mixture models. Other hierarchical clustering algorithms in the literature include =-=[8]-=-, which describes a scheme to characterize text collections hierarchically based on a deterministic annealing algorithm. In this model, besides the latent variables used for clustering the documents a... |

40 | Using Machine Learning to Improve Information Access
- Sahami
- 1998
(Show Context)
Citation Context ...f documents. A more formal definition of these noise and useful features is provided in subsequent sections. In passing we note that document clustering is a rich area with several approaches such as =-=[12, 5]-=- The next section describes this hierarchical model and also derives the objective function, based on marginal likelihood, that describes such a model. In Section 3 we describe some approximate scheme... |

32 | Bayesian classification with correlation and inheritance
- Cheeseman, Hanson, et al.
- 1991
(Show Context)
Citation Context ...ters and so on recursively until the leaf clusters are reached. A simple example of such a cluster hierarchy is diagrammed in Figure 1. We note here that a model similar to ours has been discussed in =-=[7]-=-. 1 See [3] for an in-depth treatment of the Bayesian statistical paradigm. OEAE U 0 ; N 0 OEAE U 1 ; N 1 % % % % OEAE OEAE U 2 ; N 2 e e e e OEAE \Delta \Delta \Delta OEAE A A A OEAE \Delta \Delta \D... |

26 | Extracting Names from Natural-Language Text,” IBM Research Report 20338
- Ravin, Wacholder
- 1996
(Show Context)
Citation Context ... data corresponding to Figure 3. 4.3.2 Results on a Real Document Collection For the real-world data set we first performed a feature extraction step where the features were extracted as described in =-=[4, 11]-=-. The feature selection was performed using a distributional clustering algorithm as described in [14]. After feature selection we were left with a total of 14772 tokens. To enable appropriate compari... |

24 |
Model selection in unsupervised learning with applications to document clustering
- Vaithyanathan, Dom
- 1999
(Show Context)
Citation Context ... objective function. Section 4 discusses the experimental set-up, results and evaluation. Finally Section 5 describes some future work in this area. 2 Models for Unsupervised Learning In previous work=-=[14, 13]-=- we addressed the problem of flat (non-hierarchical) partitional (each data element belongs to one and only one cluster) clustering. Here we first describe that model in general terms and then extend ... |

23 |
Hierarchical model-based clustering for large datasets
- Posse
(Show Context)
Citation Context ...hm, which we denote by MHAC. The HAC algorithm starts with singleton clusters (i.e., each data point is present in its own cluster) and is thus a computationally very expensive algorithm. As noted in =-=[10]-=-, the lower parts of the dendrogram, created thus, provide no useful information. To overcome this problem Posse [10] , generated an initial set of clusters using a minimum spanning tree which was the... |

16 | Generalized model selection for unsupervised learning in high dimensions
- Vaithyanathan, Dom
- 1999
(Show Context)
Citation Context ... objective function. Section 4 discusses the experimental set-up, results and evaluation. Finally Section 5 describes some future work in this area. 2 Models for Unsupervised Learning In previous work=-=[14, 13]-=- we addressed the problem of flat (non-hierarchical) partitional (each data element belongs to one and only one cluster) clustering. Here we first describe that model in general terms and then extend ... |

6 |
Lexical Assistance at the Information Retrieval User Interface
- Byrd, Ravin
- 1995
(Show Context)
Citation Context ... data corresponding to Figure 3. 4.3.2 Results on a Real Document Collection For the real-world data set we first performed a feature extraction step where the features were extracted as described in =-=[4, 11]-=-. The feature selection was performed using a distributional clustering algorithm as described in [14]. After feature selection we were left with a total of 14772 tokens. To enable appropriate compari... |

1 |
Hierarchical model-based clustering for large datasets
- Banfield, Raftery
- 1993
(Show Context)
Citation Context ...a is generated by a mixture of underlying probability distributions where each of the components can be interpreted as a cluster. A complete description of such model-based clustering can be found in =-=[1]-=-. Extensions to these generative models incorporating hierarchical agglomerative algorithms have also been studied[6]. These algorithms operate by merging clusters such that the resulting likelihood i... |