## Likelihood based hierarchical clustering (2004)

### Cached

### Download Links

Venue: | IEEE Trans. on Signal Processing |

Citations: | 14 - 5 self |

### BibTeX

@ARTICLE{Castro04likelihoodbased,

author = {R. M. Castro and Student Member and M. J. Coates and R. D. Nowak},

title = {Likelihood based hierarchical clustering},

journal = {IEEE Trans. on Signal Processing},

year = {2004},

volume = {52},

pages = {2308--2321}

}

### OpenURL

### Abstract

This paper develops a new method for hierarchical clustering. Unlike other existing clustering schemes, our method is based on a generative, tree-structured model that represents relationships between the objects to be clustered, rather than directly modeling properties of objects themselves. In certain problems, this generative model naturally captures the physical mechanisms responsible for relationships among objects, for example, in certain evolutionary tree problems in genetics and communication network topology identification. The paper examines the networking problem in some detail, to illustrate the new clustering method. More broadly, the generative model may not reflect actual physical mechanisms, but it nonetheless provides a means for dealing with errors in the similarity matrix, simultaneously promoting two desirable features in clustering: intra-class similarity and inter-class dissimilarity.

### Citations

2307 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ... do not reflect the underlying model we wish to identify. The usual strategy for tackling this issues is to weight the complexity of the models involved, penalizing models that are more complex [22], =-=[23]-=-. The basic idea behind penalized estimators is to find the “best” simple model. In this approach, instead of maximizing the profile likelihood we will maximize the functional Lλ(x|T) = L(x|T)exp(−λn(... |

2243 |
Equation of state calculations by fast computing machines
- Metropolis, Rosenbluth, et al.
- 1953
(Show Context)
Citation Context ...g [18]. We do not pursue such enhancements in this paper, but believe that it is a fertile avenue for future work. One possible way to perform the sampling is to use the Metropolis-Hastings algorithm =-=[19]-=-, [20]. For this we need to construct a irreducible Markov chain with state space F (note that in this case the state space is finite), so that each state corresponds to a tree. We allow only certain ... |

1300 | Data clustering: A review - Jain, Murty, et al. - 1999 |

1217 |
Monte carlo sampling methods using markov chains and their applications
- Hastings
(Show Context)
Citation Context .... We do not pursue such enhancements in this paper, but believe that it is a fertile avenue for future work. One possible way to perform the sampling is to use the Metropolis-Hastings algorithm [19], =-=[20]-=-. For this we need to construct a irreducible Markov chain with state space F (note that in this case the state space is finite), so that each state corresponds to a tree. We allow only certain transi... |

546 |
Markov Chain Monte Carlo in practice
- Gilks, Richardson, et al.
- 1996
(Show Context)
Citation Context ...m we present is relatively simple and serves mainly to illustrate the basic methodology. There are many strategies that can be applied to improve the performance, such as smart restarts and annealing =-=[18]-=-. We do not pursue such enhancements in this paper, but believe that it is a fertile avenue for future work. One possible way to perform the sampling is to use the Metropolis-Hastings algorithm [19], ... |

498 |
Stochastic Complexity
- Rissanen
- 1989
(Show Context)
Citation Context ...y, and do not reflect the underlying model we wish to identify. The usual strategy for tackling this issues is to weight the complexity of the models involved, penalizing models that are more complex =-=[22]-=-, [23]. The basic idea behind penalized estimators is to find the “best” simple model. In this approach, instead of maximizing the profile likelihood we will maximize the functional Lλ(x|T) = L(x|T)ex... |

444 | Unsupervised learning by probabilistic latent semantic analysis
- Hofmann
(Show Context)
Citation Context ...ovariance matrix σ 2 I) this leads essentially to the well known Unweighted Pair-Group Average Method (UPGMA) [10]. Other model-based approaches have been used for hierarchical clustering in the past =-=[11]-=-–[14]. Such methods usually model the clusters directly, as Gaussian components, for example. The cluster models induce a distance measure. The work of Banfield and Raftery [5] is representative of th... |

312 |
Model-based Gaussian and non-Gaussian clustering
- Banfield, Raftery
- 1993
(Show Context)
Citation Context ...ction of a tree that fits the particular realization of the measurements instead of the true underlying tree structure. Hierarchical clustering is a widely used approach, which has a long history [1]–=-=[5]-=- and is especially popular for document clustering [6]–[9]. Most approaches to hierarchical clustering are agglomerative algorithms that follow a simple methodology [10], and proceed by repeatedly app... |

216 |
an efficient data clustering method for very large databases
- Zhang, Ramakrishnan, et al.
- 1996
(Show Context)
Citation Context ...e measurements instead of the true underlying tree structure. Hierarchical clustering is a widely used approach, which has a long history [1]–[5] and is especially popular for document clustering [6]–=-=[9]-=-. Most approaches to hierarchical clustering are agglomerative algorithms that follow a simple methodology [10], and proceed by repeatedly applying four steps: (i) choose the pair of nodes with the hi... |

182 |
Finding Groups in Data
- Kaufman, PJ
- 1990
(Show Context)
Citation Context ...h, which has a long history [1]–[5] and is especially popular for document clustering [6]–[9]. Most approaches to hierarchical clustering are agglomerative algorithms that follow a simple methodology =-=[10]-=-, and proceed by repeatedly applying four steps: (i) choose the pair of nodes with the highest similarity; (ii) merge the pair into a new node/cluster; (iii) update the similarities between the new no... |

134 |
Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions
- Tanner
- 1996
(Show Context)
Citation Context ...ee in the feasible set F ′ is equally likely, and trees outside this set have probability zero. There are numerous ways of drawing samples from a distribution that is known up to a normalizing factor =-=[17]-=-. Here we outline an MCMC approach. The algorithm we present is relatively simple and serves mainly to illustrate the basic methodology. There are many strategies that can be applied to improve the pe... |

106 |
A survey of recent advances in hierarchical clustering algorithms
- Murtagh
- 1983
(Show Context)
Citation Context ..., revealing that nodes 8 and 9 have a common parent in the tree. Using a simple agglomerative bottom-up procedure, following the same conceptual framework as many hierarchical clustering methods [1], =-=[4]-=-, one can recover the underlying tree. The following result, appearing in a networking context in [15], ensures that for a similarity matrix γ satisfying the monotonicity property, the set of pairwise... |

105 | Optimization and simplification of hierarchical clusterings - Fisher - 1995 |

91 | Inference of multicast routing trees and bottleneck bandwidths using end-to-end measurements
- Ratnasamy, McCanne
- 1999
(Show Context)
Citation Context ... 15sconnections between these. The problem we address is the identification of the network topology based on end-to-end measurements, which is a practical problem relevant in the networking community =-=[25]-=-–[28]. Knowledge of the network topology is essential for tasks like monitoring and provisioning a network. There are tools, such as traceroute, that rely on close cooperation from the network interna... |

85 | Network tomography: recent developments
- Castro, Coates, et al.
(Show Context)
Citation Context ...onnections between these. The problem we address is the identification of the network topology based on end-to-end measurements, which is a practical problem relevant in the networking community [25]–=-=[28]-=-. Knowledge of the network topology is essential for tasks like monitoring and provisioning a network. There are tools, such as traceroute, that rely on close cooperation from the network internal dev... |

79 | Implementing agglomerative hierarchic clustering algorithms for use in document retrieval - Voorhees - 1986 |

75 |
Recent trends in hierarchic document clustering: A critical review
- WILLET
- 1988
(Show Context)
Citation Context ...f the measurements instead of the true underlying tree structure. Hierarchical clustering is a widely used approach, which has a long history [1]–[5] and is especially popular for document clustering =-=[6]-=-–[9]. Most approaches to hierarchical clustering are agglomerative algorithms that follow a simple methodology [10], and proceed by repeatedly applying four steps: (i) choose the pair of nodes with th... |

74 | An Analysis of Recent Work on Clustering Algorithms
- Fasulo
- 1999
(Show Context)
Citation Context ...selection of a tree that fits the particular realization of the measurements instead of the true underlying tree structure. Hierarchical clustering is a widely used approach, which has a long history =-=[1]-=-–[5] and is especially popular for document clustering [6]–[9]. Most approaches to hierarchical clustering are agglomerative algorithms that follow a simple methodology [10], and proceed by repeatedly... |

73 | Maximum Likelihood Network Topology Identification from Edge-Based Unicast Measurements
- Coates, Castro, et al.
(Show Context)
Citation Context ... the measurement is based on active probing, involving the transmission of probe packets from the source to the receivers. In earlier work, we proposed a metric based on delay difference measurements =-=[27]-=-. The delay difference measurements provide (noisy) versions of a metric related to the number of shared queues in the paths to two receivers. More precisely, if the constituent links of the shared pa... |

60 | Multicast topology inference from measured end-to-end loss
- Duffield, Horowitz, et al.
- 2002
(Show Context)
Citation Context ...up procedure, following the same conceptual framework as many hierarchical clustering methods [1], [4], one can recover the underlying tree. The following result, appearing in a networking context in =-=[15]-=-, ensures that for a similarity matrix γ satisfying the monotonicity property, the set of pairwise similarities completely determines the tree. Proposition 1: Let T be a tree topology with object set ... |

56 | Algorithms for model-based Gaussian hierarchical clustering - Fraley - 1997 |

49 |
Integrated likelihood methods for eliminating nuisance parameters. Stat. Sci
- BERGER, LISEO, et al.
- 1999
(Show Context)
Citation Context ...t primarily interested in �γ(x), an estimate of γ from the measurements, hence we can regard γ as a nuisance parameter. In that case (3) can be interpreted as a maximization of the profile likelihood =-=[16]-=- L(x|T) ≡ sup p(x|γ) . (4) γ∈G(T) The solution of (3) is referred to as the Maximum Likelihood Tree (MLT). Searching for this tree is our attempt to find the tree associated with the unknown pairwise ... |

37 | Interpreting and Extending Classical Agglomerative Clustering Algorithms Using a Model-Based Approach
- Kamvar, Klein, et al.
(Show Context)
Citation Context ...ance matrix σ 2 I) this leads essentially to the well known Unweighted Pair-Group Average Method (UPGMA) [10]. Other model-based approaches have been used for hierarchical clustering in the past [11]–=-=[14]-=-. Such methods usually model the clusters directly, as Gaussian components, for example. The cluster models induce a distance measure. The work of Banfield and Raftery [5] is representative of this st... |

21 | Hierarchic Document Clustering Using Ward's Method - El-Hamdouchi, Willet - 1986 |

21 | Model-based hierarchical clustering - Vaithyanathan, Dom - 2000 |

21 | Multicast topology inference from end-to-end measurements - Duffield, Horowitz, et al. - 2000 |

2 |
Learning in Graphical Models, ser
- Jordan
- 1998
(Show Context)
Citation Context ...he choice of penalty parameter for larger numbers of objects, under the Gaussian model. As a closing remark on this Section, notice that a simple procedure, inspired by Simulated Annealing techniques =-=[24]-=-, can be used in the context of the MCMC method developed. The idea is to start simulating the Markov chain with a large penalty parameter λ and reduce it gradually (according to some “cooling schedul... |