Results 1 -
3 of
3
Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results
- Lecture notes in computer science
, 2005
"... Abstract. We explore the use of instance and cluster-level constraints with agglomerative hierarchical clustering. Though previous work has illustrated the benefits of using constraints for non-hierarchical clustering, their application to hierarchical clustering is not straight-forward for two prim ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Abstract. We explore the use of instance and cluster-level constraints with agglomerative hierarchical clustering. Though previous work has illustrated the benefits of using constraints for non-hierarchical clustering, their application to hierarchical clustering is not straight-forward for two primary reasons. First, some constraint combinations make the feasibility problem (Does there exist a single feasible solution?) NP-complete. Second, some constraint combinations when used with traditional agglomerative algorithms can cause the dendrogram to stop prematurely in a dead-end solution even though there exist other feasible solutions with a significantly smaller number of clusters. When constraints lead to efficiently solvable feasibility problems and standard agglomerative algorithms do not give rise to dead-end solutions, we empirically illustrate the benefits of using constraints to improve cluster purity and average distortion. Furthermore, we introduce the new γ constraint and use it in conjunction with the triangle inequality to considerably improve the efficiency of agglomerative clustering. 1
Towards Efficient and Improved Hierarchical Clustering with Instance and Cluster Level Constraints
"... Many clustering applications use the computationally efficient non-hierarchical clustering techniques such as k-means. However, less efficient hierarchical clustering is desirable as by creating a dendrogram the user can choose an appropriate value of k (the number of clusters) and in some domains c ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Many clustering applications use the computationally efficient non-hierarchical clustering techniques such as k-means. However, less efficient hierarchical clustering is desirable as by creating a dendrogram the user can choose an appropriate value of k (the number of clusters) and in some domains cluster hierarchies (i.e. clusters within other clusters) naturally exist. In many situations apriori constraints/information are available such as in the form of a small amount of labeled data. In this paper we explore using constraints to improve the e#- ciency of agglomerative clustering algorithms. We show that just finding feasible (satisfying all constraints) solutions for some constraint combinations is NP-complete and should be avoided. For a given set of constraints we derive upper (kmax ) and lower bounds (kmin ) on the value of k where feasible solutions exist. This allows a restricted dendrogram to be created but its creation is not straight-forward. For some combinations of constraints, starting with a feasible clustering solution (k = r) and joining the two closest clusters results in a "dead-end" feasible solution which cannot be further refined to create a feasible solution with r - 1 clusters even though kmin - 1 kmax . For such situations we introduce constraint driven hierarchical clustering algorithms that will create a complete dendrogram. When traditional algorithms can be used, we illustrate the use of the triangle inequality and a newly defined # constraint to further improve performance and use the Markov inequality to bound the expected performance improvement. Preliminary results indicate that using constraints can improve the dendrogram quality.
SPECTRAL AND PROBABILISTIC APPROACHES
"... This dissertation was produced in accordance with guidelines which permit the inclusion as part of the dissertation the text of an original paper or papers submitted for publication. The dissertation must still conform to all other requirements explained in the “Guide for the Preparation of Master’s ..."
Abstract
- Add to MetaCart
This dissertation was produced in accordance with guidelines which permit the inclusion as part of the dissertation the text of an original paper or papers submitted for publication. The dissertation must still conform to all other requirements explained in the “Guide for the Preparation of Master’s Theses and Doctoral Dissertations at The University of Texas at Dallas. ” It must include a comprehensive abstract, a full introduction and literature review and a final overall conclusion. Additional material (procedural and design data as well as descriptions of equipment) must be provided in sufficient detail to allow a clear and precise judgment to be made of the importance and originality of the research reported. It is acceptable for this dissertation to include as chapters authentic copies of papers already published, provided these meet type size, margin and legibility requirements. In such cases, connectingtextswhichprovidelogical bridgesbetweendifferentmanuscriptsaremandatory. Where the student is not the sole author of a manuscript, the student is required to make an explicit statement in the introductory material to that manuscript describing the student’s contribution to the work and acknowledging the contribution of the other authors. The signatures of the Supervising Committee which precede all other material in the dissertation

