## Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms (2003)

### Cached

### Download Links

- [cs.fit.edu]
- [www.cs.fit.edu]
- [www.sclrules.com]
- DBLP

### Other Repositories/Bibliography

Citations: | 52 - 2 self |

### BibTeX

@TECHREPORT{Salvador03determiningthe,

author = {Stan Salvador and Philip Chan},

title = {Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms},

institution = {},

year = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments are specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. In this paper, we investigate techniques to determine the number of clusters or segments to return from hierarchical clustering and segmentation algorithms. We propose an efficient algorithm, the L method, that finds the “knee ” in a ‘ # of clusters vs. clustering evaluation metric ’ graph. Using the knee is well-known, but is not a particularly well-understood method to determine the number of clusters. We explore the feasibility of this method, and attempt to determine in which situations it will and will not work. We also compare the L method to existing methods based on the accuracy of the number of clusters that are determined and efficiency. Our results show favorable performance for these criteria compared to the existing methods that were evaluated.

### Citations

1178 | A density-based algorithm for discovering clusters in large spatial database with noise
- Ester, Kriegel, et al.
- 1996
(Show Context)
Citation Context ...sed, and hierarchical. Partitioning algorithms, such as K-means and PAM [14], iteratively refine a set of k clusters and do not scale well for larger data sets. Density-based algorithms, e.g., DBSCAN =-=[3]-=- and DENCLUE [9], are able to efficiently produce clusters of arbitrary shape, and are also able to handle outliers. If the density of a region is above a specified threshold, those points are assigne... |

614 | Efficient and effective clustering methods for spatial data mining
- Ng, Han
- 1994
(Show Context)
Citation Context ...rom data points in other clusters. There are four main categories of clustering algorithms: partitioning, density-based, grid-based, and hierarchical. Partitioning algorithms, such as K-means and PAM =-=[14]-=-, iteratively refine a set of k clusters and do not scale well for larger data sets. Density-based algorithms, e.g., DBSCAN [3] and DENCLUE [9], are able to efficiently produce clusters of arbitrary s... |

575 | Cure: an efficient clustering algorithm for large databases
- Guha, Rastogi, et al.
- 1998
(Show Context)
Citation Context ...ithms can be either agglomerative or divisive. The agglomerative (bottom-up) approach repeatedly merges two clusters, while the divisive (top-down) approach repeatedly splits a cluster into two. CURE =-=[6]-=- and Chameleon [10] are examples of two hierarchical clustering algorithms. Hierarchical algorithms differ chiefly in the criteria that they use to determine similarity between clusters.sSegmentation ... |

298 | How many clusters? Which clustering method? Answers Via Model Based Cluster Analysis
- Fraley, Raftery
- 1998
(Show Context)
Citation Context ... to find a model that fits the data as accurately as possible, but also minimizes the complexity of the model. Specific methods to penalize models based on their complexity are: MML [1], MDL [8], BIC =-=[5]-=-, AIC, and SIC [19]. Permutation tests [22] attempt to prevent the creation of a PLA that over-fits the data by comparing the relative change in approximation error to the relative change of a ‘random... |

277 | Estimating the Number of Clusters in a Data Set via the Gap Statistic
- Tibshirani, Walther, et al.
- 2001
(Show Context)
Citation Context ...tically evaluate each point in the error curve, and use the point that either minimizes or maximizes some function as the number of clusters/segments to return. Such methods include the Gap statistic =-=[21]-=- and prediction strength [20]. These methods generally (with the exception of hierarchical algorithms) require the entire clustering or segmentation algorithm to be run for each potential value of k.s... |

221 |
An Efficient Data Clustering Method For Very Large Databases
- BIRCH
- 1996
(Show Context)
Citation Context ...g the number of clusters. The TURN* [4][2] algorithm locates the knee of a curve by location the point where the 2 nd derivative increases above a user specified threshold. A variant [2] of the BIRCH =-=[23]-=- algorithm uses a mixture of the Bayesian Information Criterion (BIC) and the ratio-change between inter-cluster distance and the number of clusters. Finding the Knee of a Curve. The knee of a curve i... |

215 | CHAMELEON: A hierarchical clustering algorithm using dynamic modeling
- Karypis, EH, et al.
(Show Context)
Citation Context ...r agglomerative or divisive. The agglomerative (bottom-up) approach repeatedly merges two clusters, while the divisive (top-down) approach repeatedly splits a cluster into two. CURE [6] and Chameleon =-=[10]-=- are examples of two hierarchical clustering algorithms. Hierarchical algorithms differ chiefly in the criteria that they use to determine similarity between clusters.sSegmentation Algorithms. Segment... |

177 | WaveCluster: A multiresolution clustering approach for very large spatial databases
- Sheikholeslami, Chatterjee, et al.
- 1998
(Show Context)
Citation Context ...ndle outliers. If the density of a region is above a specified threshold, those points are assigned to a cluster; otherwise they are considered to be noise. Grid-based algorithms, such as WaveCluster =-=[17]-=-, reduce the clustering space into a grid of cells, enabling efficient clustering of very large datasets. Hierarchical algorithms can be either agglomerative or divisive. The agglomerative (bottom-up)... |

167 | Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data
- Monti
(Show Context)
Citation Context ... a ‘random’ time series. If the errors are changing at a similar rate, then more segments would fit noise and not the underlying structure of the time series. Resampling [15] and Consensus Clustering =-=[13]-=- attempt to find the correct number of clusters by clustering many samples of the data set, and determining the number of clusters where clusterings of the various samples are the most “stable.” Locat... |

153 | An Online Algorithm for Segmenting Time Series
- Keogh, Chu, et al.
- 2001
(Show Context)
Citation Context ... segment can be though of as a cluster. However, segmentation algorithms typically create a finer grain partitioning than clustering algorithms. There are three approaches to time series segmentation =-=[11]-=-. First, in the sliding window approach, a segment is grown until the error of the line is above a specified threshold, then a new segment is started. Second, in the top-down approach, a segment repre... |

150 | Model selection and the principle of minimum description length
- Hansen, Yu
- 2001
(Show Context)
Citation Context ... attempts to find a model that fits the data as accurately as possible, but also minimizes the complexity of the model. Specific methods to penalize models based on their complexity are: MML [1], MDL =-=[8]-=-, BIC [5], AIC, and SIC [19]. Permutation tests [22] attempt to prevent the creation of a PLA that over-fits the data by comparing the relative change in approximation error to the relative change of ... |

62 | Cluster validation by Prediction Strength
- Tibshirani, Walter
- 2005
(Show Context)
Citation Context ...n the error curve, and use the point that either minimizes or maximizes some function as the number of clusters/segments to return. Such methods include the Gap statistic [21] and prediction strength =-=[20]-=-. These methods generally (with the exception of hierarchical algorithms) require the entire clustering or segmentation algorithm to be run for each potential value of k.sThe majority of these methods... |

60 |
T.: The UCR Time Series Data Mining Archive [http://www.cs.ucr.edu/~eamonn/TSDMA/index.html
- Keogh, Folias
- 2002
(Show Context)
Citation Context ...om noise added (2,000 pts, not in Figure 7). 3. The same as #1, but with a substantial amount of random noise added (2,000 pts).s4. An ECG of a pregnant woman from the Time Series Data Mining Archive =-=[12]-=-. It contains a recurring pattern (a heart beat) that is repeated 13 times (2,500 pts). 5. Measurements from a sensor in an industrial dryer (from the Time Series Data Mining Archive [12]). The time s... |

45 | Subspace information criterion for model selection
- Sugiyama, Ogawa
- 2001
(Show Context)
Citation Context ...hat fits the data as accurately as possible, but also minimizes the complexity of the model. Specific methods to penalize models based on their complexity are: MML [1], MDL [8], BIC [5], AIC, and SIC =-=[19]-=-. Permutation tests [22] attempt to prevent the creation of a PLA that over-fits the data by comparing the relative change in approximation error to the relative change of a ‘random’ time series. If t... |

37 |
A Robust and Scalable Clustering Algorithm for Mixed Type Attributes
- Chiu, Fang, et al.
- 2001
(Show Context)
Citation Context ...t find the knee of a curve also only work well when the clusters are well separated. A few existing clustering algorithms have built-in mechanisms for determining the number of clusters. The TURN* [4]=-=[2]-=- algorithm locates the knee of a curve by location the point where the 2 nd derivative increases above a user specified threshold. A variant [2] of the BIRCH [23] algorithm uses a mixture of the Bayes... |

32 |
An Efficient Approach to Cluster
- Hinneburg, Keim
- 1998
(Show Context)
Citation Context ...hical. Partitioning algorithms, such as K-means and PAM [14], iteratively refine a set of k clusters and do not scale well for larger data sets. Density-based algorithms, e.g., DBSCAN [3] and DENCLUE =-=[9]-=-, are able to efficiently produce clusters of arbitrary shape, and are also able to handle outliers. If the density of a region is above a specified threshold, those points are assigned to a cluster; ... |

23 | A resampling approach to cluster validation
- Roth, Lange, et al.
- 2002
(Show Context)
Citation Context ...rror to the relative change of a ‘random’ time series. If the errors are changing at a similar rate, then more segments would fit noise and not the underlying structure of the time series. Resampling =-=[15]-=- and Consensus Clustering [13] attempt to find the correct number of clusters by clustering many samples of the data set, and determining the number of clusters where clusterings of the various sample... |

23 | Learning states and rules for time series anomaly detection
- Salvador, Chan, et al.
- 2004
(Show Context)
Citation Context ...ird, the bottom-up approach typically starts off with n/2 segments, and the two most similar adjacent segments are merged until the desired number of segments or the error threshold is reached. Gecko =-=[16]-=- is a bottom-up segmentation algorithm that merges segments based on slope and creates an initial fine-grain approximation by first performing a top-down pass. Determining the Number of Clusters/Segme... |

23 |
P.,‘Clustering using Monte-Carlo crossvalidation
- Smyth
- 1996
(Show Context)
Citation Context ...n, permutation tests, resampling, and finding the knee of an error curve. Cross-validation techniques create models that attempt to fit the data as accurately as possible. Monte Carlo crossvalidation =-=[18]-=-[17] has been successfully used to prevent over-fitting (too many clusters/segments). Penalized likelihood estimation also attempts to find a model that fits the data as accurately as possible, but al... |

15 | Estimating the number of segments in time series data using permutation tests
- Vasko, Toivonen
(Show Context)
Citation Context ...urately as possible, but also minimizes the complexity of the model. Specific methods to penalize models based on their complexity are: MML [1], MDL [8], BIC [5], AIC, and SIC [19]. Permutation tests =-=[22]-=- attempt to prevent the creation of a PLA that over-fits the data by comparing the relative change in approximation error to the relative change of a ‘random’ time series. If the errors are changing a... |

13 |
A Parameterless Method for Efficiently Discovering
- Foss, Zaiane
- 2002
(Show Context)
Citation Context ...that find the knee of a curve also only work well when the clusters are well separated. A few existing clustering algorithms have built-in mechanisms for determining the number of clusters. The TURN* =-=[4]-=-[2] algorithm locates the knee of a curve by location the point where the 2 nd derivative increases above a user specified threshold. A variant [2] of the BIRCH [23] algorithm uses a mixture of the Ba... |

9 | The Kindest Cut: Minimum Message Length Segmentation
- Baxter, Oliver
- 1996
(Show Context)
Citation Context ...tion also attempts to find a model that fits the data as accurately as possible, but also minimizes the complexity of the model. Specific methods to penalize models based on their complexity are: MML =-=[1]-=-, MDL [8], BIC [5], AIC, and SIC [19]. Permutation tests [22] attempt to prevent the creation of a PLA that over-fits the data by comparing the relative change in approximation error to the relative c... |

7 | An objective analysis of the pressure-volume curve in the acute respiratory distress syndrome
- Harris, Hess, et al.
- 2000
(Show Context)
Citation Context ...oints. 2. The largest ratio difference between two points [2]. 3. The first data point with a second derivative above some threshold value [3][4]. 4. The data point with the largest second derivative =-=[7]-=-. 5. The point on the curve that is furthest from a line fitted to the entire curve. 6. Our L-method, which finds the boundary between the pair of straight lines that most closely fit the curve. This ... |