Results 1 -
2 of
2
Rearrangement clustering: Pitfalls, remedies, and applications
- Journal of Machine Learning Research
, 2006
"... Given a matrix of values in which the rows correspond to objects and the columns correspond to features of the objects, rearrangement clustering is the problem of rearranging the rows of the matrix such that the sum of the similarities between adjacent rows is maximized. Referred to by various names ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Given a matrix of values in which the rows correspond to objects and the columns correspond to features of the objects, rearrangement clustering is the problem of rearranging the rows of the matrix such that the sum of the similarities between adjacent rows is maximized. Referred to by various names and reinvented several times, this clustering technique has been extensively used in many fields over the last three decades. In this paper, we point out two critical pitfalls that have been previously overlooked. The first pitfall is deleterious when rearrangement clustering is applied to objects that form natural clusters. The second concerns a similarity metric that is commonly used. We present an algorithm that overcomes these pitfalls. This algorithm is based on a variation of the Traveling Salesman Problem. It offers an extra benefit as it automatically determines cluster boundaries. Using this algorithm, we optimally solve four benchmark problems and a 2,467-gene expression data clustering problem. As expected, our new algorithm identifies better clusters than those found by previous approaches in all five cases. Overall, our results demonstrate the benefits of rectifying the pitfalls and exemplify the usefulness of this clustering technique. Our code is available at our websites.
Supplementary information:
"... Given a matrix of values, rearrangement clustering involves rearranging the rows of the matrix and identifying cluster boundaries within the linear ordering of the rows. The TSP+k algorithm for rearrangement clustering was presented in [3] and its implementation is described in this note. Using this ..."
Abstract
- Add to MetaCart
Given a matrix of values, rearrangement clustering involves rearranging the rows of the matrix and identifying cluster boundaries within the linear ordering of the rows. The TSP+k algorithm for rearrangement clustering was presented in [3] and its implementation is described in this note. Using this code, we solve a 2,467-gene expression data clustering problem and identify “good ” clusters that contain close to eight times the number of genes that were clustered by Eisen et al. (1998). Furthermore, we identify 106 functional groups that were overlooked in that paper. We make our implementation available to the general public for applications of gene expression data analysis. Availability: C++ source code is freely available at

