Results 1 
2 of
2
Overcoming the Curse of Dimensionality in Clustering by means of the Wavelet Transform
 The Computer Journal
, 2000
"... We use a redundant wavelet transform analysis to detect clusters in highdimensional data spaces. We overcome Bellman's \curse of dimensionality" in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet t ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We use a redundant wavelet transform analysis to detect clusters in highdimensional data spaces. We overcome Bellman's \curse of dimensionality" in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet transform to such canonically ordered data, (iii) modeling the noise in wavelet space, (iv) dening signicant component parts of the data as opposed to insignicant or noisy component parts, and (v) reading o the resultant clusters. The overall complexity of this innovative approach is linear in the data dimensionality. We describe a number of examples and test cases, including the clustering of highdimensional hypertext data. 1 Introduction Bellman's (1961) [1] \curse of dimensionality" refers to the exponential growth of hypervolume as a function of dimensionality. All problems become tougher as the dimensionality increases. Nowhere is this more evident than in problems related to ...
Rearrangement clustering: Pitfalls, remedies, and applications
 Journal of Machine Learning Research
, 2006
"... Given a matrix of values in which the rows correspond to objects and the columns correspond to features of the objects, rearrangement clustering is the problem of rearranging the rows of the matrix such that the sum of the similarities between adjacent rows is maximized. Referred to by various names ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Given a matrix of values in which the rows correspond to objects and the columns correspond to features of the objects, rearrangement clustering is the problem of rearranging the rows of the matrix such that the sum of the similarities between adjacent rows is maximized. Referred to by various names and reinvented several times, this clustering technique has been extensively used in many fields over the last three decades. In this paper, we point out two critical pitfalls that have been previously overlooked. The first pitfall is deleterious when rearrangement clustering is applied to objects that form natural clusters. The second concerns a similarity metric that is commonly used. We present an algorithm that overcomes these pitfalls. This algorithm is based on a variation of the Traveling Salesman Problem. It offers an extra benefit as it automatically determines cluster boundaries. Using this algorithm, we optimally solve four benchmark problems and a 2,467gene expression data clustering problem. As expected, our new algorithm identifies better clusters than those found by previous approaches in all five cases. Overall, our results demonstrate the benefits of rectifying the pitfalls and exemplify the usefulness of this clustering technique. Our code is available at our websites.