## Relationship-Based Clustering and Visualization for High-Dimensional Data Mining (2002)

### Cached

### Download Links

- [strehl.com]
- [strehl.com]
- [www.lans.ece.utexas.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | INFORMS Journal on Computing |

Citations: | 40 - 10 self |

### BibTeX

@ARTICLE{Strehl02relationship-basedclustering,

author = {Alexander Strehl and Joydeep Ghosh},

title = {Relationship-Based Clustering and Visualization for High-Dimensional Data Mining},

journal = {INFORMS Journal on Computing},

year = {2002},

volume = {15},

pages = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

In several real-life data-mining... This paper proposes a relationship-based approach that alleviates both problems, side-stepping the "curse-of-dimensionality" issue by working in a suitable similarity space instead of the original high-dimensional attribute space. This intermediary similarity space can be suitably tailored to satisfy business criteria such as requiring customer clusters to represent comparable amounts of revenue. We apply efficient and scalable graph-partitioning-based clustering techniques in this space. The output from the clustering algorithm is used to re-order the data points so that the resulting permuted similarity matrix can be readily visualized in two dimensions, with clusters showing up as bands. While two-dimensional visualization of a similarity matrix is by itself not novel, its combination with the order-sensitive partitioning of a graph that captures the relevant similarity measure between objects provides three powerful properties: (i) the high-dimensionality of the data does not affect further processing once the similarity space is formed; (ii) it leads to clusters of (approximately) equal importance, and (iii) related clusters show up adjacent to one another, further facilitating the visualization of results. The visualization is very helpful for assessing and improving clustering. For example, actionable recommendations for splitting or merging of clusters can be easily derived, and it also guides the user toward the right number of clusters

### Citations

10958 |
Computers and Intractability: A Guide to the Theory of NP-completeness
- Garey, Johnson
- 1990
(Show Context)
Citation Context ...riments which allows at most 5% of imbalance). Thus, in graph partitioning one has essentially to solve a constrained optimization problem. Finding such an optimal partitioning is an NP-hard problem (=-=Garey and Johnson 1979-=-). However, there are fast, heuristic algorithms for this widely studied problem. We experimented with the Kernighan-Lin (KL) algorithm, recursive spectral bisection, and multi-level k-way partitionin... |

9002 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...ensions are not encountered again. This suggests a connection of our approach with kernel-based methods, such as support-vector machines, which are currently very popular for classification problems (=-=Vapnik 1995-=-, Joachims 1998). A kernel function of two vectors is a generalized inner product between the corresponding mappings of these vectors into a derived (and typically very high-dimensional) feature space... |

3645 |
Neural networks: A comprehensive foundation. Upper Saddle River
- Haykin
- 1999
(Show Context)
Citation Context ...in the feature space. A popular choice for the logical ordering is a two-dimensional lattice that allows all the data points to be projected onto a two-dimensional plane for convenient visualization (=-=Haykin 1999-=-). While clustering is a classical and well-studied area, it turns out that several datamining applications pose some unique challenges that severely test traditional techniques for clustering and clu... |

3259 |
The self-organizing map
- Kohonen
- 1990
(Show Context)
Citation Context ... similarity between any two different groups. Besides producing good clusters, certain clustering methods provide additional useful benefits. For example, Kohonen's self-organizing feature map (SOM) (=-=Kohonen 1990) imposes -=-a logical, "topographic " ordering on the cluster centers such that centers that are nearby in the logical ordering represent 0899-1499/02/0000/000155.00 INFORMS JOURNAL ON' COMPUTING 0 2002... |

2158 |
Dubes. Algorithms for Clustering Data
- Jain, C
- 1988
(Show Context)
Citation Context ...ion about the life-time customer value, and we have recently shown that the extended Jaccard similarity measure is more appropriate (Strehl et al. 2000). For binary features, the Jaccard coefficient (=-=Jain and Dubes 1988-=-) measures the ratio of the intersection of the product sets to the union of the product sets corresponding to transactions x and x, each having binary (0/1) elements. s/)(x, x) = x,x (2) IIx 112 q- I... |

1822 |
Cluster analysis and display of genome-wide expression patterns
- Eisen, Spellman, et al.
- 1998
(Show Context)
Citation Context ...e used in disciplines such as anthropology and archaeology to describe the reordering of the primary data matrix so that similar structures (e.g., genetic sequences) are brought closer (Murtagh 1985, =-=Eisen et al. 1998-=-). 4.2. Visualization The seriation of the similarity matrix, S', is very useful for visualization. Since the similarity matrix is twodimensional, it can be readily visualized as a graylevel image whe... |

1703 | Text categorization with support vector machines: Learning with many relevant features
- Joachims
- 1998
(Show Context)
Citation Context ...ot encountered again. This suggests a connection of our approach with kernel-based methods, such as support-vector machines, which are currently very popular for classification problems (Vapnik 1995, =-=Joachims 1998-=-). A kernel function of two vectors is a generalized inner product between the corresponding mappings of these vectors into a derived (and typically very high-dimensional) feature space. Thus, one can... |

1506 |
A k-means clustering algorithm
- Hartigan, Wong
- 1979
(Show Context)
Citation Context ...ther details, see Strehl and Ghosh (2000). 7. Related Work 7.1. Clustering and Indexing Clustering has been widely studied in several disciplines, specially since the late 1960s (Jain and Dubes 1988, =-=Hartigan 1975-=-). Classic approaches include partitional methods such as k-means and k-medioids, bottom-up hierarchical approaches such as single link or complete link agglomerative clustering (Murtagh 1983), soft-p... |

1110 | The visual display of quantitative information - Tufte - 1983 |

1046 |
An efficient heuristic procedure for partitioning graphs
- Kernighan, Lin
- 1970
(Show Context)
Citation Context ...stic algorithms for this widely studied problem. We experimented with the Kernighan-Lin (KL) algorithm, recursive spectral bisection, and multi-level k-way partitioning (Metis). The basic idea in KL (=-=Kernighan and Lin 1970-=-) to dealing with graph partitioning is to construct an initial partition of the vertices either randomly or according to some problem-specific strategy. Then the algorithm sweeps through the vertices... |

793 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ... by considering domain-specific transformations into similarity space in Section 2. Section 3 describes a specific clustering technique (OPossuM), based on a multi-level graph partitioning algorithm (=-=Karypis and Kumar 1998-=-). In Section 4, we describe a simple but effective visualization technique that is applicable to similarity spaces (CLusXON). Clustering and visualization results are presented in Section 5. In Secti... |

761 | A comparison of event models for naive bayes text classification - McCallum, Nigam - 1998 |

620 | Scatter/gather: A cluster-based approach to browsing large document collections
- Cutting, Pedersen, et al.
- 1992
(Show Context)
Citation Context ...faster algorithm at the cost of some possible loss in quality, and is employed, for example in the buckshot algorithm for the scatter/gather approach to iterative clustering for interactive browsing (=-=Cutting et al. 1992-=-). If the sample is O� √ n�, and “nearest cluster center” is used to allocate the remaining points, one obtains an O�kn� algorithm. Also related are randomized approaches that can partition a set of p... |

487 |
ªPartitioning Sparse Matrices with Eigenvectors of Graphs,º
- Pothen, Simon, et al.
- 1990
(Show Context)
Citation Context ...hip-Based Clustering and Visualization for High-Dimensional Data Mining In recursive bisection, a k-way split is obtained by recursively partitioning the graph into two subgraphs. Spectral bisection (=-=Pothen et al. 1990-=-, Hendrickson and Leland 1995) uses the eigenvector associated with the second smallest eigenvalue of the graph's Laplacian (Fiedler vector) (Fietiler 1975) for splitting. Metis (Karypis and Kumar 199... |

411 | FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets
- Faloutsos, Lin
- 1995
(Show Context)
Citation Context ...nsity modeling with EM can be fruitfully applied. For situation (ii), nonlinear PCA, self-organizing map (SOM), multi-dimensional scaling (MDS), or more efficient custom formulations such as FASTMAP (=-=Faloutsos and Lin 1995-=-), can be effectively applied. For further description of these methods, see Section 7 on related work. This paper primarily addresses the second aspect by describing an alternate way of clustering an... |

350 | OPTICS: Ordering points to identify the clustering structure
- Ankerst, Breunig, et al.
- 1999
(Show Context)
Citation Context ... involves a smart reordering of the similarity matrix. Ordering of data points for visualization has previously been used in conjunction with clustering in different contexts. For example, in OPTICS (=-=Ankerst et al. 1999-=-), instead of producing an explicit clustering, an augmented ordering of the database is produced. Subsequently, this ordering is used to display various metrics such as reachability values. In cluste... |

335 | ROCK: a robust clustering algorithm for categorical attributes
- Guha, Rastogi, et al.
- 1999
(Show Context)
Citation Context ...ive for gene-expression analysis (Ben-Bor et al. 1999). Graphical methods also have emerged in the datamining literature to tackle high-dimensional data analysis. ROCK (Robust Clustering using linKs, =-=Guha et al. 1999-=-) is an agglomerative hierarchical clustering technique for categorical attributes. It uses the binary Jaccard coefficient and a thresholding criterion to establish links between samples. Common neigh... |

333 | Clustering gene expression patterns - Ben-Dor, Shamir, et al. - 1999 |

302 | Concept decompositions for large sparse text data using clustering - Dhillon, Modha |

246 | Scaling clustering algorithms to large databases
- Bradley, Fayyad, et al.
- 1998
(Show Context)
Citation Context ...uential building is specially popular for out-of-core methods, the idea being to scan the database once to form a summarized model (for instance, the size, sum and sum-squared values of each cluster, =-=Bradley et al. 1998-=-) in main memory. Subsequent refinement based on summarized information is then restricted to mainmemory operations without resorting to further disk scans. 3. Representatives: Compare with representa... |

234 |
The Stable Marriage Problem: Structure and Algorithms
- Gusfield, Irving
- 1989
(Show Context)
Citation Context ...ote that the above algorithm may not result in balanced clusters. We can enforce balancing by allocating the remaining points to the k clusters in groups, each time solving a stable-marriage problem (=-=Gusfield and Irving 1989-=-), but this will increase the computation time. Figure 7 illustrates the behavior of FAsTOPOSSUM for the drugstore customer dataset from Section 5.1. Using all 2466 customers as the boot-sample (i.e.,... |

224 |
A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory
- Fiedler
- 1975
(Show Context)
Citation Context ...into two subgraphs. Spectral bisection (Pothen et al. 1990, Hendrickson and Leland 1995) uses the eigenvector associated with the second smallest eigenvalue of the graph’s Laplacian (Fiedler vector) (=-=Fiedler 1975-=-) for splitting. Metis (Karypis and Kumar 1998) handles multiconstraint multi-objective graph partitioning in three phases: coarsening, initial partitioning, and refining. First a sequence of successi... |

210 |
Multidimensional scaling: I. Theory and method
- Torgerson
- 1952
(Show Context)
Citation Context ...d utility. Nonlinear projections have also been studied (Chang and Ghosh 2001). Recreating a two- or threedimensional space from a similarity graph can also be done through multi-dimensional scaling (=-=Torgerson 1952-=-). 2. Parallel-axis plots show each object as a line along d parallel axes. However, this technique is rendered ineffective if the number of dimensions d or the number of objects gets too high. 3. Koh... |

179 |
ªAn Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations,º
- Hendrickson, Leland
- 1995
(Show Context)
Citation Context ... and Visualization for High-Dimensional Data Mining In recursive bisection, a k-way split is obtained by recursively partitioning the graph into two subgraphs. Spectral bisection (Pothen et al. 1990, =-=Hendrickson and Leland 1995-=-) uses the eigenvector associated with the second smallest eigenvalue of the graph's Laplacian (Fiedler vector) (Fietiler 1975) for splitting. Metis (Karypis and Kumar 1998) handles multiconstraint mu... |

154 | Impact of similarity measures on web-page clustering - STREHL, STREHL, et al. - 2000 |

149 | Multilevel algorithms for multi-constraint graph partitioning
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...in a systolic or block systolic manner with essentially no overhead. Frameworks such as MPI also provide native primitives for such computations. Parallelization of Metis is also very efficient, and (=-=Schloegel et al. 1999-=-) reports partitioning of graphs with over 7 million vertices in 7 seconds into 128 clusters on a 128 processor Cray T3E. For further details, see Strehl and Ghosh (2000). 7. Related Work 7.1. Cluster... |

141 |
Chameleon: Hierarchical Clustering Using Dynamic Modeling
- Karipis, Han, et al.
- 1999
(Show Context)
Citation Context ...thod based on spatial density, unless the data follow certain simple distributions as described in the introduction. Certain other limitations of popular clustering methods are nicely illustrated in (=-=Karypis et al. 1999-=-). In Aggarwal (2001), the authors recognize that one way of tackling high-dimensional data is to change the distance function in an application-specific way. They suggest some possible modified funct... |

139 |
and G.Linoff. Data Mining Techniques: For Marketing, sales, and Customer Relationship Mnagement
- Berry
- 2004
(Show Context)
Citation Context ...the purchased products. Thus, one can use techniques such as the a priori algorithm to determine associations or rules. In fact, this is currently the most popular approach to market-basket analysis (=-=Berry and Linoff 1997-=-, Chapter 8). Unfortunately, this results in loss of vital information: One cannot differentiate between buying one gallon of milk and 100 gallons of milk, or one cannot weight importance between buyi... |

126 |
Artificial Neural Networks for Feature Extraction and Multivariate Data Projection
- Mao, Jain
- 1995
(Show Context)
Citation Context ...gh non-linear (Chang and Ghosh 2001) means. Extensive approaches for feature selection or extraction have been long studied, particularly in the pattern-recognition community (Young and Calvert 1974, =-=Mao and Jain 1995-=-, Duda et al. 2001). If these techniques succeed in reducing the number of (derived) features to the order of 10 or less without much loss of information, then a variety of clustering and visualizatio... |

124 | Pattern classification,” 2nd ed
- Duda, Hart, et al.
- 2000
(Show Context)
Citation Context ...g and Ghosh 2001) means. Extensive approaches for feature selection or extraction have been long studied, particularly in the pattern-recognition community (Young and Calvert 1974, Mao and Jain 1995, =-=Duda et al. 2001-=-). If these techniques succeed in reducing the number of (derived) features to the order of 10 or less without much loss of information, then a variety of clustering and visualization methods can be a... |

109 | Learning segmentation by random walks
- Meila, Shi
- 2000
(Show Context)
Citation Context ...oning methods (Pothen et al. 1990, Miller et al. 1997) can be applied to similarity graphs. A probabilistic foundation for spectral methods for clustering and segmentation has been recently proposed (=-=Meila and Shi 2001-=-). Related work on scalability issues of clustering are discussed in Section 6.2. 7.2. Visualization Visualization of high-dimensional data clusters can be largely divided into three popular approache... |

107 |
Stemming algorithms
- Frakes
- 1992
(Show Context)
Citation Context ...tegory labels 1 ..... 20, respectively. The raw 21839 x 2340 word-by-document matrix consists of the non-normalized occurrence frequencies of stemmed words, using Porter's suffix stripping algorithm (=-=Frakes 1992-=-). Pruning all words that occur less than 0.01 or more than 0.10 times on average because they are insignificant (e.g., haruspex) or too generic (e.g., new), respectively, results in d = 2903. Let us ... |

107 |
A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354–359
- Murtagh
- 1983
(Show Context)
Citation Context ...s 1988, Hartigan 1975). Classic approaches include partitional methods such as k-means and k-medioids, bottom-up hierarchical approaches such as single link or complete link agglomerative clustering (=-=Murtagh 1983-=-), soft-partitioning approaches such as fuzzy clustering, EM-based techniques and methods motivated by statistical mechanics (Chakaravathy and Ghosh 1996). While several methods of clustering data def... |

100 | The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces
- Chakrabarti, Mehrotra
- 1999
(Show Context)
Citation Context ...uch as KDB-trees), exist. These methods are typically tractable for up to 10 to 15dimensional data, and by a judicious hybrid of these two approaches, data with tens of attributes may be partitioned (=-=Chakrabarti and Mehrotra 1999-=-). Significant overlaps among the hyper-rectangles and the occurrences of several empty areas become increasingly problematic in the dimensionality is further increased (see Chakrabarti and Mehrotra 1... |

83 | Visualising semantic spaces and author co-citation networks in digital libraries - Chen - 1999 |

75 | Visualization techniques for mining large databases: A comparison - A, Kriegel - 1996 |

72 | Partitioning-based clustering for web document categorization. Decision Support Systems (accepted for publication
- Boley, Gini, et al.
- 1999
(Show Context)
Citation Context ... for highdimensional data and is linear in d. Subsequently, this and another graph-partitioning algorithm called principal direction divisive partitioning was applied for web-document categorization (=-=Boley et al. 1999-=-). These two algorithms are the closest in spirit to our approach. Finally, spectral partitioning methods (Pothen et al. 1990, Miller et al. 1997) can be applied to similarity graphs. A probabilistic ... |

72 | Spatial clustering methods in data mining: A survey - Han, Kamber, et al. - 2001 |

71 | Separators for sphere-packings and nearest neighbor graphs
- Miller, Teng, et al.
- 1997
(Show Context)
Citation Context ...rtitioning was applied for web-document categorization (Boley et al. 1999). These two algorithms are the closest in spirit to our approach. Finally, spectral partitioning methods (Pothen et al. 1990, =-=Miller et al. 1997-=-) can be applied to similarity graphs. A probabilistic foundation for spectral methods for clustering and segmentation has been recently proposed (Meila and Shi 2001). Related work on scalability issu... |

67 |
Multidimensional clustering algorithms. Compstat lectures
- Murtagh
- 1985
(Show Context)
Citation Context ...ation, a phrase used in disciplines such as anthropology and archaeology to describe the reordering of the primary data matrix so that similar structures (e.g., genetic sequences) are brought closer (=-=Murtagh 1985-=-, Eisen et al. 1998). 4.2. Visualization The seriation of the similarity matrix, S', is very useful for visualization. Since the similarity matrix is twodimensional, it can be readily visualized as a ... |

65 | BIRCH: A New Data Clustering Algorithm and Its Applications
- Zhang, Ramakrishnan, et al.
- 1998
(Show Context)
Citation Context ...of elements, and then sequentially scan the data to allocate the remaining inputs, creating new clusters (and optionally adjusting existing centers) as needed. Such an approach is seen e.g. in BIRCH (=-=Zhang et al. 1997-=-). This style compromises balancing to some extent, and the threshold determining when a new cluster is formed has to be experimented with to bring the number of clusters obtained to the desired range... |

58 |
Relative frequency as a determinant of phonetic change, reprinted from the Harvard studies in classical philiology
- Zipf
- 1929
(Show Context)
Citation Context ...nsider the most dominant products (attribute selection) , but in practice this may still leave hundreds of products to be considered. And since product popularity tends to follow a Zipf distribution (=-=Zipf 1929), the tai-=-l is "heavy," meaning that revenue contribution from the less-popular products is significant for certain customers. Moreover, in retail the higher profit margins are often associated with l... |

52 | Clickstream clustering using weighted longest common subsequences - Banerjee, Ghosh - 2001 |

48 | Hypergraph based clustering in high-dimensional data sets: A summary of results - Han, Karypis, et al. - 1998 |

42 | A unified model for probabilistic principal surfaces
- Chang, Grosh
(Show Context)
Citation Context ... a subset based on a suitable criteria, or by transforming the original set of attributes into a smaller one using linear projections (e.g., principal component analysis (PCA)) or through non-linear (=-=Chang and Ghosh 2001-=-) means. Extensive approaches for feature selection or extraction have been long studied, particularly in the pattern-recognition community (Young and Calvert 1974, Mao and Jain 1995, Duda et al. 2001... |

36 | Personalization of supermarket product recommendations. Data Mining and Knowledge Discovery
- Lawrence, Almasi, et al.
- 2001
(Show Context)
Citation Context ...usiness applications where electronically observed behavioral data are readily available. Customer clusters can be used to identify up-selling and cross-selling opportunities with existing customers (=-=Lawrence et al. 2001-=-). 2. Facilitating efficient browsing and searching of the web by hierarchically clustering web pages. The challenges in both of these applications mainly arise from two aspects: (a) large sample size... |

34 | Re-designing distance functions and distance-based applications for high dimensional data - Aggarwal - 2001 |

28 |
Scatter/Gatherer: a cluster-based approach to browsing large document collections
- Cutting, Karger, et al.
(Show Context)
Citation Context ...aster algorithm at the cost of some possible loss in quality, and is employed, for example in the buckshot algorithm for the scatter /gather approach to iterative clustering for interactive browsing (=-=Cutting et al. 1992). If the -=-sample is O(v/7), and "nearest cluster center" is used to allocate the remaining points, one obtains an O(kn) algorithm. Also related are randomized approaches that can partition a set of po... |

28 | W.: Visualizing Class Structure of Multidimensional Data
- DHILLON, MODHA, et al.
- 1998
(Show Context)
Citation Context ...le approximation thereof (e.g., FASTMAP, Faloutsos and Lin 1995). Chen (1999), for example, creates a browsable 2-dimensional space of authors through co-citations. Another noteworthy method is CViz (=-=Dhillon et al. 1998), which p-=-rojects onto the plane that passes through three selected cluster centroids to yield a "discrimination optimal" twodimensional projection. These projections are useful for a medium number of... |

24 |
Sparse matrix reordering schemes for browsing hypertext
- BERRY, HENDRICKSON, et al.
- 1996
(Show Context)
Citation Context ...been explored. This visualization takes place in the primary data space rather than in the relationship-space. Sparse primary data-matrix reorderings have also been considered for browsing hypertext (=-=Berry et al. 1996-=-). A useful survey of visualization methods for data mining in general can be found in Keim and Kriegel (1996). The popular book by Tufte (1983) on visualizing information is also recommended. 8. Conc... |