Results 1  10
of
291
A fast and high quality multilevel scheme for partitioning irregular graphs
 SIAM JOURNAL ON SCIENTIFIC COMPUTING
, 1998
"... Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph [Bui and Jones, Proc. ..."
Abstract

Cited by 1173 (16 self)
 Add to MetaCart
Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph [Bui and Jones, Proc.
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 567 (15 self)
 Add to MetaCart
(Show Context)
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
Coclustering documents and words using Bipartite Spectral Graph Partitioning
, 2001
"... ..."
(Show Context)
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract

Cited by 400 (0 self)
 Add to MetaCart
(Show Context)
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
Data Clustering: 50 Years Beyond KMeans
, 2008
"... Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and m ..."
Abstract

Cited by 274 (6 self)
 Add to MetaCart
Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is exploratory in nature to find structure in data. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, Kmeans, was first published in 1955. In spite of the fact that Kmeans was proposed over 50 years ago and thousands of clustering algorithms have been published since then, Kmeans is still widely used. This speaks to the difficulty of designing a general purpose clustering algorithm and the illposed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semisupervised clustering, ensemble clustering, simultaneous feature selection, and data clustering and large scale data clustering.
Evaluation of Hierarchical Clustering Algorithms for Document Datasets
 Data Mining and Knowledge Discovery
, 2002
"... Fast and highquality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering solutions provide a view of the data at ..."
Abstract

Cited by 252 (6 self)
 Add to MetaCart
(Show Context)
Fast and highquality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering solutions provide a view of the data at different levels of granularity, making them ideal for people to visualize and interactively explore large document collections.
Transductive Learning via Spectral Graph Partitioning
 In ICML
, 2003
"... We present a new method for transductive learning, which can be seen as a transductive version of the k nearestneighbor classifier. ..."
Abstract

Cited by 243 (0 self)
 Add to MetaCart
(Show Context)
We present a new method for transductive learning, which can be seen as a transductive version of the k nearestneighbor classifier.
A Minmax Cut Algorithm for Graph Partitioning and Data Clustering
, 2001
"... An important application of graph partitioning is data clustering using a graph model  the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objec ..."
Abstract

Cited by 211 (15 self)
 Add to MetaCart
An important application of graph partitioning is data clustering using a graph model  the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objective function that follows the minmax clustering principle. The relaxed version of the optimization of the minmax cut objective function leads to the Fiedler vector in spectral graph partition. Theoretical analyses of minmax cut indicate that it leads to balanced partitions, and lower bonds are derived. The minmax cut algorithm is tested on newsgroup datasets and is found to outperform other current popular partitioning/clustering methods. The linkagebased re nements in the algorithm further improve the quality of clustering substantially. We also demonstrate that the linearized search order based on linkage differential is better than that based on the Fiedler vector, providing another effective partition method.