Results 11  20
of
176
Clustering categorical data: An approach based on dynamical systems
, 1998
"... We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data, ” we mean tables with fields that cannot be naturally ordered by a metric e.g., the names of producers of automobiles, or the names of product ..."
Abstract

Cited by 176 (1 self)
 Add to MetaCart
We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data, ” we mean tables with fields that cannot be naturally ordered by a metric e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the cooccurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of nonlinear dynamical systems. We discuss experiments on a variety of tables of synthetic and real data; we find that our iterative methods converge quickly to prominently correlated values of various categorical fields.
Weighted graph cuts without eigenvectors: A multilevel approach
 IEEE Trans. Pattern Anal. Mach. Intell
, 2007
"... Abstract—A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel kmeans are two of the main methods. In this paper, we discuss an equivalence between the objective functions used in these seemingly different method ..."
Abstract

Cited by 165 (22 self)
 Add to MetaCart
(Show Context)
Abstract—A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel kmeans are two of the main methods. In this paper, we discuss an equivalence between the objective functions used in these seemingly different methods—in particular, a general weighted kernel kmeans objective is mathematically equivalent to a weighted graph clustering objective. We exploit this equivalence to develop a fast highquality multilevel algorithm that directly optimizes various weighted graph clustering objectives, such as the popular ratio cut, normalized cut, and ratio association criteria. This eliminates the need for any eigenvector computation for graph clustering problems, which can be prohibitive for very large graphs. Previous multilevel graph partitioning methods such as Metis have suffered from the restriction of equalsized clusters; our multilevel algorithm removes this restriction by using kernel kmeans to optimize weighted graph cuts. Experimental results show that our multilevel algorithm outperforms a stateoftheart spectral clustering algorithm in terms of speed, memory usage, and quality. We demonstrate that our algorithm is applicable to largescale clustering tasks such as image segmentation, social network analysis, and gene network analysis. Index Terms—Clustering, data mining, segmentation, kernel kmeans, spectral clustering, graph partitioning. 1
A Nonlinear Programming Algorithm for Solving Semidefinite Programs via Lowrank Factorization
 Mathematical Programming (series B
, 2001
"... In this paper, we present a nonlinear programming algorithm for solving semidefinite programs (SDPs) in standard form. The algorithm's distinguishing feature is a change of variables that replaces the symmetric, positive semidefinite variable X of the SDP with a rectangular variable R according ..."
Abstract

Cited by 153 (10 self)
 Add to MetaCart
In this paper, we present a nonlinear programming algorithm for solving semidefinite programs (SDPs) in standard form. The algorithm's distinguishing feature is a change of variables that replaces the symmetric, positive semidefinite variable X of the SDP with a rectangular variable R according to the factorization X = RR T . The rank of the factorization, i.e., the number of columns of R, is chosen minimally so as to enhance computational speed while maintaining equivalence with the SDP. Fundamental results concerning the convergence of the algorithm are derived, and encouraging computational results on some largescale test problems are also presented. Keywords: semidefinite programming, lowrank factorization, nonlinear programming, augmented Lagrangian, limited memory BFGS. 1 Introduction In the past few years, the topic of semidefinite programming, or SDP, has received considerable attention in the optimization community, where interest in SDP has included the investigation of...
Semidefinite optimization
 Acta Numerica
, 2001
"... Optimization problems in which the variable is not a vector but a symmetric matrix which is required to be positive semidefinite have been intensely studied in the last ten years. Part of the reason for the interest stems from the applicability of such problems to such diverse areas as designing the ..."
Abstract

Cited by 152 (2 self)
 Add to MetaCart
(Show Context)
Optimization problems in which the variable is not a vector but a symmetric matrix which is required to be positive semidefinite have been intensely studied in the last ten years. Part of the reason for the interest stems from the applicability of such problems to such diverse areas as designing the strongest column, checking the stability of a differential inclusion, and obtaining tight bounds for hard combinatorial optimization problems. Part also derives from great advances in our ability to solve such problems efficiently in theory and in practice (perhaps “or ” would be more appropriate: the most effective computational methods are not always provably efficient in theory, and vice versa). Here we describe this class of optimization problems, give a number of examples demonstrating its significance, outline its duality theory, and discuss algorithms for solving such problems.
Some Applications of Laplace Eigenvalues of Graphs
 GRAPH SYMMETRY: ALGEBRAIC METHODS AND APPLICATIONS, VOLUME 497 OF NATO ASI SERIES C
, 1997
"... In the last decade important relations between Laplace eigenvalues and eigenvectors of graphs and several other graph parameters were discovered. In these notes we present some of these results and discuss their consequences. Attention is given to the partition and the isoperimetric properties of ..."
Abstract

Cited by 129 (0 self)
 Add to MetaCart
In the last decade important relations between Laplace eigenvalues and eigenvectors of graphs and several other graph parameters were discovered. In these notes we present some of these results and discuss their consequences. Attention is given to the partition and the isoperimetric properties of graphs, the maxcut problem and its relation to semidefinite programming, rapid mixing of Markov chains, and to extensions of the results to infinite graphs.
Semidefinite Programming and Combinatorial Optimization
 DOC. MATH. J. DMV
, 1998
"... We describe a few applications of semide nite programming in combinatorial optimization. ..."
Abstract

Cited by 109 (1 self)
 Add to MetaCart
(Show Context)
We describe a few applications of semide nite programming in combinatorial optimization.
SPECTRAL CLUSTERING AND THE HIGHDIMENSIONAL STOCHASTIC BLOCKMODEL
 SUBMITTED TO THE ANNALS OF STATISTICS
"... Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the stru ..."
Abstract

Cited by 98 (7 self)
 Add to MetaCart
Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities. The Stochastic Blockmodel (Holland, Laskey and Leinhardt, 1983) is a social network model with well defined communities; each node is a member of one community. For a network generated from the Stochastic Blockmodel, we bound the number of nodes “misclustered” by spectral clustering. The asymptotic results in this paper are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name highdimensional. In order to study spectral clustering under the Stochastic Blockmodel, we first show that under the more general latent space model, the eigenvectors of the normalized graph Laplacian asymptotically converge to the eigenvectors of a “population” normalized graph Laplacian. Aside from the implication for spectral clustering, this provides insight into a graph visualization technique. Our method of studying the eigenvectors of random matrices is original.
A survey of kernel and spectral methods for clustering
, 2008
"... Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of ..."
Abstract

Cited by 88 (5 self)
 Add to MetaCart
Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., Kmeans, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel Kmeans clustering algorithm.
Spectral Partitioning: The More Eigenvectors, the Better
 PROC. ACM/IEEE DESIGN AUTOMATION CONF
, 1995
"... The graph partitioning problem is to divide the vertices of a graph into disjoint clusters to minimize the total cost of the edges cut by the clusters. A spectral partitioning heuristic uses the graph's eigenvectors to construct a geometric representation of the graph (e.g., linear orderings) w ..."
Abstract

Cited by 76 (3 self)
 Add to MetaCart
(Show Context)
The graph partitioning problem is to divide the vertices of a graph into disjoint clusters to minimize the total cost of the edges cut by the clusters. A spectral partitioning heuristic uses the graph's eigenvectors to construct a geometric representation of the graph (e.g., linear orderings) which are subsequently partitioned. Our main result shows that when all the eigenvectors are used, graph partitioning reduces to a new vector partitioning problem. This result implies that as many eigenvectors as are practically possible should be used to construct a solution. This philosophy isincontrast to that of the widelyused spectral bipartitioning (SB) heuristic (which uses a single eigenvector to construct a 2way partitioning) and several previous multiway partitioning heuristics [7][10][16][26][37] (which usek eigenvectors to construct a kway partitioning). Our result motivates a simple ordering heuristic that is a multipleeigenvector extension of SB. This heuristic not only signi cantly outperforms SB, but can also yield excellent multiway VLSI circuit partitionings as compared to [1] [10]. Our experiments suggest that the vector partitioning perspective opens the door to new and effective heuristics.
A unified view of kernel kmeans, spectral clustering and graph cuts
, 2004
"... Recently, a variety of clustering algorithms have been proposed to handle data that is not linearly separable. Spectral clustering and kernel kmeans are two such methods that are seemingly quite different. In this paper, we show that a general weighted kernel kmeans objective is mathematically equ ..."
Abstract

Cited by 73 (6 self)
 Add to MetaCart
(Show Context)
Recently, a variety of clustering algorithms have been proposed to handle data that is not linearly separable. Spectral clustering and kernel kmeans are two such methods that are seemingly quite different. In this paper, we show that a general weighted kernel kmeans objective is mathematically equivalent to a weighted graph partitioning objective. Special cases of this graph partitioning objective include ratio cut, normalized cut and ratio association. Our equivalence has important consequences: the weighted kernel kmeans algorithm may be used to directly optimize the graph partitioning objectives, and conversely, spectral methods may be used to optimize the weighted kernel kmeans objective. Hence, in cases where eigenvector computation is prohibitive, we eliminate the need for any eigenvector computation for graph partitioning. Moreover, we show that the KernighanLin objective can also be incorporated into our framework, leading to an incremental weighted kernel kmeans algorithm for local optimization of the objective. We further discuss the issue of convergence of weighted kernel kmeans for an arbitrary graph affinity matrix and provide a number of experimental results. These results show that nonspectral methods for graph partitioning are as effective as spectral methods and can be used for problems such as image segmentation in addition to data clustering.