## Information Theoretical Clustering via Semidefinite Programming

### BibTeX

@MISC{Wang_informationtheoretical,

author = {Meihong Wang and Fei Sha},

title = {Information Theoretical Clustering via Semidefinite Programming},

year = {}

}

### OpenURL

### Abstract

We propose techniques of convex optimization for information theoretical clustering. The clustering objective is to maximize the mutual information between data points and cluster assignments. We formulate this problem first as an instance of max k cut on weighted graphs. We then apply the technique of semidefinite programming (SDP) relaxation to obtain a convex SDP problem. We show how the solution of the SDP problem can be further improved with a lowrank refinement heuristic. The low-rank solution reveals more clearly the cluster structure of the data. Empirical studies on several datasets demonstrate the effectiveness of our approach. In particular, the approach outperforms several other clustering algorithms when compared on standard evaluation metrics. 1

### Citations

3666 | Convex Optimization - Boyd, Vandenberghe - 2004 |

1097 | On spectral clustering: Analysis and an algorithm
- Ng, Jordan, et al.
- 2001
(Show Context)
Citation Context ...ta lie on a lowdimensional submanifold, then we can use (geodesic) distances on the manifold instead of Euclidean distances in the embedding space. This leads to the technique of spectral clustering (=-=Ng et al., 2001-=-).It is also easy to see how kernel tricks can be applied to formulate distances with inner products in nonlinear feature spaces, resulting kernelized K-means. Information theoretic clustering (ITC) h... |

937 | Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming
- Goemans, Williamson
- 1995
(Show Context)
Citation Context ... Itiseasytoseethattheinteger programming implements such cut with pairwise weights given by −L. Max k cut has recently been attacked with semidefinite programming (SDP) relaxation with great success (=-=Goemans and Williamson, 1995-=-; Frieze and Jerrum, 1997). We adopt the same strategy here. 3.2 SDP relaxation We first relax the constraint that G needs to be a binary matrix. Instead, we constrain G to be a positive semidefinite ... |

895 |
Approximation Algorithms
- Vazirani
- 2001
(Show Context)
Citation Context ... each cluster has N/K data points and then each data point needs to be assigned to a cluster. TheintegerprogrammingproblemisNP-hardtosolve. In fact, it is an instance of max k cut on weighted graphs (=-=Vazirani, 2001-=-; Hochbaum, 1997). In max k cut, we seek K disjoint partitions that maximize the sum of the weights on the edges which have two verticesindifferentpartitions. Itiseasytoseethattheinteger programming i... |

640 | UCI machine learning repository - Asuncion, Newman - 2007 |

572 |
Approximation algorithms for NP-hard problems
- Hochbaum
- 1997
(Show Context)
Citation Context ...s N/K data points and then each data point needs to be assigned to a cluster. TheintegerprogrammingproblemisNP-hardtosolve. In fact, it is an instance of max k cut on weighted graphs (Vazirani, 2001; =-=Hochbaum, 1997-=-). In max k cut, we seek K disjoint partitions that maximize the sum of the weights on the edges which have two verticesindifferentpartitions. Itiseasytoseethattheinteger programming implements such c... |

455 |
Objective criteria for the evaluation of clustering methods
- Rand
- 1971
(Show Context)
Citation Context ...: 59, 71, and 48. The glass data set is even more skewed, having 70, 76, 17, 13, 9, and 29 data points in its six classes. Evaluation metric. We evaluate clustering results with the RAND index score (=-=Rand, 1971-=-), a standard nonparametric measure of clustering quality. RAND computes the agreements between two sets of different partitions, P1 and P2, of the same data set. Each partition is viewed as a collect... |

161 | Improved Approximation Algorithms for Max k-cut and Max Bisection.", Integer Programming and Combinatorial Optimization
- Frieze, Jerrum
- 1995
(Show Context)
Citation Context ...rogramming implements such cut with pairwise weights given by −L. Max k cut has recently been attacked with semidefinite programming (SDP) relaxation with great success (Goemans and Williamson, 1995; =-=Frieze and Jerrum, 1997-=-). We adopt the same strategy here. 3.2 SDP relaxation We first relax the constraint that G needs to be a binary matrix. Instead, we constrain G to be a positive semidefinite matrix whose elements are... |

73 | Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices - FAZEL, HINDI, et al. - 2003 |

60 |
Sample estimate of the entropy of a random vector
- Kozachenko, Leonenko
- 1987
(Show Context)
Citation Context ...ta points and their cluster memberships. To overcome the difficulty of estimating MI between highdimensional variables, ITC uses pairwise distances based non-parametric statistics (Wang et al., 2009; =-=Kozachenko and Leonenko, 1987-=-). Maximizing the mutual information criterion, however, still remains challenging as it is a NP-hard combinator optimization. The earlier work uses a local search procedure, sequentially and greedily... |

30 |
A dependence maximization view of clustering
- Song, Smola, et al.
- 2007
(Show Context)
Citation Context ... conditional entropy) as a clustering criterion was first reported in (Faivishevsky and Goldberger, 2010). The idea of using information theoretical measure for clustering can also be traced back to (=-=Song et al., 2007-=-), where they have employed a different estimation technique called Hilbert-Schmidt Independence Criterion (HSIC) to measure (in)dependency between random variables. Specifically, they map random vari... |

28 | Derandomized dimensionality reduction with applications - Engebretsen, Indyk, et al. - 2002 |

21 | Fast SDP relaxations of graph cut clustering, transduction, and other combinatorial problems - Bie, Cristianini |

18 | On semidefinite relaxation of normalized k-cut and connections to spectral clustering
- Xing, Jordan
- 2003
(Show Context)
Citation Context ...ustering and max k cut has long been noted. Various spectral and SDP relaxationtechniqueshavebeen developedin similarveinas ourstosolveK-meansclusteringasacombinatorialoptimization (Zha et al., 2002; =-=Xing and Jordan, 2003-=-; Sugiyama et al., 2010; Bie and Cristianini, 2006)). In our experiments, we explore this strategy by replacing the objective function for information theoretical clustering from Trace[GL] to that for... |

17 | Divergence estimation for multidimensional densities via k-nearest-neighbor distances - Wang, Kulkarni, et al. |

11 | Discriminative clustering by regularized information maximization - Gomes, Krause, et al. - 2010 |

4 | ICA based on a smooth estimation of the differential entropy
- Faivishevsky, Goldberger
(Show Context)
Citation Context ...g ˆ Hk(X) over all possible k from 1 to (N−1) leads to a simplified estimator, ˆH(X) = = N−1 1 ∑ (N−1) k=1 D N(N−1) ∑ i̸=j ˆHk(X) log‖xi −xj‖ 2 2 +const. (3) This estimator was first investigated in (=-=Faivishevsky and Goldberger, 2009-=-, 2010) and can be understood intuitively as follows. To estimate the entropy, one would need to obtain an unbiased estimator of −logp(xi) such that H(X) ≈ 1 N ∑ −logp(xi). (4) i For one-dimensional X... |

1 | Fei Sha the 27th international conference on - Wang - 2010 |

1 | Anonlinearprogramming algorithm for solving semidefinite programs via low-rank factorization - Monteiro |

1 | Implementation of a primal– dual method for SDP on a shared memory parallel architecture - ISBN |