## Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

Citations: | 1 - 1 self |

### BibTeX

@MISC{Wu_learningbregman,

author = {Lei Wu and Rong Jin and Steven C. H. Hoi and Jianke Zhu and Nenghai Yu},

title = {Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering},

year = {}

}

### OpenURL

### Abstract

Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1

### Citations

504 | S.: Distance metric learning with application to clustering with side-information
- Xing, Ng, et al.
- 2003
(Show Context)
Citation Context ... provided in the form of pairwise constraints, i.e., must-link constraints for pairs of similar data points and cannot-link constraints for pairs of dissimilar data points. Example algorithms include =-=[14, 2, 7, 10]-=-. Most distance learning methods assume a Mahalanobis distance. Given two data points x and x ′, the distance between x and x ′ is calculated by d(x, x ′) = (x − x ′) ⊤A(x − x ′), where A is the dista... |

326 | Constrained k-means clustering with background knowledge
- Wagstaff, Cardie, et al.
- 1999
(Show Context)
Citation Context ...roposed Bregman distance learning method using the k-means algorithm for semisupervised clustering, termed Bk-means, with the following approaches: (1) a standard k-means, (2) the constrained k-means =-=[12]-=- (Ck-means), (3) Ck-means with distance learned by RCA [2], (4) Ck-means with distance learned by DCA [7], (5) Ck-means with distance learned by the Xing’s algorithm [14] (Xing), (6) Ck-means with inf... |

326 | Distance Metric Learning for Large Margin Nearest Neighbor Classification
- Weinberger, Blitzer, et al.
- 2006
(Show Context)
Citation Context ...ween two multivariate Gaussians. Neighborhood Component Analysis (NCA) [5] learns a distance metric by extending the nearest neighbor classifier. The maximum-margin nearest neighbor (LMNN) classifier =-=[13]-=- extends NCA through a maximum margin framework. Yang et al. [15] propose a Local Distance Metric (LDM) that addresses multimodal data distributions. In addition to learning a distance metric, several... |

310 | Clustering with bregman divergences
- Banerjee, Merugu, et al.
- 2005
(Show Context)
Citation Context ...en convex function φ(x). Since the local distance metric can be derived from the local Hessian matrix of ϕ(x), Bregman distance function avoids the assumption of fixed distance metric. Recent studies =-=[1]-=- also reveal the connection between Bregman distances and exponential families of distributions. For example, Kullback-Leibler divergence is a special Bregman distance when choosing the negative entro... |

280 | Pegasos: Primal estimated sub-gradient solver for SVM
- Shalev-Shwartz, Singer, et al.
(Show Context)
Citation Context ...∇αL]k , b = b − γt∇bL (17) where α t+1 k is the k-th element of vector αt+1 , πG(x) projects x into the domain G, and γt is the step size that is set to be γt = C t by following the Pegasos algorithm =-=[9]-=- for solving SVMs. The pseudo-code of the proposed algorithm is summarized in Algorithm 1. Algorithm 1 Algorithm of Learning Bregman Distance Functions INPUT: • data matrix: X ∈ R N×d • pair-wise cons... |

259 |
The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming
- Bregman
- 1967
(Show Context)
Citation Context ...ead to suboptimal solutions. To address these two limitations, we propose a novel scheme that learns Bregman distance functions from the given side information. Bregman distance or Bregman divergence =-=[3]-=- has several salient properties for distance measure. Bregman distance generalizes the class of Mahalanobis distance by deriving a distance function from a given convex function φ(x). Since the local ... |

149 | Information-theoretic metric learning
- Davis, Kulis, et al.
- 2007
(Show Context)
Citation Context ...alysis (DCA) in [7], which learns a distance metric by minimizing the distance between similar data points and in the meantime max1imizing the distance between dissimilar data points. The authors in =-=[4]-=- proposed an informationtheoretic based approach for metric learning (ITML) approach that learns the Mahalanobis distance by minimizing the differential relative entropy between two multivariate Gauss... |

108 |
Learning a mahalanobis metric from equivalence constraints
- BAR-HILLEL, SHENTAL, et al.
- 2005
(Show Context)
Citation Context ... provided in the form of pairwise constraints, i.e., must-link constraints for pairs of similar data points and cannot-link constraints for pairs of dissimilar data points. Example algorithms include =-=[14, 2, 7, 10]-=-. Most distance learning methods assume a Mahalanobis distance. Given two data points x and x ′, the distance between x and x ′ is calculated by d(x, x ′) = (x − x ′) ⊤A(x − x ′), where A is the dista... |

41 | Learning distance metrics with contextual constraints for image retrieval
- Hoi, Liu, et al.
- 2006
(Show Context)
Citation Context ... provided in the form of pairwise constraints, i.e., must-link constraints for pairs of similar data points and cannot-link constraints for pairs of dissimilar data points. Example algorithms include =-=[14, 2, 7, 10]-=-. Most distance learning methods assume a Mahalanobis distance. Given two data points x and x ′, the distance between x and x ′ is calculated by d(x, x ′) = (x − x ′) ⊤A(x − x ′), where A is the dista... |

29 | An efficient algorithm for local distance metric learning
- Yang, Jin, et al.
(Show Context)
Citation Context ...(NCA) [5] learns a distance metric by extending the nearest neighbor classifier. The maximum-margin nearest neighbor (LMNN) classifier [13] extends NCA through a maximum margin framework. Yang et al. =-=[15]-=- propose a Local Distance Metric (LDM) that addresses multimodal data distributions. In addition to learning a distance metric, several studies [11, 6] are devoted to learning a distance function, mos... |

25 |
Learning a kernel function for classification with small training samples
- Hertz, Hillel, et al.
- 2006
(Show Context)
Citation Context ...NCA through a maximum margin framework. Yang et al. [15] propose a Local Distance Metric (LDM) that addresses multimodal data distributions. In addition to learning a distance metric, several studies =-=[11, 6]-=- are devoted to learning a distance function, mostly non-metric, from the side information. Despite the success, the existing approaches for distance metric learning are limited in two aspects. First,... |

17 | Semi-supervised distance metric learning for collaborative image retrieval and clustering
- Hoi, Liu, et al.
(Show Context)
Citation Context ... provided in the form of pairwise constraints, i.e., must-link constraints for pairs of similar data points and cannot-link constraints for pairs of dissimilar data points. Example algorithms include =-=[16, 2, 8, 11, 7, 15]-=-. Most distance learning methods assume a Mahalanobis distance. Given two data points x and x ′ , the distance between x and x ′ is calculated by d(x, x ′ ) = (x − x ′ ) ⊤ A(x − x ′ ), where A is the ... |

16 |
Neighborhood component analysis
- Goldberger, Roweis, et al.
- 2004
(Show Context)
Citation Context ...oach for metric learning (ITML) approach that learns the Mahalanobis distance by minimizing the differential relative entropy between two multivariate Gaussians. Neighborhood Component Analysis (NCA) =-=[5]-=- learns a distance metric by extending the nearest neighbor classifier. The maximum-margin nearest neighbor (LMNN) classifier [13] extends NCA through a maximum margin framework. Yang et al. [15] prop... |

15 | Boostcluster: boosting clustering by pairwise constraints
- Liu, Jin, et al.
- 2007
(Show Context)
Citation Context ...by a boosting algorithm (DistBoost) [11]. To evaluate the clustering performance, we use the some standard performance metrics, including pairwise Precision, pairwise Recall, and pairwise F1 measures =-=[8]-=-, which are evaluated base on the pairwise results. Specifically, pairwise precision is the ratio of the number of correct pairs placed in the same cluster over the total number of pairs placed in the... |

14 | Distance metric learning from uncertain side information with application to automated photo tagging
- Wu, Hoi, et al.
- 2009
(Show Context)
Citation Context ... provided in the form of pairwise constraints, i.e., must-link constraints for pairs of similar data points and cannot-link constraints for pairs of dissimilar data points. Example algorithms include =-=[16, 2, 8, 11, 7, 15]-=-. Most distance learning methods assume a Mahalanobis distance. Given two data points x and x ′ , the distance between x and x ′ is calculated by d(x, x ′ ) = (x − x ′ ) ⊤ A(x − x ′ ), where A is the ... |

12 | Collaborative image retrieval via regularized metric learning
- Si, Jin, et al.
(Show Context)
Citation Context |

2 | Boosting margin based distance functions for clustering
- Tomboy, Bar-hillel, et al.
- 2004
(Show Context)
Citation Context ...NCA through a maximum margin framework. Yang et al. [15] propose a Local Distance Metric (LDM) that addresses multimodal data distributions. In addition to learning a distance metric, several studies =-=[11, 6]-=- are devoted to learning a distance function, mostly non-metric, from the side information. Despite the success, the existing approaches for distance metric learning are limited in two aspects. First,... |