## Dissimilarity in graph-based semisupervised classification (2007)

### Cached

### Download Links

Venue: | Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS |

Citations: | 18 - 2 self |

### BibTeX

@INPROCEEDINGS{Goldberg07dissimilarityin,

author = {Andrew B. Goldberg},

title = {Dissimilarity in graph-based semisupervised classification},

booktitle = {Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS},

year = {2007}

}

### OpenURL

### Abstract

Label dissimilarity specifies that a pair of examples probably have different class labels. We present a semi-supervised classification algorithm that learns from dissimilarity and similarity information on labeled and unlabeled data. Our approach uses a novel graphbased encoding of dissimilarity that results in a convex problem, and can handle both binary and multiclass classification. Experiments on several tasks are promising. 1

### Citations

504 | S.: Distance metric learning with application to clustering with side-information
- Xing, Ng, et al.
- 2003
(Show Context)
Citation Context ...ge that A and B probably have different labels (political views). Such dissimilarity knowledge has been extensively studied in semi-supervised clustering, where such pairs are known as “cannot-links” =-=[1, 6, 13, 14, 18]-=-, meaning they cannot be in the same cluster. These methods either directly modify the clustering algorithm, or Xiaojin Zhu Department of Computer Sciences University of Wisconsin, Madison Madison, WI... |

490 | Semisupervised learning using gaussian fields and harmonic functions
- Zhu, Ghahramani, et al.
- 2003
(Show Context)
Citation Context ...e classes. Our contribution is a convex method that incorporates both similarity and dissimilarity in semi-supervised learning. We start with graph-based semi-supervised classification methods (e.g., =-=[2, 20]-=-), which allows a natural combination of similarity and dissimilarity. Existing graph-based semi-supervised learning methods encode label similarity knowledge, but they cannot handle dissimilarity eas... |

451 | Semi-supervised learning literature survey
- Zhu
- 2007
(Show Context)
Citation Context ...ents on several tasks are promising. 1 INTRODUCTION Semi-supervised classification learns a classifier from both labeled and unlabeled data by encoding domain knowledge on unlabeled data in the model =-=[3, 11, 19]-=-. In this paper we focus on a particular form of domain knowledge: the label dissimilarity between examples. We assume we are given a set of dissimilarity pairs D = {(i,j)}. For (i,j) ∈ D, the two poi... |

365 | On the algorithmic implementation of multiclass kernel-based vector machines
- Crammer, Singer, et al.
- 2001
(Show Context)
Citation Context ...bjective in order to incorporate dissimilarity. For simplicity we focus on multiclass SVMs, but our method works for other loss functions too. There are several formulations of multiclass SVMs, e.g., =-=[5, 7, 17]-=-. For our purpose it is important to anchor the discriminant functions around zero. For this reason we start with the formulation in [7]. A k-class SVM is defined as the optimization problem of findin... |

326 | Constrained k-means clustering with background knowledge
- Wagstaff, Cardie, et al.
- 1999
(Show Context)
Citation Context ...ge that A and B probably have different labels (political views). Such dissimilarity knowledge has been extensively studied in semi-supervised clustering, where such pairs are known as “cannot-links” =-=[1, 6, 13, 14, 18]-=-, meaning they cannot be in the same cluster. These methods either directly modify the clustering algorithm, or Xiaojin Zhu Department of Computer Sciences University of Wisconsin, Madison Madison, WI... |

217 | Multi-class support vector machines
- Weston, Watkins
- 1998
(Show Context)
Citation Context ...bjective in order to incorporate dissimilarity. For simplicity we focus on multiclass SVMs, but our method works for other loss functions too. There are several formulations of multiclass SVMs, e.g., =-=[5, 7, 17]-=-. For our purpose it is important to anchor the discriminant functions around zero. For this reason we start with the formulation in [7]. A k-class SVM is defined as the optimization problem of findin... |

185 | On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs
- Weiss, Freeman
(Show Context)
Citation Context ...to be bounded or {−∞, ∞} will be a trivial minimizer. Second, any negative weight in W will make (1), and ultimately the whole semi-supervised problem, non-convex. One has to resort to approximations =-=[10, 15, 16]-=-. It is highly desirable to keep the optimization problem convex. 2.1 MIXED GRAPHS Let us assume y ∈ {−1,1} for binary classification. Our key idea is to encode dissimilarity between i,j as wij(f(xi) ... |

176 | Multicategory support vector machines, theory, and application to the classi cation of microarray data and satellite radiance data
- Lee, Lee, et al.
- 2004
(Show Context)
Citation Context ...bjective in order to incorporate dissimilarity. For simplicity we focus on multiclass SVMs, but our method works for other loss functions too. There are several formulations of multiclass SVMs, e.g., =-=[5, 7, 17]-=-. For our purpose it is important to anchor the discriminant functions around zero. For this reason we start with the formulation in [7]. A k-class SVM is defined as the optimization problem of findin... |

165 | Learning with labeled and unlabeled data
- Seeger
- 2002
(Show Context)
Citation Context ...ents on several tasks are promising. 1 INTRODUCTION Semi-supervised classification learns a classifier from both labeled and unlabeled data by encoding domain knowledge on unlabeled data in the model =-=[3, 11, 19]-=-. In this paper we focus on a particular form of domain knowledge: the label dissimilarity between examples. We assume we are given a set of dissimilarity pairs D = {(i,j)}. For (i,j) ∈ D, the two poi... |

117 |
Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing
- Christopher
- 1999
(Show Context)
Citation Context ...l posts (excluding quoted text) written by a user. We removed punctuation and common English words, and applied stemming. We then formed term frequencyinverse document frequency (TF-IDF) vectors (see =-=[8]-=-) for each user using word types occurring 10 or more times, which resulted in 8656 unique terms. We created dissimilarity edges by the quoting behavior between users. In political discussion boards, ... |

112 | Beyond the point cloud: from transductive to semi-supervised learning
- Sindhwani, Niyogi, et al.
(Show Context)
Citation Context ...dge, but they cannot handle dissimilarity easily, as we show in Section 2. We define a mixed graph to accommodate both, and define the analog of graph Laplacian. We then adapt manifold regularization =-=[12]-=- to the mixed graph. We extend our method to multiclass classification in Section 3, and present experimental results in Section 4. 2 DISSIMILARITY IN BINARY CLASSIFICATION Let there be n items, of wh... |

108 | MAP Estimation via Agreement on (Hyper)Trees: Message-Passing and Linear-Programming Approaches
- Wainwright, Jaakkola, et al.
- 2005
(Show Context)
Citation Context ...to be bounded or {−∞, ∞} will be a trivial minimizer. Second, any negative weight in W will make (1), and ultimately the whole semi-supervised problem, non-convex. One has to resort to approximations =-=[10, 15, 16]-=-. It is highly desirable to keep the optimization problem convex. 2.1 MIXED GRAPHS Let us assume y ∈ {−1,1} for binary classification. Our key idea is to encode dissimilarity between i,j as wij(f(xi) ... |

42 |
Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples
- Belkin
- 2006
(Show Context)
Citation Context ...e classes. Our contribution is a convex method that incorporates both similarity and dissimilarity in semi-supervised learning. We start with graph-based semi-supervised classification methods (e.g., =-=[2, 20]-=-), which allows a natural combination of similarity and dissimilarity. Existing graph-based semi-supervised learning methods encode label similarity knowledge, but they cannot handle dissimilarity eas... |

32 | Quadratic programming relaxations for metric labeling and Markov random field MAP estimation
- Ravikumar, Lafferty
- 2006
(Show Context)
Citation Context ...to be bounded or {−∞, ∞} will be a trivial minimizer. Second, any negative weight in W will make (1), and ultimately the whole semi-supervised problem, non-convex. One has to resort to approximations =-=[10, 15, 16]-=-. It is highly desirable to keep the optimization problem convex. 2.1 MIXED GRAPHS Let us assume y ∈ {−1,1} for binary classification. Our key idea is to encode dissimilarity between i,j as wij(f(xi) ... |

30 |
Relational learning with Gaussian processes
- Chu, Sindhwani, et al.
- 2006
(Show Context)
Citation Context ...cally applies to classification, and works on discriminant functions. Dissimilarity as negative correlation on discriminant functions has been discussed in relational learning with Gaussian processes =-=[4]-=-, but their formulation is non-convex and applies only to binary classification. In contrast our formulation is convex and applicable to multiple classes. Our contribution is a convex method that inco... |

23 |
Probabilistic semi-supervised clustering with constraints. Semi-supervised learning
- Basu, Bilenko, et al.
- 2006
(Show Context)
Citation Context ...ge that A and B probably have different labels (political views). Such dissimilarity knowledge has been extensively studied in semi-supervised clustering, where such pairs are known as “cannot-links” =-=[1, 6, 13, 14, 18]-=-, meaning they cannot be in the same cluster. These methods either directly modify the clustering algorithm, or Xiaojin Zhu Department of Computer Sciences University of Wisconsin, Madison Madison, WI... |

20 | A preliminary investigation into sentiment analysis of informal political discourse
- Mullen, Malouf
- 2006
(Show Context)
Citation Context ...a person’s political view (left, right) from his/her postings to online blogs. The fact that person B quotes person A and uses expletives near the quote is a strong indication that B disagrees with A =-=[9]-=-. Simple text processing thus allows us to create a dissimilarity pair (A,B) to reflect our knowledge that A and B probably have different labels (political views). Such dissimilarity knowledge has be... |

9 | Kernel regression with order preferences
- Zhu, Goldberg
- 2007
(Show Context)
Citation Context ... the number of unlabeled points that are involved in any dissimilarity edge, plus the number of labeled points l. The Representer Theorem in [7] needs to be extended to include these unlabeled points =-=[21]-=-. In particular, the minimizing functions for (8) have the form fj(x) = n� cijK(xi,x) + bj for j = 1, · · · ,k (9) i=1 The essential difference to supervised learning is that we now have n rather than... |

8 |
Nozha Boujemaa,"Unsupervised and Semi-supervised Clustering: a Brief Survey
- Grira, Crucianu
(Show Context)
Citation Context |

1 | Correlation clustering for crosslingual link detection
- Gael, Zhu
- 2007
(Show Context)
Citation Context ...ge that A and B probably have different labels (political views). Such dissimilarity knowledge has been extensively studied in semi-supervised clustering, where such pairs are known as “cannot-links” =-=[1, 6, 13, 14, 18]-=-, meaning they cannot be in the same cluster. These methods either directly modify the clustering algorithm, or change the underlying distance metric. Our method is different in that it specifically a... |