#### DMCA

## 3 Probabilistic Semi-Supervised Clustering with Constraints

### Cached

### Download Links

### Citations

11696 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ... model parameters are unknown in a clustering setting, minimizing (3.9) is an "incomplete-data problem". A popular solution technique for such problems the is Expectation Maximization (EM) algorithm [=-=Dempster et al., 1977-=-]. The K-Means algorithm [MacQueen, 1967] is known to be equivalent to the EM algorithm with hard clustering assignments, under certain assumptions [Kearns et al., 1997, Basu et al., 2002, Banerjee et... |

8753 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...mputing cluster assignments that approximate the optimal solution in this framework, e.g., iterated conditional modesgreedy ICM assignment (ICM) [Besag, 1986, Zhang et al., 2001], belief propagation [=-=Pearl, 1988-=-, Segal et al., 2003], and linear programming relaxation [Kleinberg and Tardos, 1999]. ICM is a greedy strategy that sequentially updates the cluster assignment of each point, keeping the assignments ... |

5036 |
Stochastic relaxation, gibbs distributions, and the bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...or cannot-linked to xi. Then, by the Hammersley-Clifford theorem [Hammersley and Clifford, 1971], the prior probability of a particular label configuration Y can be expressed as a Gibbs distribution [=-=Geman and Geman, 1984-=-], so that P(Y |\Theta , C) = 1Z exp (-v(Y )) = 1Z exp A~- X Ni2N vNi(Y )! , (3.2) where N is the set of all neighborhoods, Z is the partition function (normalizing term), and v(Y ) is the overall lab... |

3724 | Normalized cuts and image segmentation - Shi, Malik - 2000 |

2968 | Some methods for classification and analysis of multivariate observations
- MacQueen
- 1967
(Show Context)
Citation Context ... cost function, involving a distortion measure between the points and the cluster representatives, is minimized. A well-known unsupervised clustering algorithm that follows this framework is K-Means [=-=MacQueen, 1967-=-]. Our semi-supervised clustering model considers a sample of n data points X =problem setting (x1, . . . , xn), each xi 2 Rd being a d-dimensional vector, with xim representing its m-th component. Th... |

1673 | On spectral clustering: Analysis and an algorithm - Ng, Jordan, et al. - 2001 |

796 | Distance metric learning with application to clustering with side-information
- Xing, Ng, et al.
- 2003
(Show Context)
Citation Context ...is parameterized and the parameter values are learned to bring mustlinked points together and take cannot-linked points further apart [Bilenko and Mooney, 2003, Cohn et al., 2003, Klein et al., 2002, =-=Xing et al., 2003-=-]. This chapter describes an approach to semi-supervised clustering based on Hidden Markov Random Fields (HMRFs) that combines the constraint-based and distance-based approaches in a unified probabili... |

785 | Graphical models, exponential families, and variational
- Wainwright, Jordan
- 2008
(Show Context)
Citation Context ...hetasin (3.9). Estimation of the partition function cannot be performed in closed form for most non-trivial dependency structures, and approximate inference methods must be employed for computing it [=-=Wainwright and Jordan, 2003-=-]. Estimation of the distortion normalizer log Z\Thetasdepends on the distortion measurenormalizer approximation dA used by the model. This chapter considers three parameterized distortion measures: p... |

649 | Divergence measures based on the Shannon entropy
- Lin
- 1991
(Show Context)
Citation Context ...axIMA - dIMA(xi, xj)c' - log P(A) (3.21) The upper bound dmaxIMA can be initialized as dmaxIMA = Pdm=1 am, which follows from the fact that unweighted Jensen-Shannon divergence is bounded above by 1 [=-=Lin, 1991-=-]. Note that as discussed in Section 3.3.1, it is difficult to compute the log Z\Thetasterm in closed-form for parameterized KL distance. So, analogously to the parameterized cosine distance case, the... |

625 | Distributional clustering of english words
- Pereira, Tishby, et al.
- 1993
(Show Context)
Citation Context ...ized KL-Divergence In certain domains, data is described by probability distributions, e.g. text documents can be represented as probability distributions over words generated by a multinomial model [=-=Pereira et al., 1993-=-]. KL-divergence is a widely used distance measure for such data: dKL(xi, xj) = Pdm=1 xim log ximxjm , where xi and xj are probability distributions over d events: Pdm=1 xim = Pdm=1 xjm = 1. In previo... |

473 | Constrained K-means clustering with background knowledge
- Wagstaff, Cardie, et al.
- 2001
(Show Context)
Citation Context ...vision. For example, complete class labels may be unknown in the context of clustering for speaker identification in a conversation [Bar-Hillel et al., 2003], or clustering GPS data for lane-finding [=-=Wagstaff et al., 2001-=-]. In some domains, pairwise constraints occur naturally, e.g., the Database of Interacting Proteins (DIP) data set in biology contains information about proteins co-occurring in processes, which can ... |

376 | Segmentation using eigenvectors: a unifying view
- Weiss
- 1999
(Show Context)
Citation Context ...ated within the HMRF framework. Spectral clustering methods--algorithms that perform clustering by decomposing the pairwise affinity matrix derived from data--have been increasingly popular recently [=-=Weiss, 1999-=-, Ng et al., 2002], and several semi-supervised approaches have been developed within the spectral clustering framework. Kamvar et al. [2003] have proposed directly injecting the constraints into the ... |

281 | On the hardness of approximate reasoning
- Roth
- 1996
(Show Context)
Citation Context ...ives to find the global minimum of the objective function, given the cluster centroids, is NP-hard in any non-trivial HMRF model, similarly to other graphical models such as MRFs and belief networks [=-=Roth, 1996-=-]. There exist several techniques for computing cluster assignments that approximate the optimal solution in this framework, e.g., iterated conditional modesgreedy ICM assignment (ICM) [Besag, 1986, Z... |

247 |
A best possible heuristic for the k-center problem
- Hochbaum, Shmoys
- 1985
(Show Context)
Citation Context ...he neighborhoods, and the remaining K -y" clusters are initialized with points obtained by random perturbations of the global centroid of X. If y" > K, a weighted variant of farthest-first traversal [=-=Hochbaum and Shmoys, 1985-=-] is applied to the centroids of the y" neighborhoods, where the weight of each centroid is proportional to the size of the corresponding neighborhood. Weighted farthest-first traversal selects neighb... |

202 | Clustering with instance-level constraints
- Wagstaff, Cardie
- 2000
(Show Context)
Citation Context ...erings in which constraints are not satisfied. COP-KMeans is one such method where constraint violations are explicitly avoided in the assignment step of the K-Means algorithm [Wagstaff et al., 2001, =-=Wagstaff, 2002-=-]. Another method proposed by Demiriz et al. [1999] utilizes genetic algorithms to optimize an objective function that combines cluster compactness and cluster purity and that decreases with constrain... |

200 | Impact of similarity measures on webpage clustering
- Strehl, Ghosh, et al.
- 2000
(Show Context)
Citation Context ...lity of the clustering with respect to a given underlying class labeling of the data: it measures how closely the clustering algorithm could reconstruct the underlying label distribution in the data [=-=Strehl et al., 2000-=-]. If ^Y is the random variable denoting the cluster assignments of the points and Y is the random variable denoting the underlying class labels on the points, then the NMI measure is defined as: N M ... |

198 | From instancelevel constraints to space-level constraints: Making the most of prior knowledge in data clustering
- Klein, Kamvar, et al.
(Show Context)
Citation Context ...e distance function is parameterized and the parameter values are learned to bring mustlinked points together and take cannot-linked points further apart [Bilenko and Mooney, 2003, Cohn et al., 2003, =-=Klein et al., 2002-=-, Xing et al., 2003]. This chapter describes an approach to semi-supervised clustering based on Hidden Markov Random Fields (HMRFs) that combines the constraint-based and distance-based approaches in ... |

194 | Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields
- Kleinberg, Tardos
- 1999
(Show Context)
Citation Context ...this framework, e.g., iterated conditional modesgreedy ICM assignment (ICM) [Besag, 1986, Zhang et al., 2001], belief propagation [Pearl, 1988, Segal et al., 2003], and linear programming relaxation [=-=Kleinberg and Tardos, 1999-=-]. ICM is a greedy strategy that sequentially updates the cluster assignment of each point, keeping the assignments for the other points fixed. In many settings it has comparable performance to more e... |

169 |
Markov fields on finite graphs and lattices. Unpublished Manuscript
- Hammersley, Clifford
- 1971
(Show Context)
Citation Context ...en the model parameters and the set of constraints, depends only on the cluster labels of the observed variables that are must-linked or cannot-linked to xi. Then, by the Hammersley-Clifford theorem [=-=Hammersley and Clifford, 1971-=-], the prior probability of a particular label configuration Y can be expressed as a Gibbs distribution [Geman and Geman, 1984], so that P(Y |\Theta , C) = 1Z exp (-v(Y )) = 1Z exp A~- X Ni2N vNi(Y )!... |

152 | Discovering molecular pathways from protein interaction and gene expression data
- Segal, Wang, et al.
- 2003
(Show Context)
Citation Context ...er assignments that approximate the optimal solution in this framework, e.g., iterated conditional modesgreedy ICM assignment (ICM) [Besag, 1986, Zhang et al., 2001], belief propagation [Pearl, 1988, =-=Segal et al., 2003-=-], and linear programming relaxation [Kleinberg and Tardos, 1999]. ICM is a greedy strategy that sequentially updates the cluster assignment of each point, keeping the assignments for the other points... |

138 | Random projection for high dimensional data clustering: A cluster ensemble approach
- Fern, Brodley
- 2003
(Show Context)
Citation Context ...gnments of the data points. Though various clustering evaluation measures have been used in the literature, NMI and it's variants have become popular lately among clustering practitioners [Dom, 2001, =-=Fern and Brodley, 2003-=-, Meila, 2003]. 3.5.3 Methodology Learning curves were generated using two-fold cross-validation performed over 20 runs on each dataset. In every trial, 50% of the dataset was set aside as the trainin... |

125 |
Comparing Clusterings by the Variation of Information
- Meilă
(Show Context)
Citation Context ...ts. Though various clustering evaluation measures have been used in the literature, NMI and it's variants have become popular lately among clustering practitioners [Dom, 2001, Fern and Brodley, 2003, =-=Meila, 2003-=-]. 3.5.3 Methodology Learning curves were generated using two-fold cross-validation performed over 20 runs on each dataset. In every trial, 50% of the dataset was set aside as the training fold. Every... |

104 | Agreement-based learning - Liang, Klein, et al. - 2008 |

103 | Computing Gaussian Mixture Models with EM Using Equivalence Constraints. NIPS2003. p. 112. Applied Mechanics and Materials Vols - Shental, Bar-Hillel, et al. |

99 | An information-theoretic analysis of hard and soft assignment methods for clustering,” in Learning in graphical models
- Kearns, Mansour, et al.
- 1998
(Show Context)
Citation Context ...square of the L2 distance parameterized by a positive semidefinite weight matrix A (dA(xi, uyi) =k xi - uyik2A), then the cluster conditional probability is a Gaussian with covariance encoded by A-1 [=-=Kearns et al., 1997-=-]; If xi and uyi are probability distributions and dA is the KL-divergence (dA(xi, uyi) =P d m=1 xim log xim uyim ), then the cluster conditional probability is a multinomial dis-tribution [Dhillon an... |

72 | An information-theoretic external cluster-validity measure
- Dom
(Show Context)
Citation Context ... class assignments of the data points. Though various clustering evaluation measures have been used in the literature, NMI and it's variants have become popular lately among clustering practitioners [=-=Dom, 2001-=-, Fern and Brodley, 2003, Meila, 2003]. 3.5.3 Methodology Learning curves were generated using two-fold cross-validation performed over 20 runs on each dataset. In every trial, 50% of the dataset was ... |

34 | Information theoretic clustering of sparse co-occurrence data
- Dhillon, Guan
- 2003
(Show Context)
Citation Context ... al., 1997]; If xi and uyi are probability distributions and dA is the KL-divergence (dA(xi, uyi) =P d m=1 xim log xim uyim ), then the cluster conditional probability is a multinomial dis-tribution [=-=Dhillon and Guan, 2003-=-]. The relation in (3.8) holds even if dA is not a Bregman divergence but a directional distance measure like cosine distance. For example, if xi and uyi are vectors of unit length and dA is one minus... |

28 | Learning with constrained and unlabelled data - Lange - 2005 |

4 | Learning to recognize patterns wothout a teacher - Fralick - 1967 |

1 | Employing EM and pool-based active learning for text classification - Wiley, Sons |