## A Constrained Semi-Supervised Learning Approach to Data Association (2004)

### Cached

### Download Links

- [www.cs.ubc.ca]
- [www.cs.ubc.ca]
- [www.cs.ubc.ca]
- [www.cs.ubc.ca]
- [www.cs.ubc.ca]
- [www.cs.ubc.ca]
- [www.cs.ubc.ca]
- DBLP

### Other Repositories/Bibliography

Venue: | In European Conference for Computer Vision (ECCV |

Citations: | 11 - 3 self |

### BibTeX

@INPROCEEDINGS{Kück04aconstrained,

author = {Hendrik Kück and Peter Carbonetto and Nando de Freitas and O De Freitas},

title = {A Constrained Semi-Supervised Learning Approach to Data Association},

booktitle = {In European Conference for Computer Vision (ECCV},

year = {2004},

pages = {1--12},

publisher = {Springer}

}

### OpenURL

### Abstract

Data association (obtaining correspondences) is a ubiquitous problem in computer vision. It appears when matching image features across multiple images, matching image features to object recognition models and matching image features to semantic concepts. In this paper, we show how a wide class of data association tasks arising in computer vision can be interpreted as a constrained semi-supervised learning problem. This interpretation opens up room for the development of new, more efficient data association methods. In particular, it leads to the formulation of a new principled probabilistic model for constrained semi-supervised learning that accounts for uncertainty in the parameters and missing data. By adopting an ingenious data augmentation strategy, it becomes possible to develop an efficient MCMC algorithm where the high-dimensional variables in the model can be sampled efficiently and directly from their posterior distributions. We demonstrate the new model and algorithm on synthetic data and the complex problem of matching image features to words in the image captions.

### Citations

2590 | Normalized cuts and image segmentation
- Shi, Malik
- 1997
(Show Context)
Citation Context ..., we used a set of 300 annotated images from the Corel database. The images in this set were annotated with in total 38 different words and each image was segmented into regions using normalised cuts =-=[19]-=-. Each of the regions is described by a 6-dimensional feature vector ( CIE-Lab colour, y position in the image, boundary to area ratio and standard deviation of brightness ). The data set was split in... |

1039 | Bayesian Theory - Bernardo, Smith - 1994 |

871 | Object class recognition by unsupervised scale-invariant learning
- Fergus, Perona, et al.
- 2003
(Show Context)
Citation Context ...troduction Data association is an ubiquitous problem in computer vision. It manifests itself when matching images (eg stereo and motion data [1]), matching image features to object recognition models =-=[2]-=- and matching image features to language descriptions [3]. The data association task is commonly mapped to an unsupervised probabilistic mixture model [4, 1, 5]. The parameters of this model are typic... |

553 | Sparse bayesian learning and the relevance vector machine. Journal of machine learning research
- Tipping
- 2001
(Show Context)
Citation Context ...view [11]. From an object recognition perspective, we would like to adopt multicategorical classifiers. Here, we opt for a simple solution by combining the responses of the various binary classifiers =-=[15]-=-. In more precise terms, given the training data D (a collection of images with captions) the goal is then to learn the predictive distribution p (y = 1| x), where y is a binary indicator variable tha... |

490 | Semisupervised learning using gaussian fields and harmonic functions
- Zhu, Ghahramani, et al.
- 2003
(Show Context)
Citation Context ... The problem of semi-supervised learning has received great attention in the recent machine learning literature. In particular, very efficient kernel methods have been proposed to attack this problem =-=[12, 13]-=-. Our approach, still based on kernel expansions, favours sparse solutions. Moreover, it does not require supervised samples from each category and, in addition, it is probabilistic. The most importan... |

443 | Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary
- Duygulu, Barnard, et al.
- 2002
(Show Context)
Citation Context ...omputer vision. It manifests itself when matching images (eg stereo and motion data [1]), matching image features to object recognition models [2] and matching image features to language descriptions =-=[3]-=-. The data association task is commonly mapped to an unsupervised probabilistic mixture model [4, 1, 5]. The parameters of this model are typically learned with the EM algorithm or approximate variant... |

331 | Modeling annotated data
- Blei, Jordan
- 2003
(Show Context)
Citation Context ...g image features to object recognition models [2] and matching image features to language descriptions [3]. The data association task is commonly mapped to an unsupervised probabilistic mixture model =-=[4, 1, 5]-=-. The parameters of this model are typically learned with the EM algorithm or approximate variants. This approach is fraught with difficulties. EM often gets stuck in local minima and is highly depend... |

246 |
A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration
- McFadden
- 1989
(Show Context)
Citation Context ... construct efficient MCMC algorithms in this new setting. Efficiency here is a result of using a data augmentation method, first introduced in econometrics by economics Nobel laureate Daniel McFadden =-=[8]-=-, which enables us to compute the distribution of the high-dimensional variables analytically. That is, instead of sampling in high-dimensions with a Markov chain, we sample directly from the posterio... |

187 |
Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes
- Liu, Wong, et al.
- 1994
(Show Context)
Citation Context ...rior distribution of the high-dimensional variables. This, so called Rao-Blackwellised, sampler achieves an important decrease in variance as predicted by well known theorems from Markov chain theory =-=[9]-=-. Our approach is similar in spirit to the multiple instance learning paradigm of Dietterich et al [10]. This approach is expanded in [11] where the authors adopt support vector machines to deal with ... |

186 | Support vector machines for multiple-instance learning
- Andrews, Tsochantaridis, et al.
- 2003
(Show Context)
Citation Context ...e as predicted by well known theorems from Markov chain theory [9]. Our approach is similar in spirit to the multiple instance learning paradigm of Dietterich et al [10]. This approach is expanded in =-=[11]-=- where the authors adopt support vector machines to deal with the supervised part of the model and integer programming constraints to handle the missing labels. This optimisation approach suffers from... |

130 | Efficient Simulation from the Multivariate Normal and Student-t Distributions Subject to Linear Constraints
- Geweke
- 1991
(Show Context)
Citation Context ... in equation (10), while the z k are sampled from the truncated distributions given by equation (13). To sample from the truncated Gaussian distributions, we use the specialised routines described in =-=[18]-=-. These routines based on results from large deviation theory are essential in order to achieve good acceptance rates. We found in our experiments that the acceptance rate was satisfactory (70% to 80%... |

111 | Computational and inferential difficulties with mixture posterior distributions
- Celeux, Hurn, et al.
- 2000
(Show Context)
Citation Context ...ulties. EM often gets stuck in local minima and is highly dependent on the initial values of the parameters. Markov chain Monte Carlo (MCMC) methods also perform poorly in this mixture model scenario =-=[6]-=-. The reason for this failure is that the number of modes in the posterior distribution of the parameters is factorial in the number of mixture components [7]. Maximisation in such a highly peaked spa... |

57 |
Problems of Learning on Manifolds
- Belkin
- 2003
(Show Context)
Citation Context ... The problem of semi-supervised learning has received great attention in the recent machine learning literature. In particular, very efficient kernel methods have been proposed to attack this problem =-=[12, 13]-=-. Our approach, still based on kernel expansions, favours sparse solutions. Moreover, it does not require supervised samples from each category and, in addition, it is probabilistic. The most importan... |

53 |
Bayesian methods for mixtures of normal distributions
- STEPHENS
- 1997
(Show Context)
Citation Context ...orm poorly in this mixture model scenario [6]. The reason for this failure is that the number of modes in the posterior distribution of the parameters is factorial in the number of mixture components =-=[7]-=-. Maximisation in such a highly peaked space is a formidable task and likely to fail in high dimensions. This is unfortunate as it is becoming clear that effective learning techniques for computer vis... |

25 |
Chain Flipping for Structure from Motion with Unknown Correspondence
- Dellaert, Seitz, et al.
- 2003
(Show Context)
Citation Context ...ching image features to words in the image captions.s2 1 Introduction Data association is an ubiquitous problem in computer vision. It manifests itself when matching images (eg stereo and motion data =-=[1]-=-), matching image features to object recognition models [2] and matching image features to language descriptions [3]. The data association task is commonly mapped to an unsupervised probabilistic mixt... |

22 |
Bayesian feature weighting for unsupervised learning with application to object recognition
- Carbonetto, Freitas, et al.
- 2003
(Show Context)
Citation Context ...suming. This data association problem can be formulated as a mixture model similar to the ones used in statistical machine translation. This is the approach originally proposed in [3] and extended in =-=[14]-=- to handle continuous image features. The parameters in both cases were learned with EM. The problem with this approach is that the posterior over parameters of the mixture model has a factorial numbe... |

12 |
Sparse Bayesian learning for regression and classification using Markov chain Monte Carlo
- THAM, DOUCET, et al.
- 2002
(Show Context)
Citation Context ...he logistic link function ϕ (u) = (1 + exp (−u)) −1 . However, from a Bayesian computational point of view, the probit link has many advantages and is equally valid. Following Tam, Doucet and Kot=-=agiri[16], the un-=-known function is represented with a sparse kernel machine with kernels centered at the data points x1:N: f (x, β, γ) = β0 + N� γiβiK (x, xi) . (2) Here β is a N-dimensional parameter vector a... |

7 |
Solving the multiple instance learning with axis-parallel rectangles
- Dietterich, Lathrop, et al.
- 1997
(Show Context)
Citation Context ...es an important decrease in variance as predicted by well known theorems from Markov chain theory [9]. Our approach is similar in spirit to the multiple instance learning paradigm of Dietterich et al =-=[10]-=-. This approach is expanded in [11] where the authors adopt support vector machines to deal with the supervised part of the model and integer programming constraints to handle the missing labels. This... |

6 |
A maximum likelihood approach to data association
- Avitzour
- 1992
(Show Context)
Citation Context ... we instead choose to put a conjugate Beta-prior on τ which allows the user to exert as much control as desired over the percentage of active kernels p (τ) = Γ (a + b) Γ (a) Γ (b) τ a−1 (1 −=-= τ) b−1 . (4) For the choice a-=- = b = 1.0, we get the uninformative uniform distribution. We obtain the prior on the binary vector γ by integrating over τ � p (γ) = p (γ| τ) p (τ) dτ = Γ (Σγ + a) Γ (N − Σγ + b) , (... |