## Definition

### BibTeX

@MISC{Zhu_definition,

author = {Xiaojin Zhu},

title = {Definition},

year = {}

}

### OpenURL

### Abstract

Synonyms: Learning from labeled and unlabeled data, transductive learning

### Citations

8952 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...Vector Machines This semi-supervised learning method assumes that the decision boundary f(x) = 0 is situated in a low-density region (in terms of unlabeled data) 3between the two classes y ∈ {−1, 1} =-=[12, 8]-=-. Consider the following hat loss function on an unlabeled instance x: max(1 − |f(x)|, 0) which is positive when −1 < f(x) < 1, and zero outside. The hat loss thus measures the violation in (unlabeled... |

1241 | Combining labeled and unlabeled data with co-training
- Blum, Mitchell
- 1998
(Show Context)
Citation Context ...emi-supervised learning method assumes that there are multiple, different learners trained on the same labeled data, and these learners agree on the unlabeled data. A classic algorithm is co-training =-=[5]-=-. Take the example of web page classification, where each web page x is represented by two subsets of features, or “views” x = 〈x (1) , x (2) 〉. For instance, x (1) can represent the words on the page... |

679 | Transductive inference for text classification using support vector machines
- Joachims
- 1999
(Show Context)
Citation Context ...Vector Machines This semi-supervised learning method assumes that the decision boundary f(x) = 0 is situated in a low-density region (in terms of unlabeled data) 3between the two classes y ∈ {−1, 1} =-=[12, 8]-=-. Consider the following hat loss function on an unlabeled instance x: max(1 − |f(x)|, 0) which is positive when −1 < f(x) < 1, and zero outside. The hat loss thus measures the violation in (unlabeled... |

490 | Unsupervised word sense disambiguation rivaling supervised methods
- Yarowsky
- 1995
(Show Context)
Citation Context ...it is applicable to essentially any problems where supervised learning can be applied. For example, semi-supervised learning has been applied to natural language processing (word sense disambiguation =-=[13]-=-, document categorization, named entity classification, sentiment analysis, machine translation), computer vision (object recognition, image segmentation), bioinformatics (protein function prediction)... |

488 | Semi-supervised learning using gaussian fields and harmonic functions
- Zhu, Ghahramani, et al.
- 2003
(Show Context)
Citation Context ...d learning method assumes that there is a graph G = {V, E} such that the vertices V are the labeled and unlabeled training instances, and the undirected edges E connect instances i, j with weight wij =-=[4, 14, 3]-=-. The graph is sometimes assumed to be a random instantiation of an underlying manifold structure that supports p(x). Typically, wij reflects the proximity of xi, xj. For example, the Gaussian edge we... |

328 | Manifold regularization: A geometric framework for learning from examples
- Belkin, Niyogi, et al.
- 2006
(Show Context)
Citation Context ...d learning method assumes that there is a graph G = {V, E} such that the vertices V are the labeled and unlabeled training instances, and the undirected edges E connect instances i, j with weight wij =-=[4, 14, 3]-=-. The graph is sometimes assumed to be a random instantiation of an underlying manifold structure that supports p(x). Typically, wij reflects the proximity of xi, xj. For example, the Gaussian edge we... |

279 |
Semi-Supervised Learning
- Chapelle, Schölkopf, et al.
- 2006
(Show Context)
Citation Context ...training and transductive support vector machines, new applications in natural language processing and computer vision, and new theoretical analyses. More discussions can be found in section 1.1.3 in =-=[7]-=-. Theory It is obvious that unlabeled data {xi} l+u i=l+1 by itself does not carry any information on the mapping X ↦→ Y. How can it help us learn a better predictor f : X ↦→ Y? Balcan and Blum pointe... |

268 | Learning from labeled and unlabeled data using graph mincuts
- Blum, Chawla
(Show Context)
Citation Context ...d learning method assumes that there is a graph G = {V, E} such that the vertices V are the labeled and unlabeled training instances, and the undirected edges E connect instances i, j with weight wij =-=[4, 14, 3]-=-. The graph is sometimes assumed to be a random instantiation of an underlying manifold structure that supports p(x). Typically, wij reflects the proximity of xi, xj. For example, the Gaussian edge we... |

165 | Learning with labeled and unlabeled data
- Seeger
- 2002
(Show Context)
Citation Context ...mically find a predictor near the top of this implicit ordering and fits the labeled data well. Many semi-supervised learning methods have been proposed, with different answers to these two questions =-=[15, 7, 1, 10]-=-. It is impossible to enumerate all methods in this entry. Instead, we present a few representative methods. Generative Models This semi-supervised learning method assumes the form of joint probabilit... |

74 | Introduction to Semi-supervised Learning
- Zhu, Andrew, et al.
- 2009
(Show Context)
Citation Context ...mically find a predictor near the top of this implicit ordering and fits the labeled data well. Many semi-supervised learning methods have been proposed, with different answers to these two questions =-=[15, 7, 1, 10]-=-. It is impossible to enumerate all methods in this entry. Instead, we present a few representative methods. Generative Models This semi-supervised learning method assumes the form of joint probabilit... |

73 |
On the exponential value of labeled samples
- Castelli, Cover
- 1994
(Show Context)
Citation Context ... y | θ) = p(y | θ)p(x | y, θ). For example, the class prior distribution p(y | θ) can be a multinomial over Y, while the class conditional distribution p(x | y, θ) can be a multivariate Gaussian in X =-=[6, 9]-=-. We use θ ∈ Θ to denote the parameters of the joint probability. Each θ corresponds to a predictor fθ via Bayes rule: p(x, y | θ) fθ(x) ≡ argmax p(y | x, θ) = argmax ∑ y y y ′ p(x, y′ | θ) . Therefor... |

57 | A co-regularization approach to semi-supervised learning with multiple views
- Sindhwani, Niyogi, et al.
- 2005
(Show Context)
Citation Context ...t the same on x. This repeats so that each view teaches the other. Multiview models generalize co-training by utilizing more than two predictors, and relaxing the requirement of having separate views =-=[11]-=-. In either case, the final prediction is obtained from a (confidence weighted) average or vote among the predictors. To define the implicit ordering on the hypothesis space, we need a slight extensio... |

40 |
Semisupervised learning for computational linguistics
- Abney
- 2008
(Show Context)
Citation Context ...mically find a predictor near the top of this implicit ordering and fits the labeled data well. Many semi-supervised learning methods have been proposed, with different answers to these two questions =-=[15, 7, 1, 10]-=-. It is impossible to enumerate all methods in this entry. Instead, we present a few representative methods. Generative Models This semi-supervised learning method assumes the form of joint probabilit... |

9 | A discriminative model for semi-supervised learning
- Balcan, Blum
(Show Context)
Citation Context ...It is obvious that unlabeled data {xi} l+u i=l+1 by itself does not carry any information on the mapping X ↦→ Y. How can it help us learn a better predictor f : X ↦→ Y? Balcan and Blum pointed out in =-=[2]-=- that the key lies in an implicit ordering of f ∈ F induced by the unlabeled data. Informally, if the implicit ordering happens to rank the target predictor f ∗ near the top, then one needs less label... |