## Manifold regularization: A geometric framework for learning from labeled and unlabeled examples (2006)

### Cached

### Download Links

Venue: | JOURNAL OF MACHINE LEARNING RESEARCH |

Citations: | 337 - 13 self |

### BibTeX

@TECHREPORT{Belkin06manifoldregularization:,

author = {Mikhail Belkin and Partha Niyogi and Vikas Sindhwani},

title = {Manifold regularization: A geometric framework for learning from labeled and unlabeled examples},

institution = {JOURNAL OF MACHINE LEARNING RESEARCH},

year = {2006}

}

### Years of Citing Articles

### OpenURL

### Abstract

We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semi-supervised framework that incorporates labeled and unlabeled data in a general-purpose learner. Some transductive graph learning algorithms and standard methods including Support Vector Machines and Regularized Least Squares can be obtained as special cases. We utilize properties of Reproducing Kernel Hilbert spaces to prove new Representer theorems that provide theoretical basis for the algorithms. As a result (in contrast to purely graph-based approaches) we obtain a natural out-of-sample extension to novel examples and so are able to handle both transductive and truly semi-supervised settings. We present experimental evidence suggesting that our semi-supervised algorithms are able to use unlabeled data effectively. Finally we have a brief discussion of unsupervised and fully supervised learning within our general framework.

### Citations

9048 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...TTI Chicago 1from labeled and unlabeled data (semi-supervised and transductive learning) has attracted considerable attention in recent years. Some recently proposed methods include Transductive SVM =-=[35, 22]-=-, Cotraining [13], and a variety of graph based methods [12, 14, 32, 37, 38, 24, 23, 4]. We also note two regularization based techniques [16, 7]. The latter reference is closest in spirit to the intu... |

1699 |
A global geometric framework for nonlinear dimensionality reduction
- Tenenbaum, Silva, et al.
(Show Context)
Citation Context ...ust attempt to get empirical estimates @ of 9…9}q and . Note that in order to get such empirical estimates it is sufficient to have unlabeled examples. A case of particular recent interest (e.g., see =-=[27, 33, 5, 19]-=- for a discussion on dimensionality reduction) is when the support of is a compact submanifold †j‡?ˆ„‰ . In that case, one natural choice for 9 z The optimization problem becomes 9Eq is Š tŒ‹Ž t H t H... |

1630 | Nonlinear dimensionality reduction by locally linear embedding
- Roweis, Saul
(Show Context)
Citation Context ...ust attempt to get empirical estimates @ of 9…9}q and . Note that in order to get such empirical estimates it is sufficient to have unlabeled examples. A case of particular recent interest (e.g., see =-=[27, 33, 5, 19]-=- for a discussion on dimensionality reduction) is when the support of is a compact submanifold †j‡?ˆ„‰ . In that case, one natural choice for 9 z The optimization problem becomes 9Eq is Š tŒ‹Ž t H t H... |

1507 |
Sobolev Spaces
- Adams
- 1975
(Show Context)
Citation Context ...olution to Eqn. 3 H exists and by Lemma 3.3 belongs ï to . It follow easily from Cor. 3.7 and standard results about compact embeddings :QZ "(\ àw H „? ]:^ <:_ H <:Ž (`cbdo 9 of Sobolev spaces (e.g., =-=[1]-=-) that a þ ÿÖ†J4 5 ball þ„ÿ¸?í , H DÈ¶ƒD!9 0ïê is compact ¯ Õ in . Therefore for any such ball the minimizer in that ball H exist and belong þgÿ to . On the other hand, by substituting the zero functi... |

1284 |
Spline models for observational data
- Wahba
- 1990
(Show Context)
Citation Context ...r. The idea of regularization has a rich mathematical history going back to [34], where it is used for solving ill-posed inverse problems. Regularization is a key idea in the theory of splines (e.g., =-=[36]-=-) and is widely used in machine learning (e.g., [20]). Many machine learning algorithms, including Support Vector Machines, can be interpreted as instances of regularization. Our framework exploits th... |

1252 | Combining labeled and unlabeled data with cotraining
- Blum, Mitchell
- 1998
(Show Context)
Citation Context ...beled and unlabeled data (semi-supervised and transductive learning) has attracted considerable attention in recent years. Some recently proposed methods include Transductive SVM [35, 22], Cotraining =-=[13]-=-, and a variety of graph based methods [12, 14, 32, 37, 38, 24, 23, 4]. We also note two regularization based techniques [16, 7]. The latter reference is closest in spirit to the intuitions of our pap... |

803 | Text classification from labeled and unlabeled documents using EM
- Nigam, McCallum, et al.
(Show Context)
Citation Context ...and the training set. Note that this setting may not be applicable in several cases of practical interest where one does not have access to multiple information sources. Bayesian Techniques See e.g., =-=[25, 28, 16]-=-. An early application of semisupervised learning to Text classification appeared in [25] where a combination of EM algorithm and Naive-Bayes classification is proposed to incorporate unlabeled data. ... |

786 |
Theory of reproducing kernels
- Aronszajn
- 1950
(Show Context)
Citation Context ...resenter theorems from the previous section. 3.1 General Theory of RKHS 4 31 ¿%6' We start by recalling some basic properties of Reproducing Kernel Hilbert Spaces H ! 0À Á%ÂmÀ4 (see the original work =-=[2]-=- and also [17] for a nice discussion in the context of learning theory) and their connections to integral operators. We say that a Hilbert space of functions has the reproducing property, if the evalu... |

741 | Laplacian eigenmaps for dimensionality reduction and data representation
- Belkin, Niyogi
- 2003
(Show Context)
Citation Context ...ust attempt to get empirical estimates @ of 9…9}q and . Note that in order to get such empirical estimates it is sufficient to have unlabeled examples. A case of particular recent interest (e.g., see =-=[27, 33, 5, 19]-=- for a discussion on dimensionality reduction) is when the support of is a compact submanifold †j‡?ˆ„‰ . In that case, one natural choice for 9 z The optimization problem becomes 9Eq is Š tŒ‹Ž t H t H... |

685 | Transductive inference for text classification using support vector machines
- Joachims
- 1999
(Show Context)
Citation Context ...TTI Chicago 1from labeled and unlabeled data (semi-supervised and transductive learning) has attracted considerable attention in recent years. Some recently proposed methods include Transductive SVM =-=[35, 22]-=-, Cotraining [13], and a variety of graph based methods [12, 14, 32, 37, 38, 24, 23, 4]. We also note two regularization based techniques [16, 7]. The latter reference is closest in spirit to the intu... |

497 | Semi-supervised learning using gaussian fields and harmonic functions
- Zhu, Ghahramani, et al.
(Show Context)
Citation Context ...nd transductive learning) has attracted considerable attention in recent years. Some recently proposed methods include Transductive SVM [35, 22], Cotraining [13], and a variety of graph based methods =-=[12, 14, 32, 37, 38, 24, 23, 4]-=-. We also note two regularization based techniques [16, 7]. The latter reference is closest in spirit to the intuitions of our paper. The idea of regularization has a rich mathematical history going b... |

267 | Learning from labeled and unlabeled data using graph mincuts
- Blum, Chawla
- 2001
(Show Context)
Citation Context ...nd transductive learning) has attracted considerable attention in recent years. Some recently proposed methods include Transductive SVM [35, 22], Cotraining [13], and a variety of graph based methods =-=[12, 14, 32, 37, 38, 24, 23, 4]-=-. We also note two regularization based techniques [16, 7]. The latter reference is closest in spirit to the intuitions of our paper. The idea of regularization has a rich mathematical history going b... |

189 | Transductive learning via spectral graph partitioning - JOACHIMS - 2003 |

183 | support data for a given task
- Schölkopf, Burges, et al.
- 1995
(Show Context)
Citation Context ...1 −1 −1.5 −1 0 1 2 −1.5 −1 0 1 2 −1.5 −1 0 1 2 set. The remaining images formed the test set. 2 images for each class were randomly labeled (l= ) and the rest were left (u=|~}~ unlabeled ). Following =-=[30]-=-, we chose to train classifiers with polynomial kernels of degree 3, and set the weight on the regularization term for inductive methods b!FÎ?ßÌpD Ì]{ ?2AEÌy as . For manifold regularization, we chose... |

167 | Learning with labeled and unlabeled data
- Seeger
- 2001
(Show Context)
Citation Context ...and the training set. Note that this setting may not be applicable in several cases of practical interest where one does not have access to multiple information sources. Bayesian Techniques See e.g., =-=[25, 28, 16]-=-. An early application of semisupervised learning to Text classification appeared in [25] where a combination of EM algorithm and Naive-Bayes classification is proposed to incorporate unlabeled data. ... |

99 | Hessian Eigenmaps: New Locally Linear Embedding Techniques for High-Dimensional Data
- Donoho, Grimes
- 2003
(Show Context)
Citation Context |

88 | Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning
- Rifkin
- 2002
(Show Context)
Citation Context ...oblem). The parameter appears as the upper bound (instead of "X ) on the values P of in the quadratic program. For additional details on the derivation and alternative formulations of SVMs, see [31], =-=[26]-=-. nb 19h O O ì : A F Y X A F e A F Y Y X X Y X O Y A F X O A h Y h h H ` O 9 bdq “m`wFf $ O bdq $ žEŸ Wž “m`wFf bdo iF]`‘“a $ , , h O h h Y X O : 4.4 Laplacian Support Vector Machines By including th... |

78 |
Out-of-sample extensions for
- Bengio, Paiement, et al.
(Show Context)
Citation Context ...so obtain a natural out-of-sample extension for clustering points not in the original data set. Figures 9,10 show results of this method on two two-dimensional clustering problems. Unlike recent work =-=[9, 11]-=- on out-of-sample extensions, our method is based on a Representer theorem for RKHS. Remark 2: By taking multiple eigenvectors of the system in Eqn. 28 we obtain a natural regularized out-of-sample ex... |

76 |
Using manifold structure for partially labeled classification
- Belkin, Niyogi
(Show Context)
Citation Context ...nd transductive learning) has attracted considerable attention in recent years. Some recently proposed methods include Transductive SVM [35, 22], Cotraining [13], and a variety of graph based methods =-=[12, 14, 32, 37, 38, 24, 23, 4]-=-. We also note two regularization based techniques [16, 7]. The latter reference is closest in spirit to the intuitions of our paper. The idea of regularization has a rich mathematical history going b... |

66 |
Regularization of incorrectly posed problems
- Tikhonov
- 1963
(Show Context)
Citation Context ...so note two regularization based techniques [16, 7]. The latter reference is closest in spirit to the intuitions of our paper. The idea of regularization has a rich mathematical history going back to =-=[34]-=-, where it is used for solving ill-posed inverse problems. Regularization is a key idea in the theory of splines (e.g., [36]) and is widely used in machine learning (e.g., [20]). Many machine learning... |

57 |
Problems of Learning on Manifolds
- Belkin
- 2003
(Show Context)
Citation Context ...e Empirical Case In the case z when is unknown and sampled via labeled and unlabeled examples, the Laplace-Beltrami operator z on may be approximated by the Laplacian of the data adjacency graph (see =-=[3, 7]-=- for some discussion). A regularizer based on the graph Laplacian leads to the optimization problem posed in Eqn. 5. We now provide a proof of Theorem 2.2 which states that the solution to this proble... |

50 | Semi-supervised support vector machines for unlabeled data classification
- Fung, Mangasarian
- 2001
(Show Context)
Citation Context ...ery large number of label switches before converging. Note that even though TSVM were inspired by transductive inference, they do provide an out-of-sample extension. Semi-Supervised SVMs (S VM) [10], =-=[21]-=- : S VM incorporate unlabeled data by including the minimum hinge-loss for the two choices of labels for each unlabeled example. This is formulated as a mixed-integer program for linear SVMs in [10] a... |

8 |
Semi-Supervised Support Vector
- Bennett, Demiriz
- 1998
(Show Context)
Citation Context ...ibly very large number of label switches before converging. Note that even though TSVM were inspired by transductive inference, they do provide an out-of-sample extension. Semi-Supervised SVMs (S VM) =-=[10]-=-, [21] : S VM incorporate unlabeled data by including the minimum hinge-loss for the two choices of labels for each unlabeled example. This is formulated as a mixed-integer program for linear SVMs in ... |

8 |
Nonlinear dimensionality reduction by kernel eigenmaps
- Brand
- 2003
(Show Context)
Citation Context ...ast few years. Such graph based approaches work in a transductive setting and do not naturally extend to the semi-supervised case where novel test examples need to be classified (predicted). Also see =-=[8, 11]-=- for some recent related work on out-of-sample extensions. 1.1 The Significance of Semi-Supervised Learning From an engineering standpoint, it is clear that collecting labeled data is generally more i... |

7 |
Tommi Jaakkola, Partially labeled classification with Markov random walks, NIPS
- Szummer
- 2001
(Show Context)
Citation Context |

6 |
On the mathematical foundations
- Cucker, Smale
(Show Context)
Citation Context ...rems from the previous section. 3.1 General Theory of RKHS 4 31 ¿%6' We start by recalling some basic properties of Reproducing Kernel Hilbert Spaces H ! 0À Á%ÂmÀ4 (see the original work [2] and also =-=[17]-=- for a nice discussion in the context of learning theory) and their connections to integral operators. We say that a Hilbert space of functions has the reproducing property, if the evaluation function... |

3 |
Delalleau and N.Le Roux, Efficient Non-Parametric Function Induction in Semi-Supervised Learning
- Bengio, O
- 2004
(Show Context)
Citation Context ...uced by the kernel. Manifold regularization provides natural out-of-sample extensions to several graph based approaches. These connections are summarized in Table 2. We also note the very recent work =-=[8]-=- on out-of-sample extensions for semi-supervised learning. For Graph Regularization and Label Propagation see [29, 6, 38]. Cotraining [13] The Co-training algorithm was developed to integrate abundanc... |

3 |
Sahami (1998).Inductive learning algorithms and representations for text categorization
- Dumais, Platt, et al.
- 1998
(Show Context)
Citation Context ...id search for best performance over the first 5 realizations of the data. Linear Kernels and cosine distances were used since these have found wide-spread applications in text classification problems =-=[18]-=-. Since the exact datasets on which these algorithms were run, somewhat differ in preprocessing, preparation and experimental protocol, these results are only meant to suggest that Manifold Regulariza... |

3 |
Learning with Kernels, 644
- Schoelkopf, Smola
- 2002
(Show Context)
Citation Context ...nceptual framework is the set of ideas surrounding regularization in Reproducing Kernel Hilbert Spaces. This leads to the class of kernel based algorithms for classification and regression (e.g., see =-=[31]-=-, [36], [20]). We show how to bring these ideas together in a coherent and natural way to incorporate geometric structure in a kernel based regularization framework. As far as we know, these ideas hav... |