## Semi-supervised learning on Riemannian manifolds (2004)

### Cached

### Download Links

- [www.cs.uchicago.edu]
- [people.cs.uchicago.edu]
- [www.cs.uchicago.edu]
- [www2.imm.dtu.dk]
- DBLP

### Other Repositories/Bibliography

Venue: | Machine Learning |

Citations: | 156 - 8 self |

### BibTeX

@INPROCEEDINGS{Belkin04semi-supervisedlearning,

author = {Mikhail Belkin and Partha Niyogi},

title = {Semi-supervised learning on Riemannian manifolds},

booktitle = {Machine Learning},

year = {2004},

pages = {209--239}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on the submanifold in question rather than the total ambient space. Using the Laplace-Beltrami operator one produces a basis (the Laplacian Eigenmaps) for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once such a basis is obtained, training can be performed using the labeled data set. Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace-Beltrami operator by the graph Laplacian. We provide details of the algorithm, its theoretical justification, and several practical applications for image, speech, and text classification. 1.

### Citations

3629 | Neural Networks: A Comprehensive Foundation (2 nd ed - Haykin - 1999 |

2590 | Normalized cuts and image segmentation
- Shi, Malik
- 1997
(Show Context)
Citation Context ...1) 1 2 min(vol(G1),vol(G − G1)) we obtain: The quantity fT Lf fT Df = � � 1 vol(G1) + 1 vol(G1) + �−1 1 vol(δG1) < 2hG vol(G − G1) 31 �−1 1 vol(G−G1) vol(δG1) was introduced as the Normalized Cut in (=-=Shi and Malik, 2000-=-) in the context of image segmentation and provides a lower bound (and an approximation) for the Cheeger constant. paper.tex; 19/01/2004; 16:50; p.31s32 Similarly to the manifold case, while the direc... |

1614 | Nonlinear dimensionality reduction by locally linear embedding
- Roweis, Saul
- 2000
(Show Context)
Citation Context ... by vectors in R n , the natural distance is often different from the distance induced by the ambient space R n . While there has been recent work on using manifold structure for data representation (=-=Roweis and Saul, 2000-=-; Tenenbaum, et al, 2000), the only other application to classification, that we are aware of, was in (Szummer and Jaakkola, 2002), where the authors use a random walk on the adjacency graph for parti... |

1273 |
Spline models for observational data
- Wahba
- 1990
(Show Context)
Citation Context ...ields a measure of smoothness for functions on the manifold. 5.2. The Laplacian as a smoothness functional A simple measure of the degree of smoothness (following the theory of splines, for example, (=-=Wahba, 1990-=-)) for a function f on a unit circle S1 is the “smoothness functional” � S(f) = |f(φ) ′ | 2 dφ S 1 If S(f) is close to zero, we think of f as being “smooth”. Naturally, constant functions are the most... |

1049 | Nonlinear component analysis as a kernel eigenvalue problem - Smola, Scholkopf, et al. - 1996 |

734 | Laplacian Eigenmaps for Dimensionality Reduction and
- Belkin
(Show Context)
Citation Context ...d especially the diffusion kernel (Kondor and Lafferty, 2002; Smola and Kondor, 2003). In this paper we address the problem of classifying a partially labeled set by developing the ideas proposed in (=-=Belkin and Niyogi, 2003-=-) for data representation. In particular, we exploit the intrinsic structure of the data to improve classification with unlabeled examples under the assumption that the data resides on a low-dimension... |

267 | Learning from labeled and unlabeled data using graph mincuts
- Blum, Chawla
(Show Context)
Citation Context ...data for classification and other purposes. Although the area of partially labeled classification is fairly new, a considerable amount of work has been done in that field since the early 90’s (e.g., (=-=Blum and Chawla, 2001-=-; Castelli and Cover, 1995; Nigam, et al, 2000; Szummer and Jaakkola, 2002)). In particular, there has been a lot of recent interest in semi-supervised learning and graphs, including (Zhu, et al, 2003... |

206 |
ªA Lower Bound for the Smallest Eigenvalue of the Laplacian,º Problems in Analysis
- Cheeger
- 1970
(Show Context)
Citation Context ... To construct this function is not hard. We put � 1 ˜f(x) vol(M1) x ∈ M1 = 1 − x ∈ M − M1 vol(M−M1) appropriately smoothing it at the boundary. It is clear the � M ˜ � f = 0 and it can be shown (see (=-=Cheeger, 1970-=-) for M the details) that �∇ ˜ f�2 � M �f�2 is closely related to the Cheeger constant. On the other hand, the second (first non-constant) eigenfunction of ∆ is equal to � M argmin f⊥const �∇f�2 � M �... |

206 | Partially labeled classification with markov random walks
- Szummer, Jaakkola
- 2006
(Show Context)
Citation Context ...ially labeled classification is fairly new, a considerable amount of work has been done in that field since the early 90’s (e.g., (Blum and Chawla, 2001; Castelli and Cover, 1995; Nigam, et al, 2000; =-=Szummer and Jaakkola, 2002-=-)). In particular, there has been a lot of recent interest in semi-supervised learning and graphs, including (Zhu, et al, 2003; Zhou, et al, 2003; Chapelle, et al, 2003; Joachims, 2003; Belkin, et al,... |

193 | Transductive learning via spectral graph partitioning
- Joachims
- 2003
(Show Context)
Citation Context ...00; Szummer and Jaakkola, 2002)). In particular, there has been a lot of recent interest in semi-supervised learning and graphs, including (Zhu, et al, 2003; Zhou, et al, 2003; Chapelle, et al, 2003; =-=Joachims, 2003-=-; Belkin, et al, 2003) and closely related graph kernels ∗ misha@math.uchicago.edu † niyogi@cs.uchicago.edu c○ 2004 Kluwer Academic Publishers. Printed in the Netherlands. paper.tex; 19/01/2004; 16:50... |

186 | Diffusion kernels on graphs and other discrete input spaces
- Kondor, Lafferty
- 2002
(Show Context)
Citation Context ...aph kernels ∗ misha@math.uchicago.edu † niyogi@cs.uchicago.edu c○ 2004 Kluwer Academic Publishers. Printed in the Netherlands. paper.tex; 19/01/2004; 16:50; p.1s2 and especially the diffusion kernel (=-=Kondor and Lafferty, 2002-=-; Smola and Kondor, 2003). In this paper we address the problem of classifying a partially labeled set by developing the ideas proposed in (Belkin and Niyogi, 2003) for data representation. In particu... |

165 | Stability and generalization
- Bousquet, Elisseeff
(Show Context)
Citation Context ... f(xi)) i=1 2 + λ||f|| 2 H Now one might ask, how far is Êλ,n from Êλ? In order to get a handle on this question, one may proceed by building on the techniques described in (Cucker and Smale, 2001), (=-=Bousquet and Elisseeff, 2001-=-) (3) (4) paper.tex; 19/01/2004; 16:50; p.24sand (Kutin and Niyogi, 2002). We only provide a flavor of the kinds of results that may be obtained. Let fopt = arg minR(f) and ˆ fn = arg min Remp(f) wher... |

73 |
On the exponential value of labeled samples
- Castelli, Cover
- 1994
(Show Context)
Citation Context ... and other purposes. Although the area of partially labeled classification is fairly new, a considerable amount of work has been done in that field since the early 90’s (e.g., (Blum and Chawla, 2001; =-=Castelli and Cover, 1995-=-; Nigam, et al, 2000; Szummer and Jaakkola, 2002)). In particular, there has been a lot of recent interest in semi-supervised learning and graphs, including (Zhu, et al, 2003; Zhou, et al, 2003; Chape... |

43 | Almost-everywhere algorithmic stability and generalization error
- Kutin, Niyogi
- 2002
(Show Context)
Citation Context ...r to get a handle on this question, one may proceed by building on the techniques described in (Cucker and Smale, 2001), (Bousquet and Elisseeff, 2001) (3) (4) paper.tex; 19/01/2004; 16:50; p.24sand (=-=Kutin and Niyogi, 2002-=-). We only provide a flavor of the kinds of results that may be obtained. Let fopt = arg minR(f) and ˆ fn = arg min Remp(f) where R(f) = E[(y − f(x)) 2 ] + λ||f|| 2 H and Remp(f) = 1 �ni=1 n (yi − f(x... |

24 | Higher eigenvalues and isoperimetric inequalities on riemannian manifolds and graphs - Chung, Grigor’yan, et al. |

11 |
A note on the isoperimetric
- Buser
- 1982
(Show Context)
Citation Context ...ction e1 and the clustering function ˜ f are close. We note that several upper and lower bounds for hM in terms of the smallest nonzero eigenvalue of the Laplacian λ1 are known, e.g., (Cheeger, 1970; =-=Buser, 1982-=-). Thus an approximation to the optimal clustering is provided by M1 = {x|e1(x) > 0} and M − M1 = {x|e1(x) ≤ 0}, cutting the manifold along the zero set of e1. Thus the first nontrivial eigenfunction ... |

10 |
Regression and regularization on large graphs
- Niyogi, Matveeva, et al.
- 2003
(Show Context)
Citation Context ...Jaakkola, 2002)). In particular, there has been a lot of recent interest in semi-supervised learning and graphs, including (Zhu, et al, 2003; Zhou, et al, 2003; Chapelle, et al, 2003; Joachims, 2003; =-=Belkin, et al, 2003-=-) and closely related graph kernels ∗ misha@math.uchicago.edu † niyogi@cs.uchicago.edu c○ 2004 Kluwer Academic Publishers. Printed in the Netherlands. paper.tex; 19/01/2004; 16:50; p.1s2 and especiall... |

10 |
The Laplacian on a Riemmannian Manifold
- Rosenberg
- 1997
(Show Context)
Citation Context ...own as the Laplace-Beltrami operator, or the Laplacian 1 . 1 There is an extensive literature on the connection between the geometric properties of the manifold and the Laplace-Beltrami operator. See(=-=Rosenberg, 1997-=-) for an introduction to the subject. paper.tex; 19/01/2004; 16:50; p.6sIn the case of Rn the Laplace-Beltrami operator is simply ∆ = − � . Note that we adopt the geometric convention of writing it i ... |

1 | On the mathematical foundations of learning - Becker, Thrun, et al. - 2001 |