Results 1 - 10
of
87
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little u ..."
Abstract
-
Cited by 1512 (112 self)
- Add to MetaCart
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for real-time applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition
Multicategory Support Vector Machines, theory, and application to the classification of microarray data and satellite radiance data
- Journal of the American Statistical Association
, 2004
"... Two-category support vector machines (SVM) have been very popular in the machine learning community for classi � cation problems. Solving multicategory problems by a series of binary classi � ers is quite common in the SVM paradigm; however, this approach may fail under various circumstances. We pro ..."
Abstract
-
Cited by 116 (10 self)
- Add to MetaCart
Two-category support vector machines (SVM) have been very popular in the machine learning community for classi � cation problems. Solving multicategory problems by a series of binary classi � ers is quite common in the SVM paradigm; however, this approach may fail under various circumstances. We propose the multicategory support vector machine (MSVM), which extends the binary SVM to the multicategory case and has good theoretical properties. The proposed method provides a unifying framework when there are either equal or unequal misclassi � cation costs. As a tuning criterion for the MSVM, an approximate leave-one-out cross-validation function, called Generalized Approximate Cross Validation, is derived, analogous to the binary case. The effectiveness of the MSVM is demonstrated through the applications to cancer classi � cation using microarray data and cloud classi � cation with satellite radiance pro � les.
Correcting sample selection bias by unlabeled data
"... We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We prese ..."
Abstract
-
Cited by 69 (5 self)
- Add to MetaCart
We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.
Measuring statistical dependence with Hilbert-Schmidt norms
- PROCEEDINGS ALGORITHMIC LEARNING THEORY
, 2005
"... We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). Th ..."
Abstract
-
Cited by 61 (34 self)
- Add to MetaCart
We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.
Gaussian Processes for Machine Learning
- International Journal of Neural Systems
, 2004
"... Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to in nite (countably or continuous) index sets. GPs have been applied in a large number of elds to a diverse range of ends, and very many deep theoretical analyses of various properties are available ..."
Abstract
-
Cited by 49 (13 self)
- Add to MetaCart
Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to in nite (countably or continuous) index sets. GPs have been applied in a large number of elds to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated.
A classification framework for anomaly detection
- J. Machine Learning Research
, 2005
"... One way to describe anomalies is by saying that anomalies are not concentrated. This leads to the problem of finding level sets for the data generating density. We interpret this learning problem as a binary classification problem and compare the corresponding classification risk with the standard p ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
One way to describe anomalies is by saying that anomalies are not concentrated. This leads to the problem of finding level sets for the data generating density. We interpret this learning problem as a binary classification problem and compare the corresponding classification risk with the standard performance measure for the density level problem. In particular it turns out that the empirical classification risk can serve as an empirical performance measure for the anomaly detection problem. This allows us to compare different anomaly detection algorithms empirically, i.e. with the help of a test set. Based on the above interpretation we then propose a support vector machine (SVM) for anomaly detection. Finally, we establish universal consistency for this SVM and report some experiments which compare our SVM to other commonly used methods including the standard one-class SVM. 1
Integrating structured biological data by kernel maximum mean discrepancy
- IN ISMB
, 2006
"... Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernel-based statistical test for this problem, based on the fact that two distributions are different if and only if the ..."
Abstract
-
Cited by 33 (13 self)
- Add to MetaCart
Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernel-based statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. Results: We study the practical feasibility of an MMD-based test on three central data integration tasks: Testing cross-platform comparability of microarray data, cancer diagnosis, and data-content based schema matching for two different protein function classification schemas. In all of these experiments, including high-dimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. Conclusions: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments.
Fast rates for support vector machines using gaussian kernels
- Ann. Statist
, 2004
"... We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1-SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we ..."
Abstract
-
Cited by 31 (7 self)
- Add to MetaCart
We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1-SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we introduce a new geometric noise condition for distributions that is used to bound the approximation error of Gaussian kernels in terms of their widths. 1
Kernel measures of conditional dependence
- In Adv. NIPS
, 2008
"... We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a ..."
Abstract
-
Cited by 31 (24 self)
- Add to MetaCart
We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments. 1

