Results 1  10
of
272
Random features for largescale kernel machines
 In Neural Infomration Processing Systems
, 2007
"... To accelerate the training of kernel machines, we propose to map the input data to a randomized lowdimensional feature space and then apply existing fast linear methods. Our randomized features are designed so that the inner products of the transformed data are approximately equal to those in the f ..."
Abstract

Cited by 193 (3 self)
 Add to MetaCart
To accelerate the training of kernel machines, we propose to map the input data to a randomized lowdimensional feature space and then apply existing fast linear methods. Our randomized features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shiftinvariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in largescale classification and regression tasks linear machine learning algorithms that use these features outperform stateoftheart largescale kernel machines. 1
Consistency of the group lasso and multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it ..."
Abstract

Cited by 192 (30 self)
 Add to MetaCart
We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic model consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
The mathematics of learning: Dealing with data
 Notices of the American Mathematical Society
, 2003
"... Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1 ..."
Abstract

Cited by 145 (16 self)
 Add to MetaCart
(Show Context)
Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1
Learning the kernel function via regularization
 Journal of Machine Learning Research
, 2005
"... We study the problem of finding an optimal kernel from a prescribed convex set of kernels K for learning a realvalued function by regularization. We establish for a wide variety of regularization functionals that this leads to a convex optimization problem and, for square loss regularization, we ch ..."
Abstract

Cited by 131 (7 self)
 Add to MetaCart
(Show Context)
We study the problem of finding an optimal kernel from a prescribed convex set of kernels K for learning a realvalued function by regularization. We establish for a wide variety of regularization functionals that this leads to a convex optimization problem and, for square loss regularization, we characterize the solution of this problem. We show that, although K may be an uncountable set, the optimal kernel is always obtained as a convex combination of at most m+2 basic kernels, where m is the number of data examples. In particular, our results apply to learning the optimal radial kernel or the optimal dot product kernel. 1.
Everything Old Is New Again: A Fresh Look at Historical Approaches
 in Machine Learning. PhD thesis, MIT
, 2002
"... 2 Everything Old Is New Again: A Fresh Look at Historical ..."
Abstract

Cited by 100 (7 self)
 Add to MetaCart
(Show Context)
2 Everything Old Is New Again: A Fresh Look at Historical
Regularized LeastSquares Classification
"... We consider the solution of binary classification problems via Tikhonov regularization in a Reproducing Kernel Hilbert Space using the square loss, and denote the resulting algorithm Regularized LeastSquares Classification (RLSC). We sketch ..."
Abstract

Cited by 87 (1 self)
 Add to MetaCart
We consider the solution of binary classification problems via Tikhonov regularization in a Reproducing Kernel Hilbert Space using the square loss, and denote the resulting algorithm Regularized LeastSquares Classification (RLSC). We sketch
Theory of classification: A survey of some recent advances
, 2005
"... The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results. ..."
Abstract

Cited by 76 (3 self)
 Add to MetaCart
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.
Consistency of support vector machines and other regularized kernel classifiers
, 2002
"... ..."
(Show Context)