Results 1  10
of
697
Regularization and variable selection via the Elastic Net
 Journal of the Royal Statistical Society, Series B
, 2005
"... Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where ..."
Abstract

Cited by 362 (8 self)
 Add to MetaCart
Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the p n case. An algorithm called LARSEN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.
Locality Preserving Projections
, 2002
"... Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data s ..."
Abstract

Cited by 205 (15 self)
 Add to MetaCart
Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis (PCA)  a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. As a result, LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigenmaps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. This is borne out by illustrative examples on some high dimensional data sets.
A Survey of Robot Learning from Demonstration
"... We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a ..."
Abstract

Cited by 153 (18 self)
 Add to MetaCart
We present a comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings. We introduce the LfD design choices in terms of demonstrator, problem space, policy derivation and performance, and contribute the foundations for a structure in which to categorize LfD research. Specifically, we analyze and categorize the multiple ways in which examples are gathered, ranging from teleoperation to imitation, as well as the various techniques for policy derivation, including matching functions, dynamics models and plans. To conclude we discuss LfD limitations and related promising areas for future research.
The Entire Regularization Path for the Support Vector Machine
, 2004
"... In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. ..."
Abstract

Cited by 147 (9 self)
 Add to MetaCart
In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.
Convex multitask feature learning
 Machine Learning
, 2007
"... Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We p ..."
Abstract

Cited by 141 (15 self)
 Add to MetaCart
Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns taskspecific functions and in the latter step it learns commonacrosstasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select – not learn – a few common variables across the tasks 3.
Sparse Principal Component Analysis
 Journal of Computational and Graphical Statistics
, 2004
"... Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA su#ers from the fact that each principal component is a linear combination of all the original variables, thus it is often di#cult to interpret the results. We introduce a new method ca ..."
Abstract

Cited by 131 (4 self)
 Add to MetaCart
Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA su#ers from the fact that each principal component is a linear combination of all the original variables, thus it is often di#cult to interpret the results. We introduce a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings. We show that PCA can be formulated as a regressiontype optimization problem, then sparse loadings are obtained by imposing the lasso (elastic net) constraint on the regression coe#cients. E#cient algorithms are proposed to realize SPCA for both regular multivariate data and gene expression arrays. We also give a new formula to compute the total variance of modified principal components. As illustrations, SPCA is applied to real and simulated data, and the results are encouraging.
Sparsity and smoothness via the fused lasso
 Journal of the Royal Statistical Society Series B
, 2005
"... The lasso (Tibshirani 1996) penalizes a least squares regression by the sum of the absolute values (L1 norm) of the coefficients. The form of this penalty encourages sparse solutions, that is, having many coefficients equal to zero. Here we propose the “fused lasso”, a generalization of the lasso de ..."
Abstract

Cited by 129 (11 self)
 Add to MetaCart
The lasso (Tibshirani 1996) penalizes a least squares regression by the sum of the absolute values (L1 norm) of the coefficients. The form of this penalty encourages sparse solutions, that is, having many coefficients equal to zero. Here we propose the “fused lasso”, a generalization of the lasso designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes both the L1 norm of the coefficients and their successive differences. Thus it encourages both sparsity
Logistic Model Trees
, 2006
"... Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into ‘model trees’, i.e. trees that contain linear regr ..."
Abstract

Cited by 96 (2 self)
 Add to MetaCart
Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into ‘model trees’, i.e. trees that contain linear regression functions at the leaves. In this paper, we present an algorithm that adapts this idea for classification problems, using logistic regression instead of linear regression. We use a stagewise fitting process to construct the logistic regression models that can select relevant attributes in the data in a natural way, and show how this approach can be used to build the logistic regression models at the leaves by incrementally refining those constructed at higher levels in the tree. We compare the performance of our algorithm to several other stateoftheart learning schemes on 36 benchmark UCI datasets, and show that it produces accurate and compact classifiers.
Learning the kernel function via regularization
 Journal of Machine Learning Research
, 2005
"... We study the problem of finding an optimal kernel from a prescribed convex set of kernels K for learning a realvalued function by regularization. We establish for a wide variety of regularization functionals that this leads to a convex optimization problem and, for square loss regularization, we ch ..."
Abstract

Cited by 96 (7 self)
 Add to MetaCart
We study the problem of finding an optimal kernel from a prescribed convex set of kernels K for learning a realvalued function by regularization. We establish for a wide variety of regularization functionals that this leads to a convex optimization problem and, for square loss regularization, we characterize the solution of this problem. We show that, although K may be an uncountable set, the optimal kernel is always obtained as a convex combination of at most m+2 basic kernels, where m is the number of data examples. In particular, our results apply to learning the optimal radial kernel or the optimal dot product kernel. 1.
Overview of record linkage and current research directions
 BUREAU OF THE CENSUS
, 2006
"... This paper provides background on record linkage methods that can be used in combining data from a variety of sources such as person lists business lists. It also gives some areas of current research. ..."
Abstract

Cited by 85 (1 self)
 Add to MetaCart
This paper provides background on record linkage methods that can be used in combining data from a variety of sources such as person lists business lists. It also gives some areas of current research.