Results 1 - 10
of
126
High dimensional graphs and variable selection with the Lasso
- Annals of Statistics
, 2006
"... The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a ..."
Abstract
-
Cited by 232 (17 self)
- Add to MetaCart
The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse high-dimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power. 1. Introduction. Consider
Regularization and variable selection via the Elastic Net
- Journal of the Royal Statistical Society, Series B
, 2005
"... Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where ..."
Abstract
-
Cited by 159 (5 self)
- Add to MetaCart
Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the p n case. An algorithm called LARS-EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lasso.
Asymptotics for Lasso-type estimators
, 2000
"... this paper, we consider the asymptotic behaviour of regression estimators that minimize the residual sum of squares plus a penalty proportional to ..."
Abstract
-
Cited by 95 (3 self)
- Add to MetaCart
this paper, we consider the asymptotic behaviour of regression estimators that minimize the residual sum of squares plus a penalty proportional to
On the LASSO and Its Dual
- Journal of Computational and Graphical Statistics
, 1999
"... Proposed by Tibshirani (1996), the LASSO (least absolute shrinkage and selection operator) estimates a vector of regression coe#cients by minimising the residual sum of squares subject to a constraint on the l 1 -norm of coe#cient vector. The LASSO estimator typically has one or more zero elements ..."
Abstract
-
Cited by 89 (2 self)
- Add to MetaCart
Proposed by Tibshirani (1996), the LASSO (least absolute shrinkage and selection operator) estimates a vector of regression coe#cients by minimising the residual sum of squares subject to a constraint on the l 1 -norm of coe#cient vector. The LASSO estimator typically has one or more zero elements and thus shares characteristics of both shrinkage estimation and variable selection. In this paper we treat the LASSO as a convex programming problem and derive its dual. Consideration of the primal and dual problems together leads to important new insights into the characteristics of the LASSO estimator and to an improved method for estimating its covariance matrix. Using these results we also develop an e#cient algorithm for computing LASSO estimates which is usable even in cases where the number of regressors exceeds the number of observations. KEY WORDS AND PHRASES. Convex Programming, Dual Problem, Partial Least Squares, Quadratic Programming, Penalised Regression, Regression, Shrinkag...
Kernel partial least squares regression in reproducing kernel hilbert space
- Journal of Machine Learning Research
, 2001
"... A family of regularized least squares regression models in a Reproducing Kernel Hilbert Space is extended by the kernel partial least squares (PLS) regression model. Similar to principal components regression (PCR), PLS is a method based on the projection of input (explanatory) variables to the late ..."
Abstract
-
Cited by 74 (5 self)
- Add to MetaCart
A family of regularized least squares regression models in a Reproducing Kernel Hilbert Space is extended by the kernel partial least squares (PLS) regression model. Similar to principal components regression (PCR), PLS is a method based on the projection of input (explanatory) variables to the latent variables (components). However, in contrast to PCR, PLS creates the components by modeling the relationship between input and output variables while maintaining most of the information in the input variables. PLS is useful in situations where the number of explanatory variables exceeds the number of observations and/or a high level of multicollinearity among those variables is assumed. Motivated by this fact we will provide a kernel PLS algorithm for construction of nonlinear regression models in possibly high-dimensional feature spaces. We give the theoretical description of the kernel PLS algorithm and we experimentally compare the algorithm with the existing kernel PCR and kernel ridge regression techniques. We will demonstrate that on the data sets employed kernel PLS achieves the same results as kernel PCR but uses significantly fewer, qualitatively different components. 1.
Reinforcement learning for humanoid robotics
- Autonomous Robot
, 2003
"... Abstract. The complexity of the kinematic and dynamic structure of humanoid robots make conventional analytical approaches to control increasingly unsuitable for such systems. Learning techniques offer a possible way to aid controller design if insufficient analytical knowledge is available, and lea ..."
Abstract
-
Cited by 69 (19 self)
- Add to MetaCart
Abstract. The complexity of the kinematic and dynamic structure of humanoid robots make conventional analytical approaches to control increasingly unsuitable for such systems. Learning techniques offer a possible way to aid controller design if insufficient analytical knowledge is available, and learning approaches seem mandatory when humanoid systems are supposed to become completely autonomous. While recent research in neural networks and statistical learning has focused mostly on learning from finite data sets without stringent constraints on computational efficiency, learning for humanoid robots requires a different setting, characterized by the need for real-time learning performance from an essentially infinite stream of incrementally arriving data. This paper demonstrates how even high-dimensional learning problems of this kind can successfully be dealt with by techniques from nonparametric regression and locally weighted learning. As an example, we describe the application of one of the most advanced of such algorithms, Locally Weighted Projection Regression (LWPR), to the on-line learning of three problems in humanoid motor control: the learning of inverse dynamics models for model-based control, the learning of inverse kinematics of redundant manipulators, and the learning of oculomotor reflexes. All these examples demonstrate fast, i.e., within seconds or minutes, learning convergence with highly accurate final peformance. We conclude that real-time learning for complex motor system like humanoid robots is possible with appropriately tailored algorithms, such that increasingly autonomous robots with massive learning abilities should be achievable in the near future. 1.
Incremental Online Learning in High Dimensions
- Neural Computation
, 2005
"... Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e ..."
Abstract
-
Cited by 67 (12 self)
- Add to MetaCart
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e#cient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, iii) adjusts its weighting kernels based only on local information in order to minimize the danger of negative interference of incremental learning, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of - possibly redundant - inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and e#ciently operate in very high dimensional spaces.
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds
- In Proceedings of ICDM’03
, 2003
"... In this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topologi ..."
Abstract
-
Cited by 65 (3 self)
- Add to MetaCart
In this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of our approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and outperforms existing schemes by 10% to 35%, on the average.
Feature construction with Inductive Logic Programming: a study of quantitative predictions of chemical activity aided by structural attributes
- Data Mining and Knowledge Discovery
, 1996
"... Recently, computer programs developed within the field of Inductive Logic Programming have received some attention for their ability to construct restricted first-order logic solutions using problem-specific background knowledge. Prominent applications of such programs have been concerned with d ..."
Abstract
-
Cited by 62 (9 self)
- Add to MetaCart
Recently, computer programs developed within the field of Inductive Logic Programming have received some attention for their ability to construct restricted first-order logic solutions using problem-specific background knowledge. Prominent applications of such programs have been concerned with determining "structure-activity" relationships in the areas of molecular biology and chemistry. Typically the task here is to predict the "activity" of a compound, like toxicity, from its chemical structure.
Grouped and hierarchical model selection through composite absolute penalties
- Annals of Statistics
, 2006
"... Extracting useful information from high-dimensional data is an important part of the focus of today’s statistical research and practice. Penalized loss function minimiza-tion has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
Extracting useful information from high-dimensional data is an important part of the focus of today’s statistical research and practice. Penalized loss function minimiza-tion has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L1-penalized L2 minimization method Lasso has been popular in regression models. In this paper, we combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penal-ties (CAP) family which allows the grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and com-bining the properties of norm penalties at the across group and within group levels. Grouped selection occurs for non-overlapping groups. In that case, we give a Bayesian 1 interpretation for CAP penalties. Hierarchical variable selection is reached by defining groups with particular overlapping patterns. In the computation aspect, we propose using the BLASSO and cross-validation to obtain CAP estimates. For a subfamily of CAP estimates involving only the L1 and L ∞ norms, we introduce the iCAP algorithm to trace the entire regularization path for the grouped selection problem. Within this subfamily, unbiased estimates of the degrees of freedom (df) are derived allowing the regularization parameter to be selected without cross-validation. CAP is shown to im-prove on the predictive performance of the LASSO in a series of simulated experiments including cases with p>> n and mis-specified groupings. When the complexity of a model is properly calculated, iCAP is seen to be parsimonious in the experiments. 1

