Results 1  10
of
18
Highdimensional semiparametric Gaussian copula graphical models
 THE ANNALS OF STATISTICS
, 2012
"... We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating highdimensional undirected graphical models. To achieve modeling flexibility, we consider the nonparanormal graphical models proposed by Liu, Lafferty and Wasserman [J. Mach. Learn. Res. 10 ..."
Abstract

Cited by 48 (19 self)
 Add to MetaCart
We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating highdimensional undirected graphical models. To achieve modeling flexibility, we consider the nonparanormal graphical models proposed by Liu, Lafferty and Wasserman [J. Mach. Learn. Res. 10 (2009) 2295–2328]. To achieve estimation robustness, we exploit nonparametric rankbased correlation coefficient estimators, including Spearman’s rho and Kendall’s tau. We prove that the nonparanormal SKEPTIC achieves the optimal parametric rates of convergence for both graph recovery and parameter estimation. This result suggests that the nonparanormal graphical models can be used as a safe replacement of the popular Gaussian graphical models, even when the data are truly Gaussian. Besides theoretical analysis, we also conduct thorough numerical simulations to compare the graph recovery performance of different estimators under both ideal and noisy settings. The proposed methods are then applied on a largescale genomic data set to illustrate their empirical usefulness. The R package huge implementing the proposed methods is available on the Comprehensive R
Simultaneous support recovery in high dimensions: Benefits and perils of block ℓ1,∞regularization
, 2009
"... ..."
(Show Context)
HighDimensional Feature Selection by FeatureWise Kernelized Lasso
, 2013
"... The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and outpu ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a featurewise kernelized Lasso for capturing nonlinear inputoutput dependency. We first show that, with particular choices of kernel functions, nonredundant features with strong statistical dependence on output values can be found in terms of kernelbased independence measures such as the HilbertSchmidt independence criterion (HSIC). We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to highdimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments for classification and regression with thousands of features.
1 ℓp−ℓq penalty for Sparse Linear and Sparse Multiple Kernel MultiTask Learning
"... Abstract—Recently, there has been a lot of interest around multitask learning (MTL) problem with the constraints that tasks should share a common sparsity profile. Such a problem can be addressed through a regularization framework where the regularizer induces a jointsparsity pattern between task ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Recently, there has been a lot of interest around multitask learning (MTL) problem with the constraints that tasks should share a common sparsity profile. Such a problem can be addressed through a regularization framework where the regularizer induces a jointsparsity pattern between task decision functions. We follow this principled framework and focus on ℓp−ℓq (with 0 ≤ p ≤ 1 and 1 ≤ q ≤ 2) mixednorms as sparsityinducing penalties. Our motivation for addressing such a larger class of penalty is to adapt the penalty to a problem at hand leading thus to better performances and better sparsity pattern. For solving the problem in the general multiple kernel case, we first derive a variational formulation of the ℓ1 − ℓq penalty which helps up in proposing an alternate optimization algorithm. Although very simple, the latter algorithm provably converges to the global minimum of the ℓ1 − ℓq penalized problem. For the linear case, we extend existing works considering accelerated proximal gradient to this penalty. Our contribution in this context is to provide an efficient scheme for computing the ℓ1−ℓq proximal operator. Then, for the more general case when 0 < p < 1, we solve the resulting nonconvex problem through a majorizationminimization approach. The resulting algorithm is an iterative scheme which, at each iteration, solves a weighted ℓ1 − ℓq sparse MTL problem. Empirical evidences from toy dataset and realword datasets dealing with BCI single trial EEG classification and protein subcellular localization show the benefit of the proposed approaches and algorithms. Index Terms—Multitask learning, multiple kernel learning, sparsity, mixednorm, Support Vector Machines I.
The Landmark Selection Method for Multiple Output Prediction
"... Conditional modeling x ↦ → y is a central problem in machine learning. A substantial research effort is devoted to such modeling when x is high dimensional. We consider, instead, the case of a high dimensional y, where x is either low dimensional or high dimensional. Our approach is based on selecti ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Conditional modeling x ↦ → y is a central problem in machine learning. A substantial research effort is devoted to such modeling when x is high dimensional. We consider, instead, the case of a high dimensional y, where x is either low dimensional or high dimensional. Our approach is based on selecting a small subset yL of the dimensions of y, and proceed by modeling (i) x ↦ → yL and (ii) yL ↦ → y. Composingthesetwomodels,weobtainaconditionalmodelx ↦ → y thatpossesses convenient statistical properties. Multilabel classification and multivariate regression experiments on several datasets show that this method outperforms the one vs. all approach as well as several sophisticated multiple output prediction methods. 1.
Sparse Additive machine
"... We develop a high dimensional nonparametric classification method named sparse additive machine (SAM), which can be viewed as a functional version of support vector machine (SVM) combined with sparse additive modeling. the SAM is related to multiple kernel learning (MKL), but is computationally more ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
We develop a high dimensional nonparametric classification method named sparse additive machine (SAM), which can be viewed as a functional version of support vector machine (SVM) combined with sparse additive modeling. the SAM is related to multiple kernel learning (MKL), but is computationally more efficient and amenable to theoretical analysis. In terms of computation, we develop an efficient accelerated proximal gradient descent algorithm which is also scalable to large datasets with a provable O(1/k2) convergence rate, where k is the number of iterations. In terms of theory, we provide the oracle properties of the SAM under asymptotic frameworks. Empirical results on both synthetic and real data are reported to back up our theory. 1
Multivariate dyadic regression trees for sparse learning problems
 In NIPS
, 2010
"... We propose a new nonparametric learning method based on multivariate dyadic regression trees (MDRTs). Unlike traditional dyadic decision trees (DDTs) or classification and regression trees (CARTs), MDRTs are constructed using penalized empirical risk minimization with a novel sparsityinducing penal ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We propose a new nonparametric learning method based on multivariate dyadic regression trees (MDRTs). Unlike traditional dyadic decision trees (DDTs) or classification and regression trees (CARTs), MDRTs are constructed using penalized empirical risk minimization with a novel sparsityinducing penalty. Theoretically, we show that MDRTs can simultaneously adapt to the unknown sparsity and smoothness of the true regression functions, and achieve the nearly optimal rates of convergence (in a minimax sense) for the class of (α, C)smooth functions. Empirically, MDRTs can simultaneously conduct function estimation and variable selection in high dimensions. To make MDRTs applicable for largescale learning problems, we propose a greedy heuristics. The superior performance of MDRTs are demonstrated on both synthetic and real datasets. 1
Sparse Bayesian structure learning with dependent relevance determination prior
"... In many problem settings, parameter vectors are not merely sparse, but dependent in such a way that nonzero coefficients tend to cluster together. We refer to this form of dependency as “region sparsity”. Classical sparse regression methods, such as the lasso and automatic relevance determination ..."
Abstract
 Add to MetaCart
In many problem settings, parameter vectors are not merely sparse, but dependent in such a way that nonzero coefficients tend to cluster together. We refer to this form of dependency as “region sparsity”. Classical sparse regression methods, such as the lasso and automatic relevance determination (ARD), model parameters as independent a priori, and therefore do not exploit such dependencies. Here we introduce a hierarchical model for smooth, regionsparse weight vectors and tensors in a linear regression setting. Our approach represents a hierarchical extension of the relevance determination framework, where we add a transformed Gaussian process to model the dependencies between the prior variances of regression weights. We combine this with a structured model of the prior variances of Fourier coefficients, which eliminates unnecessary high frequencies. The resulting prior encourages weights to be regionsparse in two different bases simultaneously. We develop efficient approximate inference methods and show substantial improvements over comparable methods (e.g., group lasso and smooth RVM) for both simulated and real datasets from brain imaging. 1