Results 11 - 20
of
41
Collaborative hierarchical sparse modeling
, 2010
"... Sparse modeling is a powerful framework for data analysis and processing. Traditionally, encoding in this framework is done by solving an ℓ1-regularized linear regression problem, usually called Lasso. In this work we first combine the sparsityinducing property of the Lasso model, at the individual ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Sparse modeling is a powerful framework for data analysis and processing. Traditionally, encoding in this framework is done by solving an ℓ1-regularized linear regression problem, usually called Lasso. In this work we first combine the sparsityinducing property of the Lasso model, at the individual feature level, with the block-sparsity property of the group Lasso model, where sparse groups of features are jointly encoded, obtaining a sparsity pattern hierarchically structured. This results in the hierarchical Lasso, which shows important practical modeling advantages. We then extend this approach to the collaborative case, where a set of simultaneously coded signals share the same sparsity pattern at the higher (group) level but not necessarily at the lower one. Signals then share the same active groups, or classes, but not necessarily the same active set. This is very well suited for applications such as source separation. An efficient optimization procedure, which guarantees convergence to the global optimum, is developed for these new models. The underlying presentation of the new framework and optimization approach is complemented with experimental examples and preliminary theoretical results.
Input selection and shrinkage in multiresponse linear regression
- Computational Statistics and Data Analysis
, 2007
"... The regression problem of modeling several response variables using the same set of input variables is considered. The model is linearly parameterized and the parameters are estimated by minimizing the error sum of squares subject to a sparsity constraint. The constraint has the effect of eliminatin ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The regression problem of modeling several response variables using the same set of input variables is considered. The model is linearly parameterized and the parameters are estimated by minimizing the error sum of squares subject to a sparsity constraint. The constraint has the effect of eliminating useless inputs and constraining the parameters of the remaining inputs in the model. Two algorithms for solving the resulting convex cone programming problem are proposed. The first algorithm gives a pointwise solution, while the second one computes the entire path of solutions as a function of the constraint parameter. Based on experiments with real data sets, the proposed method has a similar performance to existing methods. In simulation experiments, the proposed method is competitive both in terms of prediction accuracy and correctness of input selection. The advantages become more apparent when many correlated inputs are available for model construction. © 2007 Elsevier B.V. All rights reserved.
Nonparametric Regression and Classification with Joint Sparsity Constraints
"... We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints. Our approach is based on a regularization method that enforces common sparsity patterns across different function components in a nonparametric additive model. The algorithms ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints. Our approach is based on a regularization method that enforces common sparsity patterns across different function components in a nonparametric additive model. The algorithms employ a coordinate descent approach that is based on a functional soft-thresholding operator. The framework yields several new models, including multi-task sparse additive models, multi-response sparse additive models, and sparse additive multi-category logistic regression. The methods are illustrated with experiments on synthetic data and gene microarray data. 1
Group Sparse Priors for Covariance Estimation
- Proc. of the Conf. on Uncertainty in AI
, 2009
"... Recently it has become popular to learn sparse Gaussian graphical models (GGMs) by imposing ℓ1 or group ℓ1,2 penalties on the elements of the precision matrix. This penalized likelihood approach results in a tractable convex optimization problem. In this paper, we reinterpret these results as perfor ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Recently it has become popular to learn sparse Gaussian graphical models (GGMs) by imposing ℓ1 or group ℓ1,2 penalties on the elements of the precision matrix. This penalized likelihood approach results in a tractable convex optimization problem. In this paper, we reinterpret these results as performing MAP estimation under a novel prior which we call the group ℓ1 and ℓ1,2 positivedefinite matrix distributions. This enables us to build a hierarchical model in which the ℓ1 regularization terms vary depending on which group the entries are assigned to, which in turn allows us to learn block structured sparse GGMs with unknown group assignments. Exact inference in this hierarchical model is intractable, due to the need to compute the normalization constant of these matrix distributions. However, we derive upper bounds on the partition functions, which lets us use fast variational inference (optimizing a lower bound on the joint posterior). We show that on two real world data sets (motion capture and financial data), our method which infers the block structure outperforms a method that uses a fixed block structure, which in turn outperforms baseline methods that ignore block structure. 1
SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION
- SUBMITTED TO THE ANNALS OF STATISTICS
, 2010
"... In multivariate regression, a K-dimensional response vector is regressed upon a common set of p covariates, with a matrix B ∗ ∈ R p×K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the ℓ1/ℓ2 norm is used for support union re ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In multivariate regression, a K-dimensional response vector is regressed upon a common set of p covariates, with a matrix B ∗ ∈ R p×K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the ℓ1/ℓ2 norm is used for support union recovery, or recovery of the set of s rows for which B ∗ is non-zero. Under high-dimensional scaling, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter θ(n, p, s) : = n/[2ψ(B ∗ ) log(p − s)]. Here n is the sample size, and ψ(B ∗ ) is a sparsity-overlap function measuring a combination of the sparsities and overlaps of the K-regression coefficient vectors that constitute the model. We prove that the multivariate group Lasso succeeds for problem sequences (n, p, s) such that θ(n, p, s) exceeds a critical level θu, and fails for sequences such that θ(n, p, s) lies below a critical level θℓ. For the special case of the standard Gaussian ensemble, we show that θℓ = θu so that the characterization is sharp. The sparsity-overlap function ψ(B ∗ ) reveals that, if the design is uncorrelated on the active rows, ℓ1/ℓ2 regularization for multivariate regression never harms performance relative to an ordinary Lasso approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the coefficient vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the multivariate group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.
Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso
, 1005
"... We consider the problem of learning a structured multi-task regression, where the output consists of multiple responses that are related by a graph and the correlated response variables are dependent on the common inputs in a sparse but synergistic manner. Previous methods such as ℓ1/ℓ2-regularized ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We consider the problem of learning a structured multi-task regression, where the output consists of multiple responses that are related by a graph and the correlated response variables are dependent on the common inputs in a sparse but synergistic manner. Previous methods such as ℓ1/ℓ2-regularized multi-task regression assume that all of the output variables are equally related to the inputs, although in many real-world problems, outputs are related in a complex manner. In this paper, we propose graph-guided fused lasso (GFlasso) for structured multi-task regression that exploits the graph structure over the output variables. We introduce a novel penalty function based on fusion penalty to encourage highly correlated outputs to share a common set of relevant inputs. In addition, we propose a simple yet efficient proximal-gradient method for optimizing GFlasso that can also be applied to any optimization problems with a convex smooth loss and the general class of fusion penalty defined on arbitrary graph structures. By exploiting the structure of the non-smooth “fusion penalty”, our method achieves a faster convergence rate than the standard first-order method, sub-gradient method, and is significantly more scalable than the widely adopted second-order cone-programming and quadratic-programming formulations. In addition, we provide an analysis of the consistency property of the GFlasso model. Experimental results not only demonstrate the superiority of GFlasso over the standard lasso but also show the efficiency and scalability of our proximal-gradient method.
Accelerated gradient method for multi-task sparse learning problem
- in Proceedings of the International Conference on Data Mining
, 2009
"... Abstract—Many real world learning problems can be recast as multi-task learning problems which utilize correlations among different tasks to obtain better generalization performance than learning each task individually. The feature selection problem in multi-task setting has many applications in fie ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—Many real world learning problems can be recast as multi-task learning problems which utilize correlations among different tasks to obtain better generalization performance than learning each task individually. The feature selection problem in multi-task setting has many applications in fields of computer vision, text classification and bio-informatics. Generally, it can be realized by solving a L-1-infinity regularized optimization problem. And the solution automatically yields the joint sparsity among different tasks. However, due to the nonsmooth nature of the L-1-infinity norm, there lacks an efficient training algorithm for solving such problem with general convex loss functions. In this paper, we propose an accelerated gradient method based on an “optimal ” first order black-box method named after Nesterov and provide the convergence rate for smooth convex loss functions. For nonsmooth convex loss functions, such as hinge loss, our method still has fast convergence rate empirically. Moreover, by exploiting the structure of the L-1-infinity ball, we solve the black-box oracle in Nesterov’s method by a simple sorting scheme. Our method is suitable for large-scale multi-task learning problem since it only utilizes the first order information and is very easy to implement. Experimental results show that our method significantly outperforms the most state-of-the-art methods in both convergence speed and learning accuracy. Keywords-multi-task learning; L-1-infinity regularization; optimal method; gradient descend I.
F∞ norm support vector machine
- Statistica Sinica
"... In this paper we propose a new support vector machine (SVM), the F∞-norm SVM, to perform automatic factor selection in classification. The F∞-norm SVM methodology is motivated by the feature selection problem in cases where the input features are generated by factors, and the model is best interpret ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper we propose a new support vector machine (SVM), the F∞-norm SVM, to perform automatic factor selection in classification. The F∞-norm SVM methodology is motivated by the feature selection problem in cases where the input features are generated by factors, and the model is best interpreted in terms of significant factors. This type of problem arises naturally when a set of dummy variables are used to represent a categorical factor and/or a set of basis functions of a continuous variable are included in the predictor set. In problems without such obvious group information, we propose to first create groups among features by clustering, and then apply the F∞-norm SVM. We show that the F∞-norm SVM is equivalent to a linear programming problem and can be efficiently solved using the standard linear programming technique. Analysis on simulated and real world data shows that the F∞-norm SVM enjoys competitive performance when compared with the 1-norm and 2-norm SVMs.
Convex and network flow optimization for structured sparsity
- JMLR
"... We consider a class of learning problems regularized by a structured sparsity-inducing norm defined as the sum of ℓ2- or ℓ∞-norms over groups of variables. Whereas much effort has been put in developing fast optimization techniques when the groups are disjoint or embedded in a hierarchy, we address ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We consider a class of learning problems regularized by a structured sparsity-inducing norm defined as the sum of ℓ2- or ℓ∞-norms over groups of variables. Whereas much effort has been put in developing fast optimization techniques when the groups are disjoint or embedded in a hierarchy, we address here the case of general overlapping groups. To this end, we present two different strategies: On the one hand, we show that the proximal operator associated with a sum of ℓ∞norms can be computed exactly in polynomial time by solving a quadratic min-cost flow problem, allowing the use of accelerated proximal gradient methods. On the other hand, we use proximal splitting techniques, and address an equivalent formulation with non-overlapping groups, but in higher dimension and with additional constraints. We propose efficient and scalable algorithms exploiting these two strategies, which are significantly faster than alternative approaches. We illustrate these methods with several problems such as CUR matrix factorization, multi-task learning of tree-structured dictionaries, background subtraction in video sequences, image denoising with wavelets, and topographic dictionary learning of natural image patches.
A scalable trust-region algorithm with application to mixed-norm regression
"... We present a new algorithm for minimizing a convex loss-function subject to regularization. Our framework applies to numerous problems in machine learning and statistics; notably, for sparsity-promoting regularizers such as ℓ1 or ℓ1, ∞ norms, it enables efficient computation of sparse solutions. Our ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We present a new algorithm for minimizing a convex loss-function subject to regularization. Our framework applies to numerous problems in machine learning and statistics; notably, for sparsity-promoting regularizers such as ℓ1 or ℓ1, ∞ norms, it enables efficient computation of sparse solutions. Our approach is based on the trust-region framework with nonsmooth objectives, which allows us to build on known results to provide convergence analysis. We avoid the computational overheads associated with the conventional Hessian approximation used by trust-region methods by instead using a simple separable quadratic approximation. This approximation also enables use of proximity operators for tackling nonsmooth regularizers. We illustrate the versatility of our resulting algorithm by specializing it to three mixed-norm regression problems: group lasso [36], group logistic regression [21], and multi-task lasso [19]. We experiment with both synthetic and real-world large-scale data—our method is seen to be competitive, robust, and scalable. 1.

