• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Joint covariate selection for grouped classification (2007)

by G Obozinski, B Taskar, M Jordan
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 15
Next 10 →

Efficient Online and Batch Learning using Forward Backward Splitting

by John Duchi, Yoram Singer, Yoav Freund
"... We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the first phase. This view yields a simple yet effective algorithm that can be used for batch penalized risk minimization and online learning. Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as ℓ1. We derive concrete and very simple algorithms for minimization of loss functions with ℓ1, ℓ2, ℓ 2 2, and ℓ ∞ regularization. We also show how to construct efficient algorithms for mixed-norm ℓ1/ℓq regularization. We further extend the algorithms and give efficient implementations for very high-dimensional data with sparsity. We demonstrate the potential of the proposed framework in a series of experiments with synthetic and natural datasets.

High-dimensional union support recovery in multivariate

by Guillaume Obozinski, Martin J. Wainwright, Michael I. Jordan
"... regression ..."
Abstract - Cited by 19 (0 self) - Add to MetaCart
regression

Union support recovery in high-dimensional multivariate

by Guillaume Obozinski, Martin J. Wainwright, Michael I. Jordan , 2008
"... regression ..."
Abstract - Cited by 14 (6 self) - Add to MetaCart
regression

SLEP: Sparse Learning with Efficient Projections, Arizona State University, 2009. [Online]. Available: http://www.public.asu.edu/ ∼jye02/Software/SLEP [19

by Jun Liu, Shuiwang Ji, Jieping Ye - Annals of Applied Statistics , 2007
"... ..."
Abstract - Cited by 13 (5 self) - Add to MetaCart
Abstract not found

Multi-Task Feature Learning Via Efficient L2,1-Norm Minimization

by Jun Liu, Shuiwang Ji, Jieping Ye , 2009
"... The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the ℓ2,1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilisti ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the ℓ2,1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family. One appealing feature of the ℓ2,1-norm regularization is that it encourages multiple predictors to share similar sparsity patterns. However, the resulting optimization problem is challenging to solve due to the non-smoothness of the ℓ2,1-norm regularization. In this paper, we propose to accelerate the computation by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov’s method—an optimal first-order black-box method for smooth convex optimization. A key building block in solving the reformulations is the Euclidean projection. We show that the Euclidean projection for the first reformulation can be analytically computed, while the Euclidean projection for the second one can be computed in linear time. Empirical evaluations on several data sets verify the efficiency of the proposed algorithms.

Simultaneous support recovery in high-dimensional regression: Benefits and perils of ℓ1,∞-regularization

by N. Negahban, Martin J. Wainwright, Senior Member , 2009
"... Abstract—Given a collection of 2 linear regression problems in dimensions, suppose that the regression coefficients share partially common supports of size at most. This set-up suggests the use of 1-regularized regression for joint estimation of the matrix of regression coefficients. We analyze the ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
Abstract—Given a collection of 2 linear regression problems in dimensions, suppose that the regression coefficients share partially common supports of size at most. This set-up suggests the use of 1-regularized regression for joint estimation of the matrix of regression coefficients. We analyze the high-dimensional scaling of 1-regularized quadratic programming, considering both consistency rates in-norm, and how the minimal sample size required for consistent variable selection scales with model dimension, sparsity, and overlap between the supports. We first establish bounds on the-error as well sufficient conditions for exact variable selection for fixed design matrices, as well as for designs drawn randomly from general Gaussian distributions. Specializing to the case =2linear regression problems with standard Gaussian designs whose supports overlap in a fraction [0 1] of their entries, we prove that 1-regularized method undergoes a phase transition characterized by the rescaled sample size 1 () = (4 3) log ( (2) ).An implication is that the use of 1-regularization yields improved statistical efficiency if the overlap parameter is large enough ( 2 3), but has worse statistical efficiency than a naive Lasso-based approach for moderate to small overlap ( 2 3). Empirical simulations illustrate the close agreement between theory and actual behavior in practice. These results show that caution must be exercised in applying 1 block regularization: if the data does not match its structure very closely, it can impair statistical performance relative to computationally less expensive schemes. Index Terms — 1-constraints, compressed sensing, convex relaxation, group Lasso, high-dimensional inference, model selection, phase transitions, sparse approximation, subset selection.

Composite Objective Mirror Descent

by John Duchi, Shai Shalev-shwartz, Yoram Singer, Ambuj Tewari
"... We present a new method for regularized convex optimization and analyze it under both online and stochastic optimization settings. In addition to unifying previously known firstorder algorithms, such as the projected gradient method, mirror descent, and forwardbackward splitting, our method yields n ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
We present a new method for regularized convex optimization and analyze it under both online and stochastic optimization settings. In addition to unifying previously known firstorder algorithms, such as the projected gradient method, mirror descent, and forwardbackward splitting, our method yields new analysis and algorithms. We also derive specific instantiations of our method for commonly used regularization functions, such as ℓ1, mixed norm, and trace-norm. 1

Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ1,∞-regularization

by Sahand Negahban, Martin J. Wainwright
"... Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports. This set-up suggests the use of ℓ1/ℓ∞-regularized regression for joint estimation of the p × r matrix of regression coefficients. We analyze the high-dime ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports. This set-up suggests the use of ℓ1/ℓ∞-regularized regression for joint estimation of the p × r matrix of regression coefficients. We analyze the high-dimensional scaling of ℓ1/ℓ∞-regularized quadratic programming, considering both consistency rates in ℓ∞-norm, and also how the minimal sample size n required for performing variable selection grows as a function of the model dimension, sparsity, and overlap between the supports. We begin by establishing bounds on the ℓ∞error as well sufficient conditions for exact variable selection for fixed design matrices, as well as designs drawn randomly from general Gaussian matrices. These results show that the high-dimensional scaling of ℓ1/ℓ∞-regularization is qualitatively similar to that of ordinary ℓ1-regularization. Our second set of results applies to design matrices drawn from standard Gaussian ensembles, for which we provide a sharp set of necessary and sufficient conditions: the ℓ1/ℓ∞-regularized method undergoes a phase transition characterized by the rescaled sample size θ1,∞(n, p, s, α) = n/{(4 − 3α)s log(p − (2 − α)s)}. More precisely, for any δ> 0, the probability of successfully recovering both supports converges to 1 for scalings such that θ1, ∞ ≥ 1+δ, and converges to 0 for scalings for which θ1, ∞ ≤ 1 −δ. An implication of this threshold is that use of ℓ1,∞-regularization yields improved statistical efficiency if the overlap parameter is large enough (α> 2/3), but performs worse than a naive Lasso-based approach for moderate to small overlap (α < 2/3). We illustrate the close agreement between these theoretical predictions, and the actual behavior in simulations. 1

Group Sparse Coding

by Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
"... Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1

Boosting with Structural Sparsity

by John Duchi, Yoram Singer
"... We derive generalizations of AdaBoost and related gradient-based coordinate descent methods that incorporate sparsity-promoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and back ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
We derive generalizations of AdaBoost and related gradient-based coordinate descent methods that incorporate sparsity-promoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and back-pruning through regularization and give an automatic stopping criterion for feature induction. We study penalties based on the ℓ1, ℓ2, and ℓ ∞ norms of the predictor and introduce mixed-norm penalties that build upon the initial penalties. The mixed-norm regularizers facilitate structural sparsity in parameter space, which is a useful property in multiclass prediction and other related tasks. We report empirical results that demonstrate the power of our approach in building accurate and structurally sparse models. 1. Introduction and
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University