• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Adaptive sparseness for supervised learning (2003)

by M Figueiredo
Venue:PAMI
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 43
Next 10 →

Sparse multinomial logistic regression: Fast algorithms and generalization bounds

by Balaji Krishnapuram, Lawrence Carin, Mário A. T. Figueiredo, Alexander J. Hartemink - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2005
"... Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exac ..."
Abstract - Cited by 67 (1 self) - Add to MetaCart
Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.

A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration

by José M. Bioucas-Dias, Mário A. T. Figueiredo - IEEE TRANSACTIONS ON IMAGE PROCESSING , 2007
"... Iterative shrinkage/thresholding (IST) algorithms have been recently proposed to handle a class of convex unconstrained optimization problems arising in image restoration and other linear inverse problems. This class of problems results from combining a linear observation model with a nonquadratic ..."
Abstract - Cited by 41 (7 self) - Add to MetaCart
Iterative shrinkage/thresholding (IST) algorithms have been recently proposed to handle a class of convex unconstrained optimization problems arising in image restoration and other linear inverse problems. This class of problems results from combining a linear observation model with a nonquadratic regularizer (e.g., total variation or wavelet-based regularization). It happens that the convergence rate of these IST algorithms depends heavily on the linear observation operator, becoming very slow when this operator is ill-conditioned or ill-posed. In this paper, we introduce two-step IST (TwIST) algorithms, exhibiting much faster convergence rate than IST for ill-conditioned problems. For a vast class of nonquadratic convex regularizers ( norms, some Besov norms, and total variation), we show that TwIST converges to a minimizer of the objective function, for a given range of values of its parameters. For noninvertible observation operators, we introduce a monotonic version of TwIST (MTwIST); although the convergence proof does not apply to this scenario, we give experimental evidence that MTwIST exhibits similar speed gains over IST. The effectiveness of the new methods are experimentally confirmed on problems of image deconvolution and of restoration with missing samples.

Bayesian inference and optimal design in the sparse linear model

by Matthias W. Seeger, Martin Wainwright - Workshop on Artificial Intelligence and Statistics
"... The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of ..."
Abstract - Cited by 29 (8 self) - Add to MetaCart
The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007).

Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches

by Mark Schmidt, Glenn Fung, Rómer Rosales
"... Abstract. L1 regularization is effective for feature selection, but the resulting optimization is challenging due to the non-differentiability of the 1-norm. In this paper we compare state-of-the-art optimization techniques to solve this problem across several loss functions. Furthermore, we propose ..."
Abstract - Cited by 23 (1 self) - Add to MetaCart
Abstract. L1 regularization is effective for feature selection, but the resulting optimization is challenging due to the non-differentiability of the 1-norm. In this paper we compare state-of-the-art optimization techniques to solve this problem across several loss functions. Furthermore, we propose two new techniques. The first is based on a smooth (differentiable) convex approximation for the L1 regularizer that does not depend on any assumptions about the loss function used. The other technique is a new strategy that addresses the non-differentiability of the L1-regularizer by casting the problem as a constrained optimization problem that is then solved using a specialized gradient projection method. Extensive comparisons show that our newly proposed approaches consistently rank among the best in terms of convergence speed and efficiency by measuring the number of function evaluations required. 1

Sparse and shift-invariant representations of music

by Thomas Blumensath, Michael E. Davies, Thomas Blumensath, Mike Davies - IEEE Transactions on Speech and Audio Processing , 2006
"... c○2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other w ..."
Abstract - Cited by 20 (6 self) - Add to MetaCart
c○2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the

A Bayesian Approach to Joint Feature Selection and Classifier Design

by Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Mário A. T. Figueiredo - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2004
"... This paper adopts a Bayesian approach to simultaneously learn both an optimal nonlinear classifier and a subset of predictor variables (or features) that are most relevant to the classification task. The approach uses heavy-tailed priors to promote sparsity in the utilization of both basis functio ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
This paper adopts a Bayesian approach to simultaneously learn both an optimal nonlinear classifier and a subset of predictor variables (or features) that are most relevant to the classification task. The approach uses heavy-tailed priors to promote sparsity in the utilization of both basis functions and features; these priors act as regularizers for the likelihood function that rewards good classification on the training data. We derive an

Bayesian Multinomial Logistic Regression for Author Identification

by David Madigan, Alexander Genkin, David D. Lewis, Dmitriy Fradkin, David D. Lewis Consulting - In Maxent Conference , 2005
"... Motivated by high-dimensional applications in authorship atttribution, we describe a Bayesian multinomial logistic regression model together with an associated learning algorithm. ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
Motivated by high-dimensional applications in authorship atttribution, we describe a Bayesian multinomial logistic regression model together with an associated learning algorithm.

Author Identification on the Large Scale

by David Madigan, Alexander Genkin, David D. Lewis, Er Genkin David D. Lewis, Shlomo Argamon, Dmitriy Fradkin, Li Ye, David D. Lewis Consulting - In Proc. of the Meeting of the Classification Society of North America , 2005
"... this paper is on techniques for identifying authors in large collections of textual artifacts (e-mails, communiques, transcribed speech, etc.). Our approach focuses on very high-dimensional, topic-free document representations and particular attribution problems, such as: (1) Which one of these K au ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
this paper is on techniques for identifying authors in large collections of textual artifacts (e-mails, communiques, transcribed speech, etc.). Our approach focuses on very high-dimensional, topic-free document representations and particular attribution problems, such as: (1) Which one of these K authors wrote this particular document? (2) Did any of these K authors write this particular document? Scientific investigation into measuring style and authorship of texts goes back to the late nineteenth century, with the pioneering studies of Mendenhall [36] and Mascol [34, 35] on distributions of sentence and word lengths in works of literature and the gospels of the New Testament. The underlying notion was that works by di#erent authors are strongly distinguished by quantifiable features of the text. By the mid-twentieth century, this line of research had grown into what became known as "stylometrics", and a variety of textual statistics had been proposed to quantify textual style. The style of early work was characterized by a search for invariant properties of textual statistics, such as Zipf's distribution and Yule's K statistic

The Horseshoe Estimator for Sparse Signals

by Carlos M. Carvalho, Nicholas G. Polson, James G. Scott , 2008
"... This paper proposes a new approach to sparsity called the horseshoe estimator. The horseshoe is a close cousin of other widely used Bayes rules arising from, for example, double-exponential and Cauchy priors, in that it is a member of the same family of multivariate scale mixtures of normals. But th ..."
Abstract - Cited by 8 (4 self) - Add to MetaCart
This paper proposes a new approach to sparsity called the horseshoe estimator. The horseshoe is a close cousin of other widely used Bayes rules arising from, for example, double-exponential and Cauchy priors, in that it is a member of the same family of multivariate scale mixtures of normals. But the horseshoe enjoys a number of advantages over existing approaches, including its robustness, its adaptivity to different sparsity patterns, and its analytical tractability. We prove two theorems that formally characterize both the horseshoe’s adeptness at large outlying signals, and its super-efficient rate of convergence to the correct estimate of the sampling density in sparse situations. Finally, using a combination of real and simulated data, we show that the horseshoe estimator corresponds quite closely to the answers one would get by pursuing a full Bayesian model-averaging approach using a discrete mixture prior to model signals and noise.

On Bayesian Classification with Laplace Priors

by Ata Kabán
"... We present a new classification approach, using a variational Bayesian estimation of probit regression with Laplace priors. Laplace priors have been previously used extensively as a sparsity inducing mechanism to perform feature selection simultaneously with classification or regression. However, co ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
We present a new classification approach, using a variational Bayesian estimation of probit regression with Laplace priors. Laplace priors have been previously used extensively as a sparsity inducing mechanism to perform feature selection simultaneously with classification or regression. However, contrarily to the ’myth ’ of sparse Bayesian learning with Laplace priors, we find that the sparsity effect is due to a property of the maximum a posteriori (MAP) parameter estimates only. The Bayesian estimates, in turn, induce a posterior weighting rather than a hard selection of features, and has different advantageous properties: (1) It provides better estimates of the prediction uncertainty; (2) it is able to retain correlated features favouring generalisation; (3) it is more stable with respect to the hyperparameter choice and (4) it produces a weight-based ranking of the features, suited for interpretation. We analyse the behaviour of the Bayesian estimate in comparison with its MAP counterpart, as well as other related models, (a) through a graphical interpretation of the associated shrinkage and (b) by controlled numerical simulations in a range of testing conditions. The results pinpoint the situations when the advantages of Bayesian estimates are feasible to exploit. Finally, we demonstrate the working of our method in a gene expression classification task. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University