Results 1  10
of
248
Structured variable selection with sparsityinducing norms
, 2011
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to ov ..."
Abstract

Cited by 186 (27 self)
 Add to MetaCart
(Show Context)
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for leastsquares linear regression in low and highdimensional settings.
Online Learning for Latent Dirichlet Allocation
"... We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collection ..."
Abstract

Cited by 173 (19 self)
 Add to MetaCart
(Show Context)
We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1
F.: Proximal methods for sparse hierarchical dictionary learning
 In: ICML
"... This paper proposes to combine two approaches for modeling data admitting sparse representations: On the one hand, dictionary learning has proven very effective for various signal restoration and representation tasks. On the other hand, recent work on structured sparsity provides a natural framework ..."
Abstract

Cited by 123 (22 self)
 Add to MetaCart
This paper proposes to combine two approaches for modeling data admitting sparse representations: On the one hand, dictionary learning has proven very effective for various signal restoration and representation tasks. On the other hand, recent work on structured sparsity provides a natural framework for modeling dependencies between dictionary elements. We propose to combine these approaches to learn dictionaries embedded in a hierarchy. We show that the proximal operator for the treestructured sparse regularization that we consider can be computed exactly in linear time with a primaldual approach, allowing the use of accelerated gradient methods. Experiments show that for natural image patches, learned dictionary elements organize themselves naturally in such a hierarchical structure, leading to an improved performance for restoration tasks. When applied to text documents, our method learns hierarchies of topics, thus providing a competitive alternative to probabilistic topic models. Learned sparse representations, initially introduced by Olshausen and Field [1997], have been the focus of much research in machine learning, signal processing and neuroscience, leading to stateoftheart algorithms for several problems in image processing. Modeling signals as a linear combination of a
Dictionaries for Sparse Representation Modeling
"... Sparse and redundant representation modeling of data assumes an ability to describe signals as linear combinations of a few atoms from a prespecified dictionary. As such, the choice of the dictionary that sparsifies the signals is crucial for the success of this model. In general, the choice of a p ..."
Abstract

Cited by 101 (3 self)
 Add to MetaCart
Sparse and redundant representation modeling of data assumes an ability to describe signals as linear combinations of a few atoms from a prespecified dictionary. As such, the choice of the dictionary that sparsifies the signals is crucial for the success of this model. In general, the choice of a proper dictionary can be done using one of two ways: (i) building a sparsifying dictionary based on a mathematical model of the data, or (ii) learning a dictionary to perform best on a training set. In this paper we describe the evolution of these two paradigms. As manifestations of the first approach, we cover topics such as wavelets, wavelet packets, contourlets, and curvelets, all aiming to exploit 1D and 2D mathematical models for constructing effective dictionaries for signals and images. Dictionary learning takes a different route, attaching the dictionary to a set of examples it is supposed to serve. From the seminal work of Field and Olshausen, through the MOD, the KSVD, the Generalized PCA and others, this paper surveys the various options such training has to offer, up to the most recent contributions and structures.
Stochastic Variational Inference
 JOURNAL OF MACHINE LEARNING RESEARCH (2013, IN PRESS)
, 2013
"... We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet proce ..."
Abstract

Cited by 88 (22 self)
 Add to MetaCart
(Show Context)
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.
TaskDriven Dictionary Learning
"... Abstract—Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that ..."
Abstract

Cited by 85 (3 self)
 Add to MetaCart
Abstract—Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a largescale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in largescale settings, and is well suited to supervised and semisupervised classification, as well as regression tasks for data that admit sparse representations. Index Terms—Basis pursuit, Lasso, dictionary learning, matrix factorization, semisupervised learning, compressed sensing. Ç 1
Proximal Methods for Hierarchical Sparse Coding
, 2010
"... Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced treestructured sparse regularizatio ..."
Abstract

Cited by 83 (20 self)
 Add to MetaCart
Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced treestructured sparse regularization norm, which has proven useful in several applications. This norm leads to regularized problems that are difficult to optimize, and we propose in this paper efficient algorithms for solving them. More precisely, we show that the proximal operator associated with this norm is computable exactly via a dual approach that can be viewed as the composition of elementary proximal operators. Our procedure has a complexity linear, or close to linear, in the number of atoms, and allows the use of accelerated gradient techniques to solve the treestructured sparse approximation problem at the same computational cost as traditional ones using the ℓ1norm. Our method is efficient and scales gracefully to millions of variables, which we illustrate in two types of applications: first, we consider fixed hierarchical dictionaries of wavelets to denoise natural images. Then, we apply our optimization tools in the context of dictionary learning, where learned dictionary elements naturally organize in a prespecified arborescent structure, leading to a better performance in reconstruction of natural image patches. When applied to text documents, our method learns hierarchies of topics, thus providing a competitive alternative to probabilistic topic models.
Learning A Discriminative Dictionary for Sparse Coding via Label Consistent KSVD
"... A label consistent KSVD (LCKSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse ..."
Abstract

Cited by 80 (8 self)
 Add to MetaCart
(Show Context)
A label consistent KSVD (LCKSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistent constraint called ‘discriminative sparsecode error ’ and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the KSVD algorithm. Our algorithm learns a single overcomplete dictionary and an optimal linear classifier jointly. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse coding techniques for face and object category recognition under the same learning conditions. 1.
Nonuniform deblurring for shaken images
 In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2010. 8. Image taken with a Canon 1D Mark III, at 35mm f/4.5. Images
"... Blur from camera shake is mostly due to the 3D rotation of the camera, resulting in a blur kernel that can be significantly nonuniform across the image. However, most current deblurring methods model the observed image as a convolution of a sharp image with a uniform blur kernel. We propose a new p ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
(Show Context)
Blur from camera shake is mostly due to the 3D rotation of the camera, resulting in a blur kernel that can be significantly nonuniform across the image. However, most current deblurring methods model the observed image as a convolution of a sharp image with a uniform blur kernel. We propose a new parametrized geometric model of the blurring process in terms of the rotational velocity of the camera during exposure. We apply this model to two different algorithms for camera shake removal: the first one uses a single blurry image (blind deblurring), while the second one uses both a blurry image and a sharp but noisy image of the same scene. We show that our approach makes it possible to model and remove a wider class of blurs than previous approaches, including uniform blur as a special case, and demonstrate its effectiveness with experiments on real images. 1.
Structured sparsityinducing norms through submodular functions
 IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 2010
"... Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turnedinto a convex optimization problem byreplacing the cardinality function by its convex en ..."
Abstract

Cited by 56 (13 self)
 Add to MetaCart
Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turnedinto a convex optimization problem byreplacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the ℓ1norm. In this paper, we investigate more general setfunctions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nonincreasing submodular setfunctions, the corresponding convex envelope can be obtained from its Lovász extension, a common tool in submodular analysis. This defines a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or highdimensional inference). By selecting specific submodular functions, we can give a new interpretation to known norms, such as those based on rankstatistics or grouped norms with potentially overlapping groups; we also define new norms, in particular ones that can be used as nonfactorial priors for supervised learning.