Results 1  10
of
13
Y.: Sparse feature learning for deep belief networks
 In: Advances in Neural Information Processing Systems (NIPS 2007
, 2007
"... Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the ..."
Abstract

Cited by 70 (11 self)
 Add to MetaCart
Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the tradeoff between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, highorder dependencies between the input observed variables can be captured. 1
Learning Invariant Features through Topographic Filter Maps
"... Several recentlyproposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often compos ..."
Abstract

Cited by 70 (12 self)
 Add to MetaCart
Several recentlyproposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a nonlinear transform, such as a pointwise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locallyinvariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited. 1.
Fast inference in sparse coding algorithms with applications to object recognition
 Technical report, Computational and Biological Learning Lab, Courant Institute, NYU
"... Adaptive sparse coding methods learn a possibly overcomplete set of basis functions, such that natural image patches can be reconstructed by linearly combining a small subset of these bases. The applicability of these methods to visual object recognition tasks has been limited because of the prohibi ..."
Abstract

Cited by 36 (11 self)
 Add to MetaCart
Adaptive sparse coding methods learn a possibly overcomplete set of basis functions, such that natural image patches can be reconstructed by linearly combining a small subset of these bases. The applicability of these methods to visual object recognition tasks has been limited because of the prohibitive cost of the optimization algorithms required to compute the sparse representation. In this work we propose a simple and efficient algorithm to learn basis functions. After training, this model also provides a fast and smooth approximator to the optimal representation, achieving even better accuracy than exact sparse coding algorithms on visual object recognition tasks. 1
Largescale deep unsupervised learning using graphics processors
 International Conf. on Machine Learning
, 2009
"... The promise of unsupervised learning methods lies in their potential to use vast amounts of unlabeled data to learn complex, highly nonlinear models with millions of free parameters. We consider two wellknown unsupervised learning models, deep belief networks (DBNs) and sparse coding, that have rec ..."
Abstract

Cited by 29 (7 self)
 Add to MetaCart
The promise of unsupervised learning methods lies in their potential to use vast amounts of unlabeled data to learn complex, highly nonlinear models with millions of free parameters. We consider two wellknown unsupervised learning models, deep belief networks (DBNs) and sparse coding, that have recently been applied to a flurry of machine learning applications (Hinton & Salakhutdinov, 2006; Raina et al., 2007). Unfortunately, current learning algorithms for both models are too slow for largescale applications, forcing researchers to focus on smallerscale models, or to use fewer training examples. In this paper, we suggest massively parallel methods to help resolve these problems. We argue that modern graphics processors far surpass the computational capabilities of multicore CPUs, and have the potential to revolutionize the applicability of deep unsupervised learning methods. We develop general principles for massively parallelizing unsupervised learning tasks using graphics processors. We show that these principles can be applied to successfully scaling up learning algorithms for both DBNs and sparse coding. Our implementation of DBN learning is up to 70 times faster than a dualcore CPU implementation for large models. For example, we are able to reduce the time required to learn a fourlayer DBN with 100 million free parameters from several weeks to around a single day. For sparse coding, we develop a simple, inherently parallel algorithm, that leads to a 5 to 15fold speedup over previous methods.
ClusteringBased Denoising With Locally Learned Dictionaries
, 2009
"... In this paper, we propose KLLD: a patchbased, locally adaptive denoising method based on clustering the given noisy image into regions of similar geometric structure. In order to effectively perform such clustering, we employ as features the local weight functions derived from our earlier work on ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
In this paper, we propose KLLD: a patchbased, locally adaptive denoising method based on clustering the given noisy image into regions of similar geometric structure. In order to effectively perform such clustering, we employ as features the local weight functions derived from our earlier work on steering kernel regression [1]. These weights are exceedingly informative and robust in conveying reliable local structural information about the image even in the presence of significant amounts of noise. Next, we model each region (or cluster)—which may not be spatially contiguous—by “learning ” a best basis describing the patches within that cluster using principal components analysis. This learned basis (or “dictionary”) is then employed to optimally estimate the underlying pixel values using a kernel regression framework. An iterated version of the proposed algorithm is also presented which leads to further performance enhancements. We also introduce a novel mechanism for optimally choosing the local patch size for each cluster using Stein’s unbiased risk estimator (SURE). We illustrate the overall algorithm’s capabilities with several examples. These indicate that the proposed method appears to be competitive with some of the most recently published state of the art denoising methods.
Hierarchical ALS Algorithms for Nonnegative Matrix and 3D Tensor Factorization
 In: Independent Component Analysis, ICA07
"... Abstract. In the paper we present new Alternating Least Squares (ALS) algorithms for Nonnegative Matrix Factorization (NMF) and their extensions to 3D Nonnegative Tensor Factorization (NTF) that are robust in the presence of noise and have many potential applications, including multiway Blind Sourc ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
Abstract. In the paper we present new Alternating Least Squares (ALS) algorithms for Nonnegative Matrix Factorization (NMF) and their extensions to 3D Nonnegative Tensor Factorization (NTF) that are robust in the presence of noise and have many potential applications, including multiway Blind Source Separation (BSS), multisensory or multidimensional data analysis, and nonnegative neural sparse coding. We propose to use local cost functions whose simultaneous or sequential (one by one) minimization leads to a very simple ALS algorithm which works under some sparsity constraints both for an underdetermined (a system which has less sensors than sources) and overdetermined model. The extensive experimental results confirm the validity and high performance of the developed algorithms, especially with usage of the multilayer hierarchical NMF. Extension of the proposed algorithm to
Improved MFOCUSS algorithm with overlapping blocks for locally smooth sparse signals
 IEEE Trans. Signal Process
"... CUSS) algorithm has already found many applications in signal processing and data analysis, whereas the regularized MFOCUSS algorithm has been recently proposed by Cotter et al. for finding sparse solutions to an underdetermined system of linear equations with multiple measurement vectors. In this ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
CUSS) algorithm has already found many applications in signal processing and data analysis, whereas the regularized MFOCUSS algorithm has been recently proposed by Cotter et al. for finding sparse solutions to an underdetermined system of linear equations with multiple measurement vectors. In this paper, we propose three modifications to the MFOCUSS algorithm to make it more efficient for sparse and locally smooth solutions. First, motivated by the simultaneously autoregressive (SAR) model, we incorporate an additional weighting (smoothing) matrix into the Tikhonov regularization term. Next, the entire set of measurement vectors is divided into blocks, and the solution is updated sequentially, based on the overlapping of data blocks. The last modification is based on an alternating minimization technique to provide datadriven (simultaneous) estimation of the regularization parameter with the generalized crossvalidation (GCV) approach. Finally, the simulation results demonstrating the benefits of the proposed modifications support the analysis. Index Terms—FOCal Underdetermined System Solver (FOCUSS), generalized crossvalidation (GCV), smooth signals, sparse solutions, underdetermined systems. I.
Sensitivity computation of the ℓ1 minimization problem and its application to dictionary design
 INVERSE PROBLEMS
, 2009
"... The ℓ 1 minimization problem has been studied extensively in the past few years. Recently, there has been a growing interest in its application for inverse problems. Most studies have concentrated in devising ways for sparse representation of a solution using a given prototype dictionary. Very few ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The ℓ 1 minimization problem has been studied extensively in the past few years. Recently, there has been a growing interest in its application for inverse problems. Most studies have concentrated in devising ways for sparse representation of a solution using a given prototype dictionary. Very few studies have addressed the more challenging problem of optimal dictionary construction, and even these were primarily devoted to the simplistic sparse coding application. In this paper, sensitivity analysis of the inverse solution with respect to the dictionary is presented. This analysis reveals some of the salient features and intrinsic difficulties which are associated with the dictionary design problem. Equipped with these insights, we propose an optimization strategy that alleviates these hurdles while utilizing the derived sensitivity relations for the design of a locally optimal dictionary. Our optimality criterion is based on local minimization of the Bayesian risk, given a set of training models. We present a mathematical formulation and an algorithmic framework to achieve this goal. The proposed framework offers the design of dictionaries for inverse problems that incorporate nontrivial, noninjective observation operators, where the data and the recovered parameters may reside in different spaces. We test our algorithm and show that it yields improved dictionaries for a diverse set of inverse problems in geophysics and medical imaging.
Efficient Learning and Inference of Sparse Overcomplete Representations
"... Sparse overcomplete representations are useful in many vision applications, such as feature extraction [7], recognition [6], denoising, and inpainting of natural images [2]. Many such algorithms have focused on learning a dictionary of basis functions such that any image patch can be reconstructed a ..."
Abstract
 Add to MetaCart
Sparse overcomplete representations are useful in many vision applications, such as feature extraction [7], recognition [6], denoising, and inpainting of natural images [2]. Many such algorithms have focused on learning a dictionary of basis functions such that any image patch can be reconstructed as a linear combination of a small subset of them. Unfortunately, finding the coefficients corresponding to a new image patch is expensive because it involves optimizing the reconstruction error under nonquadratic sparsity constraints (such as an L1 penalty) [5, 3, 4]. We propose an algorithm that learns basis functions in such a way that the optimal set of coefficients representing the input patch can be efficiently computed by a feedforward pass through a simple parametrized function (an encoder), without requiring any optimization. The coefficient vector z ∈ R n is used to model the input space y ∈ R m in terms of a basis set Φ ∈ R m×n. We use ℓ 2 norm for penalizing the reconstruction error, and ℓ 1 norm for enforcing the sparsity of the coefficient vector: �y − Φz � 2 2 + αz�z�1 which is equivalent to the solution using approximate decomposition of input signals as given in Basis Pursuit algorithm [1]. Based on this formulation we propose to learn the basis set and an encoder for