Results 1  10
of
22
Representation Learning: A Review and New Perspectives
, 2012
"... The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to ..."
Abstract

Cited by 159 (4 self)
 Add to MetaCart
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and joint training of deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep architectures. This motivates longerterm unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
The Manifold Tangent Classifier
"... We combine three important ideas present in previous work for building classifiers: the semisupervised hypothesis (the input distribution contains information about the classifier), the unsupervised manifold hypothesis (data density concentrates near lowdimensional manifolds), and the manifold hyp ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
(Show Context)
We combine three important ideas present in previous work for building classifiers: the semisupervised hypothesis (the input distribution contains information about the classifier), the unsupervised manifold hypothesis (data density concentrates near lowdimensional manifolds), and the manifold hypothesis for classification (different classes correspond to disjoint manifolds separated by low density). We exploit a novel algorithm for capturing manifold structure (highorder contractive autoencoders) and we show how it builds a topological atlas of charts, each chart being characterized by the principal singular vectors of the Jacobian of a representation mapping. This representation learning algorithm can be stacked to yield a deep architecture, and we combine it with a domain knowledgefree version of the TangentProp algorithm to encourage the classifier to be insensitive to local directions changes along the manifold. Recordbreaking classification results are obtained. 1
What regularized autoencoders learn from the data generating distribution
, 2012
"... What do autoencoders learn about the underlying data generating distribution? Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form o ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
(Show Context)
What do autoencoders learn about the underlying data generating distribution? Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density. We show that the autoencoder captures the score (derivative of the logdensity with respect to the input). It contradicts previous interpretations of reconstruction error as an energy function. Unlike previous results, the theorems provided here are completely generic and do not depend on the parametrization of the autoencoder: they show what the autoencoder would tend to if given enough capacity and examples. These results are for a contractive training criterion we show to be similar to the denoising autoencoder training criterion with small corruption noise, but with contraction applied on the whole reconstruction function rather than just encoder. Similarly to score matching, one can consider the proposed training criterion as a convenient alternative to maximum likelihood because it does not involve a partition function. Finally, we show how an approximate MetropolisHastings MCMC can be setup to recover samples from the estimated distribution, and this is confirmed in sampling experiments. 1.
Nonlinear dimensionality reduction with local spline embedding
 IEEE Trans. Knowl. Data Eng
, 2009
"... Abstract—This paper presents a new algorithm for Nonlinear Dimensionality Reduction (NLDR). Our algorithm is developed under the conceptual framework of compatible mapping. Each such mapping is a compound of a tangent space projection and a group of splines. Tangent space projection is estimated at ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a new algorithm for Nonlinear Dimensionality Reduction (NLDR). Our algorithm is developed under the conceptual framework of compatible mapping. Each such mapping is a compound of a tangent space projection and a group of splines. Tangent space projection is estimated at each data point on the manifold, through which the data point itself and its neighbors are represented in tangent space with local coordinates. Splines are then constructed to guarantee that each of the local coordinates can be mapped to its own single global coordinate with respect to the underlying manifold. Thus, the compatibility between local alignments is ensured. In such a work setting, we develop an optimization framework based on reconstruction error analysis, which can yield a global optimum. The proposed algorithm is also extended to embed out of samples via spline interpolation. Experiments on toy data sets and realworld data sets illustrate the validity of our method. Index Terms—Nonlinear dimensionality reduction, compatible mapping, local spline embedding, out of samples. Ç 1
Better Mixing via Deep Representations
, 2014
"... It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation. We study the following related conjecture: better representations, in the sense of better ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation. We study the following related conjecture: better representations, in the sense of better disentangling, can be exploited to produce fastermixing Markov chains. Consequently, mixing would be more efficient at higher levels of representation. To better understand why and how this is happening, we propose a secondary conjecture: the higherlevel samples fill more uniformly the space they occupy and the highdensity manifolds tend to unfold when represented at higher levels. The paper discusses these hypotheses and tests them experimentally through visualization and measurements of mixing and interpolating between samples. 1
Implicit density estimation by local moment matching to sample from autoencoders
, 2012
"... Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of the unknown data generating density. This paper contributes to the mathematical understanding of this phenomenon and helps define better justified sampling algorithms for deep learning ba ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Recent work suggests that some autoencoder variants do a good job of capturing the local manifold structure of the unknown data generating density. This paper contributes to the mathematical understanding of this phenomenon and helps define better justified sampling algorithms for deep learning based on autoencoder variants. We consider an MCMC where each step samples from a Gaussian whose mean and covariance matrix depend on the previous state, defines through its asymptotic distribution a target density. First, we show that good choices (in the sense of consistency) for these mean and covariance functions are the local expected value and local covariance under that target density. Then we show that an autoencoder with a contractive penalty captures estimators of these local moments in its reconstruction function and its Jacobian. A contribution of this work is thus a novel alternative to maximumlikelihood density estimation, which we call local moment matching. It also justifies a recently proposed sampling algorithm for the Contractive AutoEncoder and extends it to the Denoising AutoEncoder. 1
Prognostic Physiology: Modeling Patient Severity in Intensive Care Units Using Radial Domain Folding
"... Realtime scalable predictive algorithms that can mine big health data as the care is happening can become the new “medical tests ” in critical care. This work describes a new unsupervised learning approach, radial domain folding, to scale and summarize the enormous amount of data collected and to v ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Realtime scalable predictive algorithms that can mine big health data as the care is happening can become the new “medical tests ” in critical care. This work describes a new unsupervised learning approach, radial domain folding, to scale and summarize the enormous amount of data collected and to visualize the degradations or improvements in multiple organ systems in real time. Our proposed system is based on learning multilayer lower dimensional abstractions from routinely generated patient data in modern Intensive Care Units (ICUs), and is dramatically different from most of the current work being done in ICU data mining that rely on building supervised predictive models using commonly measured clinical observations. We patient states that summarize a patient's physiology. Further, we show that a logistic regression model trained exclusively on our learned layer outperforms a customized SAPS II score on the mortality prediction task.
1Kernel Discriminant Analysis for Regression Problems
"... In this paper, we propose a nonlinear feature extraction method for regression problems to reduce the dimensionality of the input space. Previously, a feature extraction method LDAr, a regressional version of the linear discriminant analysis, was proposed. In this paper, LDAr is generalized to a non ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
In this paper, we propose a nonlinear feature extraction method for regression problems to reduce the dimensionality of the input space. Previously, a feature extraction method LDAr, a regressional version of the linear discriminant analysis, was proposed. In this paper, LDAr is generalized to a nonlinear discriminant analysis by using the so called kernel trick. The basic idea is to map the input space into a highdimensional feature space where the variables are nonlinear transformations of input variables. Then we try to maximize the ratio of distances of samples with large differences in the target value and those with small differences in the target value in the feature space. It is well known that the distribution of face images, under a perceivable variation in translation, rotation, and scaling, is highly nonlinear and the face alignment problem is a complex regression problem. We have applied the proposed method to various regression problems including face alignment problems and achieved better performances than those of conventional linear feature extraction methods.
Human body shape estimation using a multiresolution manifold forest
 In IEEE Conference on Computer Vision and Pattern Recognition
, 2014
"... This paper proposes a method for estimating the 3D body shape of a person with robustness to clothing. We formulate the problem as optimization over the manifold of valid depth maps of body shapes learned from synthetic training data. The manifold itself is represented using a novel data structure, ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
This paper proposes a method for estimating the 3D body shape of a person with robustness to clothing. We formulate the problem as optimization over the manifold of valid depth maps of body shapes learned from synthetic training data. The manifold itself is represented using a novel data structure, a MultiResolution Manifold Forest (MRMF), which contains vertical edges between tree nodes as well as horizontal edges between nodes across trees that correspond to overlapping partitions. We show that this data structure allows both efficient localization and navigation on the manifold for onthefly building of local linear models (manifold charting). We demonstrate shape estimation of clothed users, showing significant improvement in accuracy over global shape models and models using precomputed clusters. We further compare the MRMF with alternative manifold charting methods on a public dataset for estimating 3D motion from noisy 2D marker observations, obtaining stateoftheart results. 1.
Multimodal transitions for generative stochastic networks. arXiv preprint arXiv:1312.5578
, 2013
"... Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data g ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Generative Stochastic Networks (GSNs) have been recently introduced as an alternative to traditional probabilistic modeling: instead of parametrizing the data distribution directly, one parametrizes a transition operator for a Markov chain whose stationary distribution is an estimator of the data generating distribution. The result of training is therefore a machine that generates samples through this Markov chain. However, the previously introduced GSN consistency theorems suggest that in order to capture a wide class of distributions, the transition operator in general should be multimodal, something that has not been done before this paper. We introduce for the first time multimodal transition distributions for GSNs, in particular using models in the NADE family (Neural Autoregressive Density Estimator) as output distributions of the transition operator. A NADE model is related to an RBM (and can thus model multimodal distributions) but its likelihood (and likelihood gradient) can be computed easily. The parameters of the NADE are obtained as a learned function of the previous state of the learned Markov chain. Experiments clearly illustrate the advantage of such multimodal transition distributions over unimodal GSNs. 1