Results 1  10
of
13
Expressive Power and Approximation Errors of Restricted Boltzmann Machines
, 2011
"... We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal KullbackLeibler divergenc ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
(Show Context)
We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal KullbackLeibler divergence to the RBM model with n visible and m hidden units is bounded from above by (n−1)−log(m+1). In this way we can specify the number of hidden units that guarantees a sufficiently rich model containing different classes of distributions and respecting a given error tolerance. 1
An implicitization challenge for binary factor analysis
 J. Symb. Comput
"... Abstract. We use tropical geometry to compute the multidegree and Newton polytope of the hypersurface of a statistical model with two hidden and four observed binary random variables, solving an open question stated by Drton, Sturmfels and Sullivant in [6, Problem 7.7]. The model is obtained from th ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We use tropical geometry to compute the multidegree and Newton polytope of the hypersurface of a statistical model with two hidden and four observed binary random variables, solving an open question stated by Drton, Sturmfels and Sullivant in [6, Problem 7.7]. The model is obtained from the undirected graphical model of the complete bipartite graph K2,4 by marginalizing two of the six binary random variables. We present algorithms for computing the Newton polytope of its defining equation by parallel walks along the polytope and its normal fan. In this way we compute vertices of the polytope. Finally, we also compute and certify its facets by studying tangent cones of the polytope at the symmetry classes vertices. The Newton polytope has 17 214 912 vertices in 44 938 symmetry classes and 70 646 facets in 246 symmetry classes. 1.
When does a mixture of products contain a product of mixtures?
, 2014
"... We derive relations between theoretical properties of restricted Boltzmann machines (RBMs), popular machine learning models which form the building blocks of deep learning models, and several natural notions from discrete mathematics and convex geometry. We give implications and equivalences relatin ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
We derive relations between theoretical properties of restricted Boltzmann machines (RBMs), popular machine learning models which form the building blocks of deep learning models, and several natural notions from discrete mathematics and convex geometry. We give implications and equivalences relating RBMrepresentable probability distributions, perfectly reconstructibe inputs, Hamming modes, zonotopes and zonosets, point configurations in hyperplane arrangements, linear threshold codes, and multicovering numbers of hypercubes. As a motivating application, we prove results on the relative representational power of mixtures of product distributions and products of mixtures of pairs of product distributions (RBMs) that formally justify widely held intuitions about distributed representations. In particular, we show that an exponentially larger mixture of products, requiring an exponentially larger number of parameters, is required to represent the probability distributions represented as products of mixtures.
Expressive Power of Conditional Restricted Boltzmann Machines
"... Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results on the minimal size of universal approximators of condi
Centering svdd for unsupervised feature representation in object classification
 in Neural Information Processing
, 2013
"... Abstract. Learning good feature representation from unlabeled data has attracted researchers great attention recently. Among others, Kmeans clustering algorithm is popularly used to map the input data into a feature representation, by finding the nearest centroid for each input point. However, this ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Learning good feature representation from unlabeled data has attracted researchers great attention recently. Among others, Kmeans clustering algorithm is popularly used to map the input data into a feature representation, by finding the nearest centroid for each input point. However, this ignores the density information of each cluster completely and the resulting representation may be too terse. In this paper, we proposed a SVDD (Support Vector Data Description) based method to address these issues. The key idea of our method is to use SVDD to measure the density of each cluster resulted from KMeans clustering, based on which a robust feature representation can be derived. For this purpose, we add a new constraint to the original SVDD objective function to make the model align better with the data. In addition, we show that our modified SVDD can be solved very efficiently as a linear programming problem, instead of as a quadratic one. The effectiveness and feasibility of the proposed method is verified on two object classification databases with promising results.
Tropical cycles and Chow polytopes
 Beitr. Algebra Geom
"... Abstract. The Chow polytope of an algebraic cycle in a torus depends only on its tropicalisation. Generalising this, we associate a Chow polytope to any abstract tropical variety in a tropicalised toric variety. Several significant polyhedra associated to tropical varieties are special cases of our ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The Chow polytope of an algebraic cycle in a torus depends only on its tropicalisation. Generalising this, we associate a Chow polytope to any abstract tropical variety in a tropicalised toric variety. Several significant polyhedra associated to tropical varieties are special cases of our Chow polytope. The Chow polytope of a tropical variety X is given by a simple combinatorial construction: its normal subdivision is the Minkowski sum of X and a reflected skeleton of the fan of the ambient toric variety. 1.
Deep Learning Kickoff Meeting What is Sparse Coding?
, 2011
"... There are many formulations and algorithms for sparse coding in the literature. We isolate the basic mathematical idea in order to study its properties. Let y ∈ Rn be a data point in a data set Y. Assume that y can be expressed as y = Aa where A ∈ Rn×m is a dictionary matrix and a ∈ Rm is a sparse v ..."
Abstract
 Add to MetaCart
There are many formulations and algorithms for sparse coding in the literature. We isolate the basic mathematical idea in order to study its properties. Let y ∈ Rn be a data point in a data set Y. Assume that y can be expressed as y = Aa where A ∈ Rn×m is a dictionary matrix and a ∈ Rm is a sparse vector. (Ignore noise in the data y for now.) Assume a has at most k nonzero entries (ksparse). Usually, the dictionary A is overcomplete: there are fewer rows than columns (n < m). The columns of A are the atoms of the dictionary. What is Sparse Coding? y = Aa e.g. y small image patch, A dictionary of features, a sparse representation of y. In compressed sensing, we attempt to recover a, assuming we know the dictionary A. In sparse coding, we attempt to find a dictionary A so that each y ∈ Y has a sparse representation. When A is overcomplete, both problems are illposed without the sparsity condition. Sparse Coding in Deep Learning Implement sparse coding on each layer of the autoencoder. Each layer is a Restricted Boltzmann Machine (RBM). Without sparse coding, the autoencoder learns a lowdim representation similar to Principal Component Analysis. Sparse Coding in Deep Learning Aim: Find code map b(y) and dictionary B such that y ̂ = Bb(y) is as close to y as possible. Method: Relax the problem and optimize: min
Deep Narrow Boltzmann Machines are Universal Approximators
, 2014
"... We show that deep narrow Boltzmann machines are universal approximators of probability distributions on the activities of their visible units, provided they have sufficiently many hidden layers, each containing the same number of units as the visible layer. Besides from this existence statement, w ..."
Abstract
 Add to MetaCart
We show that deep narrow Boltzmann machines are universal approximators of probability distributions on the activities of their visible units, provided they have sufficiently many hidden layers, each containing the same number of units as the visible layer. Besides from this existence statement, we provide upper and lower bounds on the sufficient number of layers and parameters. These bounds show that deep narrow Boltzmann machines are at least as compact universal approximators as restricted Boltzmann machines and narrow sigmoid belief networks, with respect to the currently available bounds for those models.
TNNSUBMISSION 1 CSVDDNet: An Effective SingleLayer Network for Unsupervised Feature Learning
"... Abstract—In this paper, we investigate the problem of learning feature representation from unlabeled data using a singlelayer Kmeans network. A Kmeans network maps the input data into a feature representation by finding the nearest centroid for each input point, which has attracted researchers ’ ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—In this paper, we investigate the problem of learning feature representation from unlabeled data using a singlelayer Kmeans network. A Kmeans network maps the input data into a feature representation by finding the nearest centroid for each input point, which has attracted researchers ’ great attention recently due to its simplicity, effectiveness, and scalability. However, one drawback of this feature mapping is that it tends to be unreliable when the training data contains noise. To address this issue, we propose a SVDD based feature learning algorithm that describes the density and distribution of each cluster from Kmeans with an SVDD ball for more robust feature representation. For this purpose, we present a new SVDD algorithm called CSVDD that centers the SVDD ball towards the mode of local density of each cluster, and we show that the objective of CSVDD can be solved very efficiently as a linear programming problem. Additionally, previous singlelayer networks favor a large number of centroids but a crude pooling size, resulting in a representation that highlights the global aspects of the object. Here we explore an alternative network architecture with much smaller number of nodes but with much finer pooling size, hence emphasizing the local details of the object. The architecture is also extended with multiple receptive field scales and multiple pooling sizes. Extensive experiments on several popular object recognition benchmarks, such as MINST, NORB, CIFAR10 and STL10, shows that the proposed CSVDDNet method yields comparable or better performance than that of the previous state of the art methods. I.