Results 1  10
of
45
Tensor decompositions for learning latent variable models
, 2014
"... This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable mo ..."
Abstract

Cited by 83 (7 self)
 Add to MetaCart
(Show Context)
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable moments (typically, of second and thirdorder). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.
Square deal: Lower bounds and improved relaxations for tensor recovery
 CoRR
"... Recovering a lowrank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially subopt ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Recovering a lowrank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially suboptimal: reliably recovering a Kway tensor of length n and Tucker rank r from Gaussian measurements requires Ω(rnK−1) observations. In contrast, a certain (intractable) nonconvex formulation needs only O(rK+nrK) observations. We introduce a very simple, new convex relaxation, which partially bridges this gap. Our new formulation succeeds with O(rbK/2cndK/2e) observations. While these results pertain to Gaussian measurements, simulations strongly suggest that the new norm also outperforms the sum of nuclear norms for tensor completion from a random subset of entries. Our lower bound for the sumofnuclearnorms model follows from a new result on recovering signals with multiple sparse structures (e.g. sparse, low rank), which perhaps surprisingly demonstrates the significant suboptimality of the commonly used recovery approach via minimizing the sum of individual sparsity inducing norms (e.g. l1, nuclear norm). Our new formulation for lowrank tensor recovery however opens the possibility in reducing the sample complexity by exploiting several structures jointly. 1
Statistical Algorithms and a Lower Bound for Detecting Planted Cliques
"... We introduce a framework for proving lower bounds on computational problems over distributions, based on defining a restricted class of algorithms called statistical algorithms. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
We introduce a framework for proving lower bounds on computational problems over distributions, based on defining a restricted class of algorithms called statistical algorithms. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution, rather than directly accessing samples. Our definition captures most natural algorithms of interest in theory and in practice, e.g., momentsbased methods, local search, standard iterative methods for convex optimization, MCMC and simulated annealing. Our definition and techniques are inspired by and generalize the statistical query model in learning theory [35]. For wellknown problems over distributions, we give lower bounds on the complexity of any statistical algorithm. These include an exponential lower bounds for moment maximization in R n, and a nearly optimal lower bound for detecting planted bipartite clique distributions (or planted dense subgraph distributions) when the planted clique has size O(n1/2−δ) for any constant δ> 0. Variants of the latter have been assumed to be hard to prove hardness for other problems and for cryptographic applications. Our lower bounds provide concrete evidence
Tensors: a Brief Introduction
, 2014
"... Tensor decompositions are at the core of many Blind Source Separation (BSS) algorithms, either explicitly or implicitly. In particular, the Canonical Polyadic (CP) tensor ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Tensor decompositions are at the core of many Blind Source Separation (BSS) algorithms, either explicitly or implicitly. In particular, the Canonical Polyadic (CP) tensor
A Tensor Approach to Learning Mixed Membership Community Models
"... Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with nonoverlapping communities such as the stochastic block model. In this paper, we remove this restriction, and provide guaranteed ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with nonoverlapping communities such as the stochastic block model. In this paper, we remove this restriction, and provide guaranteed community detection for a family of probabilistic network models with overlapping communities, termed as the mixed membership Dirichlet model, first introduced by Airoldi et al. (2008). This model allows for nodes to have fractional memberships in multiple communities and assumes that the community memberships are drawn from a Dirichlet distribution. Moreover, it contains the stochastic block model as a special case. We propose a unified approach to learning these models via a tensor spectral decomposition method. Our estimator is based on loworder moment tensor of the observed network, consisting of 3star counts. Our learning method is fast and is based on simple linear algebraic operations, e.g., singular value decomposition and tensor power iterations. We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method. As an important special case, our results match the best known scaling requirements for the (homogeneous) stochastic block model.
Lowrank Tensor Recovery via Iterative Hard
"... Abstract—We study recovery of lowrank tensors from a small number of measurements. A version of the iterative hard thresholding algorithm (TIHT) for the higher order singular value decomposition (HOSVD) is introduced. As a first step towards the analysis of the algorithm, we define a corresponding ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We study recovery of lowrank tensors from a small number of measurements. A version of the iterative hard thresholding algorithm (TIHT) for the higher order singular value decomposition (HOSVD) is introduced. As a first step towards the analysis of the algorithm, we define a corresponding tensor restricted isometry property (HOSVDTRIP) and show that Gaussian and Bernoulli random measurement ensembles satisfy it with high probability. I.
Semidefinite relaxations for best rank1 tensor approximations
 SIAM JOUNRAL ON MATRIX ANALYSIS AND APPLICATIONS
, 2014
"... ..."
(Show Context)
Blind Multilinear Identification
"... Abstract — We discuss a technique that allows blind recovery of signals or blind identification of mixtures in instances where such recovery or identification were previously thought to be impossible. These instances include: 1) closely located or highly correlated sources in antenna array processin ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract — We discuss a technique that allows blind recovery of signals or blind identification of mixtures in instances where such recovery or identification were previously thought to be impossible. These instances include: 1) closely located or highly correlated sources in antenna array processing; 2) highly correlated spreading codes in code division multiple access (CDMA) radio communication; and 3) nearly dependent spectra in fluorescence spectroscopy. These have important implications. In the case of antenna array processing, it allows for joint localization and extraction of multiple sources from the measurement of a noisy mixture recorded on multiple sensors in an entirely deterministic manner. In the case of CDMA, it allows the possibility of having a number of users larger than the spreading gain. In the case of fluorescence spectroscopy, it allows for detection of nearly identical chemical constituents. The proposed technique involves the solution of a bounded coherence lowrank multilinear approximation problem. We show that bounded coherence allows us to establish existence and uniqueness of the recovered solution. We will provide some statistical motivation for the approximation problem and discuss greedy approximation bounds. To provide the theoretical underpinnings for this technique, we develop a corresponding theory of sparse separable decompositions of functions, including notions of rank and nuclear norm that can be specialized to the usual ones for matrices and operators and also be applied to hypermatrices and tensors. Index Terms — Source separation, array signal processing, system identification, channel estimation, remote sensing, fluorescence, function approximation, harmonic analysis, greedy algorithms, inverse problems. I.
Characterizing Algebraic Invariants by Differential Radical Invariants ⋆
"... Abstract We prove that any invariant algebraic set of a given polynomial vector field can be algebraically represented by one polynomial and a finite set of its successive Lie derivatives. This socalled differential radical characterization relies on a sound abstraction of the reachable set of solu ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract We prove that any invariant algebraic set of a given polynomial vector field can be algebraically represented by one polynomial and a finite set of its successive Lie derivatives. This socalled differential radical characterization relies on a sound abstraction of the reachable set of solutions by the smallest variety that contains it. The characterization leads to a differential radical invariant proof rule that is sound and complete, which implies that invariance of algebraic equations over realclosed fields is decidable. Furthermore, the problem of generating invariant varieties is shown to be as hard as minimizing the rank of a symbolic matrix, and is therefore NPhard. We investigate symbolic linear algebra tools based on Gaussian elimination to efficiently automate the generation. The approach can, e.g., generate nontrivial algebraic invariant equations capturing the airplane behavior during takeoff or landing in longitudinal motion.
Low Rank Language Models for Small Training Sets
"... Abstract—Several language model smoothing techniques are available that are effective for a variety of tasks; however, training with small data sets is still difficult. This letter introduces the low rank language model, which uses a low rank tensor representation of joint probability distributions ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Several language model smoothing techniques are available that are effective for a variety of tasks; however, training with small data sets is still difficult. This letter introduces the low rank language model, which uses a low rank tensor representation of joint probability distributions for parametertying and optimizes likelihood under a rank constraint. It obtains lower perplexity than standard smoothing techniques when the training setissmallandalsoleadstoperplexityreductionwhenusedin domain adaptation via interpolation with a general, outofdomain model. Index Terms—Language model, low rank tensor. I.