Results 1  10
of
29
Tensor decompositions for learning latent variable models
, 2014
"... This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable mo ..."
Abstract

Cited by 83 (7 self)
 Add to MetaCart
(Show Context)
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable moments (typically, of second and thirdorder). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.
Experiments with Spectral Learning of LatentVariable PCFGs
"... Latentvariable PCFGs (LPCFGs) are a highly successful model for natural language parsing. Recent work (Cohen et al., 2012) has introduced a spectral algorithm for parameter estimation of LPCFGs, which—unlike the EM algorithm—is guaranteed to give consistent parameter estimates (it has PACstyle g ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
(Show Context)
Latentvariable PCFGs (LPCFGs) are a highly successful model for natural language parsing. Recent work (Cohen et al., 2012) has introduced a spectral algorithm for parameter estimation of LPCFGs, which—unlike the EM algorithm—is guaranteed to give consistent parameter estimates (it has PACstyle guarantees of sample complexity). This paper describes experiments using the spectral algorithm. We show that the algorithm provides models with the same accuracy as EM, but is an order of magnitude more efficient. We describe a number of key steps used to obtain this level of performance; these should be relevant to other work on the application of spectral learning algorithms. We view our results as strong empirical evidence for the viability of spectral methods as an alternative to EM. 1
LowRank Tensors for Scoring Dependency Structures
"... Accurate scoring of syntactic structures such as headmodifier arcs in dependency parsing typically requires rich, highdimensional feature representations. A small subset of such features is often selected manually. This is problematic when features lack clear linguistic meaning as in embeddings o ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
Accurate scoring of syntactic structures such as headmodifier arcs in dependency parsing typically requires rich, highdimensional feature representations. A small subset of such features is often selected manually. This is problematic when features lack clear linguistic meaning as in embeddings or when the information is blended across features. In this paper, we use tensors to map highdimensional feature vectors into low dimensional representations. We explicitly maintain the parameters as a lowrank tensor to obtain low dimensional representations of words in their syntactic roles, and to leverage modularity in the tensor for easy training with online algorithms. Our parser consistently outperforms the Turbo and MST parsers across 14 different languages. We also obtain the best published UAS results on 5 languages.1 1
Spectral Learning of General Weighted Automata via Constrained Matrix Completion
"... Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently propo ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1
Spectral Experts for Estimating Mixtures of Linear Regressions
"... Discriminative latentvariable models are typically learned using EM or gradientbased optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Discriminative latentvariable models are typically learned using EM or gradientbased optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative latentvariable model. Our approach relies on a lowrank linear regression to recover a symmetric tensor, which can be factorized into the parameters using a tensor power method. We prove rates of convergence for our estimator and provide an empirical evaluation illustrating its strengths relative to local optimization (EM). 1.
Spectral dependency parsing with latent variables
 In EMNLPCoNLL
, 2012
"... Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local minima. In this paper, we present a novel extension of this class of spectral methods to learn dependency tree structures. We propose a simple yet powerful latent variable generative model for dependency parsing, and a spectral learning method to efficiently estimate it. As a pilot experimental evaluation, we use the spectral tree probabilities estimated by our model to rerank the outputs of a near stateoftheart parser. Our approach gives us a moderate reduction in error of up to 4.6 % over the baseline reranker. 1
Unsupervised spectral learning of WCFG as lowrank matrix completion
 In Proceedings of EMNLP
, 2013
"... We derive a spectral method for unsupervised learning of Weighted Context Free Grammars. We frame WCFG induction as finding a Hankel matrix that has low rank and is linearly constrained to represent a function computed by insideoutside recursions. The proposed algorithm picks the grammar that agr ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
We derive a spectral method for unsupervised learning of Weighted Context Free Grammars. We frame WCFG induction as finding a Hankel matrix that has low rank and is linearly constrained to represent a function computed by insideoutside recursions. The proposed algorithm picks the grammar that agrees with a sample and is the simplest with respect to the nuclear norm of the Hankel matrix. 1
Nonconvex Global Optimization for LatentVariable Models
, 2013
"... Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branchandbound. Our method would eventually find the global maximum (up to a userspecified ɛ) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximumlikelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branchandbound. Although these techniques do not yet provide a practical solution to our instance of this NPhard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison
"... Probabilistic latentvariable models are a powerful tool for modelling structured data. However, traditional expectationmaximization methods of learning such models are both computationally expensive and prone to localminima. In contrast to these traditional methods, recently developed learning a ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Probabilistic latentvariable models are a powerful tool for modelling structured data. However, traditional expectationmaximization methods of learning such models are both computationally expensive and prone to localminima. In contrast to these traditional methods, recently developed learning algorithms based upon the method of moments are both computationally efficient and provide strong statistical guarantees. In this work we provide a unified presentation and empirical comparison of three general momentbased methods in the context of modelling stochastic languages. By rephrasing these methods upon a common theoretical ground, introducing novel theoretical results where necessary, we provide a clear comparison, making explicit the statistical assumptions upon which each method relies. With this theoretical grounding, we then provide an indepth empirical analysis of the methods on both real and synthetic data with the goal of elucidating performance trends and highlighting important implementation details. 1.
Spectral Learning of Refinement HMMs
, 2013
"... We derive a spectral algorithm for learning the parameters of a refinement HMM. This method is simple, efficient, and can be applied to a wide range of supervised sequence labeling tasks. Like other spectral methods, it avoids the problem of local optima and provides a consistent estimate of the par ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We derive a spectral algorithm for learning the parameters of a refinement HMM. This method is simple, efficient, and can be applied to a wide range of supervised sequence labeling tasks. Like other spectral methods, it avoids the problem of local optima and provides a consistent estimate of the parameters. Our experiments on a phoneme recognition task show that when equipped with informative feature functions, it performs significantly better than a supervised HMM and competitively with EM.