Results 1 - 10
of
29
Tensor decompositions for learning latent variable models
, 2014
"... This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their low-order observable mo ..."
Abstract
-
Cited by 83 (7 self)
- Add to MetaCart
(Show Context)
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthog-onal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a ro-bust and computationally tractable estimation approach for several popular latent variable models.
Experiments with Spectral Learning of Latent-Variable PCFGs
"... Latent-variable PCFGs (L-PCFGs) are a highly successful model for natural language parsing. Recent work (Cohen et al., 2012) has introduced a spectral algorithm for parameter estimation of L-PCFGs, which—unlike the EM algorithm—is guaranteed to give consistent parameter estimates (it has PAC-style g ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
(Show Context)
Latent-variable PCFGs (L-PCFGs) are a highly successful model for natural language parsing. Recent work (Cohen et al., 2012) has introduced a spectral algorithm for parameter estimation of L-PCFGs, which—unlike the EM algorithm—is guaranteed to give consistent parameter estimates (it has PAC-style guarantees of sample complexity). This paper describes experiments using the spectral algorithm. We show that the algorithm provides models with the same accuracy as EM, but is an order of magnitude more efficient. We describe a number of key steps used to obtain this level of performance; these should be relevant to other work on the application of spectral learning algorithms. We view our results as strong empirical evidence for the viability of spectral methods as an alternative to EM. 1
Low-Rank Tensors for Scoring Dependency Structures
"... Accurate scoring of syntactic structures such as head-modifier arcs in dependency parsing typically requires rich, high-dimensional feature representations. A small subset of such features is often se-lected manually. This is problematic when features lack clear linguistic meaning as in embeddings o ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
(Show Context)
Accurate scoring of syntactic structures such as head-modifier arcs in dependency parsing typically requires rich, high-dimensional feature representations. A small subset of such features is often se-lected manually. This is problematic when features lack clear linguistic meaning as in embeddings or when the information is blended across features. In this paper, we use tensors to map high-dimensional fea-ture vectors into low dimensional repre-sentations. We explicitly maintain the pa-rameters as a low-rank tensor to obtain low dimensional representations of words in their syntactic roles, and to leverage mod-ularity in the tensor for easy training with online algorithms. Our parser consistently outperforms the Turbo and MST parsers across 14 different languages. We also ob-tain the best published UAS results on 5 languages.1 1
Spectral Learning of General Weighted Automata via Constrained Matrix Completion
"... Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently propo ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1
Spectral Experts for Estimating Mixtures of Linear Regressions
"... Discriminative latent-variable models are typically learned using EM or gradient-based optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
(Show Context)
Discriminative latent-variable models are typically learned using EM or gradient-based optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative latentvariable model. Our approach relies on a lowrank linear regression to recover a symmetric tensor, which can be factorized into the parameters using a tensor power method. We prove rates of convergence for our estimator and provide an empirical evaluation illustrating its strengths relative to local optimization (EM). 1.
Spectral dependency parsing with latent variables
- In EMNLP-CoNLL
, 2012
"... Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local minima. In this paper, we present a novel extension of this class of spectral methods to learn dependency tree structures. We propose a simple yet powerful latent variable generative model for dependency parsing, and a spectral learning method to efficiently estimate it. As a pilot experimental evaluation, we use the spectral tree probabilities estimated by our model to re-rank the outputs of a near state-of-theart parser. Our approach gives us a moderate reduction in error of up to 4.6 % over the baseline re-ranker. 1
Unsupervised spectral learning of WCFG as low-rank matrix completion
- In Proceedings of EMNLP
, 2013
"... We derive a spectral method for unsupervised learning of Weighted Context Free Grammars. We frame WCFG induction as finding a Han-kel matrix that has low rank and is linearly constrained to represent a function computed by inside-outside recursions. The proposed al-gorithm picks the grammar that agr ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
We derive a spectral method for unsupervised learning of Weighted Context Free Grammars. We frame WCFG induction as finding a Han-kel matrix that has low rank and is linearly constrained to represent a function computed by inside-outside recursions. The proposed al-gorithm picks the grammar that agrees with a sample and is the simplest with respect to the nuclear norm of the Hankel matrix. 1
Nonconvex Global Optimization for Latent-Variable Models
, 2013
"... Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method would eventually find the global maximum (up to a user-specified ɛ) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branch-and-bound. Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison
"... Probabilistic latent-variable models are a power-ful tool for modelling structured data. However, traditional expectation-maximization methods of learning such models are both computationally expensive and prone to local-minima. In contrast to these traditional methods, recently developed learning a ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Probabilistic latent-variable models are a power-ful tool for modelling structured data. However, traditional expectation-maximization methods of learning such models are both computationally expensive and prone to local-minima. In contrast to these traditional methods, recently developed learning algorithms based upon the method of moments are both computationally efficient and provide strong statistical guarantees. In this work we provide a unified presentation and empiri-cal comparison of three general moment-based methods in the context of modelling stochastic languages. By rephrasing these methods upon a common theoretical ground, introducing novel theoretical results where necessary, we provide a clear comparison, making explicit the statisti-cal assumptions upon which each method relies. With this theoretical grounding, we then provide an in-depth empirical analysis of the methods on both real and synthetic data with the goal of elu-cidating performance trends and highlighting im-portant implementation details. 1.
Spectral Learning of Refinement HMMs
, 2013
"... We derive a spectral algorithm for learning the parameters of a refinement HMM. This method is simple, efficient, and can be applied to a wide range of supervised sequence labeling tasks. Like other spectral methods, it avoids the problem of local optima and provides a consistent estimate of the par ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We derive a spectral algorithm for learning the parameters of a refinement HMM. This method is simple, efficient, and can be applied to a wide range of supervised sequence labeling tasks. Like other spectral methods, it avoids the problem of local optima and provides a consistent estimate of the parameters. Our experiments on a phoneme recognition task show that when equipped with informative feature functions, it performs significantly better than a supervised HMM and competitively with EM.