Results 1  10
of
30
A Spectral Algorithm for Learning Hidden Markov Models
"... Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard; practitioners typically resort to search heuristics (such as the BaumWelch / EM algorithm) which suffer from ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard; practitioners typically resort to search heuristics (such as the BaumWelch / EM algorithm) which suffer from the usual local optima issues. We prove that under a natural separation condition (roughly analogous to those considered for learning mixture models), there is an efficient and provably correct algorithm for learning HMMs. The sample complexity of the algorithm does not explicitly depend on the number of distinct (discrete) observations—it implicitly depends on this number through spectral properties of the underlying HMM. This makes the algorithm particularly applicable to settings with a large number of observations, such as those in natural language processing where the space of observation is sometimes the words in a language. The algorithm is also simple: it employs only a singular value decomposition and matrix multiplications. 1
Learning Latent Tree Graphical Models
 J. of Machine Learning Research
, 2011
"... We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing me ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using socalled information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a preprocessing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighborjoining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare
Multiview learning of word embeddings via cca
 In Proc. of NIPS
, 2011
"... Recently, there has been substantial interest in using large amounts of unlabeled data to learn word representations which can then be used as features in supervised classifiers for NLP tasks. However, most current approaches are slow to train, do not model the context of the word, and lack theoreti ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Recently, there has been substantial interest in using large amounts of unlabeled data to learn word representations which can then be used as features in supervised classifiers for NLP tasks. However, most current approaches are slow to train, do not model the context of the word, and lack theoretical grounding. In this paper, we present a new learning method, Low Rank MultiView Learning (LRMVL) which uses a fast spectral method to estimate low dimensional contextspecific word representations from unlabeled data. These representation features can then be used with any supervised learner. LRMVL is extremely fast, gives guaranteed convergence to a global optimum, is theoretically elegant, and achieves stateoftheart performance on named entity recognition (NER) and chunking problems. 1 Introduction and Related Work Over the past decade there has been increased interest in using unlabeled data to supplement the labeled data in semisupervised learning settings to overcome the inherent data sparsity and get improved generalization accuracies in high dimensional domains like NLP. Approaches like [1, 2]
Kernel Embeddings of Latent Tree Graphical Models
"... Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and nonGaussian variables. Our method can recover the latent tree structures with provable guarantees and perform localminimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1
Spectral Methods for Learning Multivariate Latent Tree Structure
"... This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linearGaussian models, hidden Markov models, Gaussian mixture models, and Markov evol ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linearGaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottomup procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many highdimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from secondorder statistics. 1
Spectral dimensionality reduction for hmms
 CoRR
"... Hidden Markov Models (HMMs) can be accurately approximated using cooccurrence frequencies of pairs and triples of observations by using a fast spectral method Hsu et al. (2009) in contrast to the usual slow methods like EM or Gibbs sampling. We provide a new spectral method which significantly redu ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Hidden Markov Models (HMMs) can be accurately approximated using cooccurrence frequencies of pairs and triples of observations by using a fast spectral method Hsu et al. (2009) in contrast to the usual slow methods like EM or Gibbs sampling. We provide a new spectral method which significantly reduces the number of model parameters that need to be estimated, and generates a sample complexity that does not depend on the size of the observation vocabulary. We present an elementary proof giving bounds on the relative accuracy of probability estimates from our model. (Correlaries show our bounds can be weakened to provide either L1 bounds or KL bounds which provide easier direct comparisons to previous work.) Our theorem uses conditions that are checkable from the data, instead of putting conditions on the unobservable Markov transition matrix. 1
Twomanifold problems with applications to nonlinear system identification
 In Proc. 29th Intl. Conf. on Machine Learning (ICML
"... Recently, there has been much interest in spectral approaches to learning manifolds— socalled kernel eigenmap methods. These methods have had some successes, but their applicability is limited because they are not robust to noise. To address this limitation, we look at twomanifold problems, in whi ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Recently, there has been much interest in spectral approaches to learning manifolds— socalled kernel eigenmap methods. These methods have had some successes, but their applicability is limited because they are not robust to noise. To address this limitation, we look at twomanifold problems, in which we simultaneously reconstruct two related manifolds, each representing a different view of the same data. By solving these interconnected learning problems together, twomanifold algorithms are able to succeed where a nonintegrated approach would fail: each view allows us to suppress noise in the other, reducing bias. We propose a class of algorithms for twomanifold problems, based on spectral decomposition of crosscovariance operators in Hilbert space, and discuss when twomanifold problems are useful. Finally, we demonstrate that solving a twomanifold problem can aid in learning a nonlinear dynamical system from limited data. 1.
Spectral Learning of General Weighted Automata via Constrained Matrix Completion
"... Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently propo ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1
A spectral approach for probabilistic grammatical inference on trees
 In: Conference on Algorithmic Learning Theory (ALT
, 2010
"... Abstract. We focus on the estimation of a probability distribution over a set of trees. We consider here the class of distributions computed by weighted automata a strict generalization of probabilistic tree automata. This class of distributions (called rational distributions, or rational stochasti ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. We focus on the estimation of a probability distribution over a set of trees. We consider here the class of distributions computed by weighted automata a strict generalization of probabilistic tree automata. This class of distributions (called rational distributions, or rational stochastic tree languages RSTL) has an algebraic characterization: All the residuals (conditional) of such distributions lie in a finitedimensional vector subspace. We propose a methodology based on Principal Components Analysis to identify this vector subspace. We provide an algorithm that computes an estimate of the target residuals vector subspace and builds a model which computes an estimate of the target distribution. 1