Results 1  10
of
280
A Unifying Review of Linear Gaussian Models
, 1999
"... Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observa ..."
Abstract

Cited by 348 (18 self)
 Add to MetaCart
(Show Context)
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.
Tagging English Text with a Probabilistic Model
, 1994
"... In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used ..."
Abstract

Cited by 300 (0 self)
 Add to MetaCart
In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used a simple triclass Markov model and are looking for the best way to estimate the parameters of this model, depending on the kind and amount of training data provided. Two approaches in particular are compared and combined: using text that has been tagged by hand and computing relative frequency counts, using text without tags and training the model as a hidden Markov process, according to a Maximum Likelihood principle
Hidden Markov processes
 IEEE Trans. Inform. Theory
, 2002
"... Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finite ..."
Abstract

Cited by 259 (5 self)
 Add to MetaCart
(Show Context)
Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finitestate finitealphabet HMPs was expanded to HMPs with finite as well as continuous state spaces and a general alphabet. In particular, statistical properties and ergodic theorems for relative entropy densities of HMPs were developed. Consistency and asymptotic normality of the maximumlikelihood (ML) parameter estimator were proved under some mild conditions. Similar results were established for switching autoregressive processes. These processes generalize HMPs. New algorithms were developed for estimating the state, parameter, and order of an HMP, for universal coding and classification of HMPs, and for universal decoding of hidden Markov channels. These and other related topics are reviewed in this paper. Index Terms—Baum–Petrie algorithm, entropy ergodic theorems, finitestate channels, hidden Markov models, identifiability, Kalman filter, maximumlikelihood (ML) estimation, order estimation, recursive parameter estimation, switching autoregressive processes, Ziv inequality. I.
Learning String Edit Distance
, 1997
"... In many applications, it is necessary to determine the similarity of two strings. A widelyused notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic mo ..."
Abstract

Cited by 248 (2 self)
 Add to MetaCart
In many applications, it is necessary to determine the similarity of two strings. A widelyused notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string edit distance with nearly one fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.
Matching Hierarchical Structures Using Association Graphs
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... this article, please send email to: tpami@computer.org, and reference IEEECS Log Number 108453 ..."
Abstract

Cited by 217 (27 self)
 Add to MetaCart
(Show Context)
this article, please send email to: tpami@computer.org, and reference IEEECS Log Number 108453
The Maximum Clique Problem
, 1999
"... Contents 1 Introduction 2 1.1 Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Problem Formulations 4 2.1 Integer Programming Formulations . . . . . . . . . . . . . . . . . . . 5 2.2 Continuous Formulations . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Computation ..."
Abstract

Cited by 198 (21 self)
 Add to MetaCart
(Show Context)
Contents 1 Introduction 2 1.1 Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Problem Formulations 4 2.1 Integer Programming Formulations . . . . . . . . . . . . . . . . . . . 5 2.2 Continuous Formulations . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Computational Complexity 12 4 Bounds and Estimates 15 5 Exact Algorithms 19 5.1 Enumerative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 Exact Algorithms for the Unweighted Case . . . . . . . . . . . . . . 21 5.3 Exact Algorithms for the Weighted Case . . . . . . . . . . . . . . . . 25 6 Heuristics 27 6.1 Sequential Greedy Heuristics . . . . . . . . . . . . . . . . . . . . . . 28 6.2 Local Search Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.3 Advanced Search Heuristics . . . . . . . . . . . . . . . . . . . . . . . 30 6.3.1 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . 30 6.3.2 Neural networks . . . . . . . . . . . . . . . . . . . . . . . .
Training Tree Transducers
 IN HLTNAACL
, 2004
"... Many probabilistic models for natural language are now written in terms of hierarchical tree structure. Treebased modeling still lacks many of the standard tools taken for granted in (finitestate) stringbased modeling. The theory of tree transducer automata provides a possible framework to ..."
Abstract

Cited by 130 (11 self)
 Add to MetaCart
(Show Context)
Many probabilistic models for natural language are now written in terms of hierarchical tree structure. Treebased modeling still lacks many of the standard tools taken for granted in (finitestate) stringbased modeling. The theory of tree transducer automata provides a possible framework to draw on, as it has been worked out in an extensive literature. We motivate the use of tree transducers for natural language and address the training problem for probabilistic treetotree and treetostring transducers.
A Spectral Algorithm for Learning Hidden Markov Models
"... Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard; practitioners typically resort to search heuristics (such as the BaumWelch / EM algorithm) which suffer from ..."
Abstract

Cited by 120 (8 self)
 Add to MetaCart
(Show Context)
Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard; practitioners typically resort to search heuristics (such as the BaumWelch / EM algorithm) which suffer from the usual local optima issues. We prove that under a natural separation condition (roughly analogous to those considered for learning mixture models), there is an efficient and provably correct algorithm for learning HMMs. The sample complexity of the algorithm does not explicitly depend on the number of distinct (discrete) observations—it implicitly depends on this number through spectral properties of the underlying HMM. This makes the algorithm particularly applicable to settings with a large number of observations, such as those in natural language processing where the space of observation is sometimes the words in a language. The algorithm is also simple: it employs only a singular value decomposition and matrix multiplications. 1