Results 1  10
of
115
The Infinite Hidden Markov Model
 Machine Learning
, 2002
"... We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. Th ..."
Abstract

Cited by 488 (33 self)
 Add to MetaCart
We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying statetransition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infiniteconsider, for example, symbols being possible words appearing in English text.
Coupled hidden Markov models for complex action recognition
, 1996
"... We present algorithms for coupling and training hidden Markov models (HMMs) to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying twohanded actions. HMMs are perhaps the most successful framework in perceptual computing for modeling and ..."
Abstract

Cited by 368 (17 self)
 Add to MetaCart
We present algorithms for coupling and training hidden Markov models (HMMs) to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying twohanded actions. HMMs are perhaps the most successful framework in perceptual computing for modeling and classifying dynamic behaviors, popular because they offer dynamic time warping, a training algorithm, and a clear Bayesian semantics. However, the Markovian framework makes strong restrictive assumptions about the system generating the signalthat it is a single process having a small number of states and an extremely limited state memory. The singleprocess model is often inappropriate for vision (and speech) applications, resulting in low ceilings on model performance. Coupled HMMs provide an efficient way to resolve many of these problems, and offer superior training speeds, model likelihoods, and robustness to initial conditions. 1. Introduction Computer vision is turning to problems...
A Bayesian computer vision system for modeling human interactions
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... We describe a realtime computer vision and machine learning system for modeling and recognizing human behaviors in a visual surveillance task [1]. The system is particularly concerned with detecting when interactions between people occur and classifying the type of interaction. Examples of interes ..."
Abstract

Cited by 353 (6 self)
 Add to MetaCart
We describe a realtime computer vision and machine learning system for modeling and recognizing human behaviors in a visual surveillance task [1]. The system is particularly concerned with detecting when interactions between people occur and classifying the type of interaction. Examples of interesting interaction behaviors include following another person, altering one's path to meet another, and so forth. Our system combines topdown with bottomup information in a closed feedback loop, with both components employing a statistical Bayesian approach [2]. We propose and compare two different statebased learning architectures, namely, HMMs and CHMMs for modeling behaviors and interactions. The CHMM model is shown to work much more efficiently and accurately. Finally, to deal with the problem of limited training data, a synthetic ªAlifestyleº training system is used to develop flexible prior models for recognizing human interactions. We demonstrate the ability to use these a priori models to accurately classify real human behaviors and interactions with no additional tuning or training.
Turbo decoding as an instance of Pearl’s belief propagation algorithm
 IEEE Journal on Selected Areas in Communications
, 1998
"... Abstract—In this paper, we will describe the close connection between the now celebrated iterative turbo decoding algorithm of Berrou et al. and an algorithm that has been well known in the artificial intelligence community for a decade, but which is relatively unknown to information theorists: Pear ..."
Abstract

Cited by 309 (15 self)
 Add to MetaCart
Abstract—In this paper, we will describe the close connection between the now celebrated iterative turbo decoding algorithm of Berrou et al. and an algorithm that has been well known in the artificial intelligence community for a decade, but which is relatively unknown to information theorists: Pearl’s belief propagation algorithm. We shall see that if Pearl’s algorithm is applied to the “belief network ” of a parallel concatenation of two or more codes, the turbo decoding algorithm immediately results. Unfortunately, however, this belief diagram has loops, and Pearl only proved that his algorithm works when there are no loops, so an explanation of the excellent experimental performance of turbo decoding is still lacking. However, we shall also show that Pearl’s algorithm can be used to routinely derive previously known iterative, but suboptimal, decoding algorithms for a number of other errorcontrol systems, including Gallager’s
Bayesian Networks Without Tears
 AI MAGAZINE
, 1991
"... I give an introduction to Bayesian networks for AI researchers with a limited grounding in probability theory. Over the last few years, this method of reasoning using probabilities has become popular within the AI probability and uncertainty community. Indeed, it is probably fair to say that Bayesia ..."
Abstract

Cited by 236 (2 self)
 Add to MetaCart
I give an introduction to Bayesian networks for AI researchers with a limited grounding in probability theory. Over the last few years, this method of reasoning using probabilities has become popular within the AI probability and uncertainty community. Indeed, it is probably fair to say that Bayesian networks are to a large segment of the AIuncertainty community what resolution theorem proving is to the AIlogic community. Nevertheless, despite what seems to be their obvious importance, the ideas and techniques have not spread much beyond the research community responsible for them. This is probably because the ideas and techniques are not that easy to understand. I hope to rectify this situation by making Bayesian networks more accessible to the probabilistically unsophisticated.
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
 Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract

Cited by 178 (10 self)
 Add to MetaCart
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naiveBayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Probabilistic independence networks for hidden Markov probability models
, 1996
"... Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been develop ..."
Abstract

Cited by 167 (12 self)
 Add to MetaCart
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper contains a selfcontained review of the basic principles of PINs. It is shown that the wellknown forwardbackward (FB) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach.
Dependency networks for inference, collaborative filtering, and data visualization
 Journal of Machine Learning Research
"... We describe a graphical model for probabilistic relationshipsan alternative tothe Bayesian networkcalled a dependency network. The graph of a dependency network, unlike aBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of ..."
Abstract

Cited by 159 (10 self)
 Add to MetaCart
We describe a graphical model for probabilistic relationshipsan alternative tothe Bayesian networkcalled a dependency network. The graph of a dependency network, unlike aBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of conditional distributions, one for each nodegiven its parents. We identify several basic properties of this representation and describe a computationally e cient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative ltering (the task of predicting preferences), and the visualization of acausal predictive relationships.
Learning dynamic Bayesian networks
 Adaptive Processing of Sequences and Data Structures
, 1998
"... Bayesian networks are directed acyclic graphs that represent dependencies between variables in a probabilistic model. Many time series models, including the hidden Markov models (HMMs) used in speech recognition and Kalman filter models used in filtering and control applications, can be viewed as ex ..."
Abstract

Cited by 124 (0 self)
 Add to MetaCart
Bayesian networks are directed acyclic graphs that represent dependencies between variables in a probabilistic model. Many time series models, including the hidden Markov models (HMMs) used in speech recognition and Kalman filter models used in filtering and control applications, can be viewed as examples of dynamic Bayesian networks. We first provide a brief tutorial on learning and Bayesian networks. We then present some dynamic Bayesian networks that can capture much richer structure than HMMs and Kalman filters, including spatial and temporal multiresolution structure, distributed hidden state representations, and multiple switching linear regimes. While exact probabilistic inference is intractable in these networks, one can obtain tractable variational approximations which call as subroutines the forwardbackward and Kalman filter recursions. These approximations can be used to learn the model parameters...