Results 11  20
of
624
Hierarchical topic models and the nested Chinese restaurant process
 Advances in Neural Information Processing Systems
, 2004
"... We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested ..."
Abstract

Cited by 279 (32 self)
 Add to MetaCart
(Show Context)
We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1
Probabilistic independence networks for hidden Markov probability models
, 1996
"... Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been develop ..."
Abstract

Cited by 191 (13 self)
 Add to MetaCart
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper contains a selfcontained review of the basic principles of PINs. It is shown that the wellknown forwardbackward (FB) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach.
Adaptive Probabilistic Networks with Hidden Variables
 Machine Learning
, 1997
"... . Probabilistic networks (also known as Bayesian belief networks) allow a compact description of complex stochastic relationships among several random variables. They are rapidly becoming the tool of choice for uncertain reasoning in artificial intelligence. In this paper, we investigate the problem ..."
Abstract

Cited by 185 (10 self)
 Add to MetaCart
. Probabilistic networks (also known as Bayesian belief networks) allow a compact description of complex stochastic relationships among several random variables. They are rapidly becoming the tool of choice for uncertain reasoning in artificial intelligence. In this paper, we investigate the problem of learning probabilistic networks with known structure and hidden variables. This is an important problem, because structure is much easier to elicit from experts than numbers, and the world is rarely fully observable. We present a gradientbased algorithmand show that the gradient can be computed locally, using information that is available as a byproduct of standard probabilistic network inference algorithms. Our experimental results demonstrate that using prior knowledge about the structure, even with hidden variables, can significantly improve the learning rate of probabilistic networks. We extend the method to networks in which the conditional probability tables are described using a ...
Probability product kernels
 Journal of Machine Learning Research
, 2004
"... The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions. In the probability product kernel, data points in the input space are mapped to distributions over the sample space and a general inner product i ..."
Abstract

Cited by 179 (9 self)
 Add to MetaCart
(Show Context)
The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions. In the probability product kernel, data points in the input space are mapped to distributions over the sample space and a general inner product is then evaluated as the integral of the product of pairs of distributions. The kernel is straightforward to evaluate for all exponential family models such as multinomials and Gaussians and yields interesting nonlinear kernels. Furthermore, the kernel is computable in closed form for latent distributions such as mixture models, hidden Markov models and linear dynamical systems. For intractable models, such as switching linear dynamical systems, structured meanfield approximations can be brought to bear on the kernel evaluation. For general distributions, even if an analytic expression for the kernel is not feasible, we show a straightforward sampling method to evaluate it. Thus, the kernel permits discriminative learning methods, including support vector machines, to exploit the properties, metrics and invariances of the generative models we infer from each datum. Experiments are shown using multinomial models for text, hidden Markov models for biological data sets and linear dynamical systems for time series data.
Variational learning for switching statespace models
 Neural Computation
, 1998
"... We introduce a new statistical model for time series which iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time series models  hidden Ma ..."
Abstract

Cited by 170 (5 self)
 Add to MetaCart
(Show Context)
We introduce a new statistical model for time series which iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time series models  hidden Markov models and linear dynamical systems  and is closely related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network (Jacobs et al., 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact Expectation Maximization (EM) algorithm cannot be applied. However, we present a variational approximation that maximizes a lower bound on the log likelihood and makes use of both the forwardbackward recursions for hidden Markov models and the Kalman lter recursions for linear dynamical systems. We tested the algorithm both on artificial data sets and on a natural data set of respiration force from a patient with sleep apnea. The results suggest that variational approximations are a viable method for inference and learning in switching statespace models.
Learning Dynamic Bayesian Networks
 In Adaptive Processing of Sequences and Data Structures, Lecture Notes in Artificial Intelligence
, 1998
"... Suppose we wish to build a model of data from a finite sequence of ordered observations, {Y1, Y2,..., Yt}. In most realistic scenarios, from modeling stock prices to physiological data, the observations are not related deterministically. Furthermore, there is added uncertainty resulting from the li ..."
Abstract

Cited by 168 (0 self)
 Add to MetaCart
(Show Context)
Suppose we wish to build a model of data from a finite sequence of ordered observations, {Y1, Y2,..., Yt}. In most realistic scenarios, from modeling stock prices to physiological data, the observations are not related deterministically. Furthermore, there is added uncertainty resulting from the limited size of our data set and any mismatch between our model and the true process. Probability theory provides a powerful tool for expressing both randomness and uncertainty in our model [23]. We can express the uncertainty in our prediction of the future outcome Yt+l via a probability density P(Yt+llY1,..., Yt). Such a probability density can then be used to make point predictions, define error bars, or make decisions that are expected to minimize some loss function. This chapter presents a probabilistic framework for learning models of temporal data. We express these models using the Bayesian network formalism (a.k.a. probabilistic graphical models or belief networks)a marriage of probability theory and graph theory in which dependencies between variables are expressed graphically. The graph not only allows the user to understand which variables
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data
 IN ICML
, 2004
"... In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain cond ..."
Abstract

Cited by 167 (13 self)
 Add to MetaCart
(Show Context)
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain conditional random fields (CRFs) in which each time slice contains a set of state variables and edgesa distributed state representation as in dynamic Bayesian networks (DBNs)and parameters are tied across slices. Since exact
Policy Recognition in the Abstract Hidden Markov Model
 Journal of Artificial Intelligence Research
, 2002
"... In this paper, we present a method for recognising an agent's behaviour in dynamic, noisy, uncertain domains, and across multiple levels of abstraction. We term this problem online plan recognition under uncertainty and view it generally as probabilistic inference on the stochastic process rep ..."
Abstract

Cited by 161 (25 self)
 Add to MetaCart
(Show Context)
In this paper, we present a method for recognising an agent's behaviour in dynamic, noisy, uncertain domains, and across multiple levels of abstraction. We term this problem online plan recognition under uncertainty and view it generally as probabilistic inference on the stochastic process representing the execution of the agent's plan. Our contributions in this paper are twofold. In terms of probabilistic inference, we introduce the Abstract Hidden Markov Model (AHMM), a novel type of stochastic processes, provide its dynamic Bayesian network (DBN) structure and analyse the properties of this network. We then describe an application of the RaoBlackwellised Particle Filter to the AHMM which allows us to construct an ecient, hybrid inference method for this model. In terms of plan recognition, we propose a novel plan recognition framework based on the AHMM as the plan execution model. The RaoBlackwellised hybrid inference for AHMM can take advantage of the independence properties inherent in a model of plan execution, leading to an algorithm for online probabilistic plan recognition that scales well with the number of levels in the plan hierarchy. This illustrates that while stochastic models for plan execution can be complex, they exhibit special structures which, if exploited, can lead to efficient plan recognition algorithms. We demonstrate the usefulness of the AHMM framework via a behaviour recognition system in a complex spatial environment using distributed video surveillance data.
Multiresolution markov models for signal and image processing
 Proceedings of the IEEE
, 2002
"... This paper reviews a significant component of the rich field of statistical multiresolution (MR) modeling and processing. These MR methods have found application and permeated the literature of a widely scattered set of disciplines, and one of our principal objectives is to present a single, coheren ..."
Abstract

Cited by 154 (19 self)
 Add to MetaCart
This paper reviews a significant component of the rich field of statistical multiresolution (MR) modeling and processing. These MR methods have found application and permeated the literature of a widely scattered set of disciplines, and one of our principal objectives is to present a single, coherent picture of this framework. A second goal is to describe how this topic fits into the even larger field of MR methods and concepts–in particular making ties to topics such as wavelets and multigrid methods. A third is to provide several alternate viewpoints for this body of work, as the methods and concepts we describe intersect with a number of other fields. The principle focus of our presentation is the class of MR Markov processes defined on pyramidally organized trees. The attractiveness of these models stems from both the very efficient algorithms they admit and their expressive power and broad applicability. We show how a variety of methods and models relate to this framework including models for selfsimilar and 1/f processes. We also illustrate how these methods have been used in practice. We discuss the construction of MR models on trees and show how questions that arise in this context make contact with wavelets, state space modeling of time series, system and parameter identification, and hidden
MachineLearning Research  Four Current Directions
"... Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up super ..."
Abstract

Cited by 144 (1 self)
 Add to MetaCart
Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up supervised learning algorithms, (c) reinforcement learning, and (d) learning complex stochastic models.