Results 1  10
of
54
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 598 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
The Bayes Net Toolbox for MATLAB
 Computing Science and Statistics
, 2001
"... The Bayes Net Toolbox (BNT) is an opensource Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models. BNT is widely used in teaching and research: the ..."
Abstract

Cited by 186 (1 self)
 Add to MetaCart
(Show Context)
The Bayes Net Toolbox (BNT) is an opensource Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models. BNT is widely used in teaching and research: the web page has received over 28,000 hits since May 2000. In this paper, we discuss a broad spectrum of issues related to graphical models (directed and undirected), and describe, at a highlevel, how BNT was designed to cope with them all. We also compare BNT to other software packages for graphical models, and to the nascent OpenBayes effort.
SpaceAlternating Generalized ExpectationMaximization Algorithm
 IEEE Trans. Signal Processing
, 1994
"... The expectationmaximization (EM) method can facilitate maximizing likelihood functions that arise in statistical estimation problems. In the classical EM paradigm, one iteratively maximizes the conditional loglikelihood of a single unobservable complete data space, rather than maximizing the intra ..."
Abstract

Cited by 137 (26 self)
 Add to MetaCart
(Show Context)
The expectationmaximization (EM) method can facilitate maximizing likelihood functions that arise in statistical estimation problems. In the classical EM paradigm, one iteratively maximizes the conditional loglikelihood of a single unobservable complete data space, rather than maximizing the intractable likelihood function for the measured or incomplete data. EM algorithms update all parameters simultaneously, which has two drawbacks: 1) slow convergence, and 2) difficult maximization steps due to coupling when smoothness penalties are used. This paper describes the spacealternating generalized EM (SAGE) method, which updates the parameters sequentially by alternating between several small hiddendata spaces defined by the algorithm designer. We prove that the sequence of estimates monotonically increases the penalizedlikelihood objective, we derive asymptotic convergence rates, and we provide sufficient conditions for monotone convergence in norm. Two signal processing applicatio...
Parameter expansion to accelerate EM: The PXEM algorithm
, 1998
"... The EM algorithm and its extensions are popular tools for modal estimation but are often criticised for their slow convergence. We propose a new method that can often make EM much faster. The intuitive idea is to use a 'covariance adjustment ' to correct the analysis of the M step, capital ..."
Abstract

Cited by 46 (7 self)
 Add to MetaCart
(Show Context)
The EM algorithm and its extensions are popular tools for modal estimation but are often criticised for their slow convergence. We propose a new method that can often make EM much faster. The intuitive idea is to use a 'covariance adjustment ' to correct the analysis of the M step, capitalising on extra information captured in the imputed complete data. The way we accomplish this is by parameter expansion; we expand the completedata model while preserving the observeddata model and use the expanded completedata model to generate EM. This parameterexpanded EM, PXEM, algorithm shares the simplicity and stability of ordinary EM, but has a faster rate of convergence since its M step performs a more efficient analysis. The PXEM algorithm is illustrated for the multivariate t distribution, a random effects model, factor analysis, probit regression and a Poisson imaging model.
Optimization with EM and ExpectationConjugateGradient
, 2003
"... We show a close relationship between the Expectation  Maximization (EM) algorithm and direct optimization algorithms such as gradientbased methods for parameter learning. ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
(Show Context)
We show a close relationship between the Expectation  Maximization (EM) algorithm and direct optimization algorithms such as gradientbased methods for parameter learning.
A Componentwise EM Algorithm for Mixtures
, 1999
"... In some situations, EM algorithm shows slow convergence problems. One possible reason is that standard procedures update the parameters simultaneously. In this paper we focus on nite mixture estimation. In this framework, we propose a componentwise EM, which updates the parameters sequentially. We ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
In some situations, EM algorithm shows slow convergence problems. One possible reason is that standard procedures update the parameters simultaneously. In this paper we focus on nite mixture estimation. In this framework, we propose a componentwise EM, which updates the parameters sequentially. We give an interpretation of this procedure as a proximal point algorithm and use it to prove the convergence. Illustrative numerical experiments show how our algorithm compares to EM and a version of the SAGE algorithm.
Accelerating EM for large databases
 Machine Learning
, 2001
"... The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often requires signi cant computational resources and has been dismissed as impractical for large databases. We presenttwo approaches that signi cantly reduce the ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often requires signi cant computational resources and has been dismissed as impractical for large databases. We presenttwo approaches that signi cantly reduce the computational cost of applying the EM algorithm to databases with a large number of cases, including databases with large dimensionality. Both approaches are based on partial Esteps for which we can use the results of Neal and Hinton (1998) to obtain the standard convergence guarantees of EM. The rst approach is a version of the incremental EM, described in Neal and Hinton (1998), which cycles through data cases in blocks. The number of cases in each block dramatically e ects the e ciency of the algorithm. We provide a method for selecting a near optimal block size. The second approach, which we call lazy EM, will, at scheduled iterations, evaluate the signi cance of each data case and then proceed for several iterations actively using only the signi cant cases. We demonstrate that both methods can signi cantly reduce computational costs through their application to highdimensional realworld and synthetic mixture modeling problems for large databases. Keywords: Expectation Maximization Algorithm, incremental EM, lazy EM, online EM, data blocking, mixture models, clustering.
Adaptive Overrelaxed Bound Optimization Methods
 In Proceedings of International Conference on Machine Learning, ICML. International Conference on Machine Learning, ICML
, 2003
"... We study a class of overrelaxed bound optimization algorithms, and their relationship to standard bound optimizers, such as ExpectationMaximization, Iterative Scaling, CCCP and NonNegative Matrix Factorization. We provide a theoretical analysis of the convergence properties of these optimizer ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
(Show Context)
We study a class of overrelaxed bound optimization algorithms, and their relationship to standard bound optimizers, such as ExpectationMaximization, Iterative Scaling, CCCP and NonNegative Matrix Factorization. We provide a theoretical analysis of the convergence properties of these optimizers and identify analytic conditions under which they are expected to outperform the standard versions. Based on this analysis, we propose a novel, simple adaptive overrelaxed scheme for practical optimization and report empirical results on several synthetic and realworld data sets showing that these new adaptive methods exhibit superior performance (in certain cases by several orders of magnitude) compared to their traditional counterparts. Our "dropin" extensions are simple to implement, apply to a wide variety of algorithms, almost always give a substantial speedup, and do not require any theoretical analysis of the underlying algorithm.
Accelerated Quantification of Bayesian Networks with Incomplete Data
 In Proceedings of First International Conference on Knowledge Discovery and Data Mining
, 1995
"... Probabilistic expert systems based on Bayesian networks (BNs) require initial specification of both a qualitative graphical structure and quantitative assessment of conditional probability tables. This paper considers statistical batch learning of the probability tables on the basis of incomple ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
Probabilistic expert systems based on Bayesian networks (BNs) require initial specification of both a qualitative graphical structure and quantitative assessment of conditional probability tables. This paper considers statistical batch learning of the probability tables on the basis of incomplete data and expert knowledge. The EM algorithm with a generalized conjugate gradient acceleration method has been dedicated to quantification of BNs by maximum posterior likelihood estimation for a superclass of the recursive graphical models. This new class of models allows a great variety of local functional restrictions to be imposed on the statistical model, which hereby extents the control and applicability of the constructed method for quantifying BNs. Introduction The construction of probabilistic expert systems (Pearl 1988, Andreassen et al. 1989) based on Bayesian networks (BNs) is often a challenging process. It is typically divided into two parts: First the constructi...
Accelerating cyclic update algorithms for parameter estimation by pattern searches
 Neural Processing Letters
"... Abstract. A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. A wellknown drawback of this strategy is slow convergence in low noise conditions. We propose using socalled ..."
Abstract

Cited by 16 (9 self)
 Add to MetaCart
(Show Context)
Abstract. A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. A wellknown drawback of this strategy is slow convergence in low noise conditions. We propose using socalled pattern searches which consist of an exploratory phase followed by a line search. During the exploratory phase, a search direction is determined by combining the individual updates of all subproblems. The approach can be used to speed up several wellknown learning methods such as variational Bayesian learning (ensemble learning) and expectationmaximization algorithm with modest algorithmic modifications. Experimental results show that the proposed method is able to reduce the required convergence time by 60–85 % in realistic variational Bayesian learning problems.