Results 1  10
of
36
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 579 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Orderingbased search: A simple and effective algorithm for learning Bayesian networks
 In UAI
, 2005
"... One of the basic tasks for Bayesian networks (BNs) is that of learning a network structure from data. The BNlearning problem is NPhard, so the standard solution is heuristic search. Many approaches have been proposed for this task, but only a very small number outperform the baseline of greedy hill ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
One of the basic tasks for Bayesian networks (BNs) is that of learning a network structure from data. The BNlearning problem is NPhard, so the standard solution is heuristic search. Many approaches have been proposed for this task, but only a very small number outperform the baseline of greedy hillclimbing with tabu lists; moreover, many of the proposed algorithms are quite complex and hard to implement. In this paper, we propose a very simple and easytoimplement method for addressing this task. Our approach is based on the wellknown fact that the best network (of bounded indegree) consistent with a given node ordering can be found very efficiently. We therefore propose a search not over the space of structures, but over the space of orderings, selecting for each ordering the best network consistent with it. This search space is much smaller, makes more global search steps, has a lower branching factor, and avoids costly acyclicity checks. We present results for this algorithm on both synthetic and real data sets, evaluating both the score of the network found and in the running time. We show that orderingbased search outperforms the standard baseline, and is competitive with recent algorithms that are much harder to implement. 1
Optimal reinsertion: A new search operator for accelerated and more accurate bayesian network structure learning
 In Proceedings of the 20th Intl. Conf. on Machine Learning
, 2003
"... We show how a conceptually simple search operator called Optimal Reinsertion can be applied to learning Bayesian Network structure from data. On each step we pick a node called the target. We delete all arcs entering or exiting the target. We then find, subject to some constraints, the globally opti ..."
Abstract

Cited by 41 (6 self)
 Add to MetaCart
We show how a conceptually simple search operator called Optimal Reinsertion can be applied to learning Bayesian Network structure from data. On each step we pick a node called the target. We delete all arcs entering or exiting the target. We then find, subject to some constraints, the globally optimal combination of inarcs and outarcs with which to reinsert it. The heart of the paper is a new algorithm called ORSearch which allows each optimal reinsertion step to be computed efficiently on large datasets. Our empirical results compare Optimal Reinsertion against a highly tuned implementation of multirestart hill climbing. The results typically show one to two orders of magnitude speedup on a variety of datasets. They usually show better final results, both in terms of BDEU score and in modeling of future data drawn from the same distribution. 1. Bayesian Network Structure Search Given a dataset of R records and m categorical attributes, how can we find a Bayesian network structure that provides a good model of the data? Happily, the formulation of this question into a welldefined optimization problem is now fairly well understood (Heckerman et al., 1995; Cooper & Herskovits, 1992). However, finding the optimal solution is an NPcomplete problem (Chickering, 1996a). The computational issues in performing heuristic search in this space are also severe, even taking into account the numerous ingenious and effective innovations in recent years (e.g.
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Bayesian Network Analysis of Signaling Networks: A Primer
, 2005
"... Highthroughput proteomic data can be used to reveal the connectivity of signaling networks and the influences between signaling molecules. We present a primer on the use of Bayesian networks for this task. Bayesian networks have been successfully used to derive causal influences among biological si ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Highthroughput proteomic data can be used to reveal the connectivity of signaling networks and the influences between signaling molecules. We present a primer on the use of Bayesian networks for this task. Bayesian networks have been successfully used to derive causal influences among biological signaling molecules (for example, in the analysis of intracellular multicolor flow cytometry). We discuss ways to automatically derive a Bayesian network model from proteomic data and to interpret the resulting model.
The Information Bottleneck EM Algorithm
 In Proceedings of UAI
, 2003
"... Learning with hidden variables is a central challenge in probabilistic graphical models that has important implications for many reallife problems. The classical approach is using the Expectation Maximization (EM) algorithm. This algorithm, however, can get trapped in local maxima. In this pap ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Learning with hidden variables is a central challenge in probabilistic graphical models that has important implications for many reallife problems. The classical approach is using the Expectation Maximization (EM) algorithm. This algorithm, however, can get trapped in local maxima. In this paper we explore a new approach that is based on the Information Bottleneck principle.
The Latent Maximum Entropy Principle
 In Proc. of ISIT
, 2002
"... We present an extension to Jaynes' maximum entropy principle that handles latent variables. The principle of latent maximum entropy we propose is di#erent from both Jaynes' maximum entropy principle and maximum likelihood estimation, but often yields better estimates in the presence of h ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
We present an extension to Jaynes' maximum entropy principle that handles latent variables. The principle of latent maximum entropy we propose is di#erent from both Jaynes' maximum entropy principle and maximum likelihood estimation, but often yields better estimates in the presence of hidden variables and limited training data. We first show that solving for a latent maximum entropy model poses a hard nonlinear constrained optimization problem in general. However, we then show that feasible solutions to this problem can be obtained e#ciently for the special case of loglinear modelswhich forms the basis for an e#cient approximation to the latent maximum entropy principle. We derive an algorithm that combines expectationmaximization with iterative scaling to produce feasible loglinear solutions. This algorithm can be interpreted as an alternating minimization algorithm in the information divergence, and reveals an intimate connection between the latent maximum entropy and maximum likelihood principles.
Latent Tree Models and Approximate Inference in Bayesian Networks
 TO APPEAR, AAAI08
, 2008
"... We propose a novel method for approximate inference in Bayesian networks (BNs). The idea is to sample data from a BN, learn a latent tree model (LTM) from the data offline, and when online, make inference with the LTM instead of the original BN. Because LTMs are treestructured, inference takes line ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
We propose a novel method for approximate inference in Bayesian networks (BNs). The idea is to sample data from a BN, learn a latent tree model (LTM) from the data offline, and when online, make inference with the LTM instead of the original BN. Because LTMs are treestructured, inference takes linear time. In the meantime, they can represent complex relationship among leaf nodes and hence the approximation accuracy is often good. Empirical evidence shows that our method can achieve good approximation accuracy at low online computational cost.
Convex Structure Learning for Bayesian Networks: Polynomial Feature Selection and Approximate Ordering
, 2006
"... We present a new approach to learning the structure and parameters of a Bayesian network based on regularized estimation in an exponential family representation. Here we show that, given a fixed variable order, the optimal structure and parameters can be learned efficiently, even without restricting ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We present a new approach to learning the structure and parameters of a Bayesian network based on regularized estimation in an exponential family representation. Here we show that, given a fixed variable order, the optimal structure and parameters can be learned efficiently, even without restricting the size of the parent variable sets. We then consider the problem of optimizing the variable order for a given set of features. This is still a computationally hard problem, but we present a convex relaxation that yields an optimal “soft” ordering in polynomial time. One novel aspect of the approach is that we do not perform a discrete search over DAG structures, nor over variable orders, but instead solve a continuous convex relaxation that can then be rounded to obtain a valid network structure. We conduct an experimental comparison against standard structure search procedures over standard objectives, which cope with local minima, and evaluate the advantages of using convex relaxations that reduce the effects of local minima.
Comparing and Evaluating HMM Ensemble Training Algorithms Using Train and Test and Condition Number Criteria.
 Pattern Analysis and Applications
, 2004
"... Hidden Markov Models have many applications in signal processing and pattern recognition, but their convergencebased training algorithms are known to suffer from oversensitivity to the initial random model choice. This paper describes the boundary between regions in which ensemble learning is supe ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Hidden Markov Models have many applications in signal processing and pattern recognition, but their convergencebased training algorithms are known to suffer from oversensitivity to the initial random model choice. This paper describes the boundary between regions in which ensemble learning is superior to Rabiner’s multiplesequence BaumWelch training method, and proposes techniques for determining the best method in any arbitrary situation. It also studies the suitability of the training methods using the condition number, a recently proposed diagnostic tool for testing the quality of the model. A new method for training Hidden Markov Models Correspondence to: