Results 1  10
of
33
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 564 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control
 In OSDI
, 2004
"... building block for automated diagnosis and control ..."
Abstract

Cited by 190 (16 self)
 Add to MetaCart
building block for automated diagnosis and control
Selectivity Estimation using Probabilistic Models
, 2001
"... Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in costbased query optimization, query profiling, and approximate query answering. In this paper, ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in costbased query optimization, query profiling, and approximate query answering. In this paper, we show how probabilistic graphical models can be effectively used for this task as an accurate and compact approximation of the joint frequency distribution of multiple attributes across multiple relations. Probabilistic Relational Models (PRMs) are a recent development that extends graphical statistical models such as Bayesian Networks to relational domains. They represent the statistical dependencies between attributes within a table, and between attributes across foreignkey joins. We provide an efficient algorithm for constructing a PRM from a database, and show how a PRM can be used to compute selectivity estimates for a broad class of queries. One of the major contributions of this work is a unified framework for the estimation of queries involving both select and foreignkey join operations. Furthermore, our approach is not limited to answering a small set of predetermined queries; a single model can be used to effectively estimate the sizes of a wide collection of potential queries across multiple tables. We present results for our approach on several realworld databases. For both singletable multiattribute queries and a general class of selectjoin queries, our approach produces more accurate estimates than standard approaches to selectivity estimation, using comparable space and time.
Optimal reinsertion: A new search operator for accelerated and more accurate bayesian network structure learning
 In Proceedings of the 20th Intl. Conf. on Machine Learning
, 2003
"... We show how a conceptually simple search operator called Optimal Reinsertion can be applied to learning Bayesian Network structure from data. On each step we pick a node called the target. We delete all arcs entering or exiting the target. We then find, subject to some constraints, the globally opti ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
We show how a conceptually simple search operator called Optimal Reinsertion can be applied to learning Bayesian Network structure from data. On each step we pick a node called the target. We delete all arcs entering or exiting the target. We then find, subject to some constraints, the globally optimal combination of inarcs and outarcs with which to reinsert it. The heart of the paper is a new algorithm called ORSearch which allows each optimal reinsertion step to be computed efficiently on large datasets. Our empirical results compare Optimal Reinsertion against a highly tuned implementation of multirestart hill climbing. The results typically show one to two orders of magnitude speedup on a variety of datasets. They usually show better final results, both in terms of BDEU score and in modeling of future data drawn from the same distribution. 1. Bayesian Network Structure Search Given a dataset of R records and m categorical attributes, how can we find a Bayesian network structure that provides a good model of the data? Happily, the formulation of this question into a welldefined optimization problem is now fairly well understood (Heckerman et al., 1995; Cooper & Herskovits, 1992). However, finding the optimal solution is an NPcomplete problem (Chickering, 1996a). The computational issues in performing heuristic search in this space are also severe, even taking into account the numerous ingenious and effective innovations in recent years (e.g.
Active Learning of Causal Bayes Net Structure
, 2001
"... We propose a decision theoretic approach for deciding which interventions to perform so as to learn the causal structure of a model as quickly as possible. Without such interventions, it is impossible to distinguish between Markov equivalent models, even given infinite data. We perform online MCMC t ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
We propose a decision theoretic approach for deciding which interventions to perform so as to learn the causal structure of a model as quickly as possible. Without such interventions, it is impossible to distinguish between Markov equivalent models, even given infinite data. We perform online MCMC to estimate the posterior over graph structures, and use importance sampling to find the best action to perform at each step. We assume the data is discretevalued and fully observed.
Generalized Prioritized Sweeping
 Advances in Neural Information Processing Systems
, 1998
"... Prioritized sweeping is a modelbased reinforcement learning method that attempts to focus an agent's limited computational resources to achieve a good estimate of the value of environment states. To choose effectively where to spend a costly planning step, classic prioritized sweeping uses a simple ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Prioritized sweeping is a modelbased reinforcement learning method that attempts to focus an agent's limited computational resources to achieve a good estimate of the value of environment states. To choose effectively where to spend a costly planning step, classic prioritized sweeping uses a simple heuristic to focus computation on the states that are likely to have the largest errors. In this paper, we introduce generalized prioritized sweeping, a principled method for generating such estimates in a representationspecific manner. This allows us to extend prioritized sweeping beyond an explicit, statebased representation to deal with compact representations that are necessary for dealing with large state spaces. We apply this method for generalized model approximators (such as Bayesian networks), and describe preliminary experiments that compare our approach with classical prioritized sweeping. 1 Introduction In reinforcement learning, there is a tradeoff between spending time act...
Parameter priors for directed acyclic graphical models and the characterization of several probability distributions
 MICROSOFT RESEARCH, ADVANCED TECHNOLOGY DIVISION
, 1999
"... We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distri ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let W be an n × n, n ≥ 3, positivedefinite symmetric matrix of random variables and f(W) be a pdf of W. Then, f(W) is a Wishart distribution if and only if W11 − W12W −1 is independent 22 W ′ 12 of {W12, W22} for every block partitioning
Parallel estimation of distribution algorithms
, 2002
"... The thesis deals with the new evolutionary paradigm based on the concept of Estimation of Distribution Algorithms (EDAs) that use probabilistic model of promising solutions found so far to obtain new candidate solutions of optimized problem. There are six primary goals of this thesis: 1. Suggestion ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
The thesis deals with the new evolutionary paradigm based on the concept of Estimation of Distribution Algorithms (EDAs) that use probabilistic model of promising solutions found so far to obtain new candidate solutions of optimized problem. There are six primary goals of this thesis: 1. Suggestion of a new formal description of EDA algorithm. This high level concept can be used to compare the generality of various probabilistic models by comparing the properties of underlying mappings. Also, some convergence issues are discussed and theoretical ways for further improvements are proposed. 2. Development of new probabilistic model and methods capable of dealing with continuous parameters. The resulting Mixed Bayesian Optimization Algorithm (MBOA) uses a set of decision trees to express the probability model. Its main advantage against the mostly used IDEA and EGNA approach is its backward compatibility with discrete domains, so it is uniquely capable of learning linkage between mixed continuousdiscrete genes. MBOA handles the discretization of continuous parameters as an integral part of the learning process, which outperforms the histogrambased
Collective Mining of Bayesian Networks from Distributed Heterogeneous Data
, 2002
"... We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local an ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local and nonlocal variables and transmits a subset of these observations to a central site. Another Bayesian network is learnt at the central site using the data transmitted from the local site. The local and central Bayesian networks are combined to obtain a collective Bayesian network, that models the entire data. Experimental results and theoretical justification that demonstrate the feasibility of our approach are presented.
Aggregating Learned Probabilistic Beliefs
, 2001
"... We consider the task of aggregating beliefs of several experts. We assume that these beliefs are represented as probability distributions. We argue that the evaluation of any aggregation technique depends on the semantic context of this task. We propose a framework, in which we assume that nature ge ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
We consider the task of aggregating beliefs of several experts. We assume that these beliefs are represented as probability distributions. We argue that the evaluation of any aggregation technique depends on the semantic context of this task. We propose a framework, in which we assume that nature generates samples from a `true' distribution and different experts form their beliefs based on the subsets of the data they have a chance to observe. Naturally, the optimal aggregate distribution would be the one learned from the combined sample sets. Such a formulation leads to a natural way to measure the accuracy of the aggregation mechanism. We show that the wellknown aggregation operator LinOP is ideally suited for that task. We propose a LinOPbased learning algorithm, inspired by the techniques developed for Bayesian learning, which aggregates the experts' distributions represented as Bayesian networks. We show experimentally that this algorithm performs well in practice. 1