Results 1  10
of
33
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 759 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
A Guide to the Literature on Learning Probabilistic Networks From Data
, 1996
"... This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the ..."
Abstract

Cited by 203 (0 self)
 Add to MetaCart
This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The presentation avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples. Keywords Bayesian networks, graphical models, hidden variables, learning, learning structure, probabilistic networks, knowledge discovery. I. Introduction Probabilistic networks or probabilistic gra...
An experimental comparison of several clustering and intialization methods
, 1998
"... We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the Kmeans algorithm, a ..."
Abstract

Cited by 99 (1 self)
 Add to MetaCart
(Show Context)
We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the Kmeans algorithm, and modelbased hierarchical agglomerative clustering. We learn naiveBayes models with a hidden root node, using highdimensional discretevariable data sets (both real and synthetic). We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization schemes on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of hierarchical agglomerative clustering. Although the methods are substantially different, they lead to learned models that are strikingly similar in quality. 1
An experimental comparison of modelbased clustering methods
, 2001
"... Abstract. We compare the three basic algorithms for modelbased clustering on highdimensional discretevariable datasets. All three algorithms use the same underlying model: a naiveBayes model with a hidden root node, also known as a multinomialmixture model. In the first part of the paper, we per ..."
Abstract

Cited by 62 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We compare the three basic algorithms for modelbased clustering on highdimensional discretevariable datasets. All three algorithms use the same underlying model: a naiveBayes model with a hidden root node, also known as a multinomialmixture model. In the first part of the paper, we perform an experimental comparison between three batch algorithms that learn the parameters of this model: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the Kmeans algorithm, and modelbased agglomerative clustering. We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization methods on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of agglomerative clustering. Although the methods are substantially different, they lead to learned models that are similar in quality.
Update rules for parameter estimation in Bayesian networks
, 1997
"... This paper reexamines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in online learning [12]. We provide a unified framework for parameter estimation that encompasses both online learning, where the model is co ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
(Show Context)
This paper reexamines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in online learning [12]. We provide a unified framework for parameter estimation that encompasses both online learning, where the model is continuously adapted to new data cases as they arrive, and the more traditional batch learning, where a preaccumulated set of samples is used in a onetime model selection process. In the batch case, our framework encompassesboth the gradient projection algorithm [2, 3] and the EM algorithm [14] for Bayesian networks. The framework also leads to new online and batch parameter update schemes, including a parameterized version of EM. We provide both empirical and theoretical results indicating that parameterized EM allows faster convergence to the maximum likelihood parameters than does standard EM. 1 Introduction Over the past few years, there has been a growing interest in the problem of le...
Accelerating EM for large databases
 Machine Learning
, 2001
"... The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often requires signi cant computational resources and has been dismissed as impractical for large databases. We presenttwo approaches that signi cantly reduce the ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often requires signi cant computational resources and has been dismissed as impractical for large databases. We presenttwo approaches that signi cantly reduce the computational cost of applying the EM algorithm to databases with a large number of cases, including databases with large dimensionality. Both approaches are based on partial Esteps for which we can use the results of Neal and Hinton (1998) to obtain the standard convergence guarantees of EM. The rst approach is a version of the incremental EM, described in Neal and Hinton (1998), which cycles through data cases in blocks. The number of cases in each block dramatically e ects the e ciency of the algorithm. We provide a method for selecting a near optimal block size. The second approach, which we call lazy EM, will, at scheduled iterations, evaluate the signi cance of each data case and then proceed for several iterations actively using only the signi cant cases. We demonstrate that both methods can signi cantly reduce computational costs through their application to highdimensional realworld and synthetic mixture modeling problems for large databases. Keywords: Expectation Maximization Algorithm, incremental EM, lazy EM, online EM, data blocking, mixture models, clustering.
On discriminative Bayesian network classifiers and logistic regression
 Machine Learning
"... Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheor ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic property. The property holds for naive Bayes but also for more complex structures such as treeaugmented naive Bayes (TAN) as well as for mixed diagnosticdiscriminative structures. Our results imply that for networks satisfying our property, the conditional likelihood cannot have local maxima so that the global maximum can be found by simple local optimization methods. We also show that if this property does not hold, then in general the conditional likelihood can have local, nonglobal maxima. We illustrate our theoretical results by empirical experiments with local optimization in a conditional naive Bayes model. Furthermore, we provide a heuristic strategy for pruning the number of parameters and relevant features in such models. For many data sets, we obtain good results with heavily pruned submodels containing many fewer parameters than the original naive Bayes model.
Collective Mining of Bayesian Networks from Distributed Heterogeneous Data
, 2002
"... We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local an ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
(Show Context)
We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local and nonlocal variables and transmits a subset of these observations to a central site. Another Bayesian network is learnt at the central site using the data transmitted from the local site. The local and central Bayesian networks are combined to obtain a collective Bayesian network, that models the entire data. Experimental results and theoretical justification that demonstrate the feasibility of our approach are presented.
Data Mining: Research Trends, Challenges, and Applications
 in Roughs Sets and Data Mining: Analysis of Imprecise Data
, 1997
"... Data mining is an interdisciplinary research area spanning severals disciplines such as database systems, machine learning, intelligent information systems, statistics, and expert systems. Data mining has evolved into an important and active area of research because of theoretical challenges and pra ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
(Show Context)
Data mining is an interdisciplinary research area spanning severals disciplines such as database systems, machine learning, intelligent information systems, statistics, and expert systems. Data mining has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) interesting and previously unknown knowledge from very large realworld databases. Many aspects of data mining have been investigated in several related fields. A unique but important aspect of the problem lies in the significance of needs to extend these studies to include the nature of the contents of the realworld databases. In this chapter, we discuss the theory and foundational issues in data mining, describe data mining methods and algorithms, and review data mining applications. Since a major focus of this book is on rough sets and its applications to database mining, one full section is devoted to summari...