Results 1 
9 of
9
On discriminative Bayesian network classifiers and logistic regression
 Machine Learning
"... Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheor ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic property. The property holds for naive Bayes but also for more complex structures such as treeaugmented naive Bayes (TAN) as well as for mixed diagnosticdiscriminative structures. Our results imply that for networks satisfying our property, the conditional likelihood cannot have local maxima so that the global maximum can be found by simple local optimization methods. We also show that if this property does not hold, then in general the conditional likelihood can have local, nonglobal maxima. We illustrate our theoretical results by empirical experiments with local optimization in a conditional naive Bayes model. Furthermore, we provide a heuristic strategy for pruning the number of parameters and relevant features in such models. For many data sets, we obtain good results with heavily pruned submodels containing many fewer parameters than the original naive Bayes model.
Graphical models and exponential families
 In Proceedings of the 14th Annual Conference on Uncertainty in Arti cial Intelligence (UAI98
, 1998
"... We provide a classification of graphical models according to their representation as subfamilies of exponential families. Undirected graphical models with no hidden variables are linear exponential families (LEFs), directed acyclic graphical models and chain graphs with no hidden variables, includin ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
We provide a classification of graphical models according to their representation as subfamilies of exponential families. Undirected graphical models with no hidden variables are linear exponential families (LEFs), directed acyclic graphical models and chain graphs with no hidden variables, including Bayesian networks with several families of local distributions, are curved exponential families (CEFs) and graphical models with hidden variables are stratified exponential families (SEFs). An SEF is a finite union of CEFs satisfying a frontier condition. In addition, we illustrate how one can automatically generate independence and nonindependence constraints on the distributions over the observable variables implied by a Bayesian network with hidden variables. The relevance of these results for model selection is examined. 1
Mixnets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables
, 2000
"... Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees (Moore, 1999). In this paper, we propose a kind of Bayesian network in which lowdimensional mixtures of Gaussians over different subsets of the domain’s variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrating how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications.
When discriminative learning of Bayesian network parameters is easy
 In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence
, 2003
"... Bayesian network models are widely used for discriminative prediction tasks such as classification. Usually their parameters are determined using 'unsupervised' methods such as maximization of the joint likelihood. The reason is often that it is unclear how to find the parameters maximizin ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Bayesian network models are widely used for discriminative prediction tasks such as classification. Usually their parameters are determined using 'unsupervised' methods such as maximization of the joint likelihood. The reason is often that it is unclear how to find the parameters maximizing the conditional (supervised) likelihood. We show how the discriminative learning problem can be solved efficiently for a large class of Bayesian network models, including the Naive Bayes (NB) and treeaugmented Naive Bayes (TAN) models. We do this by showing that under a certain general condition on the network structure, the discriminative learning problem is exactly equivalent to logistic regression with unconstrained convex parameter spaces. Hitherto this was known only for Naive Bayes models. Since logistic regression models have a concave loglikelihood surface, the global maximum can be easily found by local optimization methods. 1
Supervised Learning of Bayesian Network Parameters Made Easy
 Level Perspective on Branch Architecture Performance, IEEE Micro28
, 2002
"... Bayesian network models are widely used for supervised prediction tasks such as classification. Usually the parameters of such models are determined using `unsupervised' methods such as maximization of the joint likelihood. In many cases, the reason is that it is not clear how to find the param ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Bayesian network models are widely used for supervised prediction tasks such as classification. Usually the parameters of such models are determined using `unsupervised' methods such as maximization of the joint likelihood. In many cases, the reason is that it is not clear how to find the parameters maximizing the supervised (conditional) likelihood. We show how the supervised learning problem can be solved e#ciently for a large class of Bayesian network models, including the Naive Bayes (NB) and treeaugmented NB (TAN) classifiers. We do this by showing that under a certain general condition on the network structure, the supervised learning problem is exactly equivalent to logistic regression. Hitherto this was known only for Naive Bayes models. Since logistic regression models have a concave loglikelihood surface, the global maximum can be easily found by local optimization methods.
Fast Factored Density Estimation and Compression with Bayesian Networks
"... Gaussian mixture models, compression, interpolating density trees, conditional density estimation To my family  especially my father, Donald. iv Many important data analysis tasks can be addressed by formulating them as probability estimation problems. For example, a popular general approach to au ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Gaussian mixture models, compression, interpolating density trees, conditional density estimation To my family  especially my father, Donald. iv Many important data analysis tasks can be addressed by formulating them as probability estimation problems. For example, a popular general approach to automatic classication problems is to learn a probabilistic model of each class from data in which the classes are known, and then use Bayes's rule with these models to predict the correct classes of other data for which they are not known. Anomaly detection and scientic discovery tasks can often be addressed by learning probability models over possible events and then looking for events to which these models assign low probabilities. Many data compression algorithms such as Human coding and arithmetic coding rely on probabilistic models of the data
DRAFT DRAFT Tied and Regularized Conditional Gaussian Graphical Models for Acoustic Modeling in ASR
, 2006
"... Most automatic speech recognition (ASR) systems express probability densities over sequences of acoustic feature vectors using Gaussian or Gaussianmixture hidden Markov models. In this chapter, we explore how graphical models can help describe a variety of tied (i.e., parameter shared) and regulari ..."
Abstract
 Add to MetaCart
Most automatic speech recognition (ASR) systems express probability densities over sequences of acoustic feature vectors using Gaussian or Gaussianmixture hidden Markov models. In this chapter, we explore how graphical models can help describe a variety of tied (i.e., parameter shared) and regularized Gaussian mixture systems. Unlike many previous such tied systems, however, here we allow subportions of the Gaussians to be tied in arbitrary ways. The space of such models includes regularized, tied, and adaptive versions of mixture conditional Gaussian models and also a regularized version of maximumlikelihood linear regression (MLLR). We derive expectationmaximization (EM) update equations and explore consequences to the training algorithm under relevant variants of the equations. In particular, we find that for certain combinations of regularization and/or tying, it is no longer the case that we may achieve a closedform analytic solution to the EM update equations. We describe, however, a generalized EM (GEM) procedure that will still increase the likelihood and has the same fixedpoints as the standard EM algorithm.
Mixnets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables
"... Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution�dtrees ( ..."
Abstract
 Add to MetaCart
(Show Context)
Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution�dtrees (Moore, 1999). In this paper, we propose a kind of Bayesian network in which lowdimensional mixtures of Gaussians over different subsets of the domain’s variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrating how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications. 1
DISTRIBUTION STATEMENT A Approved for Public Release Distribution Unlimited DTIC QUALITY INSPECTED 2
, 2000
"... Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtree ..."
Abstract
 Add to MetaCart
Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees (Moore, 1999). In this paper, we propose a kind of Bayesian network in which lowdimensional mixtures of Gaussians over different subsets of the domain's variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modelling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrating how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as their possible application to anomaly detection, classification, probabilistic inference, and compression. 1