Results 1  10
of
18
Dependency networks for inference, collaborative filtering, and data visualization
 Journal of Machine Learning Research
"... We describe a graphical model for probabilistic relationshipsan alternative tothe Bayesian networkcalled a dependency network. The graph of a dependency network, unlike aBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of ..."
Abstract

Cited by 157 (10 self)
 Add to MetaCart
We describe a graphical model for probabilistic relationshipsan alternative tothe Bayesian networkcalled a dependency network. The graph of a dependency network, unlike aBayesian network, is potentially cyclic. The probability component of a dependency network, like aBayesian network, is a set of conditional distributions, one for each nodegiven its parents. We identify several basic properties of this representation and describe a computationally e cient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative ltering (the task of predicting preferences), and the visualization of acausal predictive relationships.
Maximum Entropy Discrimination
, 1999
"... We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is ..."
Abstract

Cited by 124 (20 self)
 Add to MetaCart
We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under this class and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of classconditional models within this framework. Preliminary experimental results are indicative of the potential in these techniques.
On supervised selection of Bayesian networks
 In UAI99
, 1999
"... Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori known that the chosen model will be used in the future for prediction tasks involving more \focused " predictive distributions. Although focused predictive distributions can be produced from the joint probability distribution by marginalization, in practice the best model in the unsupervised sense does not necessarily perform well in supervised domains. In particular, the standard marginal likelihood score is a criterion for the unsupervised task, and, although frequently used for supervised model selection also, does not perform well in such tasks. In this paper we study the performance of the marginal likelihood score empirically in supervised Bayesian network selection tasks by using a large number of publicly available classi cation data sets, and compare the results to those obtained by alternative model selection criteria, including empirical crossvalidation methods, an approximation of a supervised marginal likelihood measure, and a supervised version of Dawid's prequential (predictive sequential) principle. The results demonstrate that the marginal likelihood score does not perform well for supervised model selection, while the best results are obtained by using Dawid's prequential approach.
On discriminative Bayesian network classifiers and logistic regression
 Machine Learning
"... Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheor ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic property. The property holds for naive Bayes but also for more complex structures such as treeaugmented naive Bayes (TAN) as well as for mixed diagnosticdiscriminative structures. Our results imply that for networks satisfying our property, the conditional likelihood cannot have local maxima so that the global maximum can be found by simple local optimization methods. We also show that if this property does not hold, then in general the conditional likelihood can have local, nonglobal maxima. We illustrate our theoretical results by empirical experiments with local optimization in a conditional naive Bayes model. Furthermore, we provide a heuristic strategy for pruning the number of parameters and relevant features in such models. For many data sets, we obtain good results with heavily pruned submodels containing many fewer parameters than the original naive Bayes model.
Classifier Learning with Supervised Marginal Likelihood
"... It has been argued that in supervised classification tasks it may be more sensible to perform model selection with respect to a more focused model selection score, like the supervised (conditional) marginal likelihood, than with respect to the standard unsupervised marginal likelihood criterion ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
It has been argued that in supervised classification tasks it may be more sensible to perform model selection with respect to a more focused model selection score, like the supervised (conditional) marginal likelihood, than with respect to the standard unsupervised marginal likelihood criterion. However, for most Bayesian network models, computing the supervised marginal likelihood score takes exponential time with respect to the amount of observed data. In this paper, we consider diagnostic Bayesian network classifiers where the significant model parameters represent conditional distributions for the class variable, given the values of the predictor variables, in which case the supervised marginal likelihood can be computed in linear time with respect to the data. As the number of model parameters grows in this case exponentially with respect to the number of predictors, we focus on simple diagnostic models where the number of relevant predictors is small, and suggest two approaches for applying this type of models in classification. The first approach is based on mixtures of simple diagnostic models, while in the second approach we apply the small predictor sets of the simple diagnostic models for augmenting the Naive Bayes classifier.
Towards Real Time Discovery from Distributed Information Sources
 In Second pacificasia conference,PAKKD98
, 1998
"... Many successful knowledge discovery or data mining techniques and systems have been developed. These techniques usually apply to centralized databases with less restricted requirements on learning and response time. Not so much effort yet has been put into mining of distributed databases and realti ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Many successful knowledge discovery or data mining techniques and systems have been developed. These techniques usually apply to centralized databases with less restricted requirements on learning and response time. Not so much effort yet has been put into mining of distributed databases and realtime issues. In this paper, we investigate issues of fast distributed data mining. We assume that merging the distributed database into a single one would be either be too costly (distributed case); or, the individual fragments are nonuniform so that mining only one fragment would bias the result (fragmented case). The goal is to classify objects O of the database into one of several mutually exclusive classes C i . Our approach to make mining fast and feasible is as follows. From each data site or fragment db k , only a single rule r ik is generated for each category C . A small subset {r i1 , ...,r ih } of these individual rules is selected to form a rule set R for each category C i . Thes...
Embedded Bayesian Network Classifiers
, 1997
"... Lowdimensional probability models for local distribution functions in a Bayesian network include decision trees, decision graphs, and causal independence models. We describe a new probability model for discrete Bayesian networks, which we call an embedded Bayesian network classifier or EBNC. The mo ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Lowdimensional probability models for local distribution functions in a Bayesian network include decision trees, decision graphs, and causal independence models. We describe a new probability model for discrete Bayesian networks, which we call an embedded Bayesian network classifier or EBNC. The model for a node Y given parents X is obtained from a (usually different) Bayesian network for Y and X in which X need not be the parents of Y . We show that an EBNC is a special case of a softmax polynomial regression model. Also, we show how to identify a nonredundant set of parameters for an EBNC, and describe an asymptotic approximation for learning the structure of Bayesian networks that contain EBNCs. Unlike the decision tree, decision graph, and causal independence models, we are unaware of a semantic justification for the use of these models. Experiments are needed to determine whether the models presented in this paper are useful in practice. Keywords: Bayesian networks, model dimen...
Mixnets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables
, 2000
"... Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees (Moore, 1999). In this paper, we propose a kind of Bayesian network in which lowdimensional mixtures of Gaussians over different subsets of the domain’s variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrating how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications.
Comparing Prequential Model Selection Criteria in Supervised Learning of Mixture Models
 Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics
, 2001
"... In this paper we study prequential model selection criteria in supervised learning domains. The main problem with this approach is the fact that the criterion is sensitive to the ordering the data is processed with. We discuss several approaches for addressing the ordering problem, and compare ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
In this paper we study prequential model selection criteria in supervised learning domains. The main problem with this approach is the fact that the criterion is sensitive to the ordering the data is processed with. We discuss several approaches for addressing the ordering problem, and compare empirically their performance in realworld supervised model selection tasks. The empirical results demonstrate that with the prequential approach it is quite easy to find predictive models that are significantly more accurate classifiers than the models found by the standard unsupervised marginal likelihood criterion. The results also suggest that averaging over random orderings may be a more sensible strategy for solving the ordering problem than trying to find the ordering optimizing the prequential model selection criterion. 1
When discriminative learning of Bayesian network parameters is easy
 In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence
, 2003
"... Bayesian network models are widely used for discriminative prediction tasks such as classification. Usually their parameters are determined using 'unsupervised' methods such as maximization of the joint likelihood. The reason is often that it is unclear how to find the parameters maximizing the cond ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Bayesian network models are widely used for discriminative prediction tasks such as classification. Usually their parameters are determined using 'unsupervised' methods such as maximization of the joint likelihood. The reason is often that it is unclear how to find the parameters maximizing the conditional (supervised) likelihood. We show how the discriminative learning problem can be solved efficiently for a large class of Bayesian network models, including the Naive Bayes (NB) and treeaugmented Naive Bayes (TAN) models. We do this by showing that under a certain general condition on the network structure, the discriminative learning problem is exactly equivalent to logistic regression with unconstrained convex parameter spaces. Hitherto this was known only for Naive Bayes models. Since logistic regression models have a concave loglikelihood surface, the global maximum can be easily found by local optimization methods. 1