Results 1  10
of
11
A tutorial introduction to the minimum description length principle
 in Advances in Minimum Description Length: Theory and Applications. 2005
"... ..."
On Discriminative Bayesian Network Classifiers and Logistic Regression
 Machine Learning
, 2005
"... Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic prope ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic property. The property holds for naive Bayes but also for more complex structures such as treeaugmented naive Bayes (TAN) as well as for mixed diagnosticdiscriminative structures. Our results imply that for networks satisfying our property, the conditional likelihood cannot have local maxima so that the global maximum can be found by simple local optimization methods. We also show that if this property does not hold, then in general the conditional likelihood can have local, nonglobal maxima. We illustrate our theoretical results by empirical experiments with local optimization in a conditional naive Bayes model. Furthermore, we provide a heuristic strategy for pruning the number of parameters and relevant features in such models. For many data sets, we obtain good results with heavily pruned submodels containing many fewer parameters than the original naive Bayes model.
Classifier Learning with Supervised Marginal Likelihood
"... It has been argued that in supervised classification tasks it may be more sensible to perform model selection with respect to a more focused model selection score, like the supervised (conditional) marginal likelihood, than with respect to the standard unsupervised marginal likelihood criterion ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
It has been argued that in supervised classification tasks it may be more sensible to perform model selection with respect to a more focused model selection score, like the supervised (conditional) marginal likelihood, than with respect to the standard unsupervised marginal likelihood criterion. However, for most Bayesian network models, computing the supervised marginal likelihood score takes exponential time with respect to the amount of observed data. In this paper, we consider diagnostic Bayesian network classifiers where the significant model parameters represent conditional distributions for the class variable, given the values of the predictor variables, in which case the supervised marginal likelihood can be computed in linear time with respect to the data. As the number of model parameters grows in this case exponentially with respect to the number of predictors, we focus on simple diagnostic models where the number of relevant predictors is small, and suggest two approaches for applying this type of models in classification. The first approach is based on mixtures of simple diagnostic models, while in the second approach we apply the small predictor sets of the simple diagnostic models for augmenting the Naive Bayes classifier.
Discriminative model selection for belief net structures
 In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI05
, 2005
"... Bayesian belief nets (BNs) are often used for classification tasks, typically to return the most likely class label for a specified instance. Many BNlearners, however, attempt to find the BN that maximizes a different objective function — viz., likelihood, rather than classification accuracy — typi ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Bayesian belief nets (BNs) are often used for classification tasks, typically to return the most likely class label for a specified instance. Many BNlearners, however, attempt to find the BN that maximizes a different objective function — viz., likelihood, rather than classification accuracy — typically by first using some model selection criterion to identify an appropriate graphical structure, then finding good parameters for that structure. This paper considers a number of possible criteria for selecting the best structure, both generative (i.e., based on likelihood; BIC, BDe) and discriminative (i.e., Conditional BIC (CBIC), resubstitution Classification Error (CE) and Bias 2 +Variance (BV)). We empirically compare these criteria against a variety of different “correct BN structures”, both realworld and synthetic, over a range of complexities. We also explore different ways to set the parameters, dealing with two issues: (1) Should we seek the parameters that maximize likelihood versus the ones that maximize conditional likelihood? (2) Should we use (i) the entire training sample first to learn the best parameters and then to evaluate the models, versus (ii) only a partition for parameter estimation and another partition for evaluation (crossvalidation)? Our results show that the discriminative BV model selection criterion is one of the best measures for identifying the optimal structure, while the discriminative CBIC performs poorly; that we should use the parameters that maximize likelihood; and that it is typically better to use crossvalidation here.
Comparing Prequential Model Selection Criteria in Supervised Learning of Mixture Models
 Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics
, 2001
"... In this paper we study prequential model selection criteria in supervised learning domains. The main problem with this approach is the fact that the criterion is sensitive to the ordering the data is processed with. We discuss several approaches for addressing the ordering problem, and compare ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
In this paper we study prequential model selection criteria in supervised learning domains. The main problem with this approach is the fact that the criterion is sensitive to the ordering the data is processed with. We discuss several approaches for addressing the ordering problem, and compare empirically their performance in realworld supervised model selection tasks. The empirical results demonstrate that with the prequential approach it is quite easy to find predictive models that are significantly more accurate classifiers than the models found by the standard unsupervised marginal likelihood criterion. The results also suggest that averaging over random orderings may be a more sensible strategy for solving the ordering problem than trying to find the ordering optimizing the prequential model selection criterion. 1
Unsupervised Bayesian Visualization of HighDimensional Data
 In
, 2000
"... We propose a data reduction method based on a probabilistic similarity framework where two vectors are considered similar if they lead to similar predictions. We show how this type of a probabilistic similarity metric can be defined both in a supervised and unsupervised manner. As a concrete applica ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We propose a data reduction method based on a probabilistic similarity framework where two vectors are considered similar if they lead to similar predictions. We show how this type of a probabilistic similarity metric can be defined both in a supervised and unsupervised manner. As a concrete application of the suggested multidimensional scaling scheme, we describe how the method can be used for producing visual images of highdimensional data, and give several examples of visualizations obtained by using the suggested scheme with probabilistic Bayesian network models. 1. INTRODUCTION Multidimensional scaling (see, e.g., [3, 2]) is a data compression or data reduction task where the goal is to replace the original highdimensional data vectors with much shorter vectors, while losing as little information as possible. Intuitively speaking, it can be argued that a pragmatically sensible data reduction scheme is such that two vectors close to each other in the original multidimensional s...
On TextBased Estimation of Document Relevance
, 2004
"... This work is part of a proactive information retrieval project that aims at estimating relevance from implicit user feedback. The noisy feedback signal needs to be complemented with all available information, and textual content is one of the natural sources. Here we take the first steps by investig ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
This work is part of a proactive information retrieval project that aims at estimating relevance from implicit user feedback. The noisy feedback signal needs to be complemented with all available information, and textual content is one of the natural sources. Here we take the first steps by investigating whether this source is at all useful in the challenging setting of estimating the relevance of a new document based on only few samples with known relevance. It turns out that even sophisticated unsupervised methods like multinomial PCA (or Latent Dirichlet Allocation) cannot help much. By contrast, feature extraction supervised by relevant auxiliary data may help. I.
Instancespecific Bayesian model averaging for classification
 Advances in Neural Information Processing Systems 17
, 2005
"... Classification algorithms typically induce populationwide models that are trained to perform well on average on expected future instances. We introduce a Bayesian framework for learning instancespecific models from data that are optimized to predict well for a particular instance. Based on this fr ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Classification algorithms typically induce populationwide models that are trained to perform well on average on expected future instances. We introduce a Bayesian framework for learning instancespecific models from data that are optimized to predict well for a particular instance. Based on this framework, we present a lazy instancespecific algorithm called ISA that performs selective model averaging over a restricted class of Bayesian networks. On experimental evaluation, this algorithm shows superior performance over model selection. We intend to apply such instancespecific algorithms to improve the performance of patientspecific predictive models induced from medical data. 1
Bayesian Network Structure Learning by Recursive Autonomy Identification Raanan Yehezkel ∗ Video Analytics Group
"... We propose the recursive autonomy identification (RAI) algorithm for constraintbased (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous substru ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We propose the recursive autonomy identification (RAI) algorithm for constraintbased (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous substructures. The sequence of operations is performed recursively for each autonomous substructure while simultaneously increasing the order of the CI test. While other CB algorithms dseparate structures and then direct the resulted undirected graph, the RAI algorithm combines the two processes from the outset and along the procedure. By this means and due to structure decomposition, learning a structure using RAI requires a smaller number of CI tests of high orders. This reduces the complexity and runtime of the algorithm and increases the accuracy by diminishing the curseofdimensionality. When the RAI algorithm learned structures from databases representing synthetic problems, known networks and natural problems, it demonstrated superiority with respect to computational complexity, runtime, structural correctness and classification accuracy over the
Bias Management of Bayesian Network Classifiers
"... Abstract. The purpose of this paper is to describe an adaptive algorithm for improving the performance of Bayesian Network Classifiers (BNCs) in an online learning framework. Instead of choosing a priori a particular model class of BNCs, our adaptive algorithm scales up the model’s complexity by gr ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. The purpose of this paper is to describe an adaptive algorithm for improving the performance of Bayesian Network Classifiers (BNCs) in an online learning framework. Instead of choosing a priori a particular model class of BNCs, our adaptive algorithm scales up the model’s complexity by gradually increasing the number of allowable dependencies among features. Starting with the simple Naïve Bayes structure, it uses simple decision rules based on qualitative information about the performance’s dynamics to decide when it makes sense to do the next move in the spectrum of feature dependencies and to start searching for a more complex classifier. Results in conducted experiments using the class of Dependence Bayesian Classifiers on three large datasets show that our algorithm is able to select a model with the appropriate complexity for the current amount of training data, thus balancing the computational cost of updating a model with the benefits of increasing in accuracy.