Results 1  10
of
24
The Learnability of Naive Bayes
 In: Proceedings of Canadian Artificial Intelligence Conference
, 2005
"... Naive Bayes is an efficient and effective learning algorithm, but previous results show that its representation ability is severely limited since it can only represent certain linearly separable functions in the binary domain. We give necessary and sufficient conditions on linearly separable functio ..."
Abstract

Cited by 156 (0 self)
 Add to MetaCart
Naive Bayes is an efficient and effective learning algorithm, but previous results show that its representation ability is severely limited since it can only represent certain linearly separable functions in the binary domain. We give necessary and sufficient conditions on linearly separable functions in the binary domain to be learnable by Naive Bayes under uniform representation. We then show that the learnability (and error rates) of Naive Bayes can be affected dramatically by sampling distributions. Our results help us to gain a much deeper understanding of this seemingly simple, yet powerful learning algorithm.
A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome
 J. Mol. Biol
, 2000
"... Version Final We develop a probabilistic system for predicting the subcellular localization of proteins and estimating the relative population of the various compartments in yeast. Our system employs a Bayesian approach, updating a protein's probability of being in a compartment based on a div ..."
Abstract

Cited by 96 (23 self)
 Add to MetaCart
Version Final We develop a probabilistic system for predicting the subcellular localization of proteins and estimating the relative population of the various compartments in yeast. Our system employs a Bayesian approach, updating a protein's probability of being in a compartment based on a diverse range of 30 features. These range from specific motifs (e.g. signal sequences or HDEL) to overall properties of a sequence (e.g. surface composition or isoelectric point) to wholegenome data (e.g. absolute mRNA expression levels or their fluctuations). The strength of our approach is the easy integration of many features, particularly the wholegenome expression data. We construct a training and testing set of ~1300 yeast proteins with an experimentally known localization from merging, filtering, and standardizing the annotation in the MIPS, SwissProt and YPD databases, and we achieve 75 % accuracy on individual protein predictions using this dataset. Moreover, we are able to estimate the relative protein population of the various compartments without requiring a definite localization for every protein. This approach, which is based on an
On supervised selection of Bayesian networks
 In UAI99
, 1999
"... Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
(Show Context)
Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori known that the chosen model will be used in the future for prediction tasks involving more \focused &quot; predictive distributions. Although focused predictive distributions can be produced from the joint probability distribution by marginalization, in practice the best model in the unsupervised sense does not necessarily perform well in supervised domains. In particular, the standard marginal likelihood score is a criterion for the unsupervised task, and, although frequently used for supervised model selection also, does not perform well in such tasks. In this paper we study the performance of the marginal likelihood score empirically in supervised Bayesian network selection tasks by using a large number of publicly available classi cation data sets, and compare the results to those obtained by alternative model selection criteria, including empirical crossvalidation methods, an approximation of a supervised marginal likelihood measure, and a supervised version of Dawid's prequential (predictive sequential) principle. The results demonstrate that the marginal likelihood score does not perform well for supervised model selection, while the best results are obtained by using Dawid's prequential approach.
Maximum entropy and the glasses you are looking through
 IN: PROCEEDINGS OF THE SIXTEENTH ANNUAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2000
, 2000
"... We give an interpretation of the Maximum Entropy (MaxEnt) Principle in gametheoretic terms. Based on this interpretation, we makeaformal distinction between different ways of applying Maximum Entropy distributions. MaxEnt has frequently been criticized on the grounds that it leads to highly represen ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
(Show Context)
We give an interpretation of the Maximum Entropy (MaxEnt) Principle in gametheoretic terms. Based on this interpretation, we makeaformal distinction between different ways of applying Maximum Entropy distributions. MaxEnt has frequently been criticized on the grounds that it leads to highly representation dependent results. Our distinction allows us to avoid this problem in many cases.
Mixnets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables
, 2000
"... Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees (Moore, 1999). In this paper, we propose a kind of Bayesian network in which lowdimensional mixtures of Gaussians over different subsets of the domain’s variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrating how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications.
The Representational Power of Discrete Bayesian Networks
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2002
"... One of the most important fundamental properties of Bayesian networks is the representational power, reflecting what kind of functions they can or cannot represent. In this paper, we establish an association between the structural complexity of Bayesian networks and their representational power. ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
One of the most important fundamental properties of Bayesian networks is the representational power, reflecting what kind of functions they can or cannot represent. In this paper, we establish an association between the structural complexity of Bayesian networks and their representational power. We use the maximum number of nodes' parents as the measure for the structural complexity of Bayesian networks, and the maximum XOR contained in a target function as the measure for the function complexity. A representational upper bound is established and proved. Roughly speaking, discrete Bayesian networks with each node having at most k parents cannot represent any function containing (k + 1)XORs. Our theoretical results help us to gain a deeper understanding on the capacities and limitations of Bayesian networks.
Integrating learning from examples into the search for diagnostic policies
 263–303. Learn (2011) 82: 445–473 471
, 2005
"... This paper studies the problem of learning diagnostic policies from training examples. A diagnostic policy is a complete description of the decisionmaking actions of a diagnostician (i.e., tests followed by a diagnostic decision) for all possible combinations of test results. An optimal diagnostic ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
This paper studies the problem of learning diagnostic policies from training examples. A diagnostic policy is a complete description of the decisionmaking actions of a diagnostician (i.e., tests followed by a diagnostic decision) for all possible combinations of test results. An optimal diagnostic policy is one that minimizes the expected total cost, which is the sum of measurement costs and misdiagnosis costs. In most diagnostic settings, there is a tradeoff between these two kinds of costs. This paper formalizes diagnostic decision making as a Markov Decision Process (MDP). The paper introduces a new family of systematic search algorithms based on the AO ∗ algorithm to solve this MDP. To make AO ∗ efficient, the paper describes an admissible heuristic that enables AO ∗ to prune large parts of the search space. The paper also introduces several greedy algorithms including some improvements over previouslypublished methods. The paper then addresses the question of learning diagnostic policies from examples. When the probabilities of diseases and test results are computed from training data, there is a great danger of overfitting. To reduce overfitting, regularizers are integrated into the search algorithms. Finally, the paper compares the proposed methods on five benchmark diagnostic data sets. The studies show that in most cases the systematic search methods produce better diagnostic policies than the greedy methods. In addition, the studies show that for training sets of realistic size, the systematic search algorithms are practical on today’s desktop computers. 1.
Local sparsity control for Naive Bayes with extreme misclassification costs
"... In applications of data mining characterized by highly skewed misclassification costs certain types of errors become virtually unacceptable. This limits the utility of a classifier to a range in which such constraints can be met. Naive Bayes, which has proven to be very useful in text mining applic ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In applications of data mining characterized by highly skewed misclassification costs certain types of errors become virtually unacceptable. This limits the utility of a classifier to a range in which such constraints can be met. Naive Bayes, which has proven to be very useful in text mining applications due to high scalability, can be particularly affected. Although its 0/1 loss tends to be small, its misclassifications are often made with apparently high con…dence. Aside from e¤orts to better calibrate Naive Bayes scores, it has been shown that its accuracy depends on document sparsity and feature selection can lead to marked improvement in classification performance. Traditionally, sparsity is controlled globally, and the result for any particular document may vary. In this work we examine the merits of local sparsity control for Naive Bayes in the context of highly asymmetric misclassification costs. In experiments with three benchmark document collections we demonstrate clear advantages of documentlevel feature selection. In the extreme cost setting, multinomial Naive Bayes with local sparsity control is able to outperform even some of the recently proposed e¤ective improvements to the Naive Bayes classifier. There are also indications that local feature selection may be preferable in different cost settings.
Performance Comparison of ItemtoItem Skills Models with the IRT Single Latent Trait Model
"... Abstract. Assessing a learner's mastery of a set of skills is a fundamental issue in intelligent learning environments. We compare the predictive performance of two approaches for training a learner model with domain data. One is based on the principle of building the model solely from observab ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Assessing a learner's mastery of a set of skills is a fundamental issue in intelligent learning environments. We compare the predictive performance of two approaches for training a learner model with domain data. One is based on the principle of building the model solely from observable data items, such as exercises or test items. Skills modelling is not part of the training phase, but instead dealt with at later stage. The other approach incorporates a single latent skill in the model. We compare the capacity of both approaches to accurately predict item outcome (binary success or failure) from a subset of item outcomes. Three types of itemtoitem models based on standard Bayesian modeling algorithms are tested: (1) Naive Bayes, (2) TreeAugmented Naive Bayes (TAN), and (3) a K2 Bayesian Classi er. Their performance is compared to the widely used IRT2PL approach which incorporates a single latent skill. The results show that the itemtoitem approaches perform as well, or better than the IRT2PL approach over 4 widely di erent data sets, but the di erences vary considerably among the data sets. We discuss the implications of these results and the issues relating to the practical use of itemtoitem models.
Advances in Bayesian Learning
"... Bayesian learning is a probabilistic approach to building models that combine prior knowledge with new information extracted from data. In the past few years, signi cant progress has been made in learning graphical models such as Bayesian networks. Bayesian networks provide a compact representation ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Bayesian learning is a probabilistic approach to building models that combine prior knowledge with new information extracted from data. In the past few years, signi cant progress has been made in learning graphical models such as Bayesian networks. Bayesian networks provide a compact representation for complex multivariate distributions and accommodate efficient inference algorithms. Bayesian networks have been successfully used in many practical applications including medical diagnosis, troubleshooting in computer systems, traffic control, signal processing, bioinformatics and web data analysis. This paper provides a brief overview of stateoftheart approaches to inference and learning in Bayesian networks and discusses further research opportunities.