Results 1  10
of
35
On the optimality of the simple Bayesian classifier under zeroone loss
 MACHINE LEARNING
, 1997
"... The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containin ..."
Abstract

Cited by 601 (25 self)
 Add to MetaCart
The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier’s probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zeroone loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadraticloss optimality of the Bayesian classifier is in fact a secondorder infinitesimal fraction of the region of zeroone optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption. Further, studies in artificial domains show that it will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain. This article’s results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this is also verified empirically.
A bayesian approach to filtering junk Email, in: Learning for Text Categorization
 Papers from the 1998 Workshop, AAAI
, 1998
"... In addressing the growing problem of junk Email on the Internet, we examine methods for the automated construction of filters to eliminate such unwanted messages from a user’s mail stream. By casting this problem in a decision theoretic framework, we are able to make use of probabilistic learning m ..."
Abstract

Cited by 386 (6 self)
 Add to MetaCart
In addressing the growing problem of junk Email on the Internet, we examine methods for the automated construction of filters to eliminate such unwanted messages from a user’s mail stream. By casting this problem in a decision theoretic framework, we are able to make use of probabilistic learning methods in conjunction with a notion of differential misclassification cost to produce filters Which are especially appropriate for the nuances of this task. While this may appear, at first, to be a straightforward text classification problem, we show that by considering domainspecific features of this problem in addition to the raw text of Email messages, we can produce much more accurate filters. Finally, we show the efficacy of such filters in a real world usage scenario, arguing that this technology is mature enough for deployment.
Parametric inference for biological sequence analysis
 In: Proceedings of the National Academy of Sciences. Volume
, 2004
"... One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied towards these problems include hidden Markov models for annotatio ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied towards these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sumproduct algorithm, solves many of the inference problems associated with different statistical models. This paper introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sumproduct algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models. 1 Inference with Graphical Models for Biological Sequence Analysis This paper develops a new algorithm for graphical models based on the mathematical foundation for statistical models proposed in [18]. Its relevance for computational biology can be summarized as follows: (a) Graphical models are a unifying statistical framework for biological sequence analysis. (b) Parametric inference is important for obtaining biologically meaningful results.
BAYDA: Software for Bayesian Classification and Feature Selection
 Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD98
, 1998
"... BAYDA is a software package for flexible data analysis in predictive data mining tasks. The mathematical model underlying the program is based on a simple Bayesian network, the Naive Bayes classifier. It is wellknown that the Naive Bayes classifier performs well in predictive data mining tasks, whe ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
BAYDA is a software package for flexible data analysis in predictive data mining tasks. The mathematical model underlying the program is based on a simple Bayesian network, the Naive Bayes classifier. It is wellknown that the Naive Bayes classifier performs well in predictive data mining tasks, when compared to approaches using more complex models. However, the model makes strong independenceassumptions that are frequently violated in practice. For this reason, the BAYDA software also provides a feature selection scheme which can be used for analyzing the problem domain, and for improving the prediction accuracy of the models constructed by BAYDA. The scheme is based on a novel Bayesian feature selection criterion introduced in this paper. The suggested criterion is inspired by the CheesemanStutz approximation for computing the marginal likelihood of Bayesiannetworks with hidden variables. The empirical results with several widelyused data sets demonstrate that the automated Bayesian...
Parameter Learning in Object Oriented Bayesian Networks
, 2001
"... This paper describes a method for parameter learning in ObjectOriented Bayesian Networks (OOBNs). We propose a methodology for learning parameters in OOBNs, and prove that maintaining the object orientation imposed by the prior model will increase the learning speed in objectoriented domains. We a ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
This paper describes a method for parameter learning in ObjectOriented Bayesian Networks (OOBNs). We propose a methodology for learning parameters in OOBNs, and prove that maintaining the object orientation imposed by the prior model will increase the learning speed in objectoriented domains. We also propose a method to efficiently estimate the probability parameters in domains that are not strictly object oriented. Finally, we attack type uncertainty, a special case of model uncertainty typical to objectoriented domains
Latent Variable Discovery in Classification Models
, 2004
"... The naive Bayes model makes the often unrealistic assumption that feature variables are mutually independent given the class variable. We interpret the violation of this assumption as an indication of the presence of latent variables and show how latent variables can be detected. Latent variable dis ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
The naive Bayes model makes the often unrealistic assumption that feature variables are mutually independent given the class variable. We interpret the violation of this assumption as an indication of the presence of latent variables and show how latent variables can be detected. Latent variable discovery is interesting, especially for medical applications, because it can lead to better understanding of application domains. It can also improve classification accuracy and boost user confidence in classification models.
Learning Bayesian networks for clustering by means of constructive induction
 Pattern Recognition Letters
, 1999
"... The purpose of this paper is to present and evaluate a heuristic algorithm for learning Bayesian networks for clustering. Our approach is based upon improving the NaiveBayes model by means of constructive induction. A key idea in this approach is to treat expected data as real data. This allows us ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
The purpose of this paper is to present and evaluate a heuristic algorithm for learning Bayesian networks for clustering. Our approach is based upon improving the NaiveBayes model by means of constructive induction. A key idea in this approach is to treat expected data as real data. This allows us to complete the database and to take advantage of factorable closed forms for the marginal likelihood. In order to get such an advantage, we search for parameter values using the EM algorithm or another alternative approach that we have developed: a hybridization of the Bound and Collapse method and the EM algorithm, which results in a method that exhibits a faster convergence rate and more effective behaviour than the EM algorithm. Also, we consider the possibility of interleaving runnings of these two methods after each structural change. We evaluate our approach on synthetic and realworld databases. Key words: clustering, Bayesian networks, learning from incomplete data, constructive ind...
Bayesian fluorescence in situ hybridisation signal classification
, 2004
"... research has indicated the significance of accurate classification of fluorescence in situ hybridisation (FISH) signals for the detection of genetic abnormalities. Based on welldiscriminating features and a trainable neural network (NN) classifier, a previous system enabled highlyaccurate classifi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
research has indicated the significance of accurate classification of fluorescence in situ hybridisation (FISH) signals for the detection of genetic abnormalities. Based on welldiscriminating features and a trainable neural network (NN) classifier, a previous system enabled highlyaccurate classification of valid signals and artefacts of two fluorophores. However, since this system employed several features that are considered independent, the naive Bayesian classifier (NBC) is suggested here as an alternative to the NN. The NBC independence assumption permits the decomposition of the highdimensional likelihood of the model for the data into a product of onedimensional probability densities. The naive independence assumption together with the Bayesian methodology allow the NBC to predict a posteriori probabilities of class membership using estimated classconditional densities in a close and simple form. Since the probability densities are the only parameters of the NBC, the misclassification rate of the model is determined exclusively by the quality of density estimation. Densities are evaluated by three methods: single Gaussian estimation (SGE; parametric method), Gaussian mixture model assuming spherical covariance matrices (GMM; semiparametric method) and kernel density estimation (KDE; nonparametric method). For lowdimensional densities, the GMM generally outperforms the KDE that tends to overfit the training set
Bayesian Networks with Imprecise Probabilities: Theory and Application to Classification
, 2010
"... Bayesian network are powerful probabilistic graphical models for modelling uncertainty. Among others, classification represents an important application: some of the most used classifiers are based on Bayesian networks. Bayesian networks are precise models: exact numeric values should be provided fo ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Bayesian network are powerful probabilistic graphical models for modelling uncertainty. Among others, classification represents an important application: some of the most used classifiers are based on Bayesian networks. Bayesian networks are precise models: exact numeric values should be provided for quantification. This requirement is sometimes too narrow. Sets instead of single distributions can provide a more realistic description in these cases. Bayesian networks can be generalized to cope with sets of distributions. This leads to a novel class of imprecise probabilistic graphical models, called credal networks. In particular, classifiers based on Bayesian networks are generalized to socalled credal classifiers. Unlike Bayesian classifiers, which always detect a single class as the one maximizing the posterior class probability, a credal classifier may eventually be unable to discriminate a single class. In other words, if the available information is not sufficient, credal classifiers allow for indecision between two or more classes, this providing a less informative but more robust conclusion than Bayesian classifiers.
Feature Based Representation and Detection of Transcription Factor Binding Sites
"... Abstract: The prediction of transcription factor binding sites is an important problem, since it reveals information about the transcriptional regulation of genes. A commonly used representation of these sites are position specific weight matrices which show weak predictive power. We introduce a fea ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract: The prediction of transcription factor binding sites is an important problem, since it reveals information about the transcriptional regulation of genes. A commonly used representation of these sites are position specific weight matrices which show weak predictive power. We introduce a featurebased modelling approach, which is able to deal with various kind of biological properties of binding sites and models them via Bayesian belief networks. The presented results imply higher model accuracy in contrast to the PSSM approach.