Results 1  10
of
22
Bayesian Network Classifiers
, 1997
"... Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with stateoftheart classifiers such as C4.5. This fact raises the question of whether a classifier with less restr ..."
Abstract

Cited by 607 (22 self)
 Add to MetaCart
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with stateoftheart classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.
Learning with Labeled and Unlabeled Data
, 2001
"... In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as we ..."
Abstract

Cited by 170 (3 self)
 Add to MetaCart
In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work. Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review. We give a rigorous definition of the problem and relate it to supervised and unsupervised learning. The crucial role of prior knowledge is put forward, and we discuss the important notion of inputdependent regularization. We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts. However, some of them might serve as basis for a genuine method. In the literature revi...
On discriminative Bayesian network classifiers and logistic regression
 Machine Learning
"... Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheor ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic property. The property holds for naive Bayes but also for more complex structures such as treeaugmented naive Bayes (TAN) as well as for mixed diagnosticdiscriminative structures. Our results imply that for networks satisfying our property, the conditional likelihood cannot have local maxima so that the global maximum can be found by simple local optimization methods. We also show that if this property does not hold, then in general the conditional likelihood can have local, nonglobal maxima. We illustrate our theoretical results by empirical experiments with local optimization in a conditional naive Bayes model. Furthermore, we provide a heuristic strategy for pruning the number of parameters and relevant features in such models. For many data sets, we obtain good results with heavily pruned submodels containing many fewer parameters than the original naive Bayes model.
Classifier Learning with Supervised Marginal Likelihood
"... It has been argued that in supervised classification tasks it may be more sensible to perform model selection with respect to a more focused model selection score, like the supervised (conditional) marginal likelihood, than with respect to the standard unsupervised marginal likelihood criterion ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
It has been argued that in supervised classification tasks it may be more sensible to perform model selection with respect to a more focused model selection score, like the supervised (conditional) marginal likelihood, than with respect to the standard unsupervised marginal likelihood criterion. However, for most Bayesian network models, computing the supervised marginal likelihood score takes exponential time with respect to the amount of observed data. In this paper, we consider diagnostic Bayesian network classifiers where the significant model parameters represent conditional distributions for the class variable, given the values of the predictor variables, in which case the supervised marginal likelihood can be computed in linear time with respect to the data. As the number of model parameters grows in this case exponentially with respect to the number of predictors, we focus on simple diagnostic models where the number of relevant predictors is small, and suggest two approaches for applying this type of models in classification. The first approach is based on mixtures of simple diagnostic models, while in the second approach we apply the small predictor sets of the simple diagnostic models for augmenting the Naive Bayes classifier.
Probabilistic Models for Bacterial Taxonomy
 INTERNATIONAL STATISTICAL REVIEW
, 2000
"... We give a survey of different probabilistic partitioning methods that have been applied to bacterial taxonomy. We introduce a theoretical framework, which makes it possible to treat the various models in a unified way. The key concepts of our approach are prediction and storing of microbiological in ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We give a survey of different probabilistic partitioning methods that have been applied to bacterial taxonomy. We introduce a theoretical framework, which makes it possible to treat the various models in a unified way. The key concepts of our approach are prediction and storing of microbiological information in a Bayesian forecasting setting. We show that there is a close connection between classification and probabilistic identification and that, in fact, our approach ties these two concepts together in a coherent way.
Statistical challenges of highdimensional data
 52 SPIKE AND SLAB PRIORS FOR BAYESIAN GROUP FEATURE SELECTION
, 1906
"... Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this theme issue is to ill ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this theme issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with highdimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying lowdimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of highdimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue.
Semisupervised Learning Of Classifiers With Application To HumanComputer Interaction
 Born, Max, Einstein’s Theory of Relativity
, 2003
"... With the growing use of computers and computing objects in the design of many of the day to day tools that humans use, humancomputer intelligent interaction is seen as a necessary step for the ability to make computers better aid the human user. There are many tasks involved in designing good inter ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
With the growing use of computers and computing objects in the design of many of the day to day tools that humans use, humancomputer intelligent interaction is seen as a necessary step for the ability to make computers better aid the human user. There are many tasks involved in designing good interaction between humans and machines. One basic task, related to many such applications, is automatic classification by the machine. Designing a classifier can be done by domain experts or by learning from training data. Training data can be labeled to the different classes or unlabeled. In this work I focus on training probabilistic classifiers with labeled and unlabeled data. I show under what conditions unlabeled data can be used to improve classification performance. I also show that it often occurs that if the conditions are violated, using unlabeled data can be detrimental to the classification performance. I discuss the implications of this analysis when learning a specific type of probabilistic classifiers, namely Bayesian networks, and propose structure learning algorithms that can potentially utilize unlabeled data to improve classification. I show how the theory and algorithms are successfully applied in two applications related to humancomputer interaction: facial expression recognition and face detection.
Supervised Learning of Bayesian Network Parameters Made Easy
 Level Perspective on Branch Architecture Performance, IEEE Micro28
, 2002
"... Bayesian network models are widely used for supervised prediction tasks such as classification. Usually the parameters of such models are determined using `unsupervised' methods such as maximization of the joint likelihood. In many cases, the reason is that it is not clear how to find the param ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Bayesian network models are widely used for supervised prediction tasks such as classification. Usually the parameters of such models are determined using `unsupervised' methods such as maximization of the joint likelihood. In many cases, the reason is that it is not clear how to find the parameters maximizing the supervised (conditional) likelihood. We show how the supervised learning problem can be solved e#ciently for a large class of Bayesian network models, including the Naive Bayes (NB) and treeaugmented NB (TAN) classifiers. We do this by showing that under a certain general condition on the network structure, the supervised learning problem is exactly equivalent to logistic regression. Hitherto this was known only for Naive Bayes models. Since logistic regression models have a concave loglikelihood surface, the global maximum can be easily found by local optimization methods.
Discrimination and Classification
, 1995
"... The aim of this report is to present methods from statistics, neural networks, nonparametric regression and pattern recognition to perform discrimination and classification. The methods are compared on theoretical and empirical grounds to highlight strengths and weaknesses. A common platform for cla ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
The aim of this report is to present methods from statistics, neural networks, nonparametric regression and pattern recognition to perform discrimination and classification. The methods are compared on theoretical and empirical grounds to highlight strengths and weaknesses. A common platform for classification is also outlined. The emphasis is on multiple (more than two) classes. Some keywords: Supervised Classification; Discriminant Analysis; Multiple Classes; Multilayer Perceptrons (MLP). Discrimination and Classification Contents 1 Introduction 3 2 Classification 3 2.1 Decision Theoretic Framework : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.2 Allocation Principles : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.3 Discriminant Functions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.4 Constructing Classifiers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.5 Evaluation Principles : : : : : : : : : : : : : : : : : : ...
Supervised Naive Bayes Parameters
, 2002
"... this paper we show, how this supervised learning problem can be solved e#ciently. We introduce an alternative parametrization in which the supervised likelihood becomes concave. From this result it follows that there can be at most one maximum, easily found by local optimization methods. We present ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
this paper we show, how this supervised learning problem can be solved e#ciently. We introduce an alternative parametrization in which the supervised likelihood becomes concave. From this result it follows that there can be at most one maximum, easily found by local optimization methods. We present test results that show this is feasible and highly beneficial