Results 1  10
of
44
Learning Bayesian network classifiers by maximizing conditional likelihood
 In ICML2004
, 2004
"... Bayesian networks are a powerful probabilistic representation, and their use for classification has received considerable attention. However, they tend to perform poorly when learned in the standard way. This is attributable to a mismatch between the objective function used (likelihood or a function ..."
Abstract

Cited by 63 (0 self)
 Add to MetaCart
Bayesian networks are a powerful probabilistic representation, and their use for classification has received considerable attention. However, they tend to perform poorly when learned in the standard way. This is attributable to a mismatch between the objective function used (likelihood or a function thereof) and the goal of classification (maximizing accuracy or conditional likelihood). Unfortunately, the computational cost of optimizing structure and parameters for conditional likelihood is prohibitive. In this paper we show that a simple approximation— choosing structures by maximizing conditional likelihood while setting parameters by maximum likelihood—yields good results. On a large suite of benchmark datasets, this approach produces better class probability estimates than naive Bayes, TAN, and generativelytrained Bayesian networks. 1.
Recognizing Planned, Multiperson Action
 Computer Vision and Image Understanding
, 2001
"... This paper demonstrates how highly structured, multiperson action can be recognized from noisy perceptual data using visually grounded goalbased primitives and loworder temporal relationships that are integrated in a probabilistic framework. The representation, which is motivated by work in mo ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
This paper demonstrates how highly structured, multiperson action can be recognized from noisy perceptual data using visually grounded goalbased primitives and loworder temporal relationships that are integrated in a probabilistic framework. The representation, which is motivated by work in modelbased object recognition and probabilistic plan recognition, makes four principal assumptions: (1) the goals of individual agents are natural atomic representational units for specifying the temporal relationships between agents engaged in group activities, (2) a highlevel description of temporal structure of the action using a small set of loworder temporal and logical constraints is adequate for representing the relationships between the agent goals for highly structured, multiagent action recognition, (3) Bayesian networks provide a suitable mechanism for integrating multiple sources of uncertain visual perceptual feature evidence, and (4) an automatically generated Bayesian
Combining Naive Bayes and nGram Language Models for Text Classification
 In 25th European Conference on Information Retrieval Research (ECIR
, 2003
"... We augment the naive Bayes model with an ngram language model to address two shortcomings of naive Bayes text classifiers. ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
We augment the naive Bayes model with an ngram language model to address two shortcomings of naive Bayes text classifiers.
Discretization for naiveBayes learning: managing discretization bias and variance
, 2003
"... Quantitative attributes are usually discretized in naiveBayes learning. We prove a theorem that explains why discretization can be effective for naiveBayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naiveBay ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
Quantitative attributes are usually discretized in naiveBayes learning. We prove a theorem that explains why discretization can be effective for naiveBayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naiveBayes classifiers, effects we name discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naiveBayes classification error. In particular, we propose proportional kinterval discretization and equal size discretization, two efficient heuristic discretization methods that are able to effectively manage discretization bias and variance by tuning discretized interval size and interval number. We empirically evaluate our new techniques against five key discretization methods for naiveBayes classifiers. The experimental results support our theoretical arguments by showing that naiveBayes classifiers trained on data discretized by our new methods are able to achieve lower classification error than those trained on data discretized by alternative discretization methods.
BNT structure learning package: documentation and experiments
 Technical Report FRE CNRS 2645). Laboratoire PSI, Universitè et INSA de Rouen
, 2004
"... Bayesian networks are a formalism for probabilistic reasonning that is more and more used for classification task in datamining. In some situations, the network structure is given by an expert, otherwise, retrieving it from a database is a NPhard problem, notably because of the search space comple ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Bayesian networks are a formalism for probabilistic reasonning that is more and more used for classification task in datamining. In some situations, the network structure is given by an expert, otherwise, retrieving it from a database is a NPhard problem, notably because of the search space complexity. In the last decade, lot of methods have been introduced to learn the network structure automatically, by simplifying the search space (augmented naive bayes, K2) or by using an heuristic in this search space (greedy search). Most of these methods deal with completely observed data, but some others can deal with incomplete data (SEM, MWSTEM). The Bayes Net Toolbox introduced by [Murphy, 2001a] for Matlab allows us using Bayesian Networks or learning them. But this toolbox is not ’state of the art ’ if we want to perform a Structural Learning, that’s why we propose this package.
On Discriminative Bayesian Network Classifiers and Logistic Regression
 Machine Learning
, 2005
"... Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic prope ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic property. The property holds for naive Bayes but also for more complex structures such as treeaugmented naive Bayes (TAN) as well as for mixed diagnosticdiscriminative structures. Our results imply that for networks satisfying our property, the conditional likelihood cannot have local maxima so that the global maximum can be found by simple local optimization methods. We also show that if this property does not hold, then in general the conditional likelihood can have local, nonglobal maxima. We illustrate our theoretical results by empirical experiments with local optimization in a conditional naive Bayes model. Furthermore, we provide a heuristic strategy for pruning the number of parameters and relevant features in such models. For many data sets, we obtain good results with heavily pruned submodels containing many fewer parameters than the original naive Bayes model.
METIORE: A Personalized Information Retrieval System
 8 International Conference on User Modeling.UM'2001
, 2001
"... The idea of personalizing the interactions of a system is not new. ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
The idea of personalizing the interactions of a system is not new.
An improved Bayesian Structural EM algorithm for learning Bayesian networks for clustering
 Pattern Recognition Letters
"... The application of the Bayesian Structural EM algorithm to learn Bayesian networks for clustering implies a search over the space of Bayesian network structures alternating between two steps: an optimization of the Bayesian network parameters (usually by means of the EM algorithm) and a structural s ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
The application of the Bayesian Structural EM algorithm to learn Bayesian networks for clustering implies a search over the space of Bayesian network structures alternating between two steps: an optimization of the Bayesian network parameters (usually by means of the EM algorithm) and a structural search for model selection. In this paper, we propose to perform the optimization of the Bayesian network parameters using an alternative approach to the EM algorithm: the BC+EM method. We provide experimental results to show that our proposal results in a more effective and efficient version of the Bayesian Structural EM algorithm for learning Bayesian networks for clustering. Key words: clustering, Bayesian networks, EM algorithm, Bayesian Structural EM algorithm, Bound and Collapse method. 1 Introduction One of the basic problems that arises in a great variety of fields, including pattern recognition, machine learning and statistics, is the socalled data clustering problem [1,5,6,10,1...
Averaged OneDependence Estimators: Preliminary Results
 University of Technology Sydney
, 2002
"... Naive Bayes is a simple, computationally efficient and remarkably accurate approach to classification learning. These properties have led to its wide deployment in many online applications. However, it is based on an assumption that all attributes are conditionally independent given the class. This ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Naive Bayes is a simple, computationally efficient and remarkably accurate approach to classification learning. These properties have led to its wide deployment in many online applications. However, it is based on an assumption that all attributes are conditionally independent given the class. This assumption leads to decreased accuracy in some applications. AODE overcomes the attribute independence assumption of naive Bayes by averaging over all models in which all attributes depend upon the class and a single other attribute. The resulting classification learning algorithm for nominal data is computationally efficient and achieves very low error rates.
Robust bayesian linear classifier ensembles
 Proc. 16th European Conf. Machine Learning, Lecture Notes in Computer Science
, 2005
"... Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators. We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.