Results 11  20
of
58
Learning Augmented Bayesian Classifiers: A Comparison of Distributionbased and Classificationbased Approaches
, 1999
"... The naïve Bayes classifier is built on the assumption of conditional independence between the attributes given the class. The algorithm has been shown to be surprisingly robust to obvious violations of this condition, but it is natural to ask if it is possible to further improve the accuracy by rela ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
The naïve Bayes classifier is built on the assumption of conditional independence between the attributes given the class. The algorithm has been shown to be surprisingly robust to obvious violations of this condition, but it is natural to ask if it is possible to further improve the accuracy by relaxing this assumption. We examine an approach where naïve Bayes is augmented by the addition of correlation arcs between attributes. We explore two methods for finding the set of augmenting arcs, a greedy hillclimbing search, and a novel, more computationally efficient algorithm that we call SuperParent. We compare these methods to TAN; a stateof theart distributionbased approach to finding the augmenting arcs. 1 INTRODUCTION The Bayesian classifier (Duda & Hart, 1973) is a simple classification method, which classifies an instance j by determining the probability of it belonging to class C . These probabilities are calculated as: ) & & ( 1 1 j N N i V A V A C P = = v , (1) where an exam...
Lazy Learning of Bayesian Rules
 Machine Learning
, 2000
"... The naive Bayesian classifier provides a simple and e#ective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and ge ..."
Abstract

Cited by 39 (8 self)
 Add to MetaCart
The naive Bayesian classifier provides a simple and e#ective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and generates a local naive Bayesian classifier at each leaf. The tests leading to a leaf can alleviate attribute interdependencies for the local naive Bayesian classifier. However, Bayesian tree learning still su#ers from the small disjunct problem of tree learning. While inferred Bayesian trees demonstrate low average prediction error rates, there is reason to believe that error rates will be higher for those leaves with few training examples. This paper proposes the application of lazy learning techniques to Bayesian tree induction and presents the resulting lazy Bayesian rule learning algorithm, called Lbr. This algorithm can be justified by a variant of Bayes theorem which supports a weaker conditional attribute independence assumption than is required by naive Bayes. For each test example, it builds a most appropriate rule with a local naive Bayesian classifier as its consequent. It is demonstrated that the computational requirements of Lbr are reasonable in a wide crosssection of natural domains. Experiments with these domains show that, on average, this new algorithm obtains lower error rates significantly more often than the reverse in comparison to a naive Bayesian classifier, C4.5, a Bayesian tree learning algorithm, a constructive Bayesian classifier that eliminates attributes and constructs new attributes using Cartesian products of existing nominal attributes, and a lazy decision tree learning algorithm. It also outperforms, although the result is not statisticall...
On predictive distributions and Bayesian networks
 Statistics and Computing
, 2000
"... this paper we are interested in discrete prediction problems for a decisiontheoretic setting, where the ..."
Abstract

Cited by 38 (29 self)
 Add to MetaCart
this paper we are interested in discrete prediction problems for a decisiontheoretic setting, where the
Learning Probabilistic Networks
 THE KNOWLEDGE ENGINEERING REVIEW
, 1998
"... A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combini ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combining prior knowledge, which might be limited solely to experience of the influences between some of the variables of interest, and data. In this paper, we first show how data can be used to revise initial estimates of the parameters of a model. We then progress to showing how the structure of the model can be revised as data is obtained. Techniques for learning with incomplete data are also covered.
Analyzing Attribute Dependencies
 PKDD 2003, volume 2838 of LNAI
, 2003
"... Many effective and efficient learning algorithms assume independence of attributes. They often perform well even in domains where this assumption is not really true. However, they may fail badly when the degree of attribute dependencies becomes critical. In this paper, we examine methods for detecti ..."
Abstract

Cited by 30 (12 self)
 Add to MetaCart
Many effective and efficient learning algorithms assume independence of attributes. They often perform well even in domains where this assumption is not really true. However, they may fail badly when the degree of attribute dependencies becomes critical. In this paper, we examine methods for detecting deviations from independence. These dependencies give rise to "interactions" between attributes which affect the performance of learning algorithms. We first formally define the degree of interaction between attributes through the deviation of the best possible "voting" classifier from the true relation between the class and the attributes in a domain. Then we propose a practical heuristic for detecting attribute interactions, called interaction gain. We experimentally investigate the suitability of interaction gain for handling attribute interactions in machine learning. We also propose visualization methods for graphical exploration of interactions in a domain.
Estimating dependency structure as a hidden variable
 In NIPS
, 1998
"... This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms based on the EM and the Minimum Spann ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms based on the EM and the Minimum Spanning Tree algorithms that learn mixtures of trees in the ML framework. The method can be extended to take into account priors and, for a wide class of priors that includes the Dirichlet and the MDL priors, it preserves its computational efficiency. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification. Finally, we show that a single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes.
OntologyBased Web Site Mapping for Information Exploration
 In Proceedings of the 8 th International Conference On Information Knowledge Management (CIKM
, 1999
"... Centralized search process requires that the whole collection reside at a single site. This imposes a burden on both the system storage of the site and the network traffic near the site. It thus comes to require the search process to be distributed. Recently, more and more Web sites provide the abil ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
Centralized search process requires that the whole collection reside at a single site. This imposes a burden on both the system storage of the site and the network traffic near the site. It thus comes to require the search process to be distributed. Recently, more and more Web sites provide the ability to search their local collection of Web pages. Query brokering systems are used to direct queries to the promising sites and merge the results from these sites. Creation of metainformation of the sites plays an important role in such systems. In this article, we introduce an ontologybased web site mapping method used to produce conceptual metainformation, the Vector Space approach, and present a serial of experiments comparing it with NaveBayes approach. We found that the Vector Space approach produces better accuracy in ontologybased web site mapping. Keywords Distributed collections, information brokers, text categorization, IR agents. 1. INTRODUCTION The World Wide Web (WWW)...
Models and Selection Criteria for Regression and Classification
 Uncertainty in Arificial Intelligence 13
, 1997
"... When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatory or input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we ca ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatory or input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we call Bayesian regression/classification (BRC) models, that can be factored into independent conditional (yjx) and input (x) models. These models are convenient, because the conditional model (the portion of the full model that we care about) can be analyzed by itself. We examine the practice of transforming arbitrary Bayesian models to BRC models, and argue that this practice is often inappropriate because it ignores prior knowledge that may be important for learning. In addition, we examine Bayesian methods for learning models from data. We discuss two criteria for Bayesian model selection that are appropriate for repression/classification: one described by Spiegelhalter et al. (1993), and an...
Lazy Bayesian Rules: A Lazy SemiNaive Bayesian Learning Technique Competitive to Boosting Decision Trees
 IN PROC. 16TH INTERNATIONAL CONF. ON MACHINE LEARNING
, 1999
"... Lbr is a lazy seminaive Bayesian classifier learning technique, designed to alleviate the attribute interdependence problem of naive Bayesian classification. To classify a test example, it creates a conjunctive rule that selects a most appropriate subset of training examples and induces a local nai ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
Lbr is a lazy seminaive Bayesian classifier learning technique, designed to alleviate the attribute interdependence problem of naive Bayesian classification. To classify a test example, it creates a conjunctive rule that selects a most appropriate subset of training examples and induces a local naive Bayesian classifier using this subset. Lbr can significantly improve the performance of the naive Bayesian classifier. A bias and variance analysis of Lbr reveals that it significantly reduces the bias of naive Bayesian classification at a cost of a slight increase in variance. It is interesting to compare this lazy technique with boosting and bagging, two wellknown stateoftheart nonlazy learning techniques. Empirical comparison of Lbr with boosting decision trees on discrete valued data shows that Lbr has, on average, significantly lower variance and higher bias. As a result of the interaction of these effects, the average prediction error of Lbr over a range of learning tasks is at...