Results 1  10
of
39
A Bayesian Approach to Causal Discovery
, 1997
"... We examine the Bayesian approach to the discovery of directed acyclic causal models and compare it to the constraintbased approach. Both approaches rely on the Causal Markov assumption, but the two differ significantly in theory and practice. An important difference between the approaches is that t ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
We examine the Bayesian approach to the discovery of directed acyclic causal models and compare it to the constraintbased approach. Both approaches rely on the Causal Markov assumption, but the two differ significantly in theory and practice. An important difference between the approaches is that the constraintbased approach uses categorical information about conditionalindependence constraints in the domain, whereas the Bayesian approach weighs the degree to which such constraints hold. As a result, the Bayesian approach has three distinct advantages over its constraintbased counterpart. One, conclusions derived from the Bayesian approach are not susceptible to incorrect categorical decisions about independence facts that can occur with data sets of finite size. Two, using the Bayesian approach, finer distinctions among model structuresboth quantitative and qualitativecan be made. Three, information from several models can be combined to make better inferences and to better ...
Knowledge Acquisition from Examples Via Multiple Models
 In Proceedings of the Fourteenth International Conference on Machine Learning
, 1997
"... If it is to qualify as knowledge, a learner's output should be accurate, stable and comprehensible. Learning multiple models can improve significantly on the accuracy and stability of single models, but at the cost of losing their comprehensibility (when they possess it, as do, for example, simple d ..."
Abstract

Cited by 55 (7 self)
 Add to MetaCart
If it is to qualify as knowledge, a learner's output should be accurate, stable and comprehensible. Learning multiple models can improve significantly on the accuracy and stability of single models, but at the cost of losing their comprehensibility (when they possess it, as do, for example, simple decision trees and rule sets). This paper proposes and evaluates CMM, a metalearner that seeks to retain most of the accuracy gains of multiple model approaches, while still producing a single comprehensible model. CMM is based on reapplying the base learner to recover the frontiers implicit in the multiple model ensemble. This is done by giving the base learner a new training set, composed of a large number of examples generated and classified according to the ensemble, plus the original examples. CMM is evaluated using C4.5RULES as the base learner, and bagging as the multiplemodel methodology. On 26 benchmark datasets, CMM retains on average 60% of the accuracy gains obtained by bagging ...
Feature Subset Selection by Bayesian networks: a comparison with genetic and sequential algorithms
"... In this paper we perform a comparison among FSSEBNA, a randomized, populationbased and evolutionary algorithm, and two genetic and other two sequential search approaches in the well known Feature Subset Selection (FSS) problem. In FSSEBNA, the FSS problem, stated as a search problem, uses the E ..."
Abstract

Cited by 42 (15 self)
 Add to MetaCart
In this paper we perform a comparison among FSSEBNA, a randomized, populationbased and evolutionary algorithm, and two genetic and other two sequential search approaches in the well known Feature Subset Selection (FSS) problem. In FSSEBNA, the FSS problem, stated as a search problem, uses the EBNA (Estimation of Bayesian Network Algorithm) search engine, an algorithm within the EDA (Estimation of Distribution Algorithm) approach. The EDA paradigm is born from the roots of the GA community in order to explicitly discover the relationships among the features of the problem and not disrupt them by genetic recombination operators. The EDA paradigm avoids the use of recombination operators and it guarantees the evolution of the population of solutions and the discovery of these relationships by the factorization of the probability distribution of best individuals in each generation of the search. In EBNA, this factorization is carried out by a Bayesian network induced by a chea...
Learning Probabilistic Networks
 THE KNOWLEDGE ENGINEERING REVIEW
, 1998
"... A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combini ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combining prior knowledge, which might be limited solely to experience of the influences between some of the variables of interest, and data. In this paper, we first show how data can be used to revise initial estimates of the parameters of a model. We then progress to showing how the structure of the model can be revised as data is obtained. Techniques for learning with incomplete data are also covered.
Why Does Bagging Work? A Bayesian Account and its Implications
 In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining
, 1997
"... The error rate of decisiontree and other classification learners can often be much reduced by bagging: learning multiple models from bootstrap samples of the database, and combining them by uniform voting. In this paper we empirically test two alternative explanations for this, both based on Bayes ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
The error rate of decisiontree and other classification learners can often be much reduced by bagging: learning multiple models from bootstrap samples of the database, and combining them by uniform voting. In this paper we empirically test two alternative explanations for this, both based on Bayesian learning theory: (1) bagging works because it is an approximation to the optimal procedure of Bayesian model averaging, with an appropriate implicit prior; (2) bagging works because it effectively shifts the prior to a more appropriate region of model space. All the experimental evidence contradicts the first hypothesis, and confirms the second. Bagging Bagging (Breiman 1996a) is a simple and effective way to reduce the error rate of many classification learning algorithms. For example, in the empirical study described below, it reduces the error of a decisiontree learner in 19 of 26 databases, by 4% on average. In the bagging procedure, given a training set of size s, a "bootstrap" re...
Knowledge Discovery Via Multiple Models
 Intelligent Data Analysis
, 1998
"... If it is to qualify as knowledge, a learner's output should be accurate, stable and comprehensible. Learning multiple models can improve significantly on the accuracy and stability of single models, but at the cost of losing their comprehensibility (when they possess it, as do, for example, simple d ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
If it is to qualify as knowledge, a learner's output should be accurate, stable and comprehensible. Learning multiple models can improve significantly on the accuracy and stability of single models, but at the cost of losing their comprehensibility (when they possess it, as do, for example, simple decision trees and rule sets). This article proposes and evaluates CMM, a metalearner that seeks to retain most of the accuracy gains of multiple model approaches, while still producing a single comprehensible model. CMM is based on reapplying the base learner to recover the frontiers implicit in the multiple model ensemble. This is done by giving the base learner a new training set, composed of a large number of examples generated and classified according to the ensemble, plus the original examples. CMM is evaluated using C4.5RULES as the base learner, and bagging as the multiplemodel methodology. On 26 benchmark datasets, CMM retains on average 60% of the accuracy gains obtained by baggin...
The State of Boosting
, 1999
"... In many problem domains, combining the predictions of several models often results in a model with improved predictive performance. Boosting is one such method that has shown great promise. On the applied side, empirical studies have shown that combining models using boosting methods produces more a ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
In many problem domains, combining the predictions of several models often results in a model with improved predictive performance. Boosting is one such method that has shown great promise. On the applied side, empirical studies have shown that combining models using boosting methods produces more accurate classification and regression models. These methods are extendible to the exponential family as well as proportional hazards regression models. This article shows that boosting, which is still new to statistics, is widely applicable. I will introduce boosting, discuss the current state of boosting, and show how these methods connect to more standard statistical practice.
A Comparison of Scientific and Engineering Criteria for Bayesian Model Selection
 Statistics and Computing
, 1996
"... this paper, we assume that there are a finite number of possible true models. For each possible model m, we define the random (vector) variable \Theta m whose values correspond to the possible values of the parameters for m. We encode our uncertainty about \Theta m using the probability distribution ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
this paper, we assume that there are a finite number of possible true models. For each possible model m, we define the random (vector) variable \Theta m whose values correspond to the possible values of the parameters for m. We encode our uncertainty about \Theta m using the probability distribution p(\Theta m jm). In this paper, we assume that p(\Theta m jm) is a probability density function. Given random sample D, we compute the posterior distributions for M and each \Theta m
Structured priors for structure learning
 In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI
, 2006
"... Traditional approaches to Bayes net structure learning typically assume little regularity in graph structure other than sparseness. However, in many cases, we expect more systematicity: variables in realworld systems often group into classes that predict the kinds of probabilistic dependencies they ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
Traditional approaches to Bayes net structure learning typically assume little regularity in graph structure other than sparseness. However, in many cases, we expect more systematicity: variables in realworld systems often group into classes that predict the kinds of probabilistic dependencies they participate in. Here we capture this form of prior knowledge in a hierarchical Bayesian framework, and exploit it to enable structure learning and type discovery from small datasets. Specifically, we present a nonparametric generative model for directed acyclic graphs as a prior for Bayes net structure learning. Our model assumes that variables come in one or more classes and that the prior probability of an edge existing between two variables is a function only of their classes. We derive an MCMC algorithm for simultaneous inference of the number of classes, the class assignments of variables, and the Bayes net structure over variables. For several realistic, sparse datasets, we show that the bias towards systematicity of connections provided by our model can yield more accurate learned networks than the traditional approach of using a uniform prior, and that the classes found by our model are appropriate. 1