## Learning Bayesian Networks Using Feature Selection (1995)

Venue: | in D. Fisher & H. Lenz, eds, Proceedings of the fifth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL |

Citations: | 19 - 2 self |

### BibTeX

@INPROCEEDINGS{Provan95learningbayesian,

author = {Gregory M. Provan and Moninder Singh},

title = {Learning Bayesian Networks Using Feature Selection},

booktitle = {in D. Fisher & H. Lenz, eds, Proceedings of the fifth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL},

year = {1995},

pages = {450--456}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, high-predictive-accuracy networks. The new approach selects a subset of features which maximizes predictive accuracy prior to the network learning phase. We examine explicitly the effects of two aspects of the algorithm, feature selection and node ordering. Our approach generates networks which are computationally simpler to evaluate and which display predictive accuracy comparable to that of Bayesian networks which model all attributes. 1 INTRODUCTION Bayesian networks are being increasingly recognized as an important representation for probabilistic reasoning. For many domains, the need to specify the probability distributions for a Bayesian network is considerable, and learning these probabilities from data using an algorithm like K2 [8] 1 could alleviate such specification difficulties. We describe an extension to the Bayesian network learning approaches introduced in K2. Rather than ...

### Citations

1075 | Herskovitz: A Bayesian Method for the Induction
- Cooper, E
- 1992
(Show Context)
Citation Context ...babilistic reasoning. For many domains, the need to specify the probability distributions for a Bayesian network is considerable, and learning these probabilities from data using an algorithm like K2 =-=[8]-=- 1 could alleviate such specification difficulties. We describe an extension to the Bayesian network learning approaches introduced in K2. Rather than use all database features (or attributes) for con... |

741 |
Aha, “UCI repository of machine learning data bases,” http: //www.ics.uci.edu/~mlearn/MLRepository.html
- Murphy, W
- 1992
(Show Context)
Citation Context ...re the networks generated by our approach with those created by CB. We tested this method on four databases acquired from the University of California, Irvine Repository of Machine Learning databases =-=[21]-=-, namely Michalski's Soybean database, Slate's Letter Recognition database, the Gene-Splicing database due to Towell, Noordewier, and Shavlik, 4 and Shapiro's Chess Endgame database. Table 2: Comparis... |

645 |
Pattern recognition: A statistical approach
- Devijver, Kittler
- 1982
(Show Context)
Citation Context ...imality, like MDL. For example, Dawid discusses the close relation between subset selection and the MDL principle in [9]. The computer vision community has studied feature selection for over 20 years =-=[10]-=-, and has formed a Soybean Database (Random Orderings) Random 1 Random 2 Random 3 Random 4 Random 5 Random Common Subset Subset-CB Subset-K2 Predictive Accuracy (%) No. of Cases 40.00 45.00 50.00 55.0... |

595 | Irrelevant Features and the Subset Selection Problem
- John, Kohavi, et al.
- 1994
(Show Context)
Citation Context ...aluate and which display predictive accuracy comparable to that of Bayesian networks which model all features. Our results, similar to those observed by other studies of feature selection in learning =-=[6, 13, 17, 18]-=-, demonstrate that feature selection provides comparable predictive accuracy using smaller networks. For example, by selecting as few as 15% of the features for the gene-splice domain, we obtained a p... |

355 |
A Practical Approach to Feature Selection
- Kira, Rendell
- 1992
(Show Context)
Citation Context ... used for the learning, and a wrapper model uses induction algorithm itself for feature selection. Three filter-model approaches that have been taken are: the FOCUS algorithm [2] the Relief algorithm =-=[14, 15]-=- (which Kononenko has extended in [16]), and an extended nearest-neighbor algorithm [5]. Wrapper-based approaches have been studied in [13, 6, 18], among others. 9 A growing consensus in this research... |

297 | Estimating attributes: analysis and extensions of relief
- Kononenko
- 1994
(Show Context)
Citation Context ...l uses induction algorithm itself for feature selection. Three filter-model approaches that have been taken are: the FOCUS algorithm [2] the Relief algorithm [14, 15] (which Kononenko has extended in =-=[16]-=-), and an extended nearest-neighbor algorithm [5]. Wrapper-based approaches have been studied in [13, 6, 18], among others. 9 A growing consensus in this research is that the success of feature select... |

258 |
The feature selection problem: Traditional methods and a new algorithm
- Kira, Rendell
- 1992
(Show Context)
Citation Context ... used for the learning, and a wrapper model uses induction algorithm itself for feature selection. Three filter-model approaches that have been taken are: the FOCUS algorithm [2] the Relief algorithm =-=[14, 15]-=- (which Kononenko has extended in [16]), and an extended nearest-neighbor algorithm [5]. Wrapper-based approaches have been studied in [13, 6, 18], among others. 9 A growing consensus in this research... |

212 | Learning with Many Irrelevant Features
- Almuallim, Dietterich
- 1991
(Show Context)
Citation Context ...m the induction algorithm used for the learning, and a wrapper model uses induction algorithm itself for feature selection. Three filter-model approaches that have been taken are: the FOCUS algorithm =-=[2]-=- the Relief algorithm [14, 15] (which Kononenko has extended in [16]), and an extended nearest-neighbor algorithm [5]. Wrapper-based approaches have been studied in [13, 6, 18], among others. 9 A grow... |

211 | Induction of selective Bayesian classifiers
- Langley, Sage
- 1994
(Show Context)
Citation Context ...aluate and which display predictive accuracy comparable to that of Bayesian networks which model all features. Our results, similar to those observed by other studies of feature selection in learning =-=[6, 13, 17, 18]-=-, demonstrate that feature selection provides comparable predictive accuracy using smaller networks. For example, by selecting as few as 15% of the features for the gene-splice domain, we obtained a p... |

186 | Greedy attribute selection
- Caruana, Freitag
- 1994
(Show Context)
Citation Context ...aluate and which display predictive accuracy comparable to that of Bayesian networks which model all features. Our results, similar to those observed by other studies of feature selection in learning =-=[6, 13, 17, 18]-=-, demonstrate that feature selection provides comparable predictive accuracy using smaller networks. For example, by selecting as few as 15% of the features for the gene-splice domain, we obtained a p... |

183 |
A Branch and Bound Algorithm for Feature Subset Selection
- Narendra, Fukunaga
- 1977
(Show Context)
Citation Context ...atistics, research on feature selection has focused primarily on selecting a subset of features within linear regression. Techniques developed include sequential backward selection [20], branch&bound =-=[22], and sear-=-ch algorithms [23, 25]. A 1993 meeting of the Society of AI and Statistics was dedicated to papers on "Selecting Models from Data"[7], and contains a large number of papers on feature select... |

151 |
HUGINA Shell for Building Bayesian Belief Universes for Expert Systems
- Andersen, Olesen, et al.
- 1989
(Show Context)
Citation Context ...redictive accuracy of the network derived from the network construction phase. We performed inference on the networks using the Lauritzen-Spiegelhalter inference algorithm as implemented in the HUGIN =-=[Andersen89]-=- system. The K2-AS approach trades off the time required to construct a network from the full feature set (as done in K2) with precomputing a feature subset and subsequently constructing a network wit... |

129 | Selection of relevant features in machine learning
- Langley, Sage
- 1994
(Show Context)
Citation Context |

100 |
On Automatic Feature Selection
- Siedlecki, Sklansky
- 1988
(Show Context)
Citation Context ...e selection has focused primarily on selecting a subset of features within linear regression. Techniques developed include sequential backward selection [20], branch&bound [22], and search algorithms =-=[23, 25]. A 1993 m-=-eeting of the Society of AI and Statistics was dedicated to papers on "Selecting Models from Data"[7], and contains a large number of papers on feature selection. This statistical approach t... |

97 |
A further comparison of splitting rules for decision-tree induction
- Buntine, Niblett
- 1992
(Show Context)
Citation Context ... network with the highest predictive accuracy, but to identify a parsimonious model with good predictive accuracy. It is possible to compute multiple models and average over them (e.g. as proposed in =-=[19, 4]-=-) to obtain the best predictive accuracy, and we hope to take this approach in future work. In addition, we restrict our attention to Bayesian networks. To fairly compare the best possible predictive ... |

92 | Using Decision Trees to Improve Case-Based Learning
- Cardie
- 1993
(Show Context)
Citation Context ...ction. Three filter-model approaches that have been taken are: the FOCUS algorithm [2] the Relief algorithm [14, 15] (which Kononenko has extended in [16]), and an extended nearest-neighbor algorithm =-=[5]-=-. Wrapper-based approaches have been studied in [13, 6, 18], among others. 9 A growing consensus in this research is that the success of feature selection is strongly correlated to the data itself, as... |

77 | An algorithm for the construction of Bayesian network structures from data
- Singh, Valtorta
- 1993
(Show Context)
Citation Context ...ond phase computes the network (from the set of features \Delta) which maximizes the predictive accuracy over the test data. The learning algorithm that we use, called CB, is a modified version of K2 =-=[24]. Whereas -=-K2 assumes a node ordering, CB uses conditional independence (CI) tests to generate a "good" node ordering, and then uses the K2 algorithm to generate the Bayesian network from the database ... |

73 | Feature selection for case-based classification of cloud types: an empirical comparison
- Aha, Bankert
(Show Context)
Citation Context ...e been pre-selected for their relevance. It is expected that in such domains feature selection may not make a significant impact. One exception is the study of cloud classification by Aha and Bankert =-=[1]-=-, in which a set of 204 attributes were significantly pruned, leading the greatly improved performance. 10 Better understanding of data sets and of domains may lead to a deeper understanding of the ro... |

59 |
KUTATO: An Entropy-Driven System for Construction of Probabilistic Expert Systems from Databases
- Herskovits, Cooper
- 1990
(Show Context)
Citation Context ...odel all features. We examine explicitly the This work was supported by NSF grant #IRI9210030, and NLM grant #BLR 3 RO1 LMO5217-02S1. 1 K2 is a Bayesian reformulation of the Kutato learning algorithm =-=[12]-=-. effects of two aspects of the algorithm: (a) feature selection, and (b) node ordering. Our experimental results verify that this approach generates networks which are computationally simpler to eval... |

50 |
Prequential Analysis, Stochastic Complexity and Bayesian Inference
- Dawid
- 1992
(Show Context)
Citation Context ... selection shares many principles with other statistical notions of information minimality, like MDL. For example, Dawid discusses the close relation between subset selection and the MDL principle in =-=[9]-=-. The computer vision community has studied feature selection for over 20 years [10], and has formed a Soybean Database (Random Orderings) Random 1 Random 2 Random 3 Random 4 Random 5 Random Common Su... |

43 |
On the effectiveness of receptors in recognition systems
- Marill, Green
- 1963
(Show Context)
Citation Context ...st few years. In statistics, research on feature selection has focused primarily on selecting a subset of features within linear regression. Techniques developed include sequential backward selection =-=[20], branch&b-=-ound [22], and search algorithms [23, 25]. A 1993 meeting of the Society of AI and Statistics was dedicated to papers on "Selecting Models from Data"[7], and contains a large number of paper... |

26 |
Computer-Based Probabilistic-Network Construction, Doctoral Dissertation
- Herskovits
- 1991
(Show Context)
Citation Context ... the network for the chess domain, which is very densely connected. Figures 1 and 2 show the learning curves for the 6 The best results we ever got for the Soybean domain with CB were 86%. Herskovits =-=[11]-=-, even with his multiscore algorithm (using multiple networks for inference) , got about 86%. As a point of comparison, in the chess endgame domain decision trees are able to obtain 99% accuracy. Ches... |

23 | Strategies for graphical model selection
- Madigan, Raftery
- 1994
(Show Context)
Citation Context ... network with the highest predictive accuracy, but to identify a parsimonious model with good predictive accuracy. It is possible to compute multiple models and average over them (e.g. as proposed in =-=[19, 4]-=-) to obtain the best predictive accuracy, and we hope to take this approach in future work. In addition, we restrict our attention to Bayesian networks. To fairly compare the best possible predictive ... |

17 |
HUGIN - a shell for building belief universes for expert systems
- Andersen, Olesen, et al.
- 1990
(Show Context)
Citation Context ...redictive accuracy of the network derived from the network construction phase. We performed inference on the networks using the Lauritzen-Spiegelhalter inference algorithm as implemented in the HUGIN =-=[3]-=- system. 3 The K2-AS approach trades off the time required to construct a network from the full feature set (as done in K2) with precomputing a feature subset and subsequently constructing a network w... |

11 |
Best first strategy for feature selection
- Xu, Yan, et al.
- 1988
(Show Context)
Citation Context ...e selection has focused primarily on selecting a subset of features within linear regression. Techniques developed include sequential backward selection [20], branch&bound [22], and search algorithms =-=[23, 25]. A 1993 m-=-eeting of the Society of AI and Statistics was dedicated to papers on "Selecting Models from Data"[7], and contains a large number of papers on feature selection. This statistical approach t... |

8 |
R.W.: Selecting models from data
- Cheeseman, Oldfors
- 1994
(Show Context)
Citation Context ...nclude sequential backward selection [20], branch&bound [22], and search algorithms [23, 25]. A 1993 meeting of the Society of AI and Statistics was dedicated to papers on "Selecting Models from =-=Data"[7]-=-, and contains a large number of papers on feature selection. This statistical approach to subset selection shares many principles with other statistical notions of information minimality, like MDL. F... |