Results 1 
7 of
7
MachineLearning Research  Four Current Directions
"... Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up super ..."
Abstract

Cited by 114 (1 self)
 Add to MetaCart
Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up supervised learning algorithms, (c) reinforcement learning, and (d) learning complex stochastic models.
Learning Limited Dependence Bayesian Classifiers
 In KDD96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining
, 1996
"... We present a framework for characterizing Bayesian classification methods. This framework can be thought of as a spectrum of allowable dependence in a given probabilistic model with the Naive Bayes algorithm at the most restrictive end and the learning of full Bayesian networks at the most general e ..."
Abstract

Cited by 108 (5 self)
 Add to MetaCart
We present a framework for characterizing Bayesian classification methods. This framework can be thought of as a spectrum of allowable dependence in a given probabilistic model with the Naive Bayes algorithm at the most restrictive end and the learning of full Bayesian networks at the most general extreme. While much work has been carried out along the two ends of this spectrum, there has been surprising little done along the middle. We analyze the assumptions made as one moves along this spectrum and show the tradeoffs between model accuracy and learning speed which become critical to consider in a variety of data mining domains. We then present a general induction algorithm that allows for traversal of this spectrum depending on the available computational power for carrying out induction and show its application in a number of domains with different properties. Introduction Recently, work in Bayesian methods for classification has grown enormously (Cooper & Herskovits 1992) (Buntin...
Improving Simple Bayes
, 1997
"... The simple Bayesian classifier (SBC), sometimes called NaiveBayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classificat ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
The simple Bayesian classifier (SBC), sometimes called NaiveBayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classification models even when there are clear conditional dependencies. We examine different approaches for handling unknowns and zero counts when estimating probabilities. Large scale experiments on 37 datasets were conducted to determine the effects of these approaches and several interesting insights are given, including a new variant of the Laplace estimator that outperforms other methods for dealing with zero counts. Using the biasvariance decomposition [15, 10], we show that while the SBC has performed well on common benchmark datasets, its accuracy will not scale up as the dataset sizes grow. Even with these limitations in mind, the SBC can serve as an excellenttool for initial exp...
Estimating dependency structure as a hidden variable
 In NIPS
, 1998
"... This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms based on the EM and the Minimum Spann ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms based on the EM and the Minimum Spanning Tree algorithms that learn mixtures of trees in the ML framework. The method can be extended to take into account priors and, for a wide class of priors that includes the Dirichlet and the MDL priors, it preserves its computational efficiency. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification. Finally, we show that a single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes.
Theory refinement of bayesian networks with hidden variables
 In Machine Learning: Proceedingsof the International Conference
, 1998
"... Copyright by ..."
Integrating learning from examples into the search for diagnostic policies
 Artificial Intelligence
, 1998
"... This paper studies the problem of learning diagnostic policies from training examples. A diagnostic policy is a complete description of the decisionmaking actions of a diagnostician (i.e., tests followed by a diagnostic decision) for all possible combinations of test results. An optimal diagnostic ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This paper studies the problem of learning diagnostic policies from training examples. A diagnostic policy is a complete description of the decisionmaking actions of a diagnostician (i.e., tests followed by a diagnostic decision) for all possible combinations of test results. An optimal diagnostic policy is one that minimizes the expected total cost, which isthe sum of measurement costs and misdiagnosis costs. In most diagnostic settings, there is a tradeo between these two kinds of costs. This paper formalizes diagnostic decision making as a Markov Decision Process (MDP). The paper introduces a new family of systematic search algorithms based on the AO algorithm to solve this MDP.To makeAO e cient, the paper describes an admissible heuristic that enables AO to prune large parts of the search space. The paper also introduces several greedy algorithms including some improvements over previouslypublished methods. The paper then addresses the question of learning diagnostic policies from examples. When the probabilities of diseases and test results are computed from training data, there is a great danger of over tting. To reduce over tting, regularizers are integrated into the search algorithms. Finally, the paper compares the proposed methods on ve benchmark diagnostic data sets. The studies show that in most cases the systematic search methods produce better diagnostic policies than the greedy methods. In addition, the studies show that for training sets of realistic size, the systematic search algorithms are practical on today's desktop computers. 1.