Results 1 - 10
of
20
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
- In Proc. 18th International Conf. on Machine Learning
, 2001
"... This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These other methods are popular because for many learning models, closed form calculation of the expec ..."
Abstract
-
Cited by 183 (2 self)
- Add to MetaCart
This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These other methods are popular because for many learning models, closed form calculation of the expected future error is intractable. Our approach is made feasible by taking a sampling approach to estimating the expected reduction in error due to the labeling of a query. In experimental results on two real-world data sets we reach high accuracy very quickly, sometimes with four times fewer labeled examples than competing methods. 1.
The CEO Problem
- IEEE Trans. Inform. Theory
, 1996
"... automated diagnosis, selfhealing and selfmonitoring systems, statistical induction and ..."
Abstract
-
Cited by 62 (3 self)
- Add to MetaCart
automated diagnosis, selfhealing and selfmonitoring systems, statistical induction and
Ensembles of models for automated diagnosis of system performance problems
- In DSN
, 2005
"... Violations of service level objectives (SLO) in Internet services are urgent conditions requiring immediate attention. Previously we showed [1] that Tree-Augmented Bayesian Networks or TAN models are effective at identifying which low-level system properties were correlated to high-level SLO violati ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
Violations of service level objectives (SLO) in Internet services are urgent conditions requiring immediate attention. Previously we showed [1] that Tree-Augmented Bayesian Networks or TAN models are effective at identifying which low-level system properties were correlated to high-level SLO violations (the metric attribution problem) under stable workloads. In this paper we extend our approach to adapt to changing workloads and external disturbances by maintaining an ensemble of probabilistic models, adding new models when existing ones do not accurately capture current system behavior. Using realistic workloads on an implemented prototype system, we show that the ensemble of TAN models captures the performance behavior of the system accurately under changing workloads and conditions. We fuse diagnoses from the ensemble of models to identify likely causes of the performance problem, with results comparable to those produced by an oracle that continuously changes the model based on advance knowledge of the workload. The cost of inducing new models and managing the ensembles is negligible, making our approach both immediately practical and theoretically appealing.
Model averaging for prediction with discrete Bayesian networks
- Journal of Machine Learning Research
, 1177
"... In this paper1 we consider the problem of performing Bayesian model-averaging over a class of discrete Bayesian network structures consistent with a partial ordering and with bounded in-degree k. We show that for N nodes this class contains in the worst-case at least Ω ( �N/2�N/2 k) distinct network ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this paper1 we consider the problem of performing Bayesian model-averaging over a class of discrete Bayesian network structures consistent with a partial ordering and with bounded in-degree k. We show that for N nodes this class contains in the worst-case at least Ω ( �N/2�N/2 k) distinct network structures, and yet model averaging over these structures can be performed using O ( �N � k · N) operations. Furthermore we show that there exists a single Bayesian network that defines a joint distribution over the variables that is equivalent to model averaging over these structures. Although constructing this network is computationally prohibitive, we show that it can be approximated by a tractable network, allowing approximate model-averaged probability calculations to be performed in O(N) time. Our result also leads to an exact and linear-time solution to the problem of averaging over the 2N possible feature sets in a naïve Bayes model, providing an exact Bayesian solution to the troublesome feature-selection problem for naïve Bayes classifiers. We demonstrate the utility of these techniques in the context of supervised classification, showing empirically that model averaging consistently beats other generative Bayesian-network-based models, even when the generating model is not guaranteed to be a member of the class being averaged over. We characterize the performance over several parameters on simulated and real-world data.
Getting the most out of ensemble selection
- In ICDM ’06: Proceedings of the Sixth International Conference on Data Mining
, 2006
"... We investigate four previously unexplored aspects of ensemble selection, a procedure for building ensembles of classifiers. First we test whether adjusting model predictions to put them on a canonical scale makes the ensembles more effective. Second, we explore the performance of ensemble selection ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We investigate four previously unexplored aspects of ensemble selection, a procedure for building ensembles of classifiers. First we test whether adjusting model predictions to put them on a canonical scale makes the ensembles more effective. Second, we explore the performance of ensemble selection when different amounts of data are available for ensemble hillclimbing. Third, we quantify the benefit of ensemble selection’s ability to optimize to arbitrary metrics. Fourth, we study the performance impact of pruning the number of models available for ensemble selection. Based on our results we present improved ensemble selection methods that double the benefit of the original method. 1
To Select or To Weigh: A Comparative Study of Linear Combination Schemes for SuperParent-One-Dependence Estimators
"... We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of semi-naive Bayesian classifiers. Altogether 16 model selection and weighing schemes, 58 benchmark data sets, as well as various statistical tests are employed. This p ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of semi-naive Bayesian classifiers. Altogether 16 model selection and weighing schemes, 58 benchmark data sets, as well as various statistical tests are employed. This paper’s main contributions are three-fold. First, it formally presents each scheme’s definition, rationale and time complexity; and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each scheme’s classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms with immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets.
Robust bayesian linear classifier ensembles
- Proc. 16th European Conf. Machine Learning, Lecture Notes in Computer Science
, 2005
"... Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators. We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.
Toward optimal active learning through monte carlo estimation of error reduction
- in Proceedings of the International Conference on Machine Learning
, 2001
"... This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These methods are popular because for many learning models, closed form calculation of the expected future ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These methods are popular because for many learning models, closed form calculation of the expected future error is intractable. Our approach is made feasible by taking a Monte Carlo approach to estimating the expected reduction in error due to the labeling of a query. In experimental results on three real-world data sets we reach high accuracy with four times fewer labelled examples than competing methods. 1.
An Ensemble Technique for Stable Learners with Performance
, 2004
"... Ensemble techniques such as bagging and DECORATE exploit the "instability" of learners, such as decision trees, to create a diverse set of models. However their application to stable learners such as nave Bayes, does not yield as much improvement and can sometimes degrade performance a claim we ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Ensemble techniques such as bagging and DECORATE exploit the "instability" of learners, such as decision trees, to create a diverse set of models. However their application to stable learners such as nave Bayes, does not yield as much improvement and can sometimes degrade performance a claim we empirically verify in this paper. But stable learners are desirable for their superior performance on some problems and algorithmic simplicity. We show that many learners commonly referred to as stable have Gaussian posterior distributions. Given such a well defined posterior distribution we can use both parametric and nonparametric bootstrapping to create a process that approximates taking draws from their posterior. By summing the joint distribution of the instance and the class we are approximating posterior model averaging a.k.a. the optimal Bayes classifier (OBC) which is known to minimize the Bayes error. We refer to our approach as bootstrap model averaging. Since model averaging removes the model uncertainty it works best when there is much model uncertainty and does no harm when there is little, a claim we empirically verify. Since the Gaussian distribution is mathematically well understood we can bound the increase over the OBC error using a Chebychev bound as a function of the number of models built. We empirically illustrate our approach's usefulness and verify our bound's correctness.
A.: The Bayesian Decision Tree Technique with a Sweeping Strategy
- In: Advances in Intelligent Systems - Theory and Applications, In cooperation with the IEEE Computer Society
, 2004
"... Abstract--The uncertainty of classification outcomes is of crucial importance for many safety critical applications including, for example, medical diagnostics. In such applications the uncertainty of classification can be reliably estimated within a Bayesian model averaging technique that allows th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract--The uncertainty of classification outcomes is of crucial importance for many safety critical applications including, for example, medical diagnostics. In such applications the uncertainty of classification can be reliably estimated within a Bayesian model averaging technique that allows the use of prior information. Decision Tree (DT) classification models used within such a technique gives experts additional information by making this classification scheme observable. The use of the Markov Chain Monte Carlo (MCMC) methodology of stochastic sampling makes the Bayesian DT technique feasible to perform. However, in practice, the MCMC technique may become stuck in a particular DT which is far away from a region with a maximal posterior. Sampling such DTs causes bias in the posterior estimates, and as a result the evaluation of classification uncertainty may be incorrect. In a particular case, the negative effect of such sampling may be reduced by giving additional prior information on the shape of DTs. In this paper we describe a new approach based on sweeping the DTs without additional priors on the favorite shape of DTs. The performances of Bayesian DT techniques with the standard and sweeping strategies are compared on a synthetic data as well as on real datasets. Quantitatively evaluating the uncertainty in terms of entropy of class posterior probabilities, we found that the sweeping strategy is superior to the standard strategy.

