Results 1  10
of
117
CostSensitive Learning by CostProportionate Example Weighting
, 2003
"... We propose and evaluate a family of methods for converting classifier learning algorithms and classification theory into costsensitive algorithms and theory. The proposed conversion is based on costproportionate weighting of the training examples, which can be realized either by feeding the weight ..."
Abstract

Cited by 105 (13 self)
 Add to MetaCart
We propose and evaluate a family of methods for converting classifier learning algorithms and classification theory into costsensitive algorithms and theory. The proposed conversion is based on costproportionate weighting of the training examples, which can be realized either by feeding the weights to the classification algorithm (as often done in boosting), or by careful subsampling. We give some theoretical performance guarantees on the proposed methods, as well as empirical evidence that they are practical alternatives to existing approaches. In particular, we propose costing, a method based on costproportionate rejection sampling and ensemble aggregation, which achieves excellent predictive performance on two publicly available datasets, while drastically reducing the computation required by other methods.
Incremental Online Learning in High Dimensions
 Neural Computation
, 2005
"... Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e ..."
Abstract

Cited by 104 (15 self)
 Add to MetaCart
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e#cient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic leaveoneout cross validation for learning without the need to memorize training data, iii) adjusts its weighting kernels based only on local information in order to minimize the danger of negative interference of incremental learning, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of  possibly redundant  inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and e#ciently operate in very high dimensional spaces.
Mining DistanceBased Outliers in Near Linear Time with Randomization and a Simple Pruning Rule
, 2003
"... Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic ..."
Abstract

Cited by 103 (4 self)
 Add to MetaCart
Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real highdimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the e#ciency is because the time to process nonoutliers, which are the majority of examples, does not depend on the size of the data set.
Learning and Making Decisions When Costs and Probabilities are Both Unknown
 In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining
, 2001
"... In many machine learning domains, misclassication costs are dierent for dierent examples, in the same way that class membership probabilities are exampledependent. In these domains, both costs and probabilities are unknown for test examples, so both cost estimators and probability estimators must be ..."
Abstract

Cited by 96 (9 self)
 Add to MetaCart
In many machine learning domains, misclassication costs are dierent for dierent examples, in the same way that class membership probabilities are exampledependent. In these domains, both costs and probabilities are unknown for test examples, so both cost estimators and probability estimators must be learned. This paper rst discusses how to make optimal decisions given cost and probability estimates, and then presents decision tree learning methods for obtaining wellcalibrated probability estimates. The paper then explains how to obtain unbiased estimators for exampledependent costs, taking into account the diculty that in general, probabilities and costs are not independent random variables, and the training examples for which costs are known are not representative of all examples. The latter problem is called sample selection bias in econometrics. Our solution to it is based on Nobel prizewinning work due to the economist James Heckman. We show that the methods we propose are s...
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
 In Proceedings of the Eighteenth International Conference on Machine Learning
, 2001
"... Accurate, wellcalibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a costsensitive decision must be made about examples with exampledependent costs. This paper presents simple but successful methods for obtaining calibrated ..."
Abstract

Cited by 95 (4 self)
 Add to MetaCart
Accurate, wellcalibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a costsensitive decision must be made about examples with exampledependent costs. This paper presents simple but successful methods for obtaining calibrated probability estimates from decision tree and naive Bayesian classifiers. Using the large and challenging KDD'98 contest dataset as a testbed, we report the results of a detailed experimental comparison of ten methods, according to four evaluation measures. We conclude that binning succeeds in significantly improving naive Bayesian probability estimates, while for improving decision tree probability estimates, we recommend smoothing by estimation and a new variant of pruning that we call curtailment.
Learning and Evaluating Classifiers under Sample Selection Bias
 In International Conference on Machine Learning ICML’04
, 2004
"... Classifier learning methods commonly assume that the training data consist of randomly drawn examples from the same distribution as the test examples about which the learned model is expected to make predictions. ..."
Abstract

Cited by 79 (2 self)
 Add to MetaCart
Classifier learning methods commonly assume that the training data consist of randomly drawn examples from the same distribution as the test examples about which the learned model is expected to make predictions.
Transforming Classifier Scores into Accurate Multiclass Probability Estimates
, 2002
"... Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decisionmaking, such as exampledependent misclassification costs, the outputs of other classifiers, or domain knowledge. Prev ..."
Abstract

Cited by 67 (5 self)
 Add to MetaCart
Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decisionmaking, such as exampledependent misclassification costs, the outputs of other classifiers, or domain knowledge. Previous calibration methods apply only to twoclass problems. Here, we show how to obtain accurate probability estimates for multiclass problems by combining calibrated binary probability estimates. We also propose a new method for obtaining calibrated twoclass probability estimates that can be applied to any classifier that produces a ranking of examples. Using naive Bayes and support vector machine classifiers, we give experimental results from a variety of twoclass and multiclass domains, including direct marketing, text categorization and digit recognition.
Pattern Extraction for Time Series Classification
, 2001
"... In this paper, we propose some new tools to allow machine learning classifiers to cope with time series data. We first argue that many timeseries classification problems can be solved by detecting and combining local properties or patterns in time series. Then, a technique is proposed to find patte ..."
Abstract

Cited by 53 (2 self)
 Add to MetaCart
In this paper, we propose some new tools to allow machine learning classifiers to cope with time series data. We first argue that many timeseries classification problems can be solved by detecting and combining local properties or patterns in time series. Then, a technique is proposed to find patterns which are useful for classification. These patterns are combined to build interpretable classification rules. Experiments, carried out on several artificial and real problems, highlight the interest of the approach both in terms of interpretability and accuracy of the induced classifiers.
Experimental Comparisons of Online and Batch Versions of Bagging and Boosting
 In ACM SIGKDD
"... Bagging and boosting are wellknown ensemble learning methods. They combine multiple learned base models with the aim of improving generalization performance. To date, they have been used primarily in batch mode, i.e., they require multiple passes through the training data. In previous work, we pres ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
Bagging and boosting are wellknown ensemble learning methods. They combine multiple learned base models with the aim of improving generalization performance. To date, they have been used primarily in batch mode, i.e., they require multiple passes through the training data. In previous work, we presented online bagging and boosting algorithms that only require one pass through the training data and presented experimental results on some relatively small datasets. Through additional experiments on a variety of larger synthetic and real datasets, this paper demonstrates that our online versions perform comparably to their batch counterparts in terms of classification accuracy. We also demonstrate the substantial reduction in running time we obtain with our online algorithms because they require fewer passes through the training data.