Results 11  20
of
404
Mining ConceptDrifting Data Streams Using Ensemble Classifiers
, 2003
"... Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two ch ..."
Abstract

Cited by 176 (31 self)
 Add to MetaCart
Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two challenges, the overwhelming volume of the streaming data, and the concept drifts. In this paper, we propose a general framework for mining conceptdrifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Bayesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the timeevolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. Our empirical study shows that the proposed methods have substantial advantage over singleclassifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.
Prediction Games and Arcing Algorithms
, 1997
"... The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost (Freund and Schapire [1995].[1996]) and others in reducing generalization error has not been well understood. By formulating prediction, both classification and regression, as a game where one pl ..."
Abstract

Cited by 138 (0 self)
 Add to MetaCart
The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost (Freund and Schapire [1995].[1996]) and others in reducing generalization error has not been well understood. By formulating prediction, both classification and regression, as a game where one player makes a selection from instances in the training set and the other a convex linear combination of predictors from a finite set, existing arcing algorithms are shown to be algorithms for finding good game strategies. An optimal game strategy finds a combined predictor that minimizes the maximum of the error over the training set. A bound on the generalization error for the combined predictors in terms of their maximum error is proven that is sharper than bounds to date. Arcing algorithms are described that converge to the optimal strategy. Schapire et.al. [1997] offered an explanation of why Adaboost works in terms of its ability to reduce the margin. Comparing Adaboost to our optimal ar...
Tree Induction for Probabilitybased Ranking
, 2002
"... Tree induction is one of the most effective and widely used methods for building classification models. However, many applications require cases to be ranked by the probability of class membership. Probability estimation trees (PETs) have the same attractive features as classification trees (e.g., c ..."
Abstract

Cited by 130 (4 self)
 Add to MetaCart
Tree induction is one of the most effective and widely used methods for building classification models. However, many applications require cases to be ranked by the probability of class membership. Probability estimation trees (PETs) have the same attractive features as classification trees (e.g., comprehensibility, accuracy and efficiency in high dimensions and on large data sets). Unfortunately, decision trees have been found to provide poor probability estimates. Several techniques have been proposed to build more accurate PETs, but, to our knowledge, there has not been a systematic experimental analysis of which techniques actually improve the probabilitybased rankings, and by how much. In this paper we first discuss why the decisiontree representation is not intrinsically inadequate for probability estimation. Inaccurate probabilities are partially the result of decisiontree induction algorithms that focus on maximizing classification accuracy and minimizing tree size (for example via reducederror pruning). Larger trees can be better for probability estimation, even if the extra size is superfluous for accuracy maximization. We then present the results of a comprehensive set of experiments, testing some straghtforward methods for improving probabilitybased rankings. We show that using a simple, common smoothing methodthe Laplace correctionuniformly improves probabilitybased rankings. In addition, bagging substantioJly improves the rankings, and is even more effective for this purpose than for improving accuracy. We conclude that PETs, with these simple modifications, should be considered when rankings based on classmembership probability are required.
Boosting Algorithms as Gradient Descent
, 2000
"... Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier h ..."
Abstract

Cited by 115 (2 self)
 Add to MetaCart
Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier having large margins on the training data. We present an abstract algorithm for finding linear combinations of functions that minimize arbitrary cost functionals (i.e functionals that do not necessarily depend on the margin). Many existing voting methods can be shown to be special cases of this abstract algorithm. Then, following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction
, 2002
"... For large, realworld inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the data and/or the computational costs associated with learning from the data. One question of practical importance is: if n ..."
Abstract

Cited by 109 (9 self)
 Add to MetaCart
For large, realworld inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the data and/or the computational costs associated with learning from the data. One question of practical importance is: if n training examples are going to be selected, in what proportion should the classes be represented? In this article we analyze the relationship between the marginal class distribution of training data and the performance of classification trees induced from these data, when the size of the training set is fixed. We study twentysix data sets and, for each, determine the best class distribution for learning. Our results show that, for a fixed number of training examples, it is often possible to obtain improved classifier performance by training with a class distribution other than the naturally occurring class distribution. For example, we show that to build a classifier robust to different misclassification costs, a balanced class distribution generally performs quite well. We also describe and evaluate a budgetsensitive progressivesampling algorithm that selects training examples such that the resulting training set has a good (nearoptimal) class distribution for learning.
Linear programming boosting via column generation
 Machine Learning
, 2002
"... 1 Introduction Recent papers [20] have shown that boosting, arcing, and related ensemble methods (hereafter summarized asboosting) can be viewed as margin maximization in function space. By changing the cost function, different ..."
Abstract

Cited by 101 (3 self)
 Add to MetaCart
1 Introduction Recent papers [20] have shown that boosting, arcing, and related ensemble methods (hereafter summarized asboosting) can be viewed as margin maximization in function space. By changing the cost function, different
Fast discriminative visual codebooks using randomized clustering forests
 In NIPS
, 2007
"... Some of the most effective recent methods for contentbased image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as kmeans vector quantization, accumulating histograms of the resulting “visual word ” codes over the image, a ..."
Abstract

Cited by 101 (4 self)
 Add to MetaCart
Some of the most effective recent methods for contentbased image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as kmeans vector quantization, accumulating histograms of the resulting “visual word ” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using kmeans. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several stateoftheart image classification tasks. 1
Boosting in the limit: Maximizing the margin of learned ensembles
 In Proceedings of the Fifteenth National Conference on Artificial Intelligence
, 1998
"... The "minimum margin" of an ensemble classifier on a given training set is, roughly speaking, the smallest vote it gives to any correct training label. Recent work has shown that the Adaboost algorithm is particularly effective at producing ensembles with large minimum margins, and theory suggests th ..."
Abstract

Cited by 97 (0 self)
 Add to MetaCart
The "minimum margin" of an ensemble classifier on a given training set is, roughly speaking, the smallest vote it gives to any correct training label. Recent work has shown that the Adaboost algorithm is particularly effective at producing ensembles with large minimum margins, and theory suggests that this may account for its success at reducing generalization error. We note, however, that the problem of finding good margins is closely related to linear programming, and we use this connection to derive and test new "LPboosting" algorithms that achieve better minimum margins than Adaboost. However, these algorithms do not always yield better generalization performance. In fact, more often the opposite is true. We report on a series of controlled experiments which show that no simple version of the minimummargin story can be complete. We conclude that the crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open. Some of our ...