Results 1 - 10
of
70
Bagging, Boosting, and C4.5
- In Proceedings of the Thirteenth National Conference on Artificial Intelligence
, 1996
"... Breiman's bagging and Freund and Schapire's boosting are recent methods for improving the predictive power of classifier learning systems. Both form a set of classifiers that are combined by voting, bagging by generating replicated bootstrap samples of the data, and boosting by adjusting the weight ..."
Abstract
-
Cited by 251 (1 self)
- Add to MetaCart
Breiman's bagging and Freund and Schapire's boosting are recent methods for improving the predictive power of classifier learning systems. Both form a set of classifiers that are combined by voting, bagging by generating replicated bootstrap samples of the data, and boosting by adjusting the weights of training instances. This paper reports results of applying both techniques to a system that learns decision trees and testing on a representative collection of datasets. While both approaches substantially improve predictive accuracy, boosting shows the greater benefit. On the other hand, boosting also produces severe degradation on some datasets. A small change to the way that boosting combines the votes of learned classifiers reduces this downside and also leads to slightly better results on most of the datasets considered. Introduction Designers of empirical machine learning systems are concerned with such issues as the computational cost of the learning method and the accuracy and ...
JAM: Java Agents for Meta-Learning over Distributed Databases
- In Proc. 3rd Intl. Conf. Knowledge Discovery and Data Mining
, 1997
"... In this paper, we describe the JAM system, a distributed, scalable and portable agent-based data mining system that employs a general approach to scaling data mining applications that we have come to call meta-learning. JAM provides a set of learning programs, implemented either as JAVA applets or a ..."
Abstract
-
Cited by 115 (23 self)
- Add to MetaCart
In this paper, we describe the JAM system, a distributed, scalable and portable agent-based data mining system that employs a general approach to scaling data mining applications that we have come to call meta-learning. JAM provides a set of learning programs, implemented either as JAVA applets or applications, that compute models over data stored locally at a site. JAM also provides a set of meta-learning agents for combining multiple models that were learned (perhaps) at different sites. It employs a special distribution mechanism which allows the migration of the derived models or classifier agents to other remote sites. We describe the overall architecture of the JAM system and the specific implementation currently under development at Columbia University. One of JAM's target applications is fraud and intrusion detection in financial information systems. A brief description of this learning task and JAM's applicability are also described. Interested users may download JAM from http...
Meta-Learning in Distributed Data Mining Systems: Issues and Approaches
- Advances of Distributed Data Mining
, 2000
"... Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach to this objective is to apply various machine learning algorithms to compute descriptive models of the available data. Here, we explore one of the main challeng ..."
Abstract
-
Cited by 71 (0 self)
- Add to MetaCart
Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach to this objective is to apply various machine learning algorithms to compute descriptive models of the available data. Here, we explore one of the main challenges in this research area, the development of techniques that scale up to large and possibly physically distributed databases. Meta-learning is a technique that seeks to compute higher-level classifiers (or classification models), called meta-classifiers, that integrate in some principled fashion multiple classifiers computed separately over different databases. This study, describes meta-learning and presents the JAM system (Java Agents for Meta-learning), an agent-based meta-learning system for large-scale data mining applications. Specifically, it identifies and addresses several important desiderata for distributed data mining systems that stem from their additional complexity co...
Issues in Stacked Generalization
- Journal of Artificial Intelligence Research
, 1999
"... Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked gener ..."
Abstract
-
Cited by 71 (1 self)
- Add to MetaCart
Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones.
Improving data driven wordclass tagging by system combination
, 1998
"... In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best indi-vidua | system. We do this by means of an ex-periment involving the task of morpho-syntactic wordclass tagging. ..."
Abstract
-
Cited by 58 (8 self)
- Add to MetaCart
In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best indi-vidua | system. We do this by means of an ex-periment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger gen-erators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy)
Linear and Order Statistics Combiners for Pattern Classification
- Combining Artificial Neural Nets
, 1999
"... Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification resul ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the "added" error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.
Knowledge Acquisition from Examples Via Multiple Models
- In Proceedings of the Fourteenth International Conference on Machine Learning
, 1997
"... If it is to qualify as knowledge, a learner's output should be accurate, stable and comprehensible. Learning multiple models can improve significantly on the accuracy and stability of single models, but at the cost of losing their comprehensibility (when they possess it, as do, for example, simple d ..."
Abstract
-
Cited by 51 (7 self)
- Add to MetaCart
If it is to qualify as knowledge, a learner's output should be accurate, stable and comprehensible. Learning multiple models can improve significantly on the accuracy and stability of single models, but at the cost of losing their comprehensibility (when they possess it, as do, for example, simple decision trees and rule sets). This paper proposes and evaluates CMM, a meta-learner that seeks to retain most of the accuracy gains of multiple model approaches, while still producing a single comprehensible model. CMM is based on reapplying the base learner to recover the frontiers implicit in the multiple model ensemble. This is done by giving the base learner a new training set, composed of a large number of examples generated and classified according to the ensemble, plus the original examples. CMM is evaluated using C4.5RULES as the base learner, and bagging as the multiple-model methodology. On 26 benchmark datasets, CMM retains on average 60% of the accuracy gains obtained by bagging ...
A Non-Invasive Learning Approach to Building Web User Profiles
, 1999
"... Introduction Recently researchers have started to make web browsers more adaptive and personalized. A personalized web browser caters to the user's interests and an adaptive one learns from the users' (potentially changing) access behavior. The goal is to help the user navigate the web. Lieberman's ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
Introduction Recently researchers have started to make web browsers more adaptive and personalized. A personalized web browser caters to the user's interests and an adaptive one learns from the users' (potentially changing) access behavior. The goal is to help the user navigate the web. Lieberman's Letizia [13] monitors the user's browsing behavior, develops a user profile, and searches for potentially interesting pages for recommendations. The user profile is developed without intervention from the user (but the details of how that is performed is not clear in [13]). While the user is reading a page, Letizia searches, in a breadth-first manner, from that location, pages that could be of interest to the user. Pazzani et al.'s Syskill & Webert [18, 19] asks the user to rank pages in a specific topic. Based on the content and ratings of pages, the system learns a user profile that predicts if pages are of interest to th
Using Correspondence Analysis to Combine Classifiers
- Machine Learning
, 1998
"... . Several effective methods have been developed recently for improving predictive performance by generating and combining multiple learned models. The general approach is to create a set of learned models either by applying an algorithm repeatedly to different versions of the training data, or by ap ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
. Several effective methods have been developed recently for improving predictive performance by generating and combining multiple learned models. The general approach is to create a set of learned models either by applying an algorithm repeatedly to different versions of the training data, or by applying different learning algorithms to the same data. The predictions of the models are then combined according to a voting scheme. This paper focuses on the task of combining the predictions of a set of learned models. The method described uses the strategies of stacking and Correspondence Analysis to model the relationship between the learning examples and their classification by a collection of learned models. A nearest neighbor method is then applied within the resulting representation to classify previously unseen examples. The new algorithm does not perform worse than, and frequently performs significantly better than other combining techniques on a suite of data sets. Keywords: Clas...
An Extensible Meta-Learning Approach for Scalable and Accurate Inductive Learning
, 1996
"... Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Som ..."
Abstract
-
Cited by 42 (8 self)
- Add to MetaCart
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Some learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of data, especially for applications in data mining. One approach to handling a large data set is to partition the data set into subsets, run the learning algorithm on each of the subsets, and combine the results. Moreover, data can be inherently distributed across multiple sites on the network and merging all the data in one location can be expensive or prohibitive. In this thesis we propose, investigate, and evaluate a meta-learning approach to integrating the results of mul...

