• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 2,288
Next 10 →

An experimental comparison of three methods for constructing ensembles of decision trees

by Thomas G. Dietterich, Doug Fisher - Bagging, boosting, and randomization. Machine Learning , 2000
"... Abstract. Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base ” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approac ..."
Abstract - Cited by 610 (6 self) - Add to MetaCart
of the decision-tree algorithm C4.5. The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than

An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.

by Eric Bauer , Philip Chan , Salvatore Stolfo , David Wolpert - Machine Learning, , 1999
"... Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several vari ..."
Abstract - Cited by 707 (2 self) - Add to MetaCart
variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, affect classification error. We provide a bias

K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classication Learning. In:

by Keki B Irani , Usama M Fayyad - IJCAI. , 1993
"... Abstract Since most real-world applications of classification learning involve continuous-valued attributes, properly addressing the discretization process is an important problem. This paper addresses the use of the entropy minimization heuristic for discretizing the range of a continuous-valued a ..."
Abstract - Cited by 832 (7 self) - Add to MetaCart
formally derive a criterion based on the minimum description length principle for deciding the partitioning of intervals. We demonstrate via empirical evaluation on several real-world data sets that better decision trees are obtained using the new multi-interval algorithm.

Support vector machines for spam categorization

by Harris Drucker, Donghui Wu, Vladimir N. Vapnik - IEEE TRANSACTIONS ON NEURAL NETWORKS , 1999
"... We study the use of support vector machines (SVM’s) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features ..."
Abstract - Cited by 342 (2 self) - Add to MetaCart
We study the use of support vector machines (SVM’s) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number

Database Mining: A Performance Perspective

by Rakesh Agrawal, Tomasz Imielinski, Arun Swami - IEEE Transactions on Knowledge and Data Engineering , 1993
"... We present our perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology. We describe three classes of database mining problems involving classification, associations, and sequences, and argue that these problems can be unifor ..."
Abstract - Cited by 345 (13 self) - Add to MetaCart
, classification, associations, sequences, decision trees Current address: Computer Science De...

SPRINT: A scalable parallel classifier for data mining

by John Shafer, Rakeeh Agrawal, Manish Mehta , 1996
"... Classification is an important data mining problem. Although classification is a well-studied problem, most of the current classi-fication algorithms require that all or a por-tion of the the entire dataset remain perma-nently in memory. This limits their suitability for mining over large databases. ..."
Abstract - Cited by 312 (8 self) - Add to MetaCart
. We present a new decision-tree-based classification algo-rithm, called SPRINT that removes all of the memory restrictions, and is fast and scalable. The algorithm has also been designed to be easily parallelized, allowing many processors to work together to build a single consistent model

Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

by Sreerama K. Murthy - Data Mining and Knowledge Discovery , 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract - Cited by 224 (1 self) - Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial

Popular ensemble methods: an empirical study

by David Opitz, Richard Maclin - Journal of Artificial Intelligence Research , 1999
"... An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Baggi ..."
Abstract - Cited by 296 (4 self) - Add to MetaCart
. Bagging (Breiman, 1996c) and Boosting (Freund & Schapire, 1996; Schapire, 1990) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods on 23 data sets using both neural networks and decision trees as our classification algorithm. Our results clearly

SLIQ: A Fast Scalable Classifier for Data Mining

by Manish Mehta, Rakesh Agrawal, Jorma Rissanen , 1996
"... . Classification is an important problem in the emerging field of data mining. Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This pap ..."
Abstract - Cited by 240 (9 self) - Add to MetaCart
is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees. The combination of these techniques enables SLIQ to scale for large data sets and classify data

Exemplar-based accounts of relations between classification, recognition, and typicality

by Robert M. Nosofsky - Journal of Experimentul Psychology: Learning, Memory, and Cognition , 1988
"... Previously published sets of classification and old-new recognition memory data are reanalyzed within the framework of an exemplar-based generalization model. The key assumption in the model is that, whereas classification decisions are based on the similarity of a probe to exemplars of a target cat ..."
Abstract - Cited by 179 (15 self) - Add to MetaCart
Previously published sets of classification and old-new recognition memory data are reanalyzed within the framework of an exemplar-based generalization model. The key assumption in the model is that, whereas classification decisions are based on the similarity of a probe to exemplars of a target
Next 10 →
Results 1 - 10 of 2,288
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University