Results 1  10
of
10
The Power of Decision Tables
 Proceedings of the European Conference on Machine Learning
, 1995
"... . We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and realworld domains containing only discre ..."
Abstract

Cited by 100 (5 self)
 Add to MetaCart
. We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and realworld domains containing only discrete features, IDTM, an algorithm inducing decision tables, can sometimes outperform stateoftheart algorithms such as C4.5. Surprisingly, performance is quite good on some datasets with continuous features, indicating that many datasets used in machine learning either do not require these features, or that these features have few values. We also describe an incremental method for performing crossvalidation that is applicable to incremental learning algorithms including IDTM. Using incremental crossvalidation, it is possible to crossvalidate a given dataset and IDTM in time that is linear in the number of instances, the number of features, and the number of label values. The time for incre...
Spike and slab variable selection: frequentist and bayesian strategies
 The Annals of Statistics
"... Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw con ..."
Abstract

Cited by 40 (7 self)
 Add to MetaCart
Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw connections to frequentist generalized ridge regression estimation. Specifically, we study the usefulness of continuous bimodal priors to model hypervariance parameters, and the effect scaling has on the posterior mean through its relationship to penalization. Several model selection strategies, some frequentist and some Bayesian in nature, are developed and studied theoretically. We demonstrate the importance of selective shrinkage for effective variable selection in terms of risk misclassification, and show this is achieved using the posterior from a rescaled spike and slab model. We also show how to verify a procedure’s ability to reduce model uncertainty in finite samples using a specialized forward selection strategy. Using this tool, we illustrate the effectiveness of rescaled spike and slab models in reducing model uncertainty. 1. Introduction. We
Model selection in electromagnetic source analysis with an application to VEF’s
 IEEE Transactions on Biomedical Engineering
, 2002
"... Abstract — In electromagnetic source analysis it is necessary to determine how many sources are required to describe the EEG or MEG adequately. Model selection procedures (MSP’s, or goodness of fit procedures) give an estimate of the required number of sources. Existing and new MSP’s are evaluated i ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Abstract — In electromagnetic source analysis it is necessary to determine how many sources are required to describe the EEG or MEG adequately. Model selection procedures (MSP’s, or goodness of fit procedures) give an estimate of the required number of sources. Existing and new MSP’s are evaluated in different source and noise settings: two sources which are close or distant, and noise which is uncorrelated or correlated. The commonly used MSP residual variance is seen to be ineffective, that is it often selects too many sources. Alternatives like the adjusted Hotelling’s test, Bayes information criterion, and the Wald test on source amplitudes are seen to be effective. The adjusted Hotelling’s test is recommended if a conservative approach is taken, and MSP’s such as Bayes information criterion or the Wald test on source amplitudes are recommended if a more liberal approach is desirable. The MSP’s are applied to empirical data (visual evoked fields). I.
Constructing New Attributes for Decision Tree Learning
, 1996
"... A wellknown fundamental limitation of selective induction algorithms is that when tasksupplied attributes are not adequate for, or directly relevant to, describing hypotheses, their performance in terms of prediction accuracy and/or theory complexity is poor. One solution to this problem is constru ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
A wellknown fundamental limitation of selective induction algorithms is that when tasksupplied attributes are not adequate for, or directly relevant to, describing hypotheses, their performance in terms of prediction accuracy and/or theory complexity is poor. One solution to this problem is constructive induction. It constructs, by using tasksupplied attributes, new attributes that are expected to be more appropriate than the tasksupplied attributes for describing the target concepts. This thesis focuses on constructive induction with decision trees as the theory description language. It explores: (1) novel approaches to constructing new binary attributes using existing constructive operators, and (2) novel methods of constructing new nominal and new continuousvalued attributes based on a newly proposed constructive operator. The thesis investigates a fixed rulebased approach to constructing new binary attributes for decision tree learning. It generates conjunctions from producti...
Unbiased Assessment of Learning Algorithms
 In IJCAI97
, 1997
"... In order to rank the performance of machine learning algorithms, many researchers conduct experiments on benchmark data sets. Since most learning algorithms have domainspecific parameters, it is a popular custom to adapt these parameters to obtain a minimal error rate on the test set. The same rate ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
In order to rank the performance of machine learning algorithms, many researchers conduct experiments on benchmark data sets. Since most learning algorithms have domainspecific parameters, it is a popular custom to adapt these parameters to obtain a minimal error rate on the test set. The same rate is then used to rank the algorithm, which causes an optimistic bias. We quantify this bias, showing, in particular, that an algorithm with more parameters will probably be ranked higher than an equally good algorithm with fewer parameters. We demonstrate this result, showing the number of parameters and trials required in order to pretend to outperform C4.5 or FOIL, respectively, for various benchmark problems. We then describe out how unbiased ranking experiments should be conducted. 1 Introduction Estimating the accuracy of a classifier is a topic that has experienced much attention in the ML community. One of the main results is that N fold cross validation provides a biasfree [ Sto74...
On the Development of Inductive Learning Algorithms: Generating Flexible and Adaptable Concept Representations
, 1998
"... ..."
Consistent Model Selection Based on Parameter Estimates
, 2002
"... We consider model selection based on estimators that are asymptotically normal. Such a method can be applied to the context of estimating equations, since a complete specification of the probability model or likelihood function is not required. We construct a cost function for the models in consider ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We consider model selection based on estimators that are asymptotically normal. Such a method can be applied to the context of estimating equations, since a complete specification of the probability model or likelihood function is not required. We construct a cost function for the models in consideration, and show that the minimizer of the cost function is a consistent estimator of the model. Despite the absence of a likelihood function, the cost function is shown to be related to an approximate posterior probability conditional on the parameter estimates, which enables a Bayesiantype evaluation of all candidate models and not just to present one best choice. The proposed method is modular and easily adapted to di erent problems, since only one set of estimates of the parameters and asymptotic variance is needed as the input, which can be obtained from very different estimation techniques for very different models, both linear and nonlinear. We also show that by ranking Zstatistics, the scope of model searching can be reduced to achieve computing efficiency. We provide data analysis examples from two clinical trials and illustrate these variable selection techniques in the contexts of partial likelihood analysis and generalized estimating equations. A third example of used automobile prices illustrates an application of the methodology in selecting graphical models.
Performance of Cross Validation in TreeBased Models
, 2005
"... Cross Validation (CV) is widely used to measure the performance of a classifier. The main purpose of this study is to explore the behavior of CV in treebased models. We report experimental studies that compare a crossvalidated tree classifier with an oracle classifier that is ideally derived on th ..."
Abstract
 Add to MetaCart
Cross Validation (CV) is widely used to measure the performance of a classifier. The main purpose of this study is to explore the behavior of CV in treebased models. We report experimental studies that compare a crossvalidated tree classifier with an oracle classifier that is ideally derived on the knowledge of underlying distributions. The main observation of this study indicates that the difference between the testing and training error from a crossvalidated tree classifier and an oracle classifier empirically has a linear regression relation. The “slope ” and the “R2 ” of regression models are employed as the performance measures of a crossvalidated tree classifier. Moreover, simulation reveals that the performance of a crossvalidated tree classifier depends on the geometry, the parameters of the underlying distributions, and sample size. Such observations can explain and justify the behavior of CV in treebased models. KEY WORDS: Cross validation; Data mining; Oracle property; Treesbased models 1.
Crossvalidation: the illusion of reliable performance estimation
"... Abstract. In data mining, we are often faced with the task of estimating model performance from training data. This estimation is supposed to express the expectation of the performance on future, previously unseen data and it is very much needed for business decisions and also for the analyst to com ..."
Abstract
 Add to MetaCart
Abstract. In data mining, we are often faced with the task of estimating model performance from training data. This estimation is supposed to express the expectation of the performance on future, previously unseen data and it is very much needed for business decisions and also for the analyst to compare different models. One of the most widely used performance estimation technique is crossvalidation which has more and more misuse in these days. This paper describes common mistakes in using crossvalidation that significantly obfuscate the estimations, presents several numerical examples on how misleading the estimation can be, and propose a data mining process for ensuring valid performance estimations. 1
RESEARCH ARTICLE Open Access
"... Crossvalidated stepwise regression for identification of novel nonnucleoside reverse transcriptase inhibitor resistance associated mutations ..."
Abstract
 Add to MetaCart
Crossvalidated stepwise regression for identification of novel nonnucleoside reverse transcriptase inhibitor resistance associated mutations