Results 1  10
of
57
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract

Cited by 716 (0 self)
 Add to MetaCart
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust nonparametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding posthoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
Composite kernel learning
 IN PROC. ICML
, 2008
"... The Support Vector Machine (SVM) is an acknowledged powerful tool for building classifiers, but it lacks flexibility, in the sense that the kernel is chosen prior to learning. Multiple Kernel Learning (MKL) enables to learn the kernel, from an ensemble of basis kernels, whose combination is optimize ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
The Support Vector Machine (SVM) is an acknowledged powerful tool for building classifiers, but it lacks flexibility, in the sense that the kernel is chosen prior to learning. Multiple Kernel Learning (MKL) enables to learn the kernel, from an ensemble of basis kernels, whose combination is optimized in the learning process. Here, we propose Composite Kernel Learning to address the situation where distinct components give rise to a group structure among kernels. Our formulation of the learning problem encompasses several setups, putting more or less emphasis on the group structure. We characterize the convexity of the learning problem, and provide a general wrapper algorithm for computing solutions. Finally, we illustrate the behavior of our method on multichannel data where groups correpond to channels.
Predicting failures with developer networks and social network analysis
 In SIGSOFT ’08/FSE16: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
, 2008
"... Software fails and fixing it is expensive. Research in failure prediction has been highly successful at modeling software failures. Few models, however, consider the key cause of failures in software: people. Understanding the structure of developer collaboration could explain a lot about the reliab ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
(Show Context)
Software fails and fixing it is expensive. Research in failure prediction has been highly successful at modeling software failures. Few models, however, consider the key cause of failures in software: people. Understanding the structure of developer collaboration could explain a lot about the reliability of the final product. We examine this collaboration structure with the developer network derived from code churn information that can predict failures at the file level. We conducted a case study involving a mature Nortel networking product of over three million lines of code. Failure prediction models were developed using test and postrelease failure data from two releases, then validated against a subsequent release. One model’s prioritization revealed 58 % of the failures in 20 % of the files compared with the optimal prioritization that would have found 61 % in 20 % of the files, indicating that a significant correlation exists between filebased developer network metrics and failures.
Automatic transcription of drum sequences using audiovisual features
 In IEEE International Conference on Acoustics, Speech and Signal Processing
, 2005
"... The transcription of a music performance from the audio signal is often problematic, either because it requires the separation of complex sources, or simply because some important highlevel music information cannot be directly extracted from the audio signal. In this paper, we propose a novel multi ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
The transcription of a music performance from the audio signal is often problematic, either because it requires the separation of complex sources, or simply because some important highlevel music information cannot be directly extracted from the audio signal. In this paper, we propose a novel multimodal approach for the transcription of drum sequences using audiovisual features. The transcription is performed by Support Vector Machines (SVM) classifiers, and three different information fusion strategies are evaluated. A correct recognition rate of 85.8 % can be achieved for a detailed taxonomy and a fully automated transcription. 1.
Analysis of variance of crossvalidation estimators of the generalization error
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... This paper brings together methods from two different disciplines: statistics and machine learning. We address the problem of estimating the variance of crossvalidation (CV) estimators of the generalization error. In particular, we approach the problem of variance estimation of the CV estimators of ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
This paper brings together methods from two different disciplines: statistics and machine learning. We address the problem of estimating the variance of crossvalidation (CV) estimators of the generalization error. In particular, we approach the problem of variance estimation of the CV estimators of generalization error as a problem in approximating the moments of a statistic. The approximation illustrates the role of training and test sets in the performance of the algorithm. It provides a unifying approach to evaluation of various methods used in obtaining training and test sets and it takes into account the variability due to different training and test sets. For the simple problem of predicting the sample mean and in the case of smooth loss functions, we show that the variance of the CV estimator of the generalization error is a function of the moments of the random T variables Y = Card(S j S j ′) and Y ∗ = Card(Sc T j Sc j ′), where S j, S j ′ are two training sets, and Sc j, Sc j ′ are the corresponding test sets. We prove that the distribution of Y and Y * is hypergeometric and we compare our estimator with the one proposed by Nadeau and Bengio (2003). We extend these results in the regression case and the case of absolute error loss, and indicate how the methods can be extended to the classification case. We illustrate the results through simulation.
Exploration in modelbased reinforcement learning by empirically estimating learning progress
 In Neural Information Processing Systems (NIPS
, 2012
"... Formal exploration approaches in modelbased reinforcement learning estimate the accuracy of the currently learned model without consideration of the empirical prediction error. For example, PACMDP approaches such as RMAX base their model certainty on the amount of collected data, while Bayesian a ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
(Show Context)
Formal exploration approaches in modelbased reinforcement learning estimate the accuracy of the currently learned model without consideration of the empirical prediction error. For example, PACMDP approaches such as RMAX base their model certainty on the amount of collected data, while Bayesian approaches assume a prior over the transition dynamics. We propose extensions to such approaches which drive exploration solely based on empirical estimates of the learner’s accuracy and learning progress. We provide a “sanity check ” theoretical analysis, discussing the behavior of our extensions in the standard stationary finite stateaction case. We then provide experimental studies demonstrating the robustness of these exploration measures in cases of nonstationary environments or where original approaches are misled by wrong domain assumptions. 1
Semianalytical method for analyzing models and model selection measures based on moment analysis
 ACM Transactions on Knowledge Discovery from Data
, 2009
"... In this article we propose a momentbased method for studying models and model selection measures. By focusing on the probabilistic space of classifiers induced by the classification algorithm rather than on that of datasets, we obtain efficient characterizations for computing the moments, which is ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
In this article we propose a momentbased method for studying models and model selection measures. By focusing on the probabilistic space of classifiers induced by the classification algorithm rather than on that of datasets, we obtain efficient characterizations for computing the moments, which is followed by visualization of the resulting formulae that are too complicated for direct interpretation. By assuming the data to be drawn independently and identically distributed from the underlying probability distribution, and by going over the space of all possible datasets, we establish general relationships between the generalization error, holdoutset error, crossvalidation error, and leaveoneout error. We later exemplify the method and the results by studying the behavior of the errors for the naive Bayes classifier.
Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets
"... We propose a new method for comparing learning algorithms on multiple tasks which is based on a novel nonparametric test that we call the Poisson binomial test. The key aspect of this work is that we provide a formal definition for what is meant to have an algorithm that is better than another. Als ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We propose a new method for comparing learning algorithms on multiple tasks which is based on a novel nonparametric test that we call the Poisson binomial test. The key aspect of this work is that we provide a formal definition for what is meant to have an algorithm that is better than another. Also, we are able to take into account the dependencies induced when evaluating classifiers on the same test set. Finally we make optimal use (in the Bayesian sense) of all the testing data we have. We demonstrate empirically that our approach is more reliable than the sign test and the Wilcoxon signed rank test, the current state of the art for algorithm comparisons. 1
Covalidation: Using model disagreement on unlabeled data to validate classification algorithms
 In Proceedings of NIPS. Citeseer
, 2004
"... In the context of binary classification, we define disagreement as a measure of how often two independentlytrained models differ in their classification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure covalidation, since the t ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
In the context of binary classification, we define disagreement as a measure of how often two independentlytrained models differ in their classification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure covalidation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plentiful compared to labeled data. We show that perinstance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (generalization) error, and a tight upper bound on the “variance of prediction error”, or the variance of the average error across instances, where variance is measured across training sets. We present experimental results on several data sets exploring covalidation for error estimation and model selection. The procedure is especially effective in active learning settings, where training sets are not drawn at random and cross validation overestimates error. 1