Results 1 - 10
of
19
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract
-
Cited by 120 (0 self)
- Add to MetaCart
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
On the Statistical Comparison of Inductive Learning Methods
- In D. Fisher & H.-J. Lenz (Eds.), Learning from Data: Artificial and Intelligence V
, 1996
"... Experimental comparisons between statistical and machine learning methods appear with increasing frequency in the literature. However, there does not seem to be a consensus on how such a comparison is performed in a methodologically sound way. Especially the effect of testing multiple hypotheses on ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Experimental comparisons between statistical and machine learning methods appear with increasing frequency in the literature. However, there does not seem to be a consensus on how such a comparison is performed in a methodologically sound way. Especially the effect of testing multiple hypotheses on the probability of producing a "false alarm" is often ignored. We transfer multiple comparison procedures from the statistical literature to the type of study discussed in this paper. These testing procedures take the number of tests performed into account, thereby controlling the probability of generating "false alarms". The multiple comparison procedures selected are illustrated on well-known regression and classification data sets. 26.1 Introduction Recent interactions between the statistical and artificial intelligence communities (see e.g. [Han93, CO94]), have led to many studies that compare the performance of empirical statistical and machine learning methods on real-life data sets; ...
Regularized Negative Correlation Learning for Neural Network Ensembles
"... Abstract—Negative correlation learning (NCL) is a neural network ensemble learning algorithm that introduces a correlation penalty term to the cost function of each individual network so that each neural network minimizes its mean square error (MSE) together with the correlation of the ensemble. Thi ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract—Negative correlation learning (NCL) is a neural network ensemble learning algorithm that introduces a correlation penalty term to the cost function of each individual network so that each neural network minimizes its mean square error (MSE) together with the correlation of the ensemble. This paper analyzes NCL and reveals that the training of NCL (when = 1) corresponds to training the entire ensemble as a single learning machine that only minimizes the MSE without regularization. This analysis explains the reason why NCL is prone to overfitting the noise in the training set. This paper also demonstrates that tuning the correlation parameter in NCL by cross validation cannot overcome the overfitting problem. The paper analyzes this problem and proposes the regularized negative correlation learning (RNCL) algorithm which incorporates an additional regularization term for the whole ensemble. RNCL decomposes the ensemble’s training objectives, including MSE and regularization, into a set of sub-objectives, and each sub-objective is implemented by an individual neural network. In this paper, we also provide a Bayesian interpretation for RNCL and provide an automatic algorithm to optimize regularization parameters based on Bayesian inference. The RNCL formulation is applicable to any nonlinear estimator minimizing the MSE. The experiments on synthetic as well as real-world data sets demonstrate that RNCL achieves better performance than NCL, especially when the noise level is nontrivial in the data set. Index Terms—Ensembles, negative correlation learning (NCL), neural network ensembles, neural networks, probabilistic model, regularization.
Learning Bayesian Networks for Solving Real-World Problems
, 1998
"... Bayesian networks, which provide a compact graphical way to express complex probabilistic relationships among several random variables, are rapidly becoming the tool of choice for dealing with uncertainty in knowledge based systems. However, approaches based on Bayesian networks have often been dism ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Bayesian networks, which provide a compact graphical way to express complex probabilistic relationships among several random variables, are rapidly becoming the tool of choice for dealing with uncertainty in knowledge based systems. However, approaches based on Bayesian networks have often been dismissed as unfit for many real-world applications since probabilistic inference is intractable for most problems of realistic size, and algorithms for learning Bayesian networks impose the unrealistic requirement of datasets being complete. In this thesis, I present practical solutions to these two problems, and demonstrate their effectiveness on several real-world problems. The solution proposed to the first problem is to learn selective Bayesian networks, i.e., ones that use only a subset of the given attributes to model a domain. The aim is to learn networks that are smaller, and henc...
Predictive ensemble pruning by expectation propagation
- School of Computer Science, University College, University of New
, 2009
"... Abstract—An ensemble is a group of learners that work together as a committee to solve a problem. The existing ensemble learning algorithms often generate unnecessarily large ensembles, which consume extra computational resource and may degrade the generalization performance. Ensemble pruning algori ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract—An ensemble is a group of learners that work together as a committee to solve a problem. The existing ensemble learning algorithms often generate unnecessarily large ensembles, which consume extra computational resource and may degrade the generalization performance. Ensemble pruning algorithms aim to find a good subset of ensemble members to constitute a small ensemble, which saves the computational resource and performs as well as, or better than, the unpruned ensemble. This paper introduces a probabilistic ensemble pruning algorithm by choosing a set of “sparse ” combination weights, most of which are zeros, to prune the ensemble. In order to obtain the set of sparse combination weights and satisfy the nonnegative constraint of the combination weights, a left-truncated, nonnegative, Gaussian prior is adopted over every combination weight. Expectation propagation (EP) algorithm is employed to approximate the posterior estimation of the weight vector. The leave-one-out (LOO) error can be obtained as a by-product in the training of EP without extra computation and is a good indication for the generalization error. Therefore, the LOO error is used together with the Bayesian evidence for model selection in this algorithm. An empirical study on several regression and classification benchmark data sets shows that our algorithm utilizes far less component learners but performs as well as, or better than, the unpruned ensemble. Our results are very competitive compared with other ensemble pruning algorithms.
Effects of electrode design and configuration on channel interactions
, 2006
"... A potential shortcoming of existing multichannel cochlear implants is electrical-field summation during simultaneous electrode stimulation. Electrical-field interactions can disrupt the stimulus waveform prior to neural activation. To test whether speech intelligibility can be degraded by electrical ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A potential shortcoming of existing multichannel cochlear implants is electrical-field summation during simultaneous electrode stimulation. Electrical-field interactions can disrupt the stimulus waveform prior to neural activation. To test whether speech intelligibility can be degraded by electrical-field interaction, speech recognition performance and interaction were examined for three Clarion electrode arrays: the pre-curved, enhanced bipolar electrode array, the enhanced bipolar electrode with an electrode positioner, and the Hi-Focus electrode with a positioner. Channel interaction was measured by comparing stimulus detection thresholds for a probe signal in the presence of a sub-threshold perturbation signal as a function of the separation between the two simultaneously stimulated electrodes. Correct identification of vowels, consonants, and words in sentences was measured with two speech strategies: one which used simultaneous stimulation and another which used sequential stimulation. Speech recognition scores were correlated with measured electrical-field interaction for the strategy which used simultaneous stimulation but not the strategy which used sequential stimulation. Higher speech recognition scores with the simultaneous strategy were generally associated with lower levels of electrical-field interaction. Electrical-field interaction accounted for as much as 70 % of the variance in speech recognition scores, suggesting that electrical-field interaction is a
Probabilistic Classification Vector Machines
"... Abstract—In this paper, a sparse learning algorithm, probabilistic classification vector machines (PCVMs), is proposed. We analyze relevance vector machines (RVMs) for classification problems and observe that adopting the same prior for different classes may lead to unstable solutions. In order to t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—In this paper, a sparse learning algorithm, probabilistic classification vector machines (PCVMs), is proposed. We analyze relevance vector machines (RVMs) for classification problems and observe that adopting the same prior for different classes may lead to unstable solutions. In order to tackle this problem, a signed and truncated Gaussian prior is adopted over every weight in PCVMs, where the sign of prior is determined by the class label, i.e., +1 or 1. The truncated Gaussian prior not only restricts the sign of weights but also leads to a sparse estimation of weight vectors, and thus controls the complexity of the model. In PCVMs, the kernel parameters can be optimized simultaneously within the training algorithm. The performance of PCVMs is extensively evaluated on four synthetic data sets and 13 benchmark data sets using three performance metrics, error rate (ERR), area under the curve of receiver operating characteristic (AUC), and root mean squared error (RMSE). We compare PCVMs with soft-margin support vector machines (SVMSoft), hard-margin support vector machines (SVMHard), SVM with the kernel parameters optimized by PCVMs (SVMPCVM), relevance vector machines (RVMs), and some other baseline classifiers. Through five replications of twofold cross-validation test, i.e., 5 2 cross-validation test, over single data sets and Friedman test with the corresponding post-hoc test to compare these algorithms over multiple data sets, we notice that PCVMs outperform other algorithms, including SVMSoft, SVMHard, RVM, and SVMPCVM, on most of the data sets under the three metrics, especially under AUC. Our results also reveal that the performance of SVMPCVM is slightly better than SVMSoft, implying that the parameter optimization algorithm in PCVMs is better than cross validation in terms of performance and computational complexity. In this paper, we also discuss the superiority of PCVMs ’ formulation using maximum a posteriori (MAP) analysis and margin analysis, which explain the empirical success of PCVMs. Index Terms—Bayesian classification, machine learning, probabilistic classification model, support vector machine.
Teachers' Tools for the 21st Century Teachers' Tools for the 21st Century
, 2000
"... this report, write: Contact: U.S. Department of Education Bernie Greene ED Pubs (202) 502--7348 P.O. Box 1398 Jessup, MD 20794-1398 Or call toll free 1--877--4ED--Pubs. Acknowledgments This report involved a great deal of work on the part of many. The authors of this report are very grateful to the ..."
Abstract
- Add to MetaCart
this report, write: Contact: U.S. Department of Education Bernie Greene ED Pubs (202) 502--7348 P.O. Box 1398 Jessup, MD 20794-1398 Or call toll free 1--877--4ED--Pubs. Acknowledgments This report involved a great deal of work on the part of many. The authors of this report are very grateful to the people listed below, without whom this report could not have been completed. At the Education Statistics Services Institute of the American Institutes for Research, Yann-Yann Shieh and Mary Ann Wiehe wrote many of the computer programs that generated the estimates presented in this report and created the output from which the tables and figures were constructed. Melisa Doherty, Rachel Firestone, Christina Kary, and Kate Lavanga assisted in development of the report. David Hurst, Douglas Levin, Vicki Lundmark, David Miller, and Mary McLaughlin reviewed various chapters of the report prior to submission to NCES. Supervised by Qiwu Liu, the ESSI Communications Design Team designed and implemented the cover and page layout. The ESSI Communications Design Team designed and implemented the cover and page layout. Design Team members who contributed to this aspect of the report are Mariel Escudero, Elina Hartwell, Qiwu Liu, and Jennifer Thompson. Experts within and outside of NCES provided helpful suggestions at all stages of the report production. Serving as consultant to the authors, Edith McArthur reviewed the outline, provided suggestions, and reviewed earlier drafts of the report. At various stages of the report, a number of NCES staff members read and commented on the report, including Ellen Bradburn, Shelley Burns, Bernie Greene, Gerald Malitz, Marilyn McMillan, Larry Ogle, Valena Plisko, Carl Schmitt, and John Ralph. Outside NCES, David Malouf of the Office of Special Educa...
Typical structure of a local group RSS talk
"... Introduction to the problem (3 overheads) . Some notation (2 overheads) . Some mathematics (26 overheads) . Return to the problem (2 overheads) . Triumphant denoument (1 overhead) February 14, 2001 Slide 3 Structure of this talk . Bonferroni adjustment and the inequalities . My interest ..."
Abstract
- Add to MetaCart
Introduction to the problem (3 overheads) . Some notation (2 overheads) . Some mathematics (26 overheads) . Return to the problem (2 overheads) . Triumphant denoument (1 overhead) February 14, 2001 Slide 3 Structure of this talk . Bonferroni adjustment and the inequalities . My interest and the early search . His life . His other work . The context February 14, 2001 Slide 4 The method of adjustment named after Bonferroni is based firmly within the classical (frequentist) tradition in applied statistics. In simultaneous inference if we are to make k tests and desire an overall test of size # we make each test at # k or more generally such that # i # i = #. In fact nobody ever uses the more general form.

