Results 1 
6 of
6
Evaluating Misclassifications in Imbalanced Data
"... Abstract. Evaluating classifier performance with ROC curves is popular in the machine learning community. To date, the only method to assess confidence of ROC curves is to construct ROC bands. In the case of severe class imbalance with few instances of the minority class, ROC bands become unreliable ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. Evaluating classifier performance with ROC curves is popular in the machine learning community. To date, the only method to assess confidence of ROC curves is to construct ROC bands. In the case of severe class imbalance with few instances of the minority class, ROC bands become unreliable. We propose a generic framework for classifier evaluation to identify a segment of an ROC curve in which misclassifications are balanced. Confidence is measured by Tango’s 95%confidence interval for the difference in misclassification in both classes. We test our method with severe class imbalance in a twoclass problem. Our evaluation favors classifiers with low numbers of misclassifications in both classes. Our results show that the proposed evaluation method is more confident than ROC bands. 1
¤ These authors contributed equally to this work.
, 2003
"... POCUS: mining genomic sequence annotation to predict disease genes ..."
Biometrics DOI: 10.1111/j.15410420.2006.00525.x Multivariate Extensions of McNemar’s Test
"... Summary. This article considers global tests of differences between paired vectors of binomial probabilities, based on data from two dependent multivariate binary samples. Difference is defined as either an inhomogeneity in the marginal distributions or asymmetry in the joint distribution. For detec ..."
Abstract
 Add to MetaCart
Summary. This article considers global tests of differences between paired vectors of binomial probabilities, based on data from two dependent multivariate binary samples. Difference is defined as either an inhomogeneity in the marginal distributions or asymmetry in the joint distribution. For detecting the first type of difference, we propose a multivariate extension of McNemar’s test and show that it is a generalized score test under a GEE approach. Univariate features such as the relationship between the Wald and score test and the dropout of pairs with the same response carry over to the multivariate case and the test does not depend on the working correlation assumption among the components of the multivariate response. For sparse or imbalanced data, such as occurs when the number of variables is large or the proportions are close to zero, the test is best implemented using a bootstrap, and if this is computationally too complex, a permutation distribution. We apply the test to safety data for a drug, in which two doses are evaluated by comparing multiple responses by the same subjects to each one of them.
1 Senior Researcher, Logistics and Quantitative Methods, CSIR
, 2008
"... statistical power, proportion, φdivergence. We consider the problem of statistical inference of binomial proportions for nonmatched, correlated samples, under the Bayesian framework. Such inference can arise when the same group is observed a different number of times on two or more inference occas ..."
Abstract
 Add to MetaCart
statistical power, proportion, φdivergence. We consider the problem of statistical inference of binomial proportions for nonmatched, correlated samples, under the Bayesian framework. Such inference can arise when the same group is observed a different number of times on two or more inference occasions, with the aim of testing the proportion of some trait. These scenarios can occur when we are interested to infer the proportion of extreme wave height per year, at a certain measuring station, where measurements are made every hour. Gaps in measurements, either due to a malfunction of the measuring instrument or another reason, can result in an unequal number of observations in different years. For such scenarios, we develop an adaptive Bayesian method, and suggest a heuristic decision procedure to conduct statistical inference. We use the φdivergence measure to quantify the perturbation of the posterior distribution of the proportion in different time points. We present a simulation study for frequentist power investigation for both the adaptive Bayesian method, as well as the regular frequentist method, using the Monte Carlo technique. Based on the simulation
RESEARCH ARTICLE Open Access Perceptions of and willingness to engage in
"... influenza transmission ..."