Results 1 - 10
of
40
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract
-
Cited by 120 (0 self)
- Add to MetaCart
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
The psychometric function: I. Fitting, sampling, and goodness of fit
, 2001
"... The psychometric function relates an observer’s performance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions ..."
Abstract
-
Cited by 38 (10 self)
- Add to MetaCart
The psychometric function relates an observer’s performance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions, (2) assessing the goodness of fit, and (3) providing confidence intervals for the function’s parameters and other estimates derived from them, for the purposes of hypothesis testing. The present paper deals with the first two topics, describing a constrained maximum-likelihood method of parameter estimation and developing several goodness-of-fit tests. Using Monte Carlo simulations, we deal with two specific difficulties that arise when fitting functions to psychophysical data. First, we note that human observers are prone to stimulus-independent errors (or lapses). We show that failure to account for this can lead to serious biases in estimates of the psychometric function’s parameters and illustrate how the problem may be overcome. Second, we note that psychophysical data sets are usually rather small by the standards required by most of the commonly applied statistical tests. We demonstrate the potential errors of applying traditional c 2 methods to psychophysical data and advocate use of Monte Carlo resampling techniques that do not rely on asymptotic theory. We have made available the software to implement our methods. The performance of an observer on a psychophysical
Sample size planning for the standardized mean difference: Accuracy in parameter estimation via narrow confidence intervals
- Psychological Methods
, 2006
"... Methods for planning sample size (SS) for the standardized mean difference so that a narrow confidence interval (CI) can be obtained via the accuracy in parameter estimation (AIPE) approach are developed. One method plans SS so that the expected width of the CI is sufficiently narrow. A modification ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
Methods for planning sample size (SS) for the standardized mean difference so that a narrow confidence interval (CI) can be obtained via the accuracy in parameter estimation (AIPE) approach are developed. One method plans SS so that the expected width of the CI is sufficiently narrow. A modification adjusts the SS so that the obtained CI is no wider than desired with some specified degree of certainty (e.g., 99 % certain the 95 % CI will be no wider than �). The rationale of the AIPE approach to SS planning is given, as is a discussion of the analytic approach to CI formation for the population standardized mean difference. Tables with values of necessary SS are provided. The freely available Methods for the Behavioral, Educational, and Social Sciences (K. Kelley, 2006a) R (R Development Core Team, 2006) software package easily implements the methods discussed.
Methods for the Behavioral, Educational, and Social Sciences (MBESS) [Computer software and manual]. Retrievable from www.cran.r-project.org
, 2007
"... package for R (R Development Core Team, 2007b), an open source statistical programming language and environment. MBESS implements methods that are not widely available elsewhere, yet are especially helpful for the idiosyncratic techniques used within the behavioral, educational, and social sciences. ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
package for R (R Development Core Team, 2007b), an open source statistical programming language and environment. MBESS implements methods that are not widely available elsewhere, yet are especially helpful for the idiosyncratic techniques used within the behavioral, educational, and social sciences. The major categories of functions are those that relate to confidence interval formation for noncentral t, F, and � 2 parameters, confidence intervals for standardized effect sizes (which require noncentral distributions), and sample size planning issues from the power analytic and accuracy in parameter estimation perspectives. In addition, MBESS contains collections of other functions that should be helpful to substantive researchers and methodologists. MBESS is a long-term project that will continue to be updated and expanded so that important methods can continue to be made available to researchers in the behavioral, educational, and social sciences. R is an open source statistical programming language and environment for (essentially) all operating systems that has gained a widespread following in quantitative disciplines (R Development Core Team, 2007b). This following is perhaps most prevalent in the statistical sciences, where many published works now provide R routines
Statistical significance testing: a historical overview of misuse and misinterpretation with implication for the editorial policies of educational journals
- Research in the Schools
, 1998
"... Statistical significance tests (SSTs) have been the object of much controversy among social scientists. Proponents have hailed SSTs as an objective means for minimizing the likelihood that chance factors have contributed to research results; critics have both questioned the logic underlying SSTs and ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Statistical significance tests (SSTs) have been the object of much controversy among social scientists. Proponents have hailed SSTs as an objective means for minimizing the likelihood that chance factors have contributed to research results; critics have both questioned the logic underlying SSTs and bemoaned the widespread misapplication and misinterpretation of the results of these tests. The present paper offers a framework for remedying some of the common problems associated with SSTs via modification of journal editorial policies. The controversy surrounding SSTs is overviewed, with attention given to both historical and more contemporary criticisms of bad practices associated with misuse of SSTs. Examples from the editorial policies of Educational and Psychological Measurement and several other journals that have established guidelines for reporting results of SSTs are overviewed, and suggestions are provided regarding additional ways that educational journals may address the problem. Statistical significance testing has existed in some form for approximately 300 years (Huberty, 1993) and has served an important purpose in the advancement of inquiry in the social sciences. However, there has been much controversy over the misuse and misinterpretation of statistical significance testing (Daniel, 1992b).
Sample size planning for the coefficient of variation from the accuracy in parameter estimation approach
, 2007
"... ..."
Statistical methods in psychology journals: guidelines and explanations
- American Psychologist
, 1999
"... In the light of continuing debate over the applications of significance testing in psychology journals and following the publication of Cohen (1994), the Board of Scientific Affairs (BSA) of the APA convened a committee called the Task Force on Statistical Inference (TFSI) whose charge was “to eluci ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In the light of continuing debate over the applications of significance testing in psychology journals and following the publication of Cohen (1994), the Board of Scientific Affairs (BSA) of the APA convened a committee called the Task Force on Statistical Inference (TFSI) whose charge was “to elucidate some of the controversial issues surrounding applications
RISK PROPENSITY DIFFERENCES BETWEEN ENTREPRENEURS AND MANAGERS: A META-ANALYTIC REVIEW
"... After decades of study, there is no consensus as to whether entrepreneurs have a higher risk propensity than do managers. We overcome a variety of limitations in narrative reviews by using psychometric meta-analysis to mathematically cumulate the literature on entrepreneurial risk propensity. The re ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
After decades of study, there is no consensus as to whether entrepreneurs have a higher risk propensity than do managers. We overcome a variety of limitations in narrative reviews by using psychometric meta-analysis to mathematically cumulate the literature on entrepreneurial risk propensity. The results indicate that entrepreneurs have at least a moderately higher level of risk propensity than do managers, a finding with important implications for further research. Entrepreneurial dispositions are a fundamental element in the development of a theory of the entrepreneur (Carland, Hoy, Boulton, & Carland, 1984; Johnson, 1990). Accordingly, inquiry has attempted to isolate and explain the psychological antecedents of entrepreneurial behavior. One of the most prominent themes is the entrepreneur’s propensity for risk taking (Carland et al., 1984; Long, 1983), or risk propensity, an individual’s willingness to take or avoid risk. While rich conceptualizations of entrepreneurial risk propensity have permeated the literature since Cantillion’s (circa 1700) description of the entrepreneur as a bearer of risk (Kilby, 1971), empirical evidence concerning the distinctiveness of the entrepreneur’s risk propensity appears inconsistent. Reviews of the literature (e.g., Brockhaus & Horwitz, 1986; Chell, 1985; Perry, 1990) have described the contradictory results in individual studies concerning hypothesized differences in the risk propensities of entrepreneurs and managers. As a result, reviewers have often concluded that entrepreneurs do not have a distinctive
The Cognitive Processes by which Perceived Locus of Causality Predicts Participation in Physical Activity
, 2002
"... The present study examined the cognitive processes by which perceived locus of causality influences participation in leisure time physical activity. ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The present study examined the cognitive processes by which perceived locus of causality influences participation in leisure time physical activity.
Using Graphs Instead of Tables in Political Science
, 2007
"... When political scientists present empirical results, they are much more likely to use tables than graphs, despite the fact that graphs greatly increases the clarity of presentation and makes it easier for a reader to understand the data being used and to draw clear and correct inferences. Using a sa ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
When political scientists present empirical results, they are much more likely to use tables than graphs, despite the fact that graphs greatly increases the clarity of presentation and makes it easier for a reader to understand the data being used and to draw clear and correct inferences. Using a sample of leading journals, we document this tendency and suggest reasons why researchers prefer tables. We argue that the extra work required in producing graphs is rewarded by greatly enhanced presentation and communication of empirical results. We illustrate their benefits by turning several published tables into graphs, including tables that present descriptive data and regression results. We show that regression graphs emphasize point estimates and confidence intervals and that they can successfully present the results of regression models. A move away from tables towards graphs would improve the discipline’s communicative output and make empirical findings more accessible to every type of audience.

