Results 1  10
of
29
The earth is round (p < .05
 American Psychologist
, 1994
"... After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is ..."
Abstract

Cited by 113 (0 self)
 Add to MetaCart
After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects Ho one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods is suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences,
Consequences of prejudice against the null hypothesis
 Psychological Bulletin
, 1975
"... The consequences of prejudice against accepting the null hypothesis were examined through (a) a mathematical model intended to stimulate the researchpublication process and (b) case studies of apparent erroneous rejections of the null hypothesis in published psychological research. The input param ..."
Abstract

Cited by 36 (8 self)
 Add to MetaCart
The consequences of prejudice against accepting the null hypothesis were examined through (a) a mathematical model intended to stimulate the researchpublication process and (b) case studies of apparent erroneous rejections of the null hypothesis in published psychological research. The input parameters for the model characterize investigators ' probabilities of selecting a problem for which the null hypothesis is true, of reporting, following up on, or abandoning research when data do or do not reject the null hypothesis, and they characterize editors ' probabilities of publishing manuscripts concluding in favor of or against the null hypothesis. With estimates of the input parameters based on a questionnaire survey of a sample of social psychologists, the model output indicates a dysfunctional researchpublication system. Particularly, the model indicates that there may be relatively few publications on problems for which the null hypothesis is (at least to a reasonable approximation) true, and of these, a high proportion will erroneously reject the null hypothesis. The case studies provide additional support for this conclusion. Accordingly, it is
The case against statistical significance testing
 Harvard Educational Review
, 1978
"... In recent years the use of traditional statistical methods in educational research has increasingly come under attack. In this article, Ronald P Carver exposes the fantasies often entertained by researchers about the meaning of statistical significance. The author recommends abandoning all statistic ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
In recent years the use of traditional statistical methods in educational research has increasingly come under attack. In this article, Ronald P Carver exposes the fantasies often entertained by researchers about the meaning of statistical significance. The author recommends abandoning all statistical significance testing and suggests other ways of evaluating research results. Carver concludes that we should return to the scientific method of examining data and replicating results rather than relying on statistical significance testing to provide equivalent information. Statistical significance testing has involved more fantasy than fact. The emphasis on statistical significance over scientific significance in educational research represents a corrupt form of the scientific method. Educational research would be better off if it stopped testing its results for statistical significance. The case against statistical significance testing has been developed by many critics (see Morrison & Henkel, 1970b). For example, after a detailed analysis Bakan (1966) concluded that "the test of statistical significance in psychological research may be taken as an instance of a kind of essential mindlessness in the conduct of research " (p. 436); and as early as 1963
Psychology will be a much better science when we change the way we analyze data
 Current Directions in Psychological Science
, 1996
"... because I believed that within it dwelt some of the most fundamental and challenging problems of the extant sciences. Who could not be intrigued, for example, by the relation between consciousness and behavior, or the rules guiding interactions in social situations, or the processes that underlie de ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
because I believed that within it dwelt some of the most fundamental and challenging problems of the extant sciences. Who could not be intrigued, for example, by the relation between consciousness and behavior, or the rules guiding interactions in social situations, or the processes that underlie development from infancy to maturity? Today, in 1996, my fascination with these problems is undiminished. But I've developed a certain angst over the intervening thirtysomething years—a constant, nagging feeling that our field spends a lot of time spinning its wheels without really making all that much progress. This problem shows up in obvious ways—for instance, in the regularity with which findings seem not to replicate. It also shows up in subtler ways—for instance, one doesn't often hear Psychologists saying, "Well this problem is solved now; let's move on to the next one " (as, for example, Johannes Kepler must have said over three centuries ago, after he had cracked the problem of describing planetary motion). I've come to believe that at least part of this problem revolves around our tools—particularly the tools that we use in the critical domains of data analysis and data interpretation. What we do, I sometimes feel, is akin to trying to build a violin using a stone mallet and a chainsaw. The tooltotask fit is not all that good, and as a result, we wind up building a lot of poorquality violins. My purpose here is to elaborate on these issues. In what follows, I will summarize our major dataanalysis and datainterpretation tools, and describe what I believe to be amiss with them. I will then offer some suggestions for change.
Statistical significance testing: a historical overview of misuse and misinterpretation with implication for the editorial policies of educational journals
 Research in the Schools
, 1998
"... Statistical significance tests (SSTs) have been the object of much controversy among social scientists. Proponents have hailed SSTs as an objective means for minimizing the likelihood that chance factors have contributed to research results; critics have both questioned the logic underlying SSTs and ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Statistical significance tests (SSTs) have been the object of much controversy among social scientists. Proponents have hailed SSTs as an objective means for minimizing the likelihood that chance factors have contributed to research results; critics have both questioned the logic underlying SSTs and bemoaned the widespread misapplication and misinterpretation of the results of these tests. The present paper offers a framework for remedying some of the common problems associated with SSTs via modification of journal editorial policies. The controversy surrounding SSTs is overviewed, with attention given to both historical and more contemporary criticisms of bad practices associated with misuse of SSTs. Examples from the editorial policies of Educational and Psychological Measurement and several other journals that have established guidelines for reporting results of SSTs are overviewed, and suggestions are provided regarding additional ways that educational journals may address the problem. Statistical significance testing has existed in some form for approximately 300 years (Huberty, 1993) and has served an important purpose in the advancement of inquiry in the social sciences. However, there has been much controversy over the misuse and misinterpretation of statistical significance testing (Daniel, 1992b).
Experimental Comparison of the Comprehensibility of a UMLbased Formal Specification versus a Textual One
 Proceedings of 11 th International Conference on Evaluation and Assessment in Software Engineering (EASE
, 2007
"... The authors wish to acknowledge the support of UK EPSRC, which has funded the ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
The authors wish to acknowledge the support of UK EPSRC, which has funded the
Effect sizes and p values: What should be reported . . . ?
, 1996
"... Despite publication of many wellargued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics' catalogs of NHT's flaws, this article also takes the unusual stance of identifying virtues that ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Despite publication of many wellargued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics' catalogs of NHT's flaws, this article also takes the unusual stance of identifying virtues that may explain why NHT continues to be so extensively used. These virtues include providing results in the form of a dichotomous (yes/no) hypothesis evaluation and providing an index (p value) that has a justifiable mapping onto confidence in repeatability of a null hypothesis rejection. The mostcriticized flaws of NHT can be avoided when the importance of a hypothesis, rather than the p value of its test, is used to determine that a finding is worthy of report, and when p = .05 is treated as insufficient basis for confidence in the replicability of an isolated nonnull finding. Together with many recent critics of NHT, we also urge reporting of important hypothesis tests in enough descriptive detail to permit secondary uses such as metaanalysis.
Cumulative research knowledge and social policy formulation: the critical role of metaanalysis
 Psychology, Public Policy, and Law
, 1996
"... For many years, policymakers expressed increasing frustration with social science research. On every issue there were studies arguing for diametrically opposed conclusions. Methods of metaanalysis that correct for the effects of sampling error have shown that almost all such conflicting results wer ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
For many years, policymakers expressed increasing frustration with social science research. On every issue there were studies arguing for diametrically opposed conclusions. Methods of metaanalysis that correct for the effects of sampling error have shown that almost all such conflicting results were caused by sampling error. Furthermore, the effects of sampling error are greatly exaggerated by using significance test methodology. In many areas, metaanalysis has now provided dependable answers to the original research questions. Metaanalysis is now increasingly being used by policymakers, by textbook writers, and by theorists to provide the basic facts needed to draw both practical and explanatory conclusions. Sophisticated metaanalysis procedures are now used to correct for the effects of other study imperfections, such as measurement error, range restriction, and artificial dichotomization. In domains where the data on artifacts are available, the effect sizes in necessarily imperfect studies have been found to be considerably understated. Path analysis can be applied to the findings from metaanalysis to yield improved causal analyses that result in both explanation of results and improved generalization of
Law and the Fireside Inductions (with Postscript): Some Reflections of a Clinical Psychologist
, 1989
"... Legislators and judges have relied upon the “fireside inductions” (commonsense, anecdotal, introspective, and culturally transmitted beliefs about human behavior) in making and enforcing law as a mode of social control. The behavior sciences conflict at times with the fireside inductions. While the ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Legislators and judges have relied upon the “fireside inductions” (commonsense, anecdotal, introspective, and culturally transmitted beliefs about human behavior) in making and enforcing law as a mode of social control. The behavior sciences conflict at times with the fireside inductions. While the sources of error in “common knowledge” about behavior are considerable, the behavior sciences are plagued with methodological problems that often render their generalized conclusions equally dubious. Legal applications of generalizations from experimental research on humans and animals in laboratory contexts often involve risky parametric and population extrapolations. Statistical analysis of file data suffers from inherent interpretative ambiguities as to causal inference from correlations. Quasiexperiments in the “reallife” setting may often be the methodologically optimal datasource. A postscript updates the original text and addresses seven additional topics: (1) abuse of significance tests, (2) failure to report overlap, (3) causal inference from correlation, (4) immediate transitions from group differences on psychological tests to “unfairness,” (5) double standard of proof of generalizability,
Alphabet Soup Blurring the Distinctions Between p’s and �’s in Psychological Research
"... Abstract. Confusion over the reporting and interpretation of results of commonly employed classical statistical tests is recorded in a sample of 1,645 papers from 12 psychology journals for the period 1990 through 2002. The confusion arises because researchers mistakenly believe that their interpret ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. Confusion over the reporting and interpretation of results of commonly employed classical statistical tests is recorded in a sample of 1,645 papers from 12 psychology journals for the period 1990 through 2002. The confusion arises because researchers mistakenly believe that their interpretation is guided by a single unified theory of statistical inference. But this is not so: classical statistical testing is a nameless amalgamation of the rival and often contradictory approaches developed by Ronald Fisher, on the one hand, and Jerzy Neyman and Egon Pearson, on the other. In particular, there is extensive failure to acknowledge the incompatibility of Fisher’s evidential p value with the Type I error rate, α, of Neyman–Pearson statistical orthodoxy. The distinction between evidence (p’s) and errors (α’s) is not trivial. Rather, it reveals the basic differences underlying Fisher’s ideas on significance testing and inductive inference, and Neyman–Pearson views on hypothesis testing and inductive behavior. So complete is this misunderstanding over measures of evidence