Results 1 - 10
of
15
The earth is round (p < .05
- American Psychologist
, 1994
"... After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its near-universal misinterpretation ofp as the probability that Ho is ..."
Abstract
-
Cited by 63 (0 self)
- Add to MetaCart
After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its near-universal misinterpretation ofp as the probability that Ho is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects Ho one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods is suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences,
Consequences of prejudice against the null hypothesis
- Psychological Bulletin
, 1975
"... The consequences of prejudice against accepting the null hypothesis were examined through (a) a mathematical model intended to stimulate the research-publication process and (b) case studies of apparent erroneous rejec-tions of the null hypothesis in published psychological research. The input param ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
The consequences of prejudice against accepting the null hypothesis were examined through (a) a mathematical model intended to stimulate the research-publication process and (b) case studies of apparent erroneous rejec-tions of the null hypothesis in published psychological research. The input parameters for the model characterize investigators ' probabilities of selecting a problem for which the null hypothesis is true, of reporting, following up on, or abandoning research when data do or do not reject the null hypothesis, and they characterize editors ' probabilities of publishing manuscripts concluding in favor of or against the null hypothesis. With estimates of the input parameters based on a questionnaire survey of a sample of social psychologists, the model output indicates a dysfunctional research-publication system. Particularly, the model indicates that there may be relatively few publications on problems for which the null hypothesis is (at least to a reasonable approximation) true, and of these, a high proportion will erroneously reject the null hypothesis. The case studies provide additional support for this conclusion. Accordingly, it is
The case against statistical significance testing
- Harvard Educational Review
, 1978
"... In recent years the use of traditional statistical methods in educational research has increasingly come under attack. In this article, Ronald P Carver exposes the fantasies often entertained by researchers about the meaning of statistical significance. The author recommends abandoning all statistic ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
In recent years the use of traditional statistical methods in educational research has increasingly come under attack. In this article, Ronald P Carver exposes the fantasies often entertained by researchers about the meaning of statistical significance. The author recommends abandoning all statistical significance testing and suggests other ways of evaluating research results. Carver concludes that we should return to the scientific method of examining data and replicating results rather than relying on statistical significance testing to provide equivalent information. Statistical significance testing has involved more fantasy than fact. The emphasis on statistical significance over scientific significance in educational research represents a corrupt form of the scientific method. Educational research would be better off if it stopped testing its results for statistical significance. The case against statistical significance testing has been developed by many critics (see Morrison & Henkel, 1970b). For example, after a detailed analysis Bakan (1966) concluded that "the test of statistical significance in psychological research may be taken as an instance of a kind of essential mindlessness in the conduct of research " (p. 436); and as early as 1963
Statistical significance testing: a historical overview of misuse and misinterpretation with implication for the editorial policies of educational journals
- Research in the Schools
, 1998
"... Statistical significance tests (SSTs) have been the object of much controversy among social scientists. Proponents have hailed SSTs as an objective means for minimizing the likelihood that chance factors have contributed to research results; critics have both questioned the logic underlying SSTs and ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Statistical significance tests (SSTs) have been the object of much controversy among social scientists. Proponents have hailed SSTs as an objective means for minimizing the likelihood that chance factors have contributed to research results; critics have both questioned the logic underlying SSTs and bemoaned the widespread misapplication and misinterpretation of the results of these tests. The present paper offers a framework for remedying some of the common problems associated with SSTs via modification of journal editorial policies. The controversy surrounding SSTs is overviewed, with attention given to both historical and more contemporary criticisms of bad practices associated with misuse of SSTs. Examples from the editorial policies of Educational and Psychological Measurement and several other journals that have established guidelines for reporting results of SSTs are overviewed, and suggestions are provided regarding additional ways that educational journals may address the problem. Statistical significance testing has existed in some form for approximately 300 years (Huberty, 1993) and has served an important purpose in the advancement of inquiry in the social sciences. However, there has been much controversy over the misuse and misinterpretation of statistical significance testing (Daniel, 1992b).
Experimental Comparison of the Comprehensibility of a UML-based Formal Specification versus a Textual One
- Proceedings of 11 th International Conference on Evaluation and Assessment in Software Engineering (EASE
, 2007
"... The authors wish to acknowledge the support of UK EPSRC, which has funded the ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
The authors wish to acknowledge the support of UK EPSRC, which has funded the
Effect sizes and p values: What should be reported . . . ?
, 1996
"... Despite publication of many well-argued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics' catalogs of NHT's flaws, this article also takes the unusual stance of identifying virtues that ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Despite publication of many well-argued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics' catalogs of NHT's flaws, this article also takes the unusual stance of identifying virtues that may explain why NHT continues to be so extensively used. These virtues include providing results in the form of a dichotomous (yes/no) hypothesis evaluation and providing an index (p value) that has a justifiable mapping onto confidence in repeatability of a null hypothesis rejection. The most-criticized flaws of NHT can be avoided when the importance of a hypothesis, rather than the p value of its test, is used to determine that a finding is worthy of report, and when p = .05 is treated as insufficient basis for confidence in the replicability of an isolated non-null finding. Together with many recent critics of NHT, we also urge reporting of important hypothesis tests in enough descriptive detail to permit secondary uses such as meta-analysis.
Alphabet Soup Blurring the Distinctions Between p’s and �’s in Psychological Research
"... Abstract. Confusion over the reporting and interpretation of results of commonly employed classical statistical tests is recorded in a sample of 1,645 papers from 12 psychology journals for the period 1990 through 2002. The confusion arises because researchers mistakenly believe that their interpret ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Confusion over the reporting and interpretation of results of commonly employed classical statistical tests is recorded in a sample of 1,645 papers from 12 psychology journals for the period 1990 through 2002. The confusion arises because researchers mistakenly believe that their interpretation is guided by a single unified theory of statistical inference. But this is not so: classical statistical testing is a nameless amalgamation of the rival and often contradictory approaches developed by Ronald Fisher, on the one hand, and Jerzy Neyman and Egon Pearson, on the other. In particular, there is extensive failure to acknowledge the incompatibility of Fisher’s evidential p value with the Type I error rate, α, of Neyman–Pearson statistical orthodoxy. The distinction between evidence (p’s) and errors (α’s) is not trivial. Rather, it reveals the basic differences underlying Fisher’s ideas on significance testing and inductive inference, and Neyman–Pearson views on hypothesis testing and inductive behavior. So complete is this misunderstanding over measures of evidence
Psychologlcal Bulletin
"... this report was facilitated by grants from National Science Foundation (GS-3050) and U.'S. Public Health Service (MH-20527-02). Although they should not be held responsible for positions espoused herein, I am very grateful to the following for providing comments on earlier drafts: Marl R. Jones, Pau ..."
Abstract
- Add to MetaCart
this report was facilitated by grants from National Science Foundation (GS-3050) and U.'S. Public Health Service (MH-20527-02). Although they should not be held responsible for positions espoused herein, I am very grateful to the following for providing comments on earlier drafts: Marl R. Jones, Paul Isaac, David Bakan, Timothy C. Brock, Bibb Latan6, Thomas M. Ostrom, Hanan C. Selvin, Martin Fishbein, Zick Rubin, and Richard A. Zeller
METHODOLOGY Effect sizes and p values: What should be reported and what should be replicated?
"... Despite publication of many well-argued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics ' catalogs of NHT's flaws, this article also takes the unusual stance of identifying virtues that ..."
Abstract
- Add to MetaCart
Despite publication of many well-argued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics ' catalogs of NHT's flaws, this article also takes the unusual stance of identifying virtues that may explain why NHT continues to be so extensively used. These virtues include providing results in the form of a dichotomous (yesho) hypothesis evaluation and providing an index (p value) that has a justifiable mapping onto confidence in repeatability of a null hypothesis rejection. The most-criticized flaws of NHT can be avoided when the importance of a hypothesis, rather than thep value of its test, is used to determine that a finding is worthy of report, and whenp z.05 is treated as insufficient basis for confidence in the replicability of an isolated non-null finding. Together with many recent critics of NHT, we also urge reporting of important hypothesis tests in enough descriptive detail to permit secondary uses such as meta-analysis. Descriptors: Replication, Statistical significance, Null hypothesis testing, Methodology To demonstrate that a natural phenomenon is experimentally demonstrable, we need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a phenomenon

