Results 1 
3 of
3
The earth is round (p < .05
 American Psychologist
, 1994
"... After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is ..."
Abstract

Cited by 129 (0 self)
 Add to MetaCart
After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects Ho one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods is suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences,
Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests
 Psychological Methods
, 2001
"... Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes a ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes an integrated, alternative inferential confidence interval approach to testing for statistical difference, equivalence, and indeterminacy that is algebraically equivalent to standard NHST procedures and therefore exacts the same evidential standard. The combined numeric and graphic tests of statistical difference, equivalence, and indeterminacy are designed to avoid common interpretive problems associated with NHST procedures. Multiple comparisons, power, sample size, test reliability, effect size, and causeeffect ratio are discussed. A section on the proper interpretation of confidence intervals is followed by a decision rule summary and caveats. The longstanding controversy surrounding null hypothesis statistical testing (NHST) has typically been argued on its technical merits, and they are not dis
Alphabet Soup Blurring the Distinctions Between p’s and �’s in Psychological Research
"... Abstract. Confusion over the reporting and interpretation of results of commonly employed classical statistical tests is recorded in a sample of 1,645 papers from 12 psychology journals for the period 1990 through 2002. The confusion arises because researchers mistakenly believe that their interpret ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. Confusion over the reporting and interpretation of results of commonly employed classical statistical tests is recorded in a sample of 1,645 papers from 12 psychology journals for the period 1990 through 2002. The confusion arises because researchers mistakenly believe that their interpretation is guided by a single unified theory of statistical inference. But this is not so: classical statistical testing is a nameless amalgamation of the rival and often contradictory approaches developed by Ronald Fisher, on the one hand, and Jerzy Neyman and Egon Pearson, on the other. In particular, there is extensive failure to acknowledge the incompatibility of Fisher’s evidential p value with the Type I error rate, α, of Neyman–Pearson statistical orthodoxy. The distinction between evidence (p’s) and errors (α’s) is not trivial. Rather, it reveals the basic differences underlying Fisher’s ideas on significance testing and inductive inference, and Neyman–Pearson views on hypothesis testing and inductive behavior. So complete is this misunderstanding over measures of evidence