The earth is round (p < .05
 American Psychologist
, 1994
"... After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is ..."
Abstract

After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects Ho one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods is suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences,
What if there were no more bickering about statistical significance tests
 RESEARCH IN THE SCHOOLS
, 1998
"... Questions and concerns are directed to those who advocate replacing statistical hypothesis testing with alternative dataanalysis strategies. It is further suggested that: (1) commonly recommended hypothesistesting alternatives are anything but perfect, especially when allowed to stand alone witho ..."
Abstract

Questions and concerns are directed to those who advocate replacing statistical hypothesis testing with alternative dataanalysis strategies. It is further suggested that: (1) commonly recommended hypothesistesting alternatives are anything but perfect, especially when allowed to stand alone without an accompanying inferential filtering device; (2) various hypothesistesting modifications can be implemented to make the hypothesistesting process and its associated conclusions more credible; and (3) hypothesis testing, when implemented intelligently, adds importantly to the storytelling function of a published empirical research investigation. From the local pubs to our professional "pubs, " everyone in socialscience academic circles seems to be talking about it these days. Not that there's anything wrong with talking about it, mind you, even to a more practically oriented crowd such as the readership of this journal. But as with the "gates " of Washington politics on the one coast and the Gates of Washington state on the other, when do we stand up and say "Enough already!"? When do we decide that ample arguments
A challenge for statistical instructors: Teaching Bayesian inference without discarding the “official” significance tests. Bayesian Methods with applications to science, policy and official statistics, 301310. Luxembourg: Office for Official Publications
 Methods of Psychological Research
, 2001
Testing the Hypothesis That Treatments Have Negligible Effects: MinimumEffect Tests in the General Linear Model
"... Researchers are often interested in testing the hypothesis that the effects of treatments, interventions, and so on are negligibly small rather than testing the hypothesis that treatments have no effect whatsoever. A number of procedures for conducting such tests have been suggested but have yet to ..."
Abstract

Researchers are often interested in testing the hypothesis that the effects of treatments, interventions, and so on are negligibly small rather than testing the hypothesis that treatments have no effect whatsoever. A number of procedures for conducting such tests have been suggested but have yet to be widely adopted. In this article, simple methods of testing such minimumeffect hypotheses are illustrated in a variety of applications of the general linear model. Tables and computational routines that can be used in conjunction with the familiar F test to evaluate the hypothesis that the effects of treatments or interventions exceed some minimum level are also provided. One of the most common statistical procedures in the behavioral and social sciences is to test the hypothesis that treatments or interventions have no effect, or that the correlation between two variables is equal to zero, and so on. Cohen (1994) referred to these procedures as "nil hypothesis " tests, a label that differentiates them from the more general category of null hypothesis tests (which allow researchers to test the hypothesis that the difference between two treatments is equal to any specific figure, including but not limited to zero) and that makes it explicit that this particular class of tests is used to evaluate the plausibility of the hypothesis that treatments or interventions have no true effect whatsoever. Although nil hypothesis tests are extremely common, there is a substantial controversy about their value and meaning (Chow, 1988; Co
Testing
"... The current, nearly omnipresent, approach to hypothesis testing in all of the social sciences is a synthesis of the Fisher test of significance and the NeymanPearson hypothesis test. In this “modern ” procedure, two hypotheses are posited: a null or restricted hypothesis (H0) which competes with an ..."
Abstract
The current, nearly omnipresent, approach to hypothesis testing in all of the social sciences is a synthesis of the Fisher test of significance and the NeymanPearson hypothesis test. In this “modern ” procedure, two hypotheses are posited: a null or restricted hypothesis (H0) which competes with an alternative or research hypothesis (H1) describing two complementary notions about some phenomenon. The research hypothesis is the probability model which describes the author’s belief about some underlying aspect of the data, and operationalizes this belief through a parameter: θ. In the simplest case, described in every introductory text, a null hypothesis asserts that θ = 0 and a complementary research hypothesis asserts that θ = 0. More generally, the test evaluates a parameter vector: θ = {θ1, θ2,...,θm}, and the null hypothesis places restrictions on some subset (ℓ ≤ m) of the theta vector such as: θi = k1θj + k2 with constants k1 and k2. A test statistic (T), some function of θ and the data, is calculated and compared with its known distribution under the assumption that H0 is true. Commonly used test statistics are sample means ( ¯ X), chisquare statistics (χ2), and tstatistics in linear (OLS) regression analysis. The test procedure assigns one of two decisions (D0, D1) to all possible values in
unknown title
"... About some misconceptions and the discontent with statistical tests in psychology 1 ..."
Abstract
About some misconceptions and the discontent with statistical tests in psychology 1
PacifiCare Behavioral Health
"... This preliminary study evaluated the effectiveness of psychotherapy treatment for adult clinical depression provided in a natural setting by benchmarking the clinical outcomes in a managed care environment against effect size estimates observed in published clinical trials. Overall results suggest ..."
Abstract
This preliminary study evaluated the effectiveness of psychotherapy treatment for adult clinical depression provided in a natural setting by benchmarking the clinical outcomes in a managed care environment against effect size estimates observed in published clinical trials. Overall results suggest that effect size estimates of effectiveness in a managed care context were comparable to effect size estimates of efficacy observed in clinical trials. Relative to the 1tailed 95thpercentile critical effect size estimates, effectiveness of treatment provided in this setting was observed to be between 80 % (patients with comorbidity and without antidepressants) and 112 % (patients without comorbidity concurrently on antidepressants) as compared to the benchmarks. Because the nature of the treatments delivered in the managed care environment were unknown, it was not possible to make conclusions about treatments. However, while replications are warranted, concerns that psychotherapy delivered in a naturalistic setting is inferior to treatments delivered in clinical trials appear unjustified.
Preliminary Evidence on the Effectiveness of Psychological Treatments Delivered at a University Counseling Center
"... Treatment data from a university counseling center (UCC) that utilized the Outcome Questionnaire–45.2 (OQ45; M. J. Lambert et al., 2004), a selfreport general clinical symptom measure, was compared against treatment efficacy benchmarks from clinical trials of adult major depression that utilized s ..."
Abstract
Treatment data from a university counseling center (UCC) that utilized the Outcome Questionnaire–45.2 (OQ45; M. J. Lambert et al., 2004), a selfreport general clinical symptom measure, was compared against treatment efficacy benchmarks from clinical trials of adult major depression that utilized similar measures. Statistical analyses suggested that the treatment effect size estimate obtained at this counseling center with clients whose level of psychological distress was above the OQ45 clinical cutoff score was similar to treatment efficacy observed in clinical trials. Analyses on OQ45 items suggested that clients elevated on 3 items indicating problematic substance use resulted in poorer treatment outcomes. In addition, clients who reported their relational status as separated or divorced had poorer outcomes than did those who reported being partnered or married, and clients reporting intimacy issues resulted in greater numbers of sessions. Although differential treatment effect due to training level was found where interns and other trainees had better pre–post outcome than did staff, interpretation of this result requires great caution because clients perceived to have complicated issues are actively reassigned to staff. More effectiveness investigations at UCCs are warranted.
"... On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of Pvalues and the virtues of Bayesian evidence ..."
Abstract
On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of Pvalues and the virtues of Bayesian evidence