Results 1 
9 of
9
The earth is round (p < .05
 American Psychologist
, 1994
"... After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is ..."
Abstract

Cited by 113 (0 self)
 Add to MetaCart
After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects Ho one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods is suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences,
Could Fisher, Jeffreys, and Neyman Have Agreed on Testing?
, 2002
"... Ronald Fisher advocated testing using pvalues; Harold Jeffreys proposed use of objective posterior probabilities of hypotheses; and Jerzy Neyman recommended testing with fixed error probabilities. Each was quite critical of the other approaches. ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
Ronald Fisher advocated testing using pvalues; Harold Jeffreys proposed use of objective posterior probabilities of hypotheses; and Jerzy Neyman recommended testing with fixed error probabilities. Each was quite critical of the other approaches.
Alphabet Soup Blurring the Distinctions Between p’s and �’s in Psychological Research
"... Abstract. Confusion over the reporting and interpretation of results of commonly employed classical statistical tests is recorded in a sample of 1,645 papers from 12 psychology journals for the period 1990 through 2002. The confusion arises because researchers mistakenly believe that their interpret ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. Confusion over the reporting and interpretation of results of commonly employed classical statistical tests is recorded in a sample of 1,645 papers from 12 psychology journals for the period 1990 through 2002. The confusion arises because researchers mistakenly believe that their interpretation is guided by a single unified theory of statistical inference. But this is not so: classical statistical testing is a nameless amalgamation of the rival and often contradictory approaches developed by Ronald Fisher, on the one hand, and Jerzy Neyman and Egon Pearson, on the other. In particular, there is extensive failure to acknowledge the incompatibility of Fisher’s evidential p value with the Type I error rate, α, of Neyman–Pearson statistical orthodoxy. The distinction between evidence (p’s) and errors (α’s) is not trivial. Rather, it reveals the basic differences underlying Fisher’s ideas on significance testing and inductive inference, and Neyman–Pearson views on hypothesis testing and inductive behavior. So complete is this misunderstanding over measures of evidence
Copyright 1998 by The Journal of Bone and Joint Surgery, Incorporated Current Concepts Review Principles of Epidemiology for the Orthopaedic Surgeon *
"... It has been stated that “‘the object of any science is the accumulation of systematized verifiable knowledge,’ and that this is to be achieved through ‘observation, experiment and thought.’ ” 12 Orthopaedists are concerned primarily with individual patients; epidemiologists study the occurrence of d ..."
Abstract
 Add to MetaCart
It has been stated that “‘the object of any science is the accumulation of systematized verifiable knowledge,’ and that this is to be achieved through ‘observation, experiment and thought.’ ” 12 Orthopaedists are concerned primarily with individual patients; epidemiologists study the occurrence of disease or other healthrelated conditions or events in defined populations 26. Epidemiological research is based on the systematic collection of observations related to the phenomenon of interest in a defined population. These data then are subjected to quantification, which includes the measurement of random variables, the estimation of population parameters, and the statistical testing of hypotheses 22. The changing profile of healthcaredelivery systems requires orthopaedists to go beyond the individual
Preprint of the Book Chapter: “Bayesian Versus Frequentist Inference”
"... Throughout this book, the topic of orderrestricted inference is dealt with almost exclusively from a Bayesian perspective. Some readers may wonder why the other main school for statistical inference – frequentist inference – has received so little attention here. Isn’t it true that in the field of ..."
Abstract
 Add to MetaCart
Throughout this book, the topic of orderrestricted inference is dealt with almost exclusively from a Bayesian perspective. Some readers may wonder why the other main school for statistical inference – frequentist inference – has received so little attention here. Isn’t it true that in the field of psychology, almost all inference is frequentist inference? The first goal of this chapter is to highlight why frequentist inference is a lessthanideal method for statistical inference. The most fundamental limitation of standard frequentist inference is that it does not condition on the observed data. The resulting paradoxes have sparked a philosophical debate that statistical practitioners have conveniently ignored. What cannot be so easily ignored are the practical limitations of frequentist inference, such as its restriction to nested model comparisons. The second goal of this chapter is to highlight the theoretical and practical advantages of a Bayesian analysis. From a theoretical perspective, Bayesian inference is principled and prescriptive, and – in contrast to frequentist inference – a method that does condition on the observed data. From a practical perspective, Bayesian inference
Data Analysis Considerations in Producing ‘Comparable ’ Information for Water Quality Management Purposes
"... Water quality monitoring is being used in local, regional, and national scales to measure how water quality variables behave in the natural environment. A common problem, which arises from monitoring, is how to relate information contained in data to the information needed by water resource manageme ..."
Abstract
 Add to MetaCart
Water quality monitoring is being used in local, regional, and national scales to measure how water quality variables behave in the natural environment. A common problem, which arises from monitoring, is how to relate information contained in data to the information needed by water resource management for decisionmaking. This is generally attempted through statistical analysis of the monitoring data. However, how the selection of methods with which to routinely analyze the data affects the quality and comparability of information produced is not as well understood as may first appear. To help understand the connectivity between the selection of methods for routine data analysis and the information produced to support management, the following three tasks were performed. An examination of the methods that are currently being used to analyze water quality monitoring data, including published criticisms of them. An exploration of how the selection of methods to analyze water quality data can impact the comparability of information used for water quality management purposes. Development of options by which data analysis methods employed in water quality
Section on Statistical Education – JSM 2010 The Undetectable Difference: An Experimental Look at the “Problem ” of pValues
"... In the face of continuing assumptions by many scientists and journal editors that pvalues provide a gold standard for inference, counter warnings are published periodically. But the core problem is not with pvalues, per se. A finding that “pvalue is less than α” could merely signal that a critica ..."
Abstract
 Add to MetaCart
In the face of continuing assumptions by many scientists and journal editors that pvalues provide a gold standard for inference, counter warnings are published periodically. But the core problem is not with pvalues, per se. A finding that “pvalue is less than α” could merely signal that a critical value has been exceeded. The question is why, when estimating a parameter, we provide a range (a confidence interval), but when testing a hypothesis about a parameter (e.g. µ = x) we proceed as if “= ” entails exact equality of the parameter with x. That standard is hard to meet, and is not a standard expected for power calculations, where we are satisfied to reject H0 if the result is merely “detectably” different from (exact) H0. This paper explores, with resampling methods, the impacts on pvalues, and alternatives, if the null hypothesis is defined as a thick or thin range of values. It also examines, empirically, the extent to which the pvalue may or may not be a good predictor of the probability that H0 is true, given the distribution of the data.
Testing University Rankings Statistically: Why this Perhaps is not such a Good Idea after All. Some Reflections on Statistical Power, Effect Size, Random Sampling and Imaginary Populations
"... In this paper we discuss and question the use of statistical significance tests in relation to university rankings as recently suggested. We outline the assumptions behind and interpretations of statistical significance tests and relate this to examples from the recent SCImago Institutions Ranking. ..."
Abstract
 Add to MetaCart
In this paper we discuss and question the use of statistical significance tests in relation to university rankings as recently suggested. We outline the assumptions behind and interpretations of statistical significance tests and relate this to examples from the recent SCImago Institutions Ranking. By use of statistical power analyses and demonstration of effect sizes, we emphasize that importance of empirical findings lies in “differences that make a difference ” and not statistical significance tests per se. Finally we discuss the crucial assumption of randomness and question the presumption that randomness is present in the university ranking data. We conclude that the application of statistical significance tests in relation to university rankings, as recently advocated, is problematic and can be misleading.