Results 1  10
of
11
Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy
 Psychological Methods
, 2000
"... Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other ..."
Abstract

Cited by 88 (0 self)
 Add to MetaCart
(Show Context)
Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other objections to its use have also been raised. In this article the author reviews and comments on the claimed misunderstandings as well as on other criticisms of the approach, and he notes arguments that have been advanced in support of NHST. Alternatives and supplements to NHST are considered, as are several related recommendations regarding the interpretation of experimental data. The concluding opinion is that NHST is easily misunderstood and misused but that when applied with good judgment it can be an effective aid to the interpretation of experimental data. Null hypothesis statistical testing (NHST1) is arguably the most widely used method of analysis of data collected in psychological experiments and has been so for about 70 years. One might think that a method that had been embraced by an entire research community would be well understood and noncontroversial after many decades of constant use. However, NHST is very controversial.2 Criticism of the method, which essentially began with the introduction of the technique (Pearce, 1992), has waxed and waned over the years; it has been intense in the recent past. Apparently, controversy regarding the idea of NHST more generally extends back more than two and a half
Bayesian inference procedures derived via the concept of relative surprise
 Communications in Statistics
, 1997
"... of least relative surprise; model checking; change of variable problem; crossvalidation. We consider the problem of deriving Bayesian inference procedures via the concept of relative surprise. The mathematical concept of surprise has been developed by I.J. Good in a long sequence of papers. We make ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
of least relative surprise; model checking; change of variable problem; crossvalidation. We consider the problem of deriving Bayesian inference procedures via the concept of relative surprise. The mathematical concept of surprise has been developed by I.J. Good in a long sequence of papers. We make a modiÞcation to this development that permits the avoidance of a serious defect; namely, the change of variable problem. We apply relative surprise to the development of estimation, hypothesis testing and model checking procedures. Important advantages of the relative surprise approach to inference include the lack of dependence on a particular loss function and complete freedom to the statistician in the choice of prior for hypothesis testing problems. Links are established with common Bayesian inference procedures such as highest posterior density regions, modal estimates and Bayes factors. From a practical perspective new inference
Measures of Surprise in Bayesian Analysis
 Duke University
, 1997
"... Measures of surprise refer to quantifications of the degree of incompatibility of data with some hypothesized model H 0 without any reference to alternative models. Traditional measures of surprise have been the pvalues, which are however known to grossly overestimate the evidence against H 0 . Str ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Measures of surprise refer to quantifications of the degree of incompatibility of data with some hypothesized model H 0 without any reference to alternative models. Traditional measures of surprise have been the pvalues, which are however known to grossly overestimate the evidence against H 0 . Strict Bayesian analysis calls for an explicit specification of all possible alternatives to H 0 so Bayesians have not made routine use of measures of surprise. In this report we CRITICALLY REVIEw the proposals that have been made in this regard. We propose new modifications, stress the connections with robust Bayesian analysis and discuss the choice of suitable predictive distributions which allow surprise measures to play their intended role in the presence of nuisance parameters. We recommend either the use of appropriate likelihoodratio type measures or else the careful calibration of pvalues so that they are closer to Bayesian answers. Key words and phrases. Bayes factors; Bayesian pvalues; Bayesian robustness; Conditioning; Model checking; Predictive distributions. 1.
The Journal of SocioEconomics 33 (2004) 527–546 Size matters: the standard error of regressions in the American Economic Review
"... Significance testing as used has no theoretical justification. Our article in the Journal of Economic Literature (1996) showed that of the 182 fulllength papers published in the 1980s in the American Economic and Review 70 % did not distinguish economic from statistical significance. Since 1996 man ..."
Abstract
 Add to MetaCart
Significance testing as used has no theoretical justification. Our article in the Journal of Economic Literature (1996) showed that of the 182 fulllength papers published in the 1980s in the American Economic and Review 70 % did not distinguish economic from statistical significance. Since 1996 many colleagues have told us that practice has improved. We interpret their response as an empirical claim, a judgment about a fact. Our colleagues, unhappily, are mistaken: significance testing is getting worse. We find here that in the next decade, the 1990s, of the 137 papers using a test of statistical significance in the AER fully 82 % mistook a merely statistically significant finding for an economically significant finding. A super majority (81%) believed that looking at the sign of a coefficient sufficed for science, ignoring size. The mistake is causing economic damage: losses of jobs and justice, and indeed of human lives (especially in, to mention another field enchanted with statistical significance as against substantive significance, medical science). The confusion between fit and importance is causing false hypotheses to be accepted and true hypotheses to be rejected. We propose a publication standard for the future: “Tell me the oomph of your coefficient; and do not confuse it with merely statistical significance.”
Types of Coherence and Coherence among Types
"... Abstract. Recent works on the notion of coherence and the coherence theory of justification begin with the assumption that a notion of coherence that exploits deductive relationships between two hypotheses is wellunderstood. As a result, such works highlight the study of a weaker notion of coherenc ..."
Abstract
 Add to MetaCart
Abstract. Recent works on the notion of coherence and the coherence theory of justification begin with the assumption that a notion of coherence that exploits deductive relationships between two hypotheses is wellunderstood. As a result, such works highlight the study of a weaker notion of coherence that relies on some sort of mutual support of two hypotheses and their inductive relationships. Epistemologists of this variety hope that this approach will ultimately lead to a better understanding of both notions of coherence and the justification at the core of the coherence theory of justification. In contrast, this paper, by adopting a Bayesian stance toward epistemological issues, argues that coherence that manifests in deductive relations between two hypotheses is far from being well understood. After distinguishing among three types of coherence between two hypotheses when one hypothesis entails the other, the paper spells out several implications of these distinctions in both epistemology of science and statistical inference that both include and go
Comparing Holistic and Atomistic Evaluation of Evidence
, 2012
"... Fact finders in legal trials often need to evaluate a mass of weak, contradictory and ambiguous evidence. There are two general ways to accomplish this task: by holistically forming a coherent mental representation of the case, or by atomistically assessing the probative value of each item of eviden ..."
Abstract
 Add to MetaCart
Fact finders in legal trials often need to evaluate a mass of weak, contradictory and ambiguous evidence. There are two general ways to accomplish this task: by holistically forming a coherent mental representation of the case, or by atomistically assessing the probative value of each item of evidence and integrating the values according to an algorithm. Parallel constraint satisfaction (PCS) models of cognitive coherence posit that a coherent mental representation is created by discounting contradicting evidence, inflating supporting evidence and interpreting ambivalent evidence in a way coherent with the emerging decision. This leads to inflated support for whichever hypothesis the fact finder accepts as true. Using a Bayesian network to model the direct dependencies between the evidence, the intermediate hypotheses and the main hypothesis, parameterised with (conditional) subjective probabilities elicited from the subjects, I demonstrate experimentally how an atomistic evaluation of evidence leads to a convergence of the computed posterior degrees of belief in the guilt of the defendant of those who convict and those who acquit. The atomistic evaluation preserves the inherent uncertainty that largely disappears in a holistic evaluation. Since the fact finders ’ posterior degree of belief in the guilt of the defendant is the relevant standard of proof in many legal systems, this result implies that using an atomistic evaluation of evidence, the threshold level of posterior belief in guilt required for a conviction may often not be reached. Max Planck Institute for Research on Collective Goods, Bonn 1 I.
EIGHT YEARS AGO, IN "THE STANDARD ERROR OF REGRESSIONS,"
"... Sophisticated, hurried readers continue to judge works on the sophistication of their surfaces....I mean only to utter darkly that in the present confusion of technical sophistication and significance, an emperor or two might slip by with no clothes. Annie Dillard, Living by Fiction ..."
Abstract
 Add to MetaCart
Sophisticated, hurried readers continue to judge works on the sophistication of their surfaces....I mean only to utter darkly that in the present confusion of technical sophistication and significance, an emperor or two might slip by with no clothes. Annie Dillard, Living by Fiction
The pvalue, the Bayes/NeymanPearson Compromise and the Teaching of Statistical Inference in Introductory Business Statistics
"... Traditionally the NeymanPearson approach to hypothesis testing has been presented in introductory business statistics courses. However, many students as well as researchers find the decisions reached by this approach, i.e., reject/failtoreject, inconsistent with their understanding of the scienti ..."
Abstract
 Add to MetaCart
Traditionally the NeymanPearson approach to hypothesis testing has been presented in introductory business statistics courses. However, many students as well as researchers find the decisions reached by this approach, i.e., reject/failtoreject, inconsistent with their understanding of the scientific process, namely accumulating evidence in support of a hypothesis. The proposed framework provides an easily understood rationale for introducing the student to I.J. Good's Bayes/NeymanPearson compromise as represented by Good's standardized pvalues. Standardized pvalues are a useful and practical tool for the evidentialist interpretation of data within the context of NeymanPearson hypothesis testing, something
Frequentist and Bayesian confidence intervals
"... Abstract. Frequentist (classical) and Bayesian approaches to the construction of confidence limits are compared. Various examples which illustrate specific problems are presented. The Likelihood Principle and the Stopping Rule Paradox are discussed. The performance of the different methods is invest ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Frequentist (classical) and Bayesian approaches to the construction of confidence limits are compared. Various examples which illustrate specific problems are presented. The Likelihood Principle and the Stopping Rule Paradox are discussed. The performance of the different methods is investigated relative to the properties coherence, precision, bias, universality, simplicity. A proposal on how to define error limits in various cases are derived from the comparison. They are based on the likelihood function only and follow in most cases the general practice in high energy physics. Classical methods are not recommended because they violate the Likelihood Principle, they can produce inconsistent results, suffer from lack of precision and generality. Also the extreme Bayesian approach with arbitrary choice of the prior probability density or priors deduced from scaling laws is rejected.