Results 1  10
of
53
The earth is round (p < .05
 American Psychologist
, 1994
"... After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is ..."
Abstract

Cited by 180 (0 self)
 Add to MetaCart
After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects Ho one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods is suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences,
Consequences of prejudice against the null hypothesis
 Psychological Bulletin
, 1975
"... The consequences of prejudice against accepting the null hypothesis were examined through (a) a mathematical model intended to stimulate the researchpublication process and (b) case studies of apparent erroneous rejections of the null hypothesis in published psychological research. The input param ..."
Abstract

Cited by 60 (9 self)
 Add to MetaCart
The consequences of prejudice against accepting the null hypothesis were examined through (a) a mathematical model intended to stimulate the researchpublication process and (b) case studies of apparent erroneous rejections of the null hypothesis in published psychological research. The input parameters for the model characterize investigators ' probabilities of selecting a problem for which the null hypothesis is true, of reporting, following up on, or abandoning research when data do or do not reject the null hypothesis, and they characterize editors ' probabilities of publishing manuscripts concluding in favor of or against the null hypothesis. With estimates of the input parameters based on a questionnaire survey of a sample of social psychologists, the model output indicates a dysfunctional researchpublication system. Particularly, the model indicates that there may be relatively few publications on problems for which the null hypothesis is (at least to a reasonable approximation) true, and of these, a high proportion will erroneously reject the null hypothesis. The case studies provide additional support for this conclusion. Accordingly, it is
The case against statistical significance testing
 Harvard Educational Review
, 1978
"... In recent years the use of traditional statistical methods in educational research has increasingly come under attack. In this article, Ronald P Carver exposes the fantasies often entertained by researchers about the meaning of statistical significance. The author recommends abandoning all statistic ..."
Abstract

Cited by 53 (0 self)
 Add to MetaCart
(Show Context)
In recent years the use of traditional statistical methods in educational research has increasingly come under attack. In this article, Ronald P Carver exposes the fantasies often entertained by researchers about the meaning of statistical significance. The author recommends abandoning all statistical significance testing and suggests other ways of evaluating research results. Carver concludes that we should return to the scientific method of examining data and replicating results rather than relying on statistical significance testing to provide equivalent information. Statistical significance testing has involved more fantasy than fact. The emphasis on statistical significance over scientific significance in educational research represents a corrupt form of the scientific method. Educational research would be better off if it stopped testing its results for statistical significance. The case against statistical significance testing has been developed by many critics (see Morrison & Henkel, 1970b). For example, after a detailed analysis Bakan (1966) concluded that &quot;the test of statistical significance in psychological research may be taken as an instance of a kind of essential mindlessness in the conduct of research &quot; (p. 436); and as early as 1963
Psychology will be a much better science when we change the way we analyze data
 Current Directions in Psychological Science
, 1996
"... because I believed that within it dwelt some of the most fundamental and challenging problems of the extant sciences. Who could not be intrigued, for example, by the relation between consciousness and behavior, or the rules guiding interactions in social situations, or the processes that underlie de ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
because I believed that within it dwelt some of the most fundamental and challenging problems of the extant sciences. Who could not be intrigued, for example, by the relation between consciousness and behavior, or the rules guiding interactions in social situations, or the processes that underlie development from infancy to maturity? Today, in 1996, my fascination with these problems is undiminished. But I've developed a certain angst over the intervening thirtysomething years—a constant, nagging feeling that our field spends a lot of time spinning its wheels without really making all that much progress. This problem shows up in obvious ways—for instance, in the regularity with which findings seem not to replicate. It also shows up in subtler ways—for instance, one doesn't often hear Psychologists saying, "Well this problem is solved now; let's move on to the next one " (as, for example, Johannes Kepler must have said over three centuries ago, after he had cracked the problem of describing planetary motion). I've come to believe that at least part of this problem revolves around our tools—particularly the tools that we use in the critical domains of data analysis and data interpretation. What we do, I sometimes feel, is akin to trying to build a violin using a stone mallet and a chainsaw. The tooltotask fit is not all that good, and as a result, we wind up building a lot of poorquality violins. My purpose here is to elaborate on these issues. In what follows, I will summarize our major dataanalysis and datainterpretation tools, and describe what I believe to be amiss with them. I will then offer some suggestions for change.
The data analysis dilemma: Ban or abandon. A review of null hypothesis significance testing
 Research in the Schools
, 1998
"... Null Hypothesis Significance Testing (NHST) is reviewed in a historical context. The most vocal criticisms of NHST that have appeared in the literature over the past 50 years are outlined. The authors conclude, based on the criticism of NHST and the alternative methods that have been proposed, that ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Null Hypothesis Significance Testing (NHST) is reviewed in a historical context. The most vocal criticisms of NHST that have appeared in the literature over the past 50 years are outlined. The authors conclude, based on the criticism of NHST and the alternative methods that have been proposed, that viable alternatives to NHST are currently available. The use of effect magnitude measures with surrounding confidence intervals and indications of the reliability of the study are recommended for individual research studies. Advances in the use of metaanalytic techniques provide us with opportunities to advance cumulative knowledge, and all research should be aimed at this goal. The authors provide discussions and references to more information on effect magnitude measures, replication techniques and metaanalytic techniques. A brief situational assessment of the research landscape and strategies for change are offered. It is generally accepted that the purpose of scientific inquiry is to advance the knowledge base of humankind by seeking evidence of a phenomena via valid experiments. In the educational arena, the confirmation of a phenomena should give teachers confidence in their methods and policy makers confidence that their policies will lead to better education for children and adults. We
Statistical significance testing: a historical overview of misuse and misinterpretation with implication for the editorial policies of educational journals
 Research in the Schools
, 1998
"... Statistical significance tests (SSTs) have been the object of much controversy among social scientists. Proponents have hailed SSTs as an objective means for minimizing the likelihood that chance factors have contributed to research results; critics have both questioned the logic underlying SSTs and ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Statistical significance tests (SSTs) have been the object of much controversy among social scientists. Proponents have hailed SSTs as an objective means for minimizing the likelihood that chance factors have contributed to research results; critics have both questioned the logic underlying SSTs and bemoaned the widespread misapplication and misinterpretation of the results of these tests. The present paper offers a framework for remedying some of the common problems associated with SSTs via modification of journal editorial policies. The controversy surrounding SSTs is overviewed, with attention given to both historical and more contemporary criticisms of bad practices associated with misuse of SSTs. Examples from the editorial policies of Educational and Psychological Measurement and several other journals that have established guidelines for reporting results of SSTs are overviewed, and suggestions are provided regarding additional ways that educational journals may address the problem. Statistical significance testing has existed in some form for approximately 300 years (Huberty, 1993) and has served an important purpose in the advancement of inquiry in the social sciences. However, there has been much controversy over the misuse and misinterpretation of statistical significance testing (Daniel, 1992b).
Rationality in psychological research: The goodenough principle
 American Psychologist
, 1985
"... ABSTRACT. " This article reexamines a number of methodological and procedural issues raised by Meehl (1967, 1978) that seem to question the rationality of psychological inquiry. The first issue concerns the asymmetry in theory testing between psychology and physics and the resulting paradox tha ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
ABSTRACT. " This article reexamines a number of methodological and procedural issues raised by Meehl (1967, 1978) that seem to question the rationality of psychological inquiry. The first issue concerns the asymmetry in theory testing between psychology and physics and the resulting paradox that, because the psychological null hypothesis is always false, increases in precision in psychology always lead to weaker tests of a theory, whereas the converse is true in physics. The second issue, related to the first, regards the slow progress observed in psychological research and the seeming unwillingness of social scientists to take seriously the Popperian requirements for intellectual honesty. We propose a goodenough principle to resolve Meehl's methodological paradox and appeal
Effect sizes and p values: What should be reported . . . ?
, 1996
"... Despite publication of many wellargued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics' catalogs of NHT's flaws, this article also takes the unusual stance of identifying vi ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Despite publication of many wellargued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics' catalogs of NHT's flaws, this article also takes the unusual stance of identifying virtues that may explain why NHT continues to be so extensively used. These virtues include providing results in the form of a dichotomous (yes/no) hypothesis evaluation and providing an index (p value) that has a justifiable mapping onto confidence in repeatability of a null hypothesis rejection. The mostcriticized flaws of NHT can be avoided when the importance of a hypothesis, rather than the p value of its test, is used to determine that a finding is worthy of report, and when p = .05 is treated as insufficient basis for confidence in the replicability of an isolated nonnull finding. Together with many recent critics of NHT, we also urge reporting of important hypothesis tests in enough descriptive detail to permit secondary uses such as metaanalysis.
Experimental comparison of the comprehensibility of a umlbased formal specification versus a textual one
 in Proceedings of 11 th International Conference on Evaluation and Assessment in Software Engineering (EASE
"... The primary objective of software specification is to promote understanding of the system properties between stakeholders. Specification comprehensibility is essential particularly during software validation and maintenance as it permits the understanding of the system properties more easily and qui ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
The primary objective of software specification is to promote understanding of the system properties between stakeholders. Specification comprehensibility is essential particularly during software validation and maintenance as it permits the understanding of the system properties more easily and quickly prior to the required tasks. Formal notation such as B increases a specification’s precision and consistency. However, the notation is regarded as being difficult to comprehend due to its unfamiliar symbols and rules of interpretation. Semiformal notation such as the Unified Modelling Language (UML) is perceived as more accessible but it cannot be verified systematically to ensure a specification’s accuracy. Integrating the UML and B could perhaps produce an accurate and approachable specification. This paper presents an experimental comparison of the comprehensibility of a UMLbased graphical formal specification versus a purely textual formal specification. The measurement focused on the efficiency in performing the comprehension tasks. The experiment employed a crossover design and was conducted on fortyone thirdyear and masters students. The results show that the integration of semiformal and formal notations expedites the subjects’ comprehension tasks with accuracy even with limited hours of training.
Cumulative research knowledge and social policy formulation: the critical role of metaanalysis
 Psychology, Public Policy, and Law
, 1996
"... For many years, policymakers expressed increasing frustration with social science research. On every issue there were studies arguing for diametrically opposed conclusions. Methods of metaanalysis that correct for the effects of sampling error have shown that almost all such conflicting results wer ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
For many years, policymakers expressed increasing frustration with social science research. On every issue there were studies arguing for diametrically opposed conclusions. Methods of metaanalysis that correct for the effects of sampling error have shown that almost all such conflicting results were caused by sampling error. Furthermore, the effects of sampling error are greatly exaggerated by using significance test methodology. In many areas, metaanalysis has now provided dependable answers to the original research questions. Metaanalysis is now increasingly being used by policymakers, by textbook writers, and by theorists to provide the basic facts needed to draw both practical and explanatory conclusions. Sophisticated metaanalysis procedures are now used to correct for the effects of other study imperfections, such as measurement error, range restriction, and artificial dichotomization. In domains where the data on artifacts are available, the effect sizes in necessarily imperfect studies have been found to be considerably understated. Path analysis can be applied to the findings from metaanalysis to yield improved causal analyses that result in both explanation of results and improved generalization of