Results 1  10
of
67
The earth is round (p < .05
 American Psychologist
, 1994
"... After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is ..."
Abstract

Cited by 346 (0 self)
 Add to MetaCart
(Show Context)
After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred.05 criterion—still persists. This article reviews the problems with this practice, including its nearuniversal misinterpretation ofp as the probability that Ho is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects Ho one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods is suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences,
Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinicalstatistical controversy
 Psychology, Public Policy, and Law
, 1996
"... Given a data set about an individual or group (e.g., interviewer ratings, life history or demographic facts, test results, selfdescriptions), there are two modes of data combination for a predictive or diagnostic purpose. The clinical method relies on human judgment that is based on informal contem ..."
Abstract

Cited by 120 (3 self)
 Add to MetaCart
(Show Context)
Given a data set about an individual or group (e.g., interviewer ratings, life history or demographic facts, test results, selfdescriptions), there are two modes of data combination for a predictive or diagnostic purpose. The clinical method relies on human judgment that is based on informal contemplation and, sometimes, discussion with others (e.g., case conferences). The mechanical method involves a formal, algorithmic, objective procedure (e.g., equation) to reach the decision. Empirical comparisons of the accuracy of the two methods (136 studies over a wide range of predictands) show that the mechanical method is almost invariably equal to or superior to the clinical method: Common antiactuarial arguments are rebutted, possible causes of widespread resistance to the comparative research are offered, and policy implications of the statistical method’s superiority are discussed. In 1928, the Illinois State Board of Parole published a study by sociologist Burgess of the parole outcome for 3,000 criminal offenders, an exhaustive sample of parolees in a period of years preceding. (In Meehl 1954/1996, this number is erroneously reported as 1,000, a slip probably arising from the fact that 1,000 cases came from each of three Illinois prisons.) Burgess combined 21 objective factors (e.g., nature of crime,
Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy
 Psychological Methods
, 2000
"... Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other ..."
Abstract

Cited by 88 (0 self)
 Add to MetaCart
(Show Context)
Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other objections to its use have also been raised. In this article the author reviews and comments on the claimed misunderstandings as well as on other criticisms of the approach, and he notes arguments that have been advanced in support of NHST. Alternatives and supplements to NHST are considered, as are several related recommendations regarding the interpretation of experimental data. The concluding opinion is that NHST is easily misunderstood and misused but that when applied with good judgment it can be an effective aid to the interpretation of experimental data. Null hypothesis statistical testing (NHST1) is arguably the most widely used method of analysis of data collected in psychological experiments and has been so for about 70 years. One might think that a method that had been embraced by an entire research community would be well understood and noncontroversial after many decades of constant use. However, NHST is very controversial.2 Criticism of the method, which essentially began with the introduction of the technique (Pearce, 1992), has waxed and waned over the years; it has been intense in the recent past. Apparently, controversy regarding the idea of NHST more generally extends back more than two and a half
Psychology will be a much better science when we change the way we analyze data
 Current Directions in Psychological Science
, 1996
"... because I believed that within it dwelt some of the most fundamental and challenging problems of the extant sciences. Who could not be intrigued, for example, by the relation between consciousness and behavior, or the rules guiding interactions in social situations, or the processes that underlie de ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
because I believed that within it dwelt some of the most fundamental and challenging problems of the extant sciences. Who could not be intrigued, for example, by the relation between consciousness and behavior, or the rules guiding interactions in social situations, or the processes that underlie development from infancy to maturity? Today, in 1996, my fascination with these problems is undiminished. But I've developed a certain angst over the intervening thirtysomething years—a constant, nagging feeling that our field spends a lot of time spinning its wheels without really making all that much progress. This problem shows up in obvious ways—for instance, in the regularity with which findings seem not to replicate. It also shows up in subtler ways—for instance, one doesn't often hear Psychologists saying, "Well this problem is solved now; let's move on to the next one " (as, for example, Johannes Kepler must have said over three centuries ago, after he had cracked the problem of describing planetary motion). I've come to believe that at least part of this problem revolves around our tools—particularly the tools that we use in the critical domains of data analysis and data interpretation. What we do, I sometimes feel, is akin to trying to build a violin using a stone mallet and a chainsaw. The tooltotask fit is not all that good, and as a result, we wind up building a lot of poorquality violins. My purpose here is to elaborate on these issues. In what follows, I will summarize our major dataanalysis and datainterpretation tools, and describe what I believe to be amiss with them. I will then offer some suggestions for change.
Bootstraps taxometrics. Solving the classification problem in psychopathology
 American Psychologist
, 1995
"... Classification in psychopathology is a problem in applied mathematics; it answers the empirical question “Is the latent structure of these phenotypic indicator correlations taxonic (categories) or nontaxonic (dimensions, factors)?” It is not a matter of convention or preference. Two taxometric proce ..."
Abstract

Cited by 73 (6 self)
 Add to MetaCart
(Show Context)
Classification in psychopathology is a problem in applied mathematics; it answers the empirical question “Is the latent structure of these phenotypic indicator correlations taxonic (categories) or nontaxonic (dimensions, factors)?” It is not a matter of convention or preference. Two taxometric procedures, MAMBAC and MAXCOV–HITMAX, provide independent tests of the taxonic conjecture and satisfactorily accurate estimates of the taxon base rate, the latent means, and the valid and falsepositive rates achievable by various cuts. The method requires no gold standard criterion, applying crude fallible diagnostic “criteria ” only in the phase of discovery to identify plausible candidate indicators. Confidence in the inference to taxonic structure and numerical accuracy of latent values is provided by multiple consistency tests, hence the term
Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests
 Psychological Methods
, 2001
"... Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes a ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
Null hypothesis statistical testing (NHST) has been debated extensively but always successfully defended. The technical merits of NHST are not disputed in this article. The widespread misuse of NHST has created a human factors problem that this article intends to ameliorate. This article describes an integrated, alternative inferential confidence interval approach to testing for statistical difference, equivalence, and indeterminacy that is algebraically equivalent to standard NHST procedures and therefore exacts the same evidential standard. The combined numeric and graphic tests of statistical difference, equivalence, and indeterminacy are designed to avoid common interpretive problems associated with NHST procedures. Multiple comparisons, power, sample size, test reliability, effect size, and causeeffect ratio are discussed. A section on the proper interpretation of confidence intervals is followed by a decision rule summary and caveats. The longstanding controversy surrounding null hypothesis statistical testing (NHST) has typically been argued on its technical merits, and they are not dis
The presence of something or the absence of nothing: Increasing theoretical precision in management research
 Organizational Research Methods
, 2010
"... In management research, theory testing confronts a paradox described by Meehl in which designing studies with greater methodological rigor puts theories at less risk of falsification. This paradox exists because most management theories make predictions that are merely directional, such as stating t ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
In management research, theory testing confronts a paradox described by Meehl in which designing studies with greater methodological rigor puts theories at less risk of falsification. This paradox exists because most management theories make predictions that are merely directional, such as stating that two variables will be positively or negatively related. As methodological rigor increases, the probability that an estimated effect will differ from zero likewise increases, and the likelihood of finding support for a directional prediction boils down to a coin toss. This paradox can be resolved by developing theories with greater precision, such that their propositions predict something more meaningful than deviations from zero. This article evaluates the precision of theories in management research, offers guidelines for making theories more precise, and discusses ways to overcome barriers to the pursuit of theoretical precision.
On construct validity: Issues of method and measurement
 Psychological Assessment
, 2005
"... noting that psychologists study hypothetical, inferred entities and that validating measures of such entities involves basic theory testing. Three important developments in clinical assessment following that seminal article are noteworthy. First, clinical research has benefited from greater theoreti ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
noting that psychologists study hypothetical, inferred entities and that validating measures of such entities involves basic theory testing. Three important developments in clinical assessment following that seminal article are noteworthy. First, clinical research has benefited from greater theoretical integration and subsequent differentiation among related constructs. Second, implementation of ongoing, critical evaluation of all aspects of the construct validity process, including theory development, hypothesis specification, research design, and empirical evaluation, has improved clinical assessment. Third, improvement in evaluating fit between hypotheses and observations has been sought. Improved means of evaluating multitrait, multimethod designs, and ways to increase their clinical representativeness, are one encouraging development. Ongoing efforts to improve the construct validity process reflect the legacy of L. J. Cronbach and P. E. Meehl.
Typology of analytical and interpretational errors in quantitative and qualitative educational research
, 2003
"... The purpose of this paper is to identify and to discuss major analytical and interpretational errors that occur regularly in quantitative and qualitative educational research. A comprehensive review of the literature discussing various problems was conducted. With respect to quantitative data analys ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
The purpose of this paper is to identify and to discuss major analytical and interpretational errors that occur regularly in quantitative and qualitative educational research. A comprehensive review of the literature discussing various problems was conducted. With respect to quantitative data analyses, common analytical and interpretational misconceptions are presented for dataanalytic techniques representing each major member of the general linear model, including hierarchical linear modeling. Common errors associated with many of these approaches include (a) no evidence provided that statistical assumptions were checked; (b) no power/sample size considerations discussed; (c) inappropriate treatment of multivariate data; (d) use of stepwise procedures; (e) failure to report reliability indices for either previous or present samples; (f) no control for Type I error rate; and (g) failure to report effect sizes. With respect to qualitative research studies, the most common errors are failure to provide evidence for judging the dependability (i.e., reliability) and credibility (i.e., validity) of findings, generalizing findings beyond the sample, and failure to