Results 1  10
of
370
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract

Cited by 744 (0 self)
 Add to MetaCart
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust nonparametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding posthoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
Mediation in experimental and nonexperimental studies: new procedures and recommendations
 PSYCHOLOGICAL METHODS
, 2002
"... Mediation is said to occur when a causal effect of some variable X on an outcome Y is explained by some intervening variable M. The authors recommend that with small to moderate samples, bootstrap methods (B. Efron & R. Tibshirani, 1993) be used to assess mediation. Bootstrap tests are powerful ..."
Abstract

Cited by 696 (4 self)
 Add to MetaCart
Mediation is said to occur when a causal effect of some variable X on an outcome Y is explained by some intervening variable M. The authors recommend that with small to moderate samples, bootstrap methods (B. Efron & R. Tibshirani, 1993) be used to assess mediation. Bootstrap tests are powerful because they detect that the sampling distribution of the mediated effect is skewed away from 0. They argue that R. M. Baron and D. A. Kenny’s (1986) recommendation of first testing the X → Y association for statistical significance should not be a requirement when there is a priori belief that the effect size is small or suppression is a possibility. Empirical examples and computer setups for bootstrap analyses are provided. Mediation models of psychological processes are popular because they allow interesting associations to be decomposed into components that reveal possible causal mechanisms. These models are useful for theory development and testing as well as for the identification of possible points of intervention in applied work. Mediation is equally of interest to experimental psychologists as it is to those who study naturally occurring processes through nonexperimental studies. For example, social–cognitive psychologists are interested in showing that the effects of cognitive priming on attitude change are mediated by the accessibility of certain beliefs (Eagly & Chaiken, 1993). Developmental psychologists use longitudinal methods to study how parental unemployment can have adverse effects on child behavior through its intervening effect on quality of parenting (Conger et al., 1990). Mediation analysis is also used in organizational
Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers
 Psychological Methods
, 1996
"... Data analysis methods in psychology still emphasize statistical significance testing, despite numerous articles demonstrating its severe deficiencies. It is now possible to use metaanalysis to show that reliance on significance testing retards the development of cumulative knowledge. But reform of ..."
Abstract

Cited by 206 (0 self)
 Add to MetaCart
Data analysis methods in psychology still emphasize statistical significance testing, despite numerous articles demonstrating its severe deficiencies. It is now possible to use metaanalysis to show that reliance on significance testing retards the development of cumulative knowledge. But reform of teaching and practice will also require that researchers learn that the benefits that they believe flow from use of significance testing are illusory. Teachers must revamp their courses to bring students to understand that (a) reliance on significance testing retards the growth of cumulative research knowledge; (b) benefits widely believed to flow from significance testing do not in fact exist; and (c) significance testing methods must be replaced with point estimates and confidence intervals in individual studies and with metaanalyses in the integration of multiple studies. This reform is essential to the future progress of cumulative knowledge in psychological research. In 1990, Aiken, West, Sechrest, and Reno published an important article surveying the teaching of quantitative methods in graduate psychology programs. They were concerned about what was not being taught or was being inadequately taught to future researchers and the harm this might cause to research progress in psychology. For example, they found that new and important quantitative methods such as causal modeling, confirmatory factor analysis, and metaanalysis were not being taught in the majority of graduate programs. This is indeed a legitimate cause for concern. But in this article, I am concerned about the opposite: An earlier version of this article was presented as the presidential address to the Division of Evaluation,
Principles of Marketing
, 1999
"... Research on forecasting is extensive and includes many studies that have tested alternative methods in order to determine which ones are most effective. We review this evidence in order to provide guidelines for forecasting for marketing. The coverage includes intentions, Delphi, role playing, conjo ..."
Abstract

Cited by 194 (1 self)
 Add to MetaCart
(Show Context)
Research on forecasting is extensive and includes many studies that have tested alternative methods in order to determine which ones are most effective. We review this evidence in order to provide guidelines for forecasting for marketing. The coverage includes intentions, Delphi, role playing, conjoint analysis, judgmental bootstrapping, analogies, extrapolation, rulebased forecasting, expert systems, and econometric methods. We discuss research about which methods are most appropriate to forecast market size, actions of decision makers, market share, sales, and financial outcomes. In general, there is a need for statistical methods that incorporate the manager's domain knowledge. This includes rulebased forecasting, expert systems, and econometric methods. We describe how to choose a forecasting method and provide guidelines for the effective use of forecasts including such procedures as scenarios.
Using confidence intervals for graphically based data interpretation
 CANADIAN JOURNAL OF EXPERIMENTAL PSYCHOLOGY
, 2003
"... As a potential alternative to standard null hypothesis significance testing, we describe methods for graphical presentation of data – particularly condition means and their corresponding confidence intervals – for a wide range of factorial designs used in experimental psychology. We describe and il ..."
Abstract

Cited by 150 (19 self)
 Add to MetaCart
As a potential alternative to standard null hypothesis significance testing, we describe methods for graphical presentation of data – particularly condition means and their corresponding confidence intervals – for a wide range of factorial designs used in experimental psychology. We describe and illustrate confidence intervals specifically appropriate for betweensubject versus withinsubject factors. For designs involving more than two levels of a factor, we describe the use of contrasts for graphical illustration of theoretically meaningful components of main effects and interactions. These graphical techniques lend themselves to a natural and straightforward assessment of statistical power.
Writing metaanalytic reviews
 Psychological Bulletin
, 1995
"... This article describes what should typically be included in the introduction, method, results, and discussion sections of a metaanalytic review. Method sections include information on literature searches, criteria for inclusion of studies, and a listing of the characteristics recorded for each stud ..."
Abstract

Cited by 122 (1 self)
 Add to MetaCart
(Show Context)
This article describes what should typically be included in the introduction, method, results, and discussion sections of a metaanalytic review. Method sections include information on literature searches, criteria for inclusion of studies, and a listing of the characteristics recorded for each study. Results sections include information describing the distribution of obtained effect sizes, central tendencies, variability, tests of significance, confidence intervals, tests for heterogeneity, and contrasts (univariate or multivariate). The interpretation of metaanalytic results is often facilitated by the inclusion of the binomial effect size display procedure, the coefficient of robustness, file drawer analysis, and, where overall results are not significant, the counternull value of the obtained effect size and power analysis. The purpose of this article is to provide some guidelines for the preparation of metaanalytic reviews of literature. Metaanalytic reviews are quantitative summaries of research domains that describe the typical strength of the effect or phenomenon, its variability, its statistical significance, and the nature of the moderator variables from which one can predict the relative strength of the effect or phenomenon (Cooper, 1989; Glass,
AERA editorial policies regarding statistical significance testing: Three suggested reforms
 Educational Researcher
, 1996
"... comments on Thompson (1996), it is argued that describing results as "significant " rather than "statistically significant " is confusing to those persons most susceptible to misinterpreting this telegraphic wording. Contrary to Robinson and Levin's view, it is noted tha ..."
Abstract

Cited by 118 (7 self)
 Add to MetaCart
(Show Context)
comments on Thompson (1996), it is argued that describing results as "significant " rather than "statistically significant " is confusing to those persons most susceptible to misinterpreting this telegraphic wording. Contrary to Robinson and Levin's view, it is noted that the utility of the characterization of results as being due to "nonchance " is limited by the nature of the null hypothesis assumed to be true. It is suggested that effect sizes are important to interpret, even though they too can be misinterpreted; recent empirical studies of publications indicate that effect sizes are still too rarely reported. Finally, the value of "external " replicability analyses is acknowledged, but it is argued that "internal " replicability analyses can also be useful, and certainly are superior to statistical significance tests regarding evaluating result replicability, because statistical significance tests do not evaluate replicability.
Inference by eye: Confidence intervals and how to read pictures of data
 American Psychologist
, 2005
"... Wider use in psychology of confidence intervals (CIs), especially as error bars in figures, is a desirable development. However, psychologists seldom use CIs and may not understand them well. The authors discuss the interpretation of figures with error bars and analyze the relationship between CIs a ..."
Abstract

Cited by 118 (14 self)
 Add to MetaCart
(Show Context)
Wider use in psychology of confidence intervals (CIs), especially as error bars in figures, is a desirable development. However, psychologists seldom use CIs and may not understand them well. The authors discuss the interpretation of figures with error bars and analyze the relationship between CIs and statistical significance testing. They propose 7 rules of eye to guide the inferential use of figures with error bars. These include general principles: Seek bars that relate directly to effects of interest, be sensitive to experimental design, and interpret the intervals. They also include guidelines for inferential interpretation of the overlap of CIs on independent group means. Wider use of interval estimation in psychology has the potential to improve research communication substantially. Inference by eye is the interpretation of graphically presented data. On first seeing Figure 1, what questions should spring to mind and what inferences are justified? We discuss figures with means and confidence intervals (CIs), and propose rules of eye to guide the interpretation of such figures. We believe it is timely to consider inference by eye because psychologists are now being encouraged to make greater use of CIs. Many who seek reform of psychologists ’ statistical practices advocate a change in emphasis from null hypothesis significance testing (NHST) to CIs, among other techniques
Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy
 Psychological Methods
, 2000
"... Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other ..."
Abstract

Cited by 97 (0 self)
 Add to MetaCart
(Show Context)
Null hypothesis significance testing (NHST) is arguably the mosl widely used approach to hypothesis evaluation among behavioral and social scientists. It is also very controversial. A major concern expressed by critics is that such testing is misunderstood by many of those who use it. Several other objections to its use have also been raised. In this article the author reviews and comments on the claimed misunderstandings as well as on other criticisms of the approach, and he notes arguments that have been advanced in support of NHST. Alternatives and supplements to NHST are considered, as are several related recommendations regarding the interpretation of experimental data. The concluding opinion is that NHST is easily misunderstood and misused but that when applied with good judgment it can be an effective aid to the interpretation of experimental data. Null hypothesis statistical testing (NHST1) is arguably the most widely used method of analysis of data collected in psychological experiments and has been so for about 70 years. One might think that a method that had been embraced by an entire research community would be well understood and noncontroversial after many decades of constant use. However, NHST is very controversial.2 Criticism of the method, which essentially began with the introduction of the technique (Pearce, 1992), has waxed and waned over the years; it has been intense in the recent past. Apparently, controversy regarding the idea of NHST more generally extends back more than two and a half
What future quantitative social science research could look like: Confidence intervals for effect sizes
 Educational Researcher
, 2002
"... presents a selfcanceling mixedmessage. To present an “encouragement ” in the context of strict absolute standards regarding the esoterics of author note placement, pagination, and margins is to send the message, “these myriad requirements count, this encouragement doesn’t.” ..."
Abstract

Cited by 94 (2 self)
 Add to MetaCart
presents a selfcanceling mixedmessage. To present an “encouragement ” in the context of strict absolute standards regarding the esoterics of author note placement, pagination, and margins is to send the message, “these myriad requirements count, this encouragement doesn’t.”