Results 1 -
7 of
7
Statistical Themes and Lessons for Data Mining
, 1997
"... Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statist ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis.
Consequences of prejudice against the null hypothesis
- Psychological Bulletin
, 1975
"... The consequences of prejudice against accepting the null hypothesis were examined through (a) a mathematical model intended to stimulate the research-publication process and (b) case studies of apparent erroneous rejec-tions of the null hypothesis in published psychological research. The input param ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
The consequences of prejudice against accepting the null hypothesis were examined through (a) a mathematical model intended to stimulate the research-publication process and (b) case studies of apparent erroneous rejec-tions of the null hypothesis in published psychological research. The input parameters for the model characterize investigators ' probabilities of selecting a problem for which the null hypothesis is true, of reporting, following up on, or abandoning research when data do or do not reject the null hypothesis, and they characterize editors ' probabilities of publishing manuscripts concluding in favor of or against the null hypothesis. With estimates of the input parameters based on a questionnaire survey of a sample of social psychologists, the model output indicates a dysfunctional research-publication system. Particularly, the model indicates that there may be relatively few publications on problems for which the null hypothesis is (at least to a reasonable approximation) true, and of these, a high proportion will erroneously reject the null hypothesis. The case studies provide additional support for this conclusion. Accordingly, it is
Under What Conditions Does Theory Obstruct Research Progress?
- PSYCHOLOGICAL REVIEW
, 1986
"... ..."
Data Mining At The Interface Of Computer Science And Statistics
, 2001
"... This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, i ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, involving the application of a variety of techniques from both computer science and statistics. The chapter discusses how computer scientists and statisticians approach data from different but complementary viewpoints and highlights the fundamental differences between statistical and computational views of data mining. In doing so we review the historical importance of statistical contributions to machine learning and data mining, including neural networks, graphical models, and flexible predictive modeling. The primary conclusion is that closer integration of computational methods with statistical thinking is likely to become increasingly important in data mining applications. Keywords: Data mining, statistics, pattern recognition, transaction data, correlation. 1.
Effect sizes and p values: What should be reported . . . ?
, 1996
"... Despite publication of many well-argued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics' catalogs of NHT's flaws, this article also takes the unusual stance of identifying virtues that ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Despite publication of many well-argued critiques of null hypothesis testing (NHT), behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics' catalogs of NHT's flaws, this article also takes the unusual stance of identifying virtues that may explain why NHT continues to be so extensively used. These virtues include providing results in the form of a dichotomous (yes/no) hypothesis evaluation and providing an index (p value) that has a justifiable mapping onto confidence in repeatability of a null hypothesis rejection. The most-criticized flaws of NHT can be avoided when the importance of a hypothesis, rather than the p value of its test, is used to determine that a finding is worthy of report, and when p = .05 is treated as insufficient basis for confidence in the replicability of an isolated non-null finding. Together with many recent critics of NHT, we also urge reporting of important hypothesis tests in enough descriptive detail to permit secondary uses such as meta-analysis.
Data Mining: Data Analysis on a Grand Scale?
- Statistical Methods in Medical Research
, 2000
"... Modern data mining has evolved largely as a result of efforts by computer scientists to address the needs of "data owners" in extracting useful information from massive observational data sets. Because of this historical context, data mining to date has largely focused on computational and algori ..."
Abstract
- Add to MetaCart
Modern data mining has evolved largely as a result of efforts by computer scientists to address the needs of "data owners" in extracting useful information from massive observational data sets. Because of this historical context, data mining to date has largely focused on computational and algorithmic issues rather than the more traditional statistical aspects of data analysis. This paper provides a brief review of the origins of data mining as well as discussing some of the primary themes in current research in data mining, including scalable algorithms for massive data sets, discovering novel patterns in data, and analysis of text, Web, and related multi-media data sets. 1 Introduction The phrase "data mining" has had a varied history within the past 30 to 40 years. In the 1960's, as digital computers were beginning to be applied to data analysis problems, it was noticed that if one searched long enough (using the computer) that one could always find some relatively complex ...
Psychologlcal Bulletin
"... this report was facilitated by grants from National Science Foundation (GS-3050) and U.'S. Public Health Service (MH-20527-02). Although they should not be held responsible for positions espoused herein, I am very grateful to the following for providing comments on earlier drafts: Marl R. Jones, Pau ..."
Abstract
- Add to MetaCart
this report was facilitated by grants from National Science Foundation (GS-3050) and U.'S. Public Health Service (MH-20527-02). Although they should not be held responsible for positions espoused herein, I am very grateful to the following for providing comments on earlier drafts: Marl R. Jones, Paul Isaac, David Bakan, Timothy C. Brock, Bibb Latan6, Thomas M. Ostrom, Hanan C. Selvin, Martin Fishbein, Zick Rubin, and Richard A. Zeller

