Results 1 
7 of
7
Multiple Comparisons in Induction Algorithms
 Machine Learning
, 1998
"... Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 010034610 4135453613 A single ..."
Abstract

Cited by 82 (10 self)
 Add to MetaCart
(Show Context)
Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 010034610 4135453613 A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a ( ). We analyze the statistical properties of and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and crossvalidation. Inductive learning, overfitting, oversearching, attribute selection, hypothesis testing, parameter estimation Multiple Com...
Data Mining At The Interface Of Computer Science And Statistics
, 2001
"... This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, i ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, involving the application of a variety of techniques from both computer science and statistics. The chapter discusses how computer scientists and statisticians approach data from different but complementary viewpoints and highlights the fundamental differences between statistical and computational views of data mining. In doing so we review the historical importance of statistical contributions to machine learning and data mining, including neural networks, graphical models, and flexible predictive modeling. The primary conclusion is that closer integration of computational methods with statistical thinking is likely to become increasingly important in data mining applications. Keywords: Data mining, statistics, pattern recognition, transaction data, correlation. 1.
Illusions in regression analysis
 International Journal of Forecasting (forthcoming
, 2012
"... Illusions in Regression Analysis Soyer and Hogarth’s article, “The Illusion of Predictability, ” shows that diagnostic statistics that are commonly provided with regression analysis lead to confusion, reduced accuracy, and overconfidence. Even highly competent researchers are subject to these proble ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Illusions in Regression Analysis Soyer and Hogarth’s article, “The Illusion of Predictability, ” shows that diagnostic statistics that are commonly provided with regression analysis lead to confusion, reduced accuracy, and overconfidence. Even highly competent researchers are subject to these problems. This overview examines the SoyerHogarth findings in light of prior research on illusions associated with regression analysis. It also summarizes solutions that have been proposed over the past century. These solutions would enhance the value of regression analysis.
Tree Structured Data Analysis: AID, CHAID and CART. Paper presented at the 1992
 University of Ostrava
"... Classification and regression trees are becoming increasingly popular for partitioning data and identifying local structure in small and large datasets. Classification trees include those models in which the dependent variable (the predicted variable) is categorical. Regression trees include those i ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Classification and regression trees are becoming increasingly popular for partitioning data and identifying local structure in small and large datasets. Classification trees include those models in which the dependent variable (the predicted variable) is categorical. Regression trees include those in which it is continuous. This paper discusses pitfalls in the use of these methods and highlights where they are especially suitable. Paper presented at the 1992 Sun Valley, ID, Sawtooth/SYSTAT Joint Software Conference. 1 1
The Forecasting Dictionary
, 2000
"... "But ‘glory ’ doesn't mean "a nice knockdown argument, " Alice objected. "When I use a word, " Humpty Dumpty said, in a rather scornful tone, "it means just what I choose it to mean—neither more nor less." "The question is, " said Alice, "wheth ..."
Abstract
 Add to MetaCart
"But ‘glory ’ doesn't mean "a nice knockdown argument, " Alice objected. "When I use a word, " Humpty Dumpty said, in a rather scornful tone, "it means just what I choose it to mean—neither more nor less." "The question is, " said Alice, "whether you can make words mean so many different things." "The question is, " said Humpty Dumpty, "which is to be master—that's all." Through the Looking Glass
al
"... amp less information. Because the cost of processing power and storage has been falling, data have become very cheap. This has opened a new chaleasy to access the data, it becomes increasingly difficult to access the desired information. Many data are left unexplored because of the lack of sufficie ..."
Abstract
 Add to MetaCart
(Show Context)
amp less information. Because the cost of processing power and storage has been falling, data have become very cheap. This has opened a new chaleasy to access the data, it becomes increasingly difficult to access the desired information. Many data are left unexplored because of the lack of sufficiently powerful tools and techniques to turn the data into information and knowledge. The analysis of data on e.g. a business should Due to the ease of generating, collecting and storing data, we live in an expanding universe of too much data. At the same time, we are confronted with the paradox that more data means
Analyzing company growth data using genetic
, 2003
"... algorithms on binary trees ..."
(Show Context)