Results 1  10
of
10
Multiple Comparisons in Induction Algorithms
 MACHINE LEARNING
, 1998
"... A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We ..."
Abstract

Cited by 94 (10 self)
 Add to MetaCart
(Show Context)
A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a ( ). We analyze the statistical properties of and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and crossvalidation.
Illusions in regression analysis
 International Journal of Forecasting (forthcoming
, 2012
"... Illusions in Regression Analysis Soyer and Hogarth’s article, “The Illusion of Predictability, ” shows that diagnostic statistics that are commonly provided with regression analysis lead to confusion, reduced accuracy, and overconfidence. Even highly competent researchers are subject to these proble ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
Illusions in Regression Analysis Soyer and Hogarth’s article, “The Illusion of Predictability, ” shows that diagnostic statistics that are commonly provided with regression analysis lead to confusion, reduced accuracy, and overconfidence. Even highly competent researchers are subject to these problems. This overview examines the SoyerHogarth findings in light of prior research on illusions associated with regression analysis. It also summarizes solutions that have been proposed over the past century. These solutions would enhance the value of regression analysis.
Data Mining At The Interface Of Computer Science And Statistics
, 2001
"... This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, i ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, involving the application of a variety of techniques from both computer science and statistics. The chapter discusses how computer scientists and statisticians approach data from different but complementary viewpoints and highlights the fundamental differences between statistical and computational views of data mining. In doing so we review the historical importance of statistical contributions to machine learning and data mining, including neural networks, graphical models, and flexible predictive modeling. The primary conclusion is that closer integration of computational methods with statistical thinking is likely to become increasingly important in data mining applications. Keywords: Data mining, statistics, pattern recognition, transaction data, correlation. 1.
The Forecasting Dictionary
, 2000
"... "But ‘glory ’ doesn't mean "a nice knockdown argument, " Alice objected. "When I use a word, " Humpty Dumpty said, in a rather scornful tone, "it means just what I choose it to mean—neither more nor less." "The question is, " said Alice, "wheth ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
"But ‘glory ’ doesn't mean "a nice knockdown argument, " Alice objected. "When I use a word, " Humpty Dumpty said, in a rather scornful tone, "it means just what I choose it to mean—neither more nor less." "The question is, " said Alice, "whether you can make words mean so many different things." "The question is, " said Humpty Dumpty, "which is to be master—that's all." Through the Looking Glass
Tree Structured Data Analysis: AID, CHAID and CART. Paper presented at the 1992
 University of Ostrava
"... Classification and regression trees are becoming increasingly popular for partitioning data and identifying local structure in small and large datasets. Classification trees include those models in which the dependent variable (the predicted variable) is categorical. Regression trees include those i ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Classification and regression trees are becoming increasingly popular for partitioning data and identifying local structure in small and large datasets. Classification trees include those models in which the dependent variable (the predicted variable) is categorical. Regression trees include those in which it is continuous. This paper discusses pitfalls in the use of these methods and highlights where they are especially suitable. Paper presented at the 1992 Sun Valley, ID, Sawtooth/SYSTAT Joint Software Conference. 1 1
Working Paper Draft
, 2013
"... We propose the Golden Rule of Forecasting: Be Conservative when Forecasting. A conservative forecast in consistent with cumulative knowledge about the present and the past. Forecasters should incorporate all knowledge relevant to the forecasting problem and use forecasting methods that have been val ..."
Abstract
 Add to MetaCart
We propose the Golden Rule of Forecasting: Be Conservative when Forecasting. A conservative forecast in consistent with cumulative knowledge about the present and the past. Forecasters should incorporate all knowledge relevant to the forecasting problem and use forecasting methods that have been validated for the type of situation. Guidelines, in the form of a Golden Rule Checklist, are provided as an aid to following the Golden Rule. The guidelines are the product of a review of experimental evidence. In all of the prior studies the authors found, forecasts derived in conservative ways were more accurate than forecasts derived in less conservative ways under all conditions. Such forecasts also reduce the risk of large errors. Gains from conservatism are greater when the situation is uncertain and complex, and when bias is likely. Conservative procedures are simple to understand and to implement. Those who are not forecasting experts should be able to understand the guidelines well enough to be able to identify doubtful forecasts. The guidelines can help forecasters avoid temptations to violate the Golden Rule presented by access to large databases and complex statistical analyses. Given that all forecasting involves uncertainty, the Golden Rule is applicable to all forecasting.
al
"... amp less information. Because the cost of processing power and storage has been falling, data have become very cheap. This has opened a new chaleasy to access the data, it becomes increasingly difficult to access the desired information. Many data are left unexplored because of the lack of sufficie ..."
Abstract
 Add to MetaCart
(Show Context)
amp less information. Because the cost of processing power and storage has been falling, data have become very cheap. This has opened a new chaleasy to access the data, it becomes increasingly difficult to access the desired information. Many data are left unexplored because of the lack of sufficiently powerful tools and techniques to turn the data into information and knowledge. The analysis of data on e.g. a business should Due to the ease of generating, collecting and storing data, we live in an expanding universe of too much data. At the same time, we are confronted with the paradox that more data means
Analyzing company growth data using genetic
, 2003
"... algorithms on binary trees ..."
(Show Context)
Fiendishly Difficult Questions: Possible Limits and Aesthetic Pleasures of Simulation
"... Simulations facilitate the construction of increasingly complex theories, which, in turn, suffer three attendant curses. By definition, complex theories have many variables, decreasing the chances that all of them will be clear and measurable. This makes complex theories difficult to test, decreasin ..."
Abstract
 Add to MetaCart
(Show Context)
Simulations facilitate the construction of increasingly complex theories, which, in turn, suffer three attendant curses. By definition, complex theories have many variables, decreasing the chances that all of them will be clear and measurable. This makes complex theories difficult to test, decreasing the chances they will be adequately tested, and difficult to understand, and decreasing the chances of attracting or sustaining an audience. However, constructing simulations that embody complex theories provides an opportunity for new perspectives and insights, and an occasion to renew respect for the complexities of the phenomena that they simulate. These are aesthetic benefits concordant with the definition of science as art with numbers.
Instability In A Tree Approach To Regression
"... One of the major problems that a treeapproach to data analysis often encounters is the instability of treestructures. The instability issue must be dealt with before data can be interpreted by this method. Examining instability at anode of a tree provides insight into the instability of the whole ..."
Abstract
 Add to MetaCart
One of the major problems that a treeapproach to data analysis often encounters is the instability of treestructures. The instability issue must be dealt with before data can be interpreted by this method. Examining instability at anode of a tree provides insight into the instability of the whole tree, because the same theory of instability applies to all the nodes. This paper deals with. the instability issue at a single node of a tree. It is assumed that the data are from a regression model, and the factors in that model that affect the instability are examined. Squarederror loss is considered as.a criterion for treeconstruction (the "is " criterion in the CART program). The selection rate of a regressor variable at a node of a tree is used as a measure of instability. The selection rate mainly depends on: (1) regression coefficients; (2) (conditional) variancecovariance structure of the regressor variables; (3) the sample size; and (4) noise in the response variable. Simulation results are reported that show patterns of instability for several different settings of regression models. Three figures and six tables illustrate the analysis. (Contains 10 references.) (SLD) Reproductions supplied by %DRS are the best that can be made * from the original document.