Results 1  10
of
12
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach
 Data Mining and Knowledge Discovery
, 1997
"... Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in stati ..."
Abstract

Cited by 155 (0 self)
 Add to MetaCart
Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in statistically invalid conclusions. This is especially true when one is using data mining techniques to analyze very large databases, which inevitably contain some statistically unlikely data. This paper describes several phenomena that can, if ignored, invalidate an experimental comparison. These phenomena and the conclusions that follow apply not only to classification, but to computational experiments in almost any aspect of data mining. The paper also discusses why comparative analysis is more important in evaluating some types of algorithms than for others, and provides some suggestions about how to avoid the pitfalls suffered by many experimental studies.
Statistical Evaluation of Neural Network Experiments: Minimum Requirements and Current Practice
 The Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A1010
, 1994
"... This work concerns the necessity of statistical evaluation of neural network experiments. This necessity is motivated by applying fundamental notions of statistical hypotheses testing to neural network research. Minimum requirements concerning statistical evaluation are developed and the appropriate ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
This work concerns the necessity of statistical evaluation of neural network experiments. This necessity is motivated by applying fundamental notions of statistical hypotheses testing to neural network research. Minimum requirements concerning statistical evaluation are developed and the appropriate statistical techniques are introduced. Articles from two leading neural network journals are examined and critizised for the lack of statistical evaluation they contain. 1 Introduction There are only few papers that discuss the foundations of the role of experimentation in neural network research, although for the general field of artificial intelligence, recently a whole textbook has been devoted to this problem [ Cohen 95 ] . However, it has already been recognized that the quality of the neural network research practice definitely needs improvement. [ Flexer 95 ] emphasizes the fact that statistical evaluation is necessary for neural network experiments as for any other empirical scienc...
Evaluating Machine Learning Models for Engineering Problems
 Artificial Intelligence in Engineering
, 1999
"... : The use of machine learning (ML), and in particular, artificial neural networks (ANN), in engineering applications has increased dramatically over the last years. However, by and large, the development of such applications or their report lack proper evaluation. Deficient evaluation practice was o ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
: The use of machine learning (ML), and in particular, artificial neural networks (ANN), in engineering applications has increased dramatically over the last years. However, by and large, the development of such applications or their report lack proper evaluation. Deficient evaluation practice was observed in the general neural networks community and again in engineering applications through a survey we conducted of articles published in AI in Engineering and elsewhere. This deficient status hinders understanding and prevents progress. This paper goal is to remedy this situation. First, several evaluation methods are discussed with their relative qualities. Second, these qualities are illustrated by using the methods to evaluate ANN performance in two engineering problems. Third, a systematic evaluation procedure for ML is discussed. This procedure will lead to better evaluation of studies, and consequently to improved research and practice in the area of ML in engineering applications...
Fast Subsampling Performance Estimates for Classification Algorithm Selection
 Proceedings of the ECML00 Workshop on MetaLearning: Building Automatic Advice Strategies for Model Selection and Method Combination
, 2000
"... The typical data mining process is characterized by the prospective and iterative application of a variety of different data mining algorithms from an algorithm toolbox. While it would be desirable to check many different algorithms and algorithm combinations for their performance on a database, ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
The typical data mining process is characterized by the prospective and iterative application of a variety of different data mining algorithms from an algorithm toolbox. While it would be desirable to check many different algorithms and algorithm combinations for their performance on a database, it is often not feasible because of time and other resource constraints. This paper investigates the effectiveness of simple and fast subsampling strategies for algorithm selection. We show that even such simple strategies perform quite well in many cases and propose to use them as a baseline for comparison with metalearning and other advanced algorithm selection strategies. 1 Introduction With the availability of a wide range of different classification algorithms, strategies for selecting the most adequate one in a particular data mining situation become more crucial. Many characteristics of both the learning algorithm and the kind of model generated by the algorithm potentially i...
Machine Learning Techniques for Civil Engineering Problems
, 1997
"... The growing volume of information databases presents opportunities for advanced data analysis techniques from machine learning (ML) research. Practical applications of ML are very different from theoretical or empirical studies, involving organizational and human aspects, and various other constrain ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
The growing volume of information databases presents opportunities for advanced data analysis techniques from machine learning (ML) research. Practical applications of ML are very different from theoretical or empirical studies, involving organizational and human aspects, and various other constraints. Despite the importance of applied ML, little has been discussed in the general ML literature on this topic. In order to remedy this situation, we studied practical applications of ML and developed a proposal for a sevensteps process that can guide practical applications of ML in engineering. The process is illustrated by relevant applications of ML in civil engineering. This illustration shows that the potential of ML has only begun to be explored, but also cautions that in order to be successful, the application process must carefully address the issues related to the sevenstep process. 1 Introduction Over the last several decades we have witnessed an explosion in information generat...
Learning Bayesian Networks for Solving RealWorld Problems
, 1998
"... Bayesian networks, which provide a compact graphical way to express complex probabilistic relationships among several random variables, are rapidly becoming the tool of choice for dealing with uncertainty in knowledge based systems. However, approaches based on Bayesian networks have often been dism ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Bayesian networks, which provide a compact graphical way to express complex probabilistic relationships among several random variables, are rapidly becoming the tool of choice for dealing with uncertainty in knowledge based systems. However, approaches based on Bayesian networks have often been dismissed as unfit for many realworld applications since probabilistic inference is intractable for most problems of realistic size, and algorithms for learning Bayesian networks impose the unrealistic requirement of datasets being complete. In this thesis, I present practical solutions to these two problems, and demonstrate their effectiveness on several realworld problems. The solution proposed to the first problem is to learn selective Bayesian networks, i.e., ones that use only a subset of the given attributes to model a domain. The aim is to learn networks that are smaller, and henc...
Ensemble modeling or selecting the best model: Many could be better than one
, 1999
"... : In the course of data modeling, many models could be created. Much work has been done on formulating guidelines for model selection. However, by and large, these guidelines are conservative or too specific. Instead of using general guidelines, models could be selected for a particular task base ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
: In the course of data modeling, many models could be created. Much work has been done on formulating guidelines for model selection. However, by and large, these guidelines are conservative or too specific. Instead of using general guidelines, models could be selected for a particular task based on statistical tests. When selecting one model, others are discarded. Instead of losing potential sources of information, models could be combined to yield better performance. We review the basics of model selection and combination and discuss their differences. Two examples of opportunistic and principled combinations are presented. The first demonstrates that mediocre quality models could be combined to yield significantly better performance. The latter is the main contribution of the paper; it describes and illustrates a novel heuristic approach called the SG(kNN) ensemble for the generation of good quality and diverse models that can even improve excellent quality models. Key...
Machine Learning as Massive Search
, 1997
"... Machine Learning as Massive Search by Richard B. Segal Chairperson of Supervisory Committee: Associate Professor Oren Etzioni Department of Computer Science and Engineering Machine learning is the inference of general patterns from data. Machinelearning algorithms search large spaces of potential ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Machine Learning as Massive Search by Richard B. Segal Chairperson of Supervisory Committee: Associate Professor Oren Etzioni Department of Computer Science and Engineering Machine learning is the inference of general patterns from data. Machinelearning algorithms search large spaces of potential hypotheses for the hypothesis that best fits the data. Since the search space for most induction problems grows exponentially in the number of features used to describe the data, most induction algorithms use greedy search to minimize search cost. Greedy search is a polynomialtime algorithm that achieves its efficiency by exploring only a tiny fraction of all hypotheses. While greedy search has good performance, it often misses the best hypotheses. This thesis proposes massive search as an alternative to greedy search. Massive search aggressively searches as many hypotheses as possible in the time available. Since massive search explores a larger portion of the hypothesis space, it is less ...
A Stratified Methodology for Classifier and Recognizer Evaluation
"... In this companion paper, we formally introduce STRAT, a stratification centric methodology for the empirical evaluation of classification systems. The motivating criteria for STRAT's development are discussed, as well as the potential consequences of departing from some common statistical assumption ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this companion paper, we formally introduce STRAT, a stratification centric methodology for the empirical evaluation of classification systems. The motivating criteria for STRAT's development are discussed, as well as the potential consequences of departing from some common statistical assumptions made when applying more traditional methods. STRAT uses an established replicate statistical technique called balanced repeated replication, or BRR, that does not require the i.i.d. assumption needed for bootstrapping, jackknifing, or binomial techniques.
Methodological Note On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach
, 1996
"... Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in stati ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in statistically invalid conclusions. This is especially true when one is using data mining techniques to analyze very large databases, which inevitably contain some statistically unlikely data. This paper describes several phenomena that can, if ignored, invalidate an experimental comparison. These phenomena and the conclusions that follow apply not only to classification, but to computational experiments in almost any aspect of data mining. The paper also discusses why comparative analysis is more important in evaluating some types of algorithms than for others, and provides some suggestions about how to avoid the pitfalls suffered by many experimental studies.