Results 1 - 10
of
26
From data mining to knowledge discovery in databases
- AI Magazine
, 1996
"... ■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases ..."
Abstract
-
Cited by 215 (0 self)
- Add to MetaCart
■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field. Across a wide variety of fields, data are
Knowledge Discovery and Data Mining: Towards a Unifying Framework
, 1996
"... This paper presents a first step towards a unifying framework for Knowledge Discovery in Databases. We describe links between data mining, knowledge discovery, and other related fields. We then define the KDD process and basic data mining algorithms, discuss application issues and conclude with an a ..."
Abstract
-
Cited by 108 (0 self)
- Add to MetaCart
This paper presents a first step towards a unifying framework for Knowledge Discovery in Databases. We describe links between data mining, knowledge discovery, and other related fields. We then define the KDD process and basic data mining algorithms, discuss application issues and conclude with an analysis of challenges facing practitioners in the field. keywords: Knowledge Discovery in Databases (KDD), Data mining, overview article, large databases, automated analysis, issues and challenges in data mining. To appear: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, August 2-4, 1996, AAAI Press. http://wwwaig. jpl.nasa.gov/kdd96 Knowledge Discovery and Data Mining: Towards a Unifying Framework Usama Fayyad Microsoft Research One Microsoft Way Redmond, WA 98052, USA fayyad@microsoft.com Gregory Piatetsky-Shapiro GTE Laboratories, MS 44 Waltham, MA 02154, USA gps@gte.com Padhraic Smyth Information and Computer S...
The role of Occam’s Razor in knowledge discovery
- Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility trade-off.
Bayesian Classification Theory
, 1991
"... The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework, and using various mathematical and algorithmic approximations, the AutoClass system searches for the ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework, and using various mathematical and algorithmic approximations, the AutoClass system searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, have discovered new independently-verified phenomena, and have been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit, or share, model parameters though a class hierarchy. In this paper we summarize the mathematical foundations of Autoclass. 1 Introduction The task of supervised classification - i.e., learning to predict class memberships of test cases given labeled training cases - is a familia...
Compression-Based Discretization of Continuous Attributes
- Proceedings of the 12th International Conference on Machine Learning
, 1995
"... Discretization of continuous attributes into ordered discrete attributes can be beneficial even for propositional induction algorithms that are capable of handling continuous attributes directly. Benefits include possibly large improvements in induction time, smaller sizes of induced trees or rule s ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Discretization of continuous attributes into ordered discrete attributes can be beneficial even for propositional induction algorithms that are capable of handling continuous attributes directly. Benefits include possibly large improvements in induction time, smaller sizes of induced trees or rule sets, and even improved predictive accuracy. We define a global evaluation measure for discretizations based on the so-called Minimum Description Length (MDL) principle from information theory. Furthermore we describe the efficient algorithmic usage of this measure in the MDL-Disc algorithm. The new method solves some problems of alternative local measures used for discretization. Empirical results in a few natural domains and extensive experiments in an artificial domain show that MDL-Disc scales up well to large learning problems involving noise. 1 Financial support for the Austrian Research Institute for Artificial Intelligence is provided by the Austrian Federal Ministry of Science and...
Why Does Bagging Work? A Bayesian Account and its Implications
- In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining
, 1997
"... The error rate of decision-tree and other classification learners can often be much reduced by bagging: learning multiple models from bootstrap samples of the database, and combining them by uniform voting. In this paper we empirically test two alternative explanations for this, both based on Bayes ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
The error rate of decision-tree and other classification learners can often be much reduced by bagging: learning multiple models from bootstrap samples of the database, and combining them by uniform voting. In this paper we empirically test two alternative explanations for this, both based on Bayesian learning theory: (1) bagging works because it is an approximation to the optimal procedure of Bayesian model averaging, with an appropriate implicit prior; (2) bagging works because it effectively shifts the prior to a more appropriate region of model space. All the experimental evidence contradicts the first hypothesis, and confirms the second. Bagging Bagging (Breiman 1996a) is a simple and effective way to reduce the error rate of many classification learning algorithms. For example, in the empirical study described below, it reduces the error of a decision-tree learner in 19 of 26 databases, by 4% on average. In the bagging procedure, given a training set of size s, a "bootstrap" re...
Occam's Two Razors: The Sharp and the Blunt
- In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining
, 1998
"... Occam's razor has been the subject of much controversy. This paper argues that this is partly because it has been interpreted in two quite different ways, the first of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. The p ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Occam's razor has been the subject of much controversy. This paper argues that this is partly because it has been interpreted in two quite different ways, the first of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. The paper reviews the large variety of theoretical arguments and empirical evidence for and against the "second razor," and concludes that the balance is strongly against it. In particular, it builds on the case of (Schaffer, 1993) and (Webb, 1996) by considering additional theoretical arguments and recent empirical evidence that the second razor fails in most domains. A version of the first razor more appropriate to KDD is proposed, and we argue that continuing to apply the second razor risks causing significant opportunities to be missed. 1 Occam's Two Razors William of Occam's famous razor states that "Nunquam ponenda est pluralitas sin necesitate," which, approximately translated, means "En...
Doppelgänger Goes To School: Machine Learning for User Modeling
, 1993
"... One characteristic of intelligence is adaptation. Computers should adapt to who is using them, how, why, when and where. The computer's representation of the user is called a user model; user modeling is concerned with developing techniques for representing the user and acting upon this information. ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
One characteristic of intelligence is adaptation. Computers should adapt to who is using them, how, why, when and where. The computer's representation of the user is called a user model; user modeling is concerned with developing techniques for representing the user and acting upon this information. The Doppelg anger system consists of a set of techniques for gathering, maintaining, and acting upon information about individuals, and illustrates my approach to user modeling. Work on Doppelg anger has been heavily influenced by the field of machine learning. This thesis has a twofold purpose: first, to set forth guidelines for the integration of machine learning techniques into user modeling, and second, to identify particular user modeling tasks for which machine learning is useful.
A Process-Oriented Heuristic for Model Selection
, 1998
"... Current methods to avoid overfitting are either data-oriented (using separate data for validation) or representation-oriented (penalizing complexity in the model). This paper proposes process-oriented evaluation, where a model's expected generalization error is computed as a function of the search p ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Current methods to avoid overfitting are either data-oriented (using separate data for validation) or representation-oriented (penalizing complexity in the model). This paper proposes process-oriented evaluation, where a model's expected generalization error is computed as a function of the search process that led to it. The paper develops the necessary theoretical framework, and applies it to one type of learning: rule induction. A process-oriented version of the CN2 rule learner is empirically compared with the default CN2. The process-oriented version is more accurate in a large majority of the datasets, with high significance, and also produces simpler models. Experiments in artificial domains suggest that processoriented evaluation is particularly useful in high-dimensional domains. 1 INTRODUCTION Overfitting avoidance is often considered the central problem of machine learning (e.g., (Cheeseman & Oldford, 1994)). If a learner is sufficiently powerful, it must guard against selec...
Computing, Cognition and Information Compression
- AI Communications
, 1993
"... This article develops the idea that the storage and processing of information in computers and in brains may often be understood as information compression. The article first reviews what is meant by information and, in particular, what is meant by redundancy, a concept which is fundamental in all ..."
Abstract
-
Cited by 15 (12 self)
- Add to MetaCart
This article develops the idea that the storage and processing of information in computers and in brains may often be understood as information compression. The article first reviews what is meant by information and, in particular, what is meant by redundancy, a concept which is fundamental in all methods for information compression. Principles of information compression are described. The major part of the article describes how these principles may be seen in a range of observations and ideas in computing and cognition: the phenomena of adaptation and inhibition in nervous systems; `neural' computing; the creation and recognition of `objects' and `classes' in perception and cognition; stereoscopic vision and random-dot stereograms; the organisation of natural languages; the organisation of grammars; the organisation of functional, structured, logic and object-oriented computer programs; the application and de-referencing of identifiers in computing; retrieval of information from dat...

