Results 1 - 10
of
16
Induction of Decision Trees
- Mach. Learn
, 1986
"... systems Abstract. The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describ ..."
Abstract
-
Cited by 2888 (3 self)
- Add to MetaCart
systems Abstract. The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions. 1.
Learning logical definitions from relations
- MACHINE LEARNING
, 1990
"... Abstract. This paper describes FOIL, a system that learns Horn clauses from data expressed as relations. FOIL is based on ideas that have proved effective in attribute-value learning systems, but extends them to a first-order formalism. This new system has been applied successfully to several tasks ..."
Abstract
-
Cited by 784 (9 self)
- Add to MetaCart
Abstract. This paper describes FOIL, a system that learns Horn clauses from data expressed as relations. FOIL is based on ideas that have proved effective in attribute-value learning systems, but extends them to a first-order formalism. This new system has been applied successfully to several tasks taken from the machine learning literature.
Multiple Comparisons in Induction Algorithms
- Machine Learning
, 1998
"... Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 01003-4610 413-545-3613 A single ..."
Abstract
-
Cited by 67 (9 self)
- Add to MetaCart
Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 01003-4610 413-545-3613 A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a ( ). We analyze the statistical properties of and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation. Inductive learning, overfitting, oversearching, attribute selection, hypothesis testing, parameter estimation Multiple Com...
General and Efficient Multisplitting of Numerical Attributes
, 1999
"... . Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the well-behavedness of an evaluation function, ..."
Abstract
-
Cited by 31 (7 self)
- Add to MetaCart
. Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the well-behavedness of an evaluation function, a property that guarantees the optimal multi-partition of an arbitrary numerical domain to be defined on boundary points. Well-behavedness reduces the number of candidate cut points that need to be examined in multisplitting numerical attributes. Many commonly used attribute evaluation functions possess this property; we demonstrate that the cumulative functions Information Gain and Training Set Error as well as the non-cumulative functions Gain Ratio and Normalized Distance Measure are all well-behaved. We also devise a method of finding optimal multisplits efficiently by examining the minimum number of boundary point combinations that is required to produce partitions which are optimal wit...
Feature Selection from Huge Feature Sets
, 2001
"... The number of features that can be computed over an image is, for practical purposes, limitless. Unfortunately, the number of features that can be computed and exploited by most computer vision systems is considerably less. As a result, it is important to develop techniques for selecting features fr ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The number of features that can be computed over an image is, for practical purposes, limitless. Unfortunately, the number of features that can be computed and exploited by most computer vision systems is considerably less. As a result, it is important to develop techniques for selecting features from very large data sets that include many irrelevant or redundant features. This work addresses the feature selection problem by proposing a three-step algorithm. The first step uses a variation of the well known Relief algorithm [11] to remove irrelevance; the second step clusters features using K-means to remove redundancy; and the third step is a standard combinatorial feature selection algorithm. This three-step combination is shown to be more effective than standard feature selection algorithms for large data sets with lots of irrelevant and redundant features. It is also shown to be no worse than standard techniques for data sets that do not have these properties. Finally, we show a third experiment in which a data set with 4096 features is reduced to 5% of its original size with very little information loss. 1.
Constructive Induction by Incremental Concept Formation
, 1990
"... This paper describes a framework that generates constructive induction schemes for the concept formation system COBWEB. The basis of this framework---context-dependent bias of multi-valued properties---provides a way for allowing COBWEB to deal with continuous and hierarchical property types as a sp ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
This paper describes a framework that generates constructive induction schemes for the concept formation system COBWEB. The basis of this framework---context-dependent bias of multi-valued properties---provides a way for allowing COBWEB to deal with continuous and hierarchical property types as a special case of constructive induction. The constructive induction scheme does not introduce learning bias and does not require major modification to the original concept-formation mechanisms. Bridger, a system that partially implements the constructive induction, as well as others extensions, is one of the first incremental concept formation programs with a general constructive induction ability. 1. INTRODUCTION TO CONCEPT FORMATION AND CONSTRUCTIVE INDUCTION Concept formation is a fundamental activity. It structures observations (also called examples) into a concise form of knowledge that can be efficiently used in the future. Two such uses include the classification of unseen observations ...
Constructing New Attributes for Decision Tree Learning
, 1996
"... A well-known fundamental limitation of selective induction algorithms is that when tasksupplied attributes are not adequate for, or directly relevant to, describing hypotheses, their performance in terms of prediction accuracy and/or theory complexity is poor. One solution to this problem is constru ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
A well-known fundamental limitation of selective induction algorithms is that when tasksupplied attributes are not adequate for, or directly relevant to, describing hypotheses, their performance in terms of prediction accuracy and/or theory complexity is poor. One solution to this problem is constructive induction. It constructs, by using task-supplied attributes, new attributes that are expected to be more appropriate than the task-supplied attributes for describing the target concepts. This thesis focuses on constructive induction with decision trees as the theory description language. It explores: (1) novel approaches to constructing new binary attributes using existing constructive operators, and (2) novel methods of constructing new nominal and new continuous-valued attributes based on a newly proposed constructive operator. The thesis investigates a fixed rule-based approach to constructing new binary attributes for decision tree learning. It generates conjunctions from producti...
Top-Down Induction of Decision Trees Classifiers -- A Survey
, 2002
"... Decision Trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining considered the issue of growing a decision tree from available data. This paper present ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Decision Trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining considered the issue of growing a decision tree from available data. This paper presents an updated survey of current methods for constructing decision tree classifiers in top-down manner. The paper suggests a unified algorithmic framework for presenting these algorithms and provides profound descriptions of the various splitting criteria and pruning methodology.
Treatment Learning: Implementation and Application
, 2003
"... Data mining and machine learning focus on inducing previously unknown, potentially useful, and ultimately understandable information from data. In this master’s thesis, we propose a new learning approach called treatment learning. Treatment learning aims at mining a small number of control variables ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Data mining and machine learning focus on inducing previously unknown, potentially useful, and ultimately understandable information from data. In this master’s thesis, we propose a new learning approach called treatment learning. Treatment learning aims at mining a small number of control variables in a large option space that can lead to better system behavior. It addresses two central issues in data mining: (1) the understandability of learnt theories; (2) how can the learnt theories benefit decision making. We design and implement a novel mining algorithm and deliver two treatment learners that are freely downloadable from an online distribution. We describe the implementation details of both learners and compare them through algorithmic performance analysis. We conduct extensive data experiments and case studies to demonstrate the effectiveness of using treatment learner to seek a small number of control variables that constrain the option space to a tight, near-optimal convergence. We compare treatment learning with other learning schemes in the frame-work of feature subset selection for supervised classification. Our treatment learner selects smaller feature subsets than most other methods with minimal or no loss in classification accuracy. Treatment learner has been successfully applied to various research domains through a collaboration with other re-searchers. By presenting four examples, we show the general paradigms of using it for decision making.
On the Well-Behavedness of Important Attribute Evaluation Functions
- In G. Grahne (Ed.), Proceedings of the Sixth Scandinavian Conference on Artificial Intelligence (pp. 95--106). Frontiers in Artificial Intelligence and Applications (Vol
, 1997
"... The class of well-behaved evaluation functions simplifies and makes efficient the handling of numerical attributes; for them it suffices to concentrate on the boundary points in searching for the optimal partition. This holds always for binary partitions and also for multisplits if only the function ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The class of well-behaved evaluation functions simplifies and makes efficient the handling of numerical attributes; for them it suffices to concentrate on the boundary points in searching for the optimal partition. This holds always for binary partitions and also for multisplits if only the function is cumulative in addition to being well-behaved. The class of well-behaved evaluation functions is a proper superclass of convex evaluation functions. Thus, a large proportion of the most important attribute evaluation functions are well-behaved. This paper explores the extent and boundaries of well-behaved functions. In particular, we examine C4.5's default attribute evaluation function gain ratio, which has been known to have problems with numerical attributes. We show that gain ratio is not convex, but is still well-behaved with respect to binary partitioning. However, it cannot handle higher arity partitioning well. Our empirical experiments show that a very simple cumulative rectifi...

