Results 1  10
of
24
A System for Induction of Oblique Decision Trees
 Journal of Artificial Intelligence Research
, 1994
"... This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hillclimbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned espe ..."
Abstract

Cited by 251 (13 self)
 Add to MetaCart
This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hillclimbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axisparallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees. 1. Introduction Current data collection technology provides a unique challenge and opportunity for automated machine learning techniques. The advent of major scientific projects such as the Human Genome Project, the Hubble Space Telescope, and the human brain mappi...
Induction of Selective Bayesian Classifiers
 CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1994
"... In this paper, we examine previous work on the naive Bayesian classifier and review its limitations, which include a sensitivity to correlated features. We respond to this problem by embedding the naive Bayesian induction scheme within an algorithm that carries out a greedy search through the space ..."
Abstract

Cited by 208 (7 self)
 Add to MetaCart
In this paper, we examine previous work on the naive Bayesian classifier and review its limitations, which include a sensitivity to correlated features. We respond to this problem by embedding the naive Bayesian induction scheme within an algorithm that carries out a greedy search through the space of features. We hypothesize that this approach will improve asymptotic accuracy in domains that involve correlated features without reducing the rate of learning in ones that do not. We report experimental results on six natural domains, including comparisons with decisiontree induction, that support these hypotheses. In closing, we discuss other approaches to extending naive Bayesian classifiers and outline some directions for future research.
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirtythree Old and New Classification Algorithms
, 2000
"... . Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both cr ..."
Abstract

Cited by 167 (7 self)
 Add to MetaCart
. Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, splinebased, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although splinebased statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...
Multivariate Decision Trees
, 1992
"... Multivariate decision trees overcome a representational limitation of univariate decision trees: univariate decision trees are restricted to splits of the instance space that are orthogonal to the feature's axis. This paper discusses the following issues for constructing multivariate decision trees: ..."
Abstract

Cited by 119 (6 self)
 Add to MetaCart
Multivariate decision trees overcome a representational limitation of univariate decision trees: univariate decision trees are restricted to splits of the instance space that are orthogonal to the feature's axis. This paper discusses the following issues for constructing multivariate decision trees: representing a multivariate test, including symbolic and numeric features, learning the coefficients of a multivariate test, selecting the features to include in a test, and pruning of multivariate decision trees. We present some new and review some wellknown methods for forming multivariate decision trees. The methods are compared across a variety of learning tasks to assess each method's ability to find concise, accurate decision trees. The results demonstrate that some multivariate methods are more effective than others. In addition, the experiments confirm that allowing multivariate tests improves the accuracy of the resulting decision tree over univariate trees. Contents 1 Introduc...
RainForest  a Framework for Fast Decision Tree Construction of Large Datasets
 In VLDB
, 1998
"... Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework fo ..."
Abstract

Cited by 95 (9 self)
 Add to MetaCart
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework for decision tree classifiers that separates the scalability aspects of algorithms for constructing a decision tree from the central features that determine the quality of the tree. This generic algorithm is easy to instantiate with specific algorithms from the literature (including C4.5, CART,
Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection
, 1993
"... The results of empirical comparisons of existing learning algorithms illustrate that each algorithm has a selective superiority; it is best for some but not all tasks. Given a data set, it is often not clear beforehand which algorithm will yield the best performance. In such cases one must search th ..."
Abstract

Cited by 63 (2 self)
 Add to MetaCart
The results of empirical comparisons of existing learning algorithms illustrate that each algorithm has a selective superiority; it is best for some but not all tasks. Given a data set, it is often not clear beforehand which algorithm will yield the best performance. In such cases one must search the space of available algorithms to find the one that produces the best classifier. In this paper we present an approach that applies knowledge about the representational biases of a set of learning algorithms to conduct this search automatically. In addition, the approach permits the available algorithms' model classes to be mixed in a recursive treestructured hybrid. We describe an implementation of the approach, MCS, that performs a heuristic bestfirst search for the best hybrid classifier for a set of data. An empirical comparison of MCS to each of its primitive learning algorithms, and to the computationally intensive method of crossvalidation, illustrates that automatic selection of l...
Constructing XofN Attributes for Decision Tree Learning
 Machine Learning
, 1998
"... . While many constructive induction algorithms focus on generating new binary attributes, this paper explores novel methods of constructing nominal and numeric attributes. We propose a new constructive operator, XofN. An XofN representation is a set containing one or more attributevalue pairs. ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
. While many constructive induction algorithms focus on generating new binary attributes, this paper explores novel methods of constructing nominal and numeric attributes. We propose a new constructive operator, XofN. An XofN representation is a set containing one or more attributevalue pairs. For a given instance, the value of an XofN representation corresponds to the number of its attributevalue pairs that are true of the instance. A single XofN representation can directly and simply represent any concept that can be represented by a single conjunctive, a single disjunctive, or a single MofN representation commonly used for constructive induction, and the reverse is not true. In this paper, we describe a constructive decision tree learning algorithm, called XofN. When building decision trees, this algorithm creates one XofN representation, either as a nominal attribute or as a numeric attribute, at each decision node. The construction of XofN representations is carrie...
Constructing Nominal XofN Attributes
 Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence
, 1995
"... Most constructive induction researchers focus only on new boolean attributes. This paper reports a new constructive induction algorithm, called XofN, that constructs new nominal attributes in the form of XofN representations. An XofN is a set containing one or more attributevalue pairs. For a g ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
Most constructive induction researchers focus only on new boolean attributes. This paper reports a new constructive induction algorithm, called XofN, that constructs new nominal attributes in the form of XofN representations. An XofN is a set containing one or more attributevalue pairs. For a given instance, its value corresponds to the number of its attributevalue pairs that are true. The promising preliminary experimental results, on both artificial and realworld domains, show that constructing new nominal attributes in the form of XofN representations can significantly improve the performance of selective induction in terms of both higher prediction accuracy and lower theory complexity. 1 Introduction A wellknown elementary limitation of selective induction algorithms is that when tasksupplied attributes are not adequate for describing hypotheses, their performance in terms of prediction accuracy and/or theory complexity is poor. To overcome this limitation, constructiv...
An Empirical Comparison of Decision Trees and Other Classification Methods
, 1998
"... Twenty two decision tree, nine statistical, and two neural network classifiers are compared on thirtytwo datasets in terms of classification error rate, computational time, and (in the case of trees) number of terminal nodes. It is found that the average error rates for a majority of the classifiers ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Twenty two decision tree, nine statistical, and two neural network classifiers are compared on thirtytwo datasets in terms of classification error rate, computational time, and (in the case of trees) number of terminal nodes. It is found that the average error rates for a majority of the classifiers are not statistically significant but the computational times of the classifiers differ over a wide range. The statistical POLYCLASS classifier based on a logistic regression spline algorithm has the lowest average error rate. However, it is also one of the most computationally intensive. The classifier based on standard polytomous logistic regression and a decision tree classifier using the QUEST algorithm with linear splits have the second lowest average error rates and are about 50 times faster than POLYCLASS. Among decision tree classifiers with univariate splits, the classifiers based on the C4.5, INDCART, and QUEST algorithms have the best combination of error rate and speed, althoug...
Learning NonLinearly Separable Boolean Functions With Linear Threshold Unit Trees and MadalineStyle Networks
 In AAAI93 [2
"... This paper investigates an algorithm for the construction of decisions trees comprised of linear threshold units and also presents a novel algorithm for the learning of nonlinearly separable boolean functions using Madalinestyle networks which are isomorphic to decision trees. The construction of su ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
This paper investigates an algorithm for the construction of decisions trees comprised of linear threshold units and also presents a novel algorithm for the learning of nonlinearly separable boolean functions using Madalinestyle networks which are isomorphic to decision trees. The construction of such networks is discussed, and their performance in learning is compared with standard BackPropagation on a sample problem in which many irrelevant attributes are introduced. Littlestone's Winnow algorithm is also explored within this architecture as a means of learning in the presence of many irrelevant attributes. The learning ability of this Madalinestyle architecture on nonoptimal (larger than necessary) networks is also explored. Introduction We initially examine a nonincremental algorithm that learns binary classification tasks by producing decision trees of linear threshold units (LTU trees). This decision tree bears some similarity to the decision trees produced by ID3 (Quinlan 19...