## Multivariate versus Univariate Decision Trees (1992)

Citations: | 31 - 3 self |

### BibTeX

@TECHREPORT{Brodley92multivariateversus,

author = {Carla E. Brodley and Paul E. Utgoff},

title = {Multivariate versus Univariate Decision Trees},

institution = {},

year = {1992}

}

### OpenURL

### Abstract

In this paper we present a new multivariate decision tree algorithm LMDT, which combines linear machines with decision trees. LMDT constructs each test in a decision tree by training a linear machine and then eliminating irrelevant and noisy variables in a controlled manner. To examine LMDT's ability to find good generalizations we present results for a variety of domains. We compare LMDT empirically to a univariate decision tree algorithm and observe that when multivariate tests are the appropriate bias for a given data set, LMDT finds small accurate trees. 1 Introduction One commonly used approach for learning from examples is to induce a univariate decision tree (Hunt, Marin & Stone, 1966; Breiman, Friedman, Olshen & Stone, 1984; Quinlan, 1986). Each test in a univariate tree is based on one of the input variables and therefore, is restricted to representing a split through the instance space that is orthogonal to the variable's axis. Such a bias may be inappropriate for problems...

### Citations

4315 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...me required to search for a set of features by a factor of n; instead of comparing n linear machines, where n is the number of features in the linear machine, LMDT compares two linear machines. CART (=-=Breiman, et al. 1984-=-) and PT2 (Utgoff & Brodley, 1990) both perform a SBS search for the best set of features to use as a test in the decision tree. CART searches at each node for the linear discriminant that maximizes t... |

4120 | Pattern Classification and Scene Analysis - DUDA, HART - 1973 |

3591 | Induction of Decision Trees
- QUINLAN
- 1986
(Show Context)
Citation Context ...small accurate trees. 1 Introduction One commonly used approach for learning from examples is to induce a univariate decision tree (Hunt, Marin & Stone, 1966; Breiman, Friedman, Olshen & Stone, 1984; =-=Quinlan, 1986-=-). Each test in a univariate tree is based on one of the input variables and therefore, is restricted to representing a split through the instance space that is orthogonal to the variable's axis. Such... |

352 | Universal Codeword Sets and Representations of the Integers - Elias - 1975 |

324 |
Stochastic Complexity in Statistical inquiry
- Rissanen
- 1989
(Show Context)
Citation Context ...ber of nodes or the number of leaves; the size of an LMDT node can be of greater complexity than a C4.5 node. To compare the size of the trees, we use the Minimum Description Length Principle (MDLP) (=-=Rissanen, 1989), which s-=-tates that the best "hypothesis" to induce from a data set is the one that minimizes the length of the hypothesis plus the length of the data when coded using the hypothesis to predict the d... |

155 | Learning machines
- Nilsson
- 1965
(Show Context)
Citation Context ...approach. For each decision node in the tree, LMDT trains a linear machine, based on a subset of the input variables, which then serves as a multivariate test for the decision node. A linear machine (=-=Nilsson, 1965-=-; Duda & Hart, 1973) is a multiclass linear discriminant, which itself classifies an instance. The class name is the result of the linear machine test with one branch for each possible class at the no... |

127 | Experiments in Induction - Hunt, Marin, et al. - 1966 |

80 | Letter recognition using holland-style adaptive classifiers - Frey, Slate - 1991 |

75 |
Feature selection and extraction
- Kittler
- 1985
(Show Context)
Citation Context ...een the weights of each pair of classes and then eliminates the variable that has the smallest dispersion. This measure is analogous to the Euclidean interclass distance measure for estimating error (=-=Kittler, 1986-=-). A thermal linear machine has converged when the magnitude of each correction to the linear machine is larger than the amount permitted by the thermal training rule for each instance in the training... |

59 |
Perceptron trees: A case study in hybrid concept representations
- Utgoff
- 1989
(Show Context)
Citation Context ... a test from an inappropriate part of the hypothesis space. A solution to this problem would be to determine the appropriate bias dynamically for each test in the tree. The perceptron tree algorithm (=-=Utgoff, 1989-=-) is one example of a system that tries to determine the appropriate representational bias for the instances automatically. Specifically, the algorithm first tries to fit a linear threshold unit(LTU) ... |

53 |
International application of a new probability algorithm for the diagnosis of coronary artery disease
- Detrano, Janosi, et al.
- 1989
(Show Context)
Citation Context ... 16 N no Pixel Seg. 7 3210 19 N no Votes 2 435 16 B yes be found in Table 2. The Clevland data set consists of 303 patient diagnoses (presence or absence of heart-disease) described by 13 attributes (=-=Detrano, et al. 1989-=-). The Glass domain involves identifying glass samples taken from the scene of an accident as one of six classes. The Iris data set, Fisher's classic data set, contains 50 examples of three different ... |

47 | Enumerative source coding - Cover - 1973 |

37 |
Optimal linear discriminants
- Gallant
- 1986
(Show Context)
Citation Context ...raining procedure for finding weights can be computationally prohibitive because if one is using the absolute error correction rule without thermal training, in conjunction with the Pocket Algorithm (=-=Gallant, 1986-=-), it is uncertain how long it will take to find the optimal weight vector or even a good weight vector. 3 An Empirical Comparison of LMDT to C4.5 To examine LMDT's ability to find a good generalizati... |

36 |
Decision Trees as Probabilistic Classifiers
- Quinlan
- 1987
(Show Context)
Citation Context ... of a multivariate tree (and LMDT's search bias for finding such a tree) is more appropriate than the bias of a univariate decision tree we compare LMDT to a univariate decision tree algorithm, C4.5 (=-=Quinlan, 1987-=-), across these tasks. The results of this comparison show that each approach has a selective superiority; for some of the tasks LMDT finds significantly more accurate trees than C4.5 and for others t... |

30 | An Incremental Method for Finding Multivariate Splits for Decision Trees - Utgoff, Brodley - 1990 |

16 | nets and short paths: Optimising Neural Computation - Frean - 1990 |

11 | Pattern classification by iteratively determined linear and piecewise linear discriminant functions - DIJDA, FOSSUM - 1966 |