## A System for Induction of Oblique Decision Trees (1994)

### Cached

### Download Links

- [www.cs.jhu.edu]
- [www.jair.org]
- [www.cs.cmu.edu]
- [jair.org]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Artificial Intelligence Research |

Citations: | 264 - 13 self |

### BibTeX

@ARTICLE{Murthy94asystem,

author = {Sreerama K. Murthy and Simon Kasif and Steven Salzberg},

title = {A System for Induction of Oblique Decision Trees},

journal = {Journal of Artificial Intelligence Research},

year = {1994},

volume = {2},

pages = {1--32}

}

### Years of Citing Articles

### OpenURL

### Abstract

This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees. 1. Introduction Current data collection technology provides a unique challenge and opportunity for automated machine learning techniques. The advent of major scientific projects such as the Human Genome Project, the Hubble Space Telescope, and the human brain mappi...

### Citations

5326 |
C4.5: programs for machine learning
- Quinlan
- 1993
(Show Context)
Citation Context ...ble ways. Figure 3 illustrates these upper limits for two points in two dimensions. For axis-parallel splits, there are only n \Delta d distinct possibilities, and axis-parallel methods such as C4.5 (=-=Quinlan, 1993-=-a) and CART (Breiman et al., 1984) can exhaustively search for the best split at each node. The problem of searching for the best oblique split is therefore much more difficult than that of searching ... |

4311 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...ssarily restrict oblique decision trees to numeric domains. Several researchers have studied the problem of converting symbolic (unordered) domains to numeric (ordered) domains and vice versa; e.g., (=-=Breiman et al., 1984-=-; Hampson & Volper, 1986; Utgoff & Brodley, 1990; Van de Merckt, 1992, 1993). To keep the discussion simple, however, we will assume that all attributes have numeric values. Induction of Oblique Decis... |

3854 | Optimization by simulated annealing - Kirkpatrick, Gelatt, et al. - 1983 |

3586 | Induction of decision trees
- Quinlan
- 1986
(Show Context)
Citation Context ...thms have been introduced in the last decade. Much of this work has concentrated on decision trees in which each node checks the value of a single attribute (Breiman, Friedman, Olshen, & Stone, 1984; =-=Quinlan, 1986-=-, 1993a). Quinlan initially proposed decision trees for classification in domains with symbolic-valued attributes (1986), and later extended them to numeric domains (1987). When the attributes are num... |

770 | UCI Repository of Machine Learning Databases Available at: http:// www. ics.uci.edu/âˆ¼mlearn/MLRepository.html - Murphy, Aha - 1992 |

468 | C.: Very simple classification rules perform well on most commonly used datasets
- Holte
- 1993
(Show Context)
Citation Context ..., and therefore the advantages of randomization may not be detectable. It is known that many of the commonly-used data sets from the UCI repository are easy to learn with very simple representations (=-=Holte, 1993); therefo-=-re those data sets may not be ideal for our purposes. Thus we created a number of artificial data sets that present different problems for learning, and for which we know the "correct" conce... |

386 | A practical approach to feature selection - Kira, Rendell - 1992 |

327 |
Learning efficient classification procedures and their application to chess end games
- Quinlan
- 1983
(Show Context)
Citation Context ...classification and other tasks since the 1960s (Moret, 1982; Safavin & Landgrebe, 1991). In the 1980's, Breiman et al.'s book on classification and regression trees (CART) and Quinlan 's work on ID3 (=-=Quinlan, 1983-=-, 1986) provided the foundations for what has become a large body of research on one of the central techniques of experimental machine learning. Many variants of decision tree (DT) algorithms have bee... |

314 |
Regression diagnostics identifying influential data and source of collinearity
- Belsley, Kuhn, et al.
- 1980
(Show Context)
Citation Context .... The category variable (median value of owner-occupied homes) is actually continuous, but we discretized it so that category = 1 if value ! $21000, and 2 otherwise. For other uses of this data, see (=-=Belsley, 1980-=-; Quinlan, 1993b). Diabetes diagnosis. This data catalogs the presence or absence of diabetes among Pima Indian females, 21 years or older, as a function of eight numeric-valued attributes. The origin... |

223 | Robust linear programming discrimination of two linearly inseparable sets - Bennett, Mangasarian - 1992 |

207 | Training a 3-node Neural Networks is NP- complete - Blum, Rivest - 1992 |

204 |
Boolean feature discovery and empirical learning
- Pagallo, Haussler
- 1990
(Show Context)
Citation Context ... The right side shows the partitioning that this tree creates in the attribute space. Researchers have also studied decision trees in which the test at a node uses boolean combinations of attributes (=-=Pagallo, 1990-=-; Pagallo & Haussler, 1990; Sahami, 1993) and linear combinations of attributes (see Section 2). Different methods for measuring the goodness of decision tree nodes, as well as techniques for pruning ... |

193 |
Classi cation and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ... Many variants of decision tree (DT) algorithms have been introduced in the last decade. Much of this work has concentrated on decision trees in which each nodechecks the value of a single attribute (=-=Breiman, Friedman, Olshen, & Stone, 1984-=-; Quinlan, 1986, 1993a). Quinlan initially proposed decision trees for classi cation in domains with symbolic-valued attributes (1986), and later extended them to numeric domains (1987). When the attr... |

192 |
Constructing optimal binary decision trees is np-complete
- Hyafil, Rivest
- 1976
(Show Context)
Citation Context ...e optimal test (with respect to an impurity measure) for each node of a tree, the complete tree may not be optimal: as is well known, the problem of finding the smallest tree is NP-Complete (Hyafil & =-=Rivest, 1976). Thus ev-=-en axis-parallel decision tree methods do not produce "ideal" decision trees. Quinlan has suggested that his windowing algorithm might be used as a way of introducing randomization into C4.5... |

181 | Second order derivatives for network pruning: Optimal brain surgeon - Hassibi, Stork - 1993 |

179 |
An empirical comparison of selection measures for decision-trees induction
- Mingers
- 1989
(Show Context)
Citation Context ...impurity, the difference being that goodness values should be maximized while impurity should be minimized. Many different measures of impurity have been studied (Breiman et al., 1984; Quinlan, 1986; =-=Mingers, 1989-=-b; Buntine & Niblett, 1992; Fayyad & Irani, 1992; Heath et al., 1993b). The OC1 system is designed to work with a large class of impurity measures. Stated simply, if the impurity measure uses only the... |

177 | A nearest hyperrectangle learning method - Salzberg - 1991 |

172 |
An empirical comparison of pruning methods for decision-tree induction
- Mingers
- 1989
(Show Context)
Citation Context ...impurity, the difference being that goodness values should be maximized while impurity should be minimized. Many different measures of impurity have been studied (Breiman et al., 1984; Quinlan, 1986; =-=Mingers, 1989-=-b; Buntine & Niblett, 1992; Fayyad & Irani, 1992; Heath et al., 1993b). The OC1 system is designed to work with a large class of impurity measures. Stated simply, if the impurity measure uses only the... |

155 | Learning machines
- Nilsson
- 1965
(Show Context)
Citation Context ...em (Utgoff & Brodley, 1991; Brodley & Utgoff, 1992), which is a successor to the Perceptron Tree method (Utgoff, 1989; Utgoff & Brodley, 1990). Each internal node in an LMDT tree is a Linear Machine (=-=Nilsson, 1990-=-). The training algorithm presents examples repeatedly at each node until the linear machine converges. Because convergence cannot be guaranteed, LMDT uses heuristics to determine when the node has st... |

129 | An empirical comparison of pattern recognition, neural nets, and machine learning classification methods - Weiss, Kapouleas |

126 | Overfitting Avoidance as Bias
- Schaffer
- 1993
(Show Context)
Citation Context ...rees as well as other types of machine learning systems (Quinlan, 1987; Niblett, 1986; Cestnik, Kononenko, & Bratko, 1987; Kodratoff & Manago, 1987; Cohen, 1993; Hassibi & Stork, 1993; Wolpert, 1992; =-=Schaffer, 1993-=-). For the OC1 system we implemented an existing pruning method, but note that any tree pruning method will work fine within OC1. Based on the experimental evaluations of Mingers (1989a) and other wor... |

122 | Multivariate Decision Trees - Brodley, Utgoff - 1995 |

120 | Combining instance-based and modelbased learning
- Quinlan
- 1993
(Show Context)
Citation Context ...ble ways. Figure 3 illustrates these upper limits for two points in two dimensions. For axis-parallel splits, there are only n \Delta d distinct possibilities, and axis-parallel methods such as C4.5 (=-=Quinlan, 1993-=-a) and CART (Breiman et al., 1984) can exhaustively search for the best split at each node. The problem of searching for the best oblique split is therefore much more difficult than that of searching ... |

104 |
A further comparison of splitting rules for decision-tree induction
- Buntine, Niblett
- 1992
(Show Context)
Citation Context ...urbations before forcing the perturbation algorithm to halt. We also included axis-parallel CART and C4.5 in our comparisons. We used the implementations of these algorithms from the IND 2.1 package (=-=Buntine, 1992). The def-=-ault cart0 and c4.5 "styles" defined in the package were used, without altering any parameter settings. The cart0 style uses the Twoing Rule and 0-SE cost complexity pruning with 10-fold cro... |

98 | Using decision trees to improve case-based learning
- Cardie
- 1994
(Show Context)
Citation Context ...ant attributes Irrelevant attributes pose a significant problem for most machine learning methods (Breiman et al., 1984; Aha, 1990; Almuallin & Dietterich, 1991; Kira & Rendell, 1992; Salzberg, 1992; =-=Cardie, 1993-=-; Schlimmer, 1993; Langley & Sage, 1993; Brodley & Utgoff, 1994). Decision tree algorithms, even axis-parallel ones, can be confused by too many irrelevant attributes. Because oblique decision trees l... |

94 | Hedonic prices and the demand for clean air - Harrison, Rubinfeld - 1978 |

92 | Using the adap learning algorithm to forecast the onset of diabetes mellitus - Smith, Everhart, et al. - 1988 |

90 | A system for induction of oblique decision trees - SK, Kasif, et al. - 1994 |

76 |
A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, and Psychological Observations
- Aha
- 1990
(Show Context)
Citation Context ...Complexity pruning, see Breiman et al. (1984) or Mingers (1989a). 3.3.3 Irrelevant attributes Irrelevant attributes pose a significant problem for most machine learning methods (Breiman et al., 1984; =-=Aha, 1990-=-; Almuallin & Dietterich, 1991; Kira & Rendell, 1992; Salzberg, 1992; Cardie, 1993; Schlimmer, 1993; Langley & Sage, 1993; Brodley & Utgoff, 1994). Decision tree algorithms, even axis-parallel ones, c... |

73 | Pattern recognition via linear programming: Theory and application to medical diagnosis - Mangasarian, Setiono, et al. - 1990 |

69 |
Decision tree and diagrams
- Moret
- 1982
(Show Context)
Citation Context ...ssification by a sequence of simple, easy-to-understand tests whose semantics is intuitively clear to domain experts. Decision trees have been used for classification and other tasks since the 1960s (=-=Moret, 1982-=-; Safavin & Landgrebe, 1991). In the 1980's, Breiman et al.'s book on classification and regression trees (CART) and Quinlan 's work on ID3 (Quinlan, 1983, 1986) provided the foundations for what has ... |

68 | Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning
- Schlimmer
- 1993
(Show Context)
Citation Context ... Irrelevant attributes pose a significant problem for most machine learning methods (Breiman et al., 1984; Aha, 1990; Almuallin & Dietterich, 1991; Kira & Rendell, 1992; Salzberg, 1992; Cardie, 1993; =-=Schlimmer, 1993-=-; Langley & Sage, 1993; Brodley & Utgoff, 1994). Decision tree algorithms, even axis-parallel ones, can be confused by too many irrelevant attributes. Because oblique decision trees learn the coeffici... |

67 |
Constructing Decision Trees in Noisy Domains
- Niblett
- 1987
(Show Context)
Citation Context ...g the data. Many studies have found that judicious pruning results in both smaller and more accurate classifiers, for decision trees as well as other types of machine learning systems (Quinlan, 1987; =-=Niblett, 1986-=-; Cestnik, Kononenko, & Bratko, 1987; Kodratoff & Manago, 1987; Cohen, 1993; Hassibi & Stork, 1993; Wolpert, 1992; Schaffer, 1993). For the OC1 system we implemented an existing pruning method, but no... |

59 |
Perceptron trees: A case study in hybrid concept representations
- Utgoff
- 1989
(Show Context)
Citation Context ...t uses a very different approach from CART-LC, is the Linear Machine Decision Trees (LMDT) system (Utgoff & Brodley, 1991; Brodley & Utgoff, 1992), which is a successor to the Perceptron Tree method (=-=Utgoff, 1989-=-; Utgoff & Brodley, 1990). Each internal node in an LMDT tree is a Linear Machine (Nilsson, 1990). The training algorithm presents examples repeatedly at each node until the linear machine converges. ... |

57 | OC1: Randomized induction of oblique decision trees
- Murthy, Kasif, et al.
- 1993
(Show Context)
Citation Context ...perturbations will occur at each node. This constant can be set by the user. Pstag is reset to 1 every time the global impurity measure is improved. Murthy, Kasif & Salzberg Our previous experiments (=-=Murthy et al., 1993-=-) indicated that the order of perturbation of the coefficients does not affect the classification accuracy as much as other parameters, especially the randomization parameters (see below). Since none ... |

38 | Linear machine decision trees - Utgoff, Brodley - 1991 |

35 | Linear function neurons: structure and training - Hampson, Volper - 1986 |

32 | Fast training algorithms for multilayer neural nets - Brent - 1991 |

31 | Multivariate versus univariate decision trees - Brodley, Utgoff - 1992 |

30 | An Incremental Method for Finding Multivariate Splits for Decision Trees - Utgoff, Brodley - 1990 |

24 | Multicategory discrimination via linear programming - Bennett, Mangasarian |

24 |
Adaptive Decision Tree Algorithms for Learning from Examples
- Pagallo
- 1990
(Show Context)
Citation Context ... The right side shows the partitioning that this tree creates in the attribute space. Researchers have also studied decision trees in which the test at a node uses boolean combinations of attributes (=-=Pagallo, 1990-=-; Pagallo & Haussler, 1990; Sahami, 1993) and linear combinations of attributes (see Section 2). Different methods for measuring the goodness of decision tree nodes, as well as techniques for pruning ... |

23 | A machine learning method for generation of a neural network architecture: A continuous ID3 algorithm - Cios, Ning - 1992 |

23 |
Very simple classi cation rules perform well on most commonly used datasets
- Holte
- 1993
(Show Context)
Citation Context ..., and therefore the advantages of randomization may not be detectable. It is known that many of the commonly-used data sets from the UCI repository are easy to learn with very simple representations (=-=Holte, 1993-=-); therefore those data sets may not be ideal for our purposes. Thus we created a number of arti cial data sets that present di erent problems for learning, and for which we know the \correct" concept... |

21 | On Randomization in Sequential and Distributed Algorithms - Gupta, Smolka, et al. - 1994 |

20 | A geometric framework for machine learning - Heath - 1992 |

18 |
Learning E cient Classi cation Procedures and their Application to Chess End Games
- Quinlan
- 1983
(Show Context)
Citation Context ...or classi cation and other tasks since the 1960s (Moret, 1982; Safavin & Landgrebe, 1991). In the 1980's, Breiman et al.'s book on classi cation and regression trees (CART) and Quinlan's work on ID3 (=-=Quinlan, 1983-=-, 1986) provided the foundations for what has become a large body of research on one of the central techniques of experimental machine learning. Many variants of decision tree (DT) algorithms have bee... |

16 |
nets and short paths: Optimising Neural Computation
- Frean
- 1990
(Show Context)
Citation Context ...guaranteed, LMDT uses heuristics to determine when the node has stabilized. To make the training stable even when the set of training instances is not linearly separable, a "thermal training"=-=; method (Frean, 1990-=-) is used, similar to simulated annealing. A third system that creates oblique trees is Simulated Annealing of Decision Trees (SADT) (Heath et al., 1993b) which, like OC1, uses randomization. SADT use... |

14 | Learning with Many Irrelevant Features - Almuallin, Dietterich - 1991 |

13 | A survey of decision tree classifier methodology - Safavin, Landgrebe - 1991 |