## Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules (1996)

### Cached

### Download Links

- [ftp.informatik.uni-trier.de]
- [www.vldb.org]
- [www.acm.org]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 22nd VLDB Conference |

Citations: | 27 - 6 self |

### BibTeX

@INPROCEEDINGS{Fukuda96constructingefficient,

author = {Takeshi Fukuda and Yasuhiko Morimoto and Shinichi Morishita and Takeshi Tokuyama},

title = {Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules},

booktitle = {In Proceedings of the 22nd VLDB Conference},

year = {1996},

pages = {146--155}

}

### OpenURL

### Abstract

We propose an extension of an entropy-based heuristic of Quinlan [Q93] for constructing a decision tree from a large database with many numeric attributes. Quinlan pointed out that his original method (as well as other existing methods) may be inefficient if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family R of grid-regions in the plane associated with the pair of attributes. For R 2 R, the data can be split into two classes: data inside R and data outside R. We compute the region R opt 2 R that minimizes the entropy of the splitting, and add the splitting associated with R opt (for each pair of strongly correlated attributes) to the set of candidate tests in Quinlan's entropy-based heuristic. We give efficient alg...

### Citations

5368 |
C4.5: Programs for Machine Learning
- Quinlan
- 1992
(Show Context)
Citation Context ...shi Tokuyama ttoku@trl.ibm.co.jp IBM Tokyo Research Laboratory 1623-14, Shimo-tsuruma, Yamato City, Kanagawa Pref, 242, JAPAN Abstract We propose an extension of an entropy-based heuristic of Quinlan =-=[Q93]-=- for constructing a decision tree from a large database with many numeric attributes. Quinlan pointed out that his original method (as well as other existing methods) may be inefficient if any numeric... |

4360 | Classification and Regression Trees - Breiman, Friedman, et al. - 1984 |

3610 | Induction of Decision Trees - Quinlan - 1986 |

305 | Inferring Decision Trees Using the Minimum Description Length Principle - Quinlan, Rivest - 1989 |

290 | Database mining: A performance perspective - Agrawal, Imielinski, et al. - 1993 |

205 | SLIQ: A fast scalable classifier for data mining
- Mehta, Agrawal, et al.
- 1996
(Show Context)
Citation Context ... one for which the associated splitting of the set of tuples attains the minimum entropy value. If each test attribute is Boolean or categorical, Quinlan's method works well, and SLIQ of Mehta et al. =-=[MAR96]-=- gives an efficient scalable implementation, which can handle a database with 10 million tuples and 400 attributes. SLIQ uses the GINI function instead of entropy. Handling Numeric Attributes To handl... |

192 | Constructing optimal binary decision trees is npcomplete - Hyafil, Rivest - 1976 |

155 | Fractional cascading: I. A data structure technique
- Chazelle, Guibas
- 1986
(Show Context)
Citation Context ...N) if preprocessing takes O(N 2 ) time. We can reduce the O(N log N) computing time to O(N) by applying the fractional x y I(left) I(right) I(mid) Q(I) I Figure 4: Hand Probe cascading data structure =-=[CG86]-=- (omitted in this version of the paper). We have the following similar results for the family of rectangles and the family of rectilinear convex regions, although the time complexity is increased (we ... |

139 | Computers and Intractability: A Guide to NP-Completeness - Garey, Johnson - 1979 |

121 | Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization
- Fukuda, Morimoto, et al.
- 1996
(Show Context)
Citation Context ... . However, it does not necessarily give a better system for users, since a region rule is more complicated than a one-dimensional rule. Indeed, some technique (for example, a visualization technique =-=[FMMT96b]-=-) is necessary to explain a region rule. Hence, it is desirable that a region rule should only be considered for a pair of strongly correlated conditional attributes. We use the entropy value again to... |

114 | An interval classifier for database mining applications. VLDB - Agrawal, Ghosh, et al. - 1992 |

89 | Mining optimized association rules for numeric attributes
- Fukuda, Morimoto, et al.
- 1996
(Show Context)
Citation Context ...for a real number Z. Compute the value Z opt of Z that minimizes Ent(S(A ? Z); S(AsZ)), and consider the splitting of S into S(A ? Z opt ) and S(AsZ opt ). By applying the algorithms of Fukuda et al. =-=[FMMT96a]-=-, we can extend the above splitting (1) to the following, which is also considered in our decision tree subsystem of SONAR: 2. For an interval I , let S(A 2 I) = ft 2 S : t[A] 2 Ig and S(A 2 I) = ft 2... |

26 |
Probing convex polytopes
- Dobkin, Edelsbrunner, et al.
- 1986
(Show Context)
Citation Context ...vertex of conv(P ) belongs to the upper (resp. lower) chain. Our algorithm is based on the use of what is known in computational geometry as "hand probing" to compute the vertices of a conve=-=x polygon [DEY86]. Han-=-d probing is based on the touching oracle: " Given a slope `, compute the tangent line with slope ` to the upper (resp. lower) chain of the convex polygon together with the tangent point v + (`) ... |

24 | Computing the discrepancy - Dobkin, Eppstein - 1993 |

23 |
Polynomial-time solutions to image segmentation
- Asano, Chen, et al.
- 1996
(Show Context)
Citation Context ...s In this paper, we propose the following scheme, applying the two-dimensional association rules (region rules) of Fukuda et al. [FMMT96a, FMMT96b] and an image segmentation algorithm of Asano et al. =-=[ACKT96]-=-. The scheme has been implemented as a subsystem of SONAR (System for Optimized Numeric Association Rules) developed by the authors [FMMT96c]. Let n be the number of tuples in the database. First, for... |

3 |
Sonar: System for optimized numeric association rules
- Fukuda, Morimoto, et al.
- 1996
(Show Context)
Citation Context ...MMT96b] and an image segmentation algorithm of Asano et al. [ACKT96]. The scheme has been implemented as a subsystem of SONAR (System for Optimized Numeric Association Rules) developed by the authors =-=[FMMT96c]-=-. Let n be the number of tuples in the database. First, for each numeric attribute, we create an equidepth bucketing so that tuples are uniformly distributed into Nsp n ordered buckets according to th... |

2 | Constructing optima1 binary decision trees is NPcomplete - HYAFIL, RIVEST - 1976 |

1 | Partial Construction of an Arrangement of Lines and Its Application to Optimal Partitioning of Bichromatic Point Set. IEICE Transactions E-77-A: 595--600 - Asano, Tokuyama - 1994 |

1 | Classification and Reyression Tree - n, Friedman, et al. - 1984 |

1 | Computer and Zntra,ctability. A Guide to NP-Completeness - Sarey, Johnson - 1979 |