## Hyper-rectangle-based discriminative data generalization and applications in data mining (2007)

Citations: | 5 - 2 self |

### BibTeX

@TECHREPORT{Gao07hyper-rectangle-baseddiscriminative,

author = {Byron Ju Gao},

title = {Hyper-rectangle-based discriminative data generalization and applications in data mining},

institution = {},

year = {2007}

}

### OpenURL

### Abstract

The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Axis-parallel hyper-rectangles provide interpretable generalizations for multi-dimensional data points with numerical attributes. In this dissertation, we study the fundamental problem of rectangle-based discriminative data generalization in the context of several useful data mining applications: cluster description, rule learning, and Nearest Rectangle classification. Clustering is one of the most important data mining tasks. However, most clustering methods output sets of points as clusters and do not generalize them into interpretable patterns. We perform a systematic study of cluster description, where we propose novel description formats leading to enhanced expressive power and introduce novel description problems specifying different trade-offs between interpretability and accuracy. We also present efficient heuristic algorithms for the introduced problems in the proposed formats. If-then rules are

### Citations

11605 | Johnson: Computers and Intractability: A guide to theory of NP hardness, Feeman and - Garey, S - 1979 |

5494 | C4.5: Programs for machine learning - Quinlan - 1993 |

4517 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ... “secondary” data type in knowledge-based systems. An attribute is called numerical if it takes values from an ordered set, and categorical if it takes values from a set not having a natural ordering =-=[BFOS84]-=-. Numerical attributes can have either discrete or continuous values. There are numerous application domains that heavily involve numerical data analyses such as pattern recognition, remote sensing an... |

3679 | Simplifying decision trees - Quinlan - 1987 |

3110 |
UCI repository of machine learning databases
- Blake, Keogh, et al.
- 1998
(Show Context)
Citation Context ...2Cover, DesTree, FindClans, and BP. For BP, we also implemented Greedy Growth [AGDP98] and a synthetic grid data generator. To make our experiments reproducible, real datasets from the UCI repository =-=[BM98]-=- with numerical attributes and without missing values were used, where data records with the same class label were treated as a cluster. Note that in the broad sense, a cluster can be used to represen... |

1559 | Finding Groups in Data: An Introduction to Cluster Analysis - Kaufman, Rousseeuw - 1990 |

1261 | Modeling by shortest data description - Rissanen - 1978 |

1067 | Fast effective rule induction
- Cohen
- 1995
(Show Context)
Citation Context ...lasses simultaneously. Unlike many existing methods that can only learn either perfect models such as early members of the AQ family [Mic69, ML86], or approximate models such as CN2 [CN89] and RIPPER =-=[Coh95]-=-, RGB has the flexibility to learn both without resorting to post-pruning. Finally, although rectangle rules provide good generalization on numerical attributes, they can be adapted to categorical att... |

808 | The CN2 induction algorithm
- Clark, Niblett
- 1989
(Show Context)
Citation Context ...of rules for all classes simultaneously. Unlike many existing methods that can only learn either perfect models such as early members of the AQ family [Mic69, ML86], or approximate models such as CN2 =-=[CN89]-=- and RIPPER [Coh95], RGB has the flexibility to learn both without resorting to post-pruning. Finally, although rectangle rules provide good generalization on numerical attributes, they can be adapted... |

660 | A threshold of ln n for approximating set cover
- Feige
- 1998
(Show Context)
Citation Context ...ber of uncovered elements, approximates the two NP-hard problems within (1 + ln n) [Joh74a] and (1 − 1 e ) [Hoc97] respectively. The ratios are optimal unless NP is contained in quasi-polynomial time =-=[Fei98]-=-. The minimum set cover and maximum coverage problems are related to the MDL and MDA (with recall at fixed precision of 1 as the accuracy measure) problems respectively except that they are given an a... |

647 | Learnability and the VapnikChervonenkis Dimension - Blumer, Ehrenfeucht, et al. - 1989 |

605 | Raghavan: “Automatic subspace clustering of high dimensional data for data mining applications
- Agrawal, Gehrke, et al.
- 1998
(Show Context)
Citation Context ...ng literatures. 2.2.1 Directly related research Currently in the database and data mining literature, there are only a few algorithms that explicitly work on description problems, i.e., Greedy Growth =-=[AGDP98]-=- and BP [LNW + 02]. Both of them have exclusively focused on the MDL problem with SOR as the description format. In the following, we review these two algorithms and then discuss a theoretical study o... |

591 | Stacked generalization - Wolpert - 1992 |

550 | The X-tree: An index structure for high-dimensional data
- Berchtold, Keim, et al.
- 1996
(Show Context)
Citation Context ...certain enumeration ordering of the elements in the rectangle pool. To facilitate the testing of red cell violation, all the red cells are built into a lean-tree index, which is a variant of X -trees =-=[BKK96]-=-. Greedy Growth and BP explicitly work on cluster description problems. However, despite the grid data limitation, they address the MDL problem solely while our focus is on the more useful and practic... |

408 | Efficiently Mining Long Patterns from Databases - Bayardo |

395 | Learning decision lists - Rivest - 1987 |

375 | New methods to color the vertices of a graph - Brélaz - 1979 |

346 | Rule Induction with CN2: Some Recent Improvements
- Clark, Boswell
- 1991
(Show Context)
Citation Context ...example-driven, i.e., it does not depend on specific examples during search. It also uses different rule evaluation measures, e.g., entropy as in early versions of CN2 [CN89], or the Laplace estimate =-=[CB91]-=- that penalizes rules with low coverage. In CN2, specializations of rules halt when no further specializations are statistically significant. While rules learned in CN2 admit inconsistency during the ... |

344 | The multi-purpose incremental learning system AQl5 and its testing application to three medical domains - Mozeti, Hong, et al. - 1986 |

307 | Inferring decision tree using the minimum description length principle - Quinlan, Rivest - 1989 |

301 | X-means: Extending k-means with efficient estimation of the number of clusters - Pelleg, Moore - 2000 |

292 |
Nearest neighbor (NN) norms: NN pattern classification techniques
- Dasarathy
- 1991
(Show Context)
Citation Context ...ng instance) is classified according to its nearest rectangle. NR learners belongs to the class of hybrid lazy-eager learning algorithms. Lazy algorithms such as k-Nearest Neighbor (kNN ) classifiers =-=[Das91]-=- are instance-based and nonparametric, whereas eager algorithms such as decision trees [BFOS84, Mur98] are modelbased and parametric. Lazy algorithms are generally more advantageous in terms of predic... |

259 |
A survey and critique of techniques for extracting rules from trained artificial neural networks. InKonnektionismus und Neuronale Netze, 187-210, Sankt Augustin: GMD
- Andrews, Diederich, et al.
- 1995
(Show Context)
Citation Context ...duction of scientific theories and study of the generalization behavior of an underlying model are several other reasons than the explanation capacity that underline the importance of rule extraction =-=[ADT95]-=-. We note that the above arguments motivating rule extraction can also serve the purpose of motivating interpretable data mining in general, which is the center of our study in the dissertation. Rule ... |

229 | FOIL: A Midterm Report - Quinlan, Cameron-Jones - 1993 |

217 | Generating accurate rule sets without global optimization
- Frank, Witten
- 1998
(Show Context)
Citation Context ...les are covered. Besides the AQ family, some other separate-and-conquer rule learners are: PRISM [Cen87], CN2 [CN89, CB91], SWAP-1 [WI93], GROW [Coh93], IREP [FW94], RIPPER [Coh95], BEXA [TC96], PART =-=[FW98]-=-, IREP++ [DB04], and TRIPPER [VH06]. [Fur99] provides a good survey on separate-and-conquer rule learning.sCHAPTER 3. RECTANGLE-BASED RULE LEARNING 60 Among these algorithms, AQ, CN2, and RIPPER are t... |

212 | The extraction of refined rules from knowledge-based neural networks - Towell, Shavlik - 1991 |

206 | Boolean feature discovery in empirical learning - Pagallo, Haussler - 1990 |

197 | Constructing optimal binary decision trees is NP-complete - Hyafil, Rivest - 1976 |

166 | On the handling of continuous-valued attributes in decision tree generation - Fayyad, Irani - 1992 |

165 | Automatic construction of decision trees from data: A multi-disciplinary survey - MURTHY - 1998 |

159 | Incremental clustering and dynamic information retrieval
- CHARIKAR, CHEKURI, et al.
- 1997
(Show Context)
Citation Context ...easures use the sum or average of (squared) distances as in k-means and k-medoid [KR90], some measures use a single distance value, radius or diameter, as in k-center [TSRB71] and pairwise clustering =-=[CCFM97]-=-. The radius of a cluster is the maximum distance between a fixed point (center) and any point in the cluster, and the diameter is the maximum distance between any two points in the cluster. A known l... |

159 | A Conservation Law for Generalization Performance - Schaffer - 1994 |

146 | Approximation algorithms for directed Steiner problems - Charikar, Chekuri, et al. - 1999 |

143 | Separate-and-conquer rule learning
- Furnkranz
- 1999
(Show Context)
Citation Context ... these techniques. 3.3.3 Rules vs. trees Despite the popularity of decision tree learning partly due to its efficiency, many researchers consider decision rules more advantageous than decision trees. =-=[Fur99]-=- argues that decision trees are often quite complex and hard to understand. [Qui93a] has noted that even pruned decision trees may be too cumbersome, complex, and inscrutable to provide insight into t... |

134 | Incremental reduced error pruning
- Furnkranz, Widmer
- 1994
(Show Context)
Citation Context ...ule at a time until all the positive examples are covered. Besides the AQ family, some other separate-and-conquer rule learners are: PRISM [Cen87], CN2 [CN89, CB91], SWAP-1 [WI93], GROW [Coh93], IREP =-=[FW94]-=-, RIPPER [Coh95], BEXA [TC96], PART [FW98], IREP++ [DB04], and TRIPPER [VH06]. [Fur99] provides a good survey on separate-and-conquer rule learning.sCHAPTER 3. RECTANGLE-BASED RULE LEARNING 60 Among t... |

133 | A polylogarithmic approximation algorithm for the group steiner tree problem - Garg, Konjevod, et al. - 1998 |

126 | Overfitting avoidance as bias - Schaffer - 1993 |

123 | Combining instance-based and model-based learning - Quinlan - 1993 |

105 | BOATOptimistic Decision Tree Construction - Gehrke, Ganti, et al. - 1999 |

100 | An experimental comparison of the nearest-neighbor and nearesthyperrectangle algorithms - Wettschereck, Dietterich - 1995 |

96 | An algorithm for point clustering and grid generation - Berger, Rigoutsos - 1991 |

95 |
Lazy learning
- Aha
- 1997
(Show Context)
Citation Context ...eager methods try to make predictions that are good on average using a single global model.sCHAPTER 4. NEAREST RECTANGLE LEARNING 114 The characteristics of eager and lazy learners were identified in =-=[Aha97]-=-. Both types of learners have their own desirable properties. To compromise on some of the distinguishing characteristics of purely lazy or eager methods, varied hybrid lazy-eager algorithms are studi... |

93 | Extracting tree-structured representations of trained networks - Craven, Shavlik |

92 | The Location of Emergency Service Facilities - Toregas, Swain, et al. - 1971 |

85 | The role of Occam's razor in knowledge discovery
- Domingos
- 1999
(Show Context)
Citation Context ...93], many researchers claim that no inductive bias can be validly preferred over another without making certain a priori assumptions. Some researchers claim that the opposite of Occam’s Razor is true =-=[Dom99]-=-. The RISE rule learning system, against Occam’s Razor, is biased for complex models inducing substantially more complex rule sets than other rule learners. In our study, we pay special attention to t... |

80 | Using sampling and queries to extract rules from trained neural networks, in machine learning - Craven, Shavlik - 1994 |

76 | Extracting Comprehensible Models from Trained Neural Networks - Craven - 1996 |

71 | Optimal partitioning for classification and regression trees - Chou - 1991 |

69 | Mining Top-K Frequent Closed Patterns without Minimum Support - Han, Wang, et al. - 2002 |

66 | On approximating the depth and related problems
- Aronov, Har-Peled
- 2005
(Show Context)
Citation Context ...cible to the classical set cover problem [Joh74a] and NP-hard. For the maximal box problem, the NP-hardness proof is given in [EHL + 02]. [LN03] and [Seg04] study approximation algorithms on a plane. =-=[AHP05]-=- studies a similar problem with “maximum disk” instead of “maximum box”. [DbSS + 05] studies a similar problem of using two, instead of one, either boxes or disks for the same objective. The red blue ... |