Results 1  10
of
76
A System for Induction of Oblique Decision Trees
 Journal of Artificial Intelligence Research
, 1994
"... This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hillclimbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned espe ..."
Abstract

Cited by 250 (13 self)
 Add to MetaCart
This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hillclimbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axisparallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees. 1. Introduction Current data collection technology provides a unique challenge and opportunity for automated machine learning techniques. The advent of major scientific projects such as the Human Genome Project, the Hubble Space Telescope, and the human brain mappi...
Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems
 Proceedings of the IEEE
, 1998
"... this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, ph ..."
Abstract

Cited by 248 (11 self)
 Add to MetaCart
this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, physics, biology, control and signal processing, information theory, complexity theory, and psychology (see [45]). Neural networks have provided a fertile soil for the infusion (and occasionally confusion) of ideas, as well as a meeting ground for comparing viewpoints, sharing tools, and renovating approaches. It is within the illdefined boundaries of the field of neural networks that researchers in traditionally distant fields have come to the realization that they have been attacking fundamentally similar optimization problems.
Exploratory projection pursuit
 Journal of the American Statistical Association
, 1987
"... Exploratory projection pursuit is concerned with finding relatively highly revealing lower dimensional projections of high dimensional data. The intent is to discover views of the multivariate data set that exhibit nonlinear effectsclustering, concentrations near nonlinear manifolds that are not c ..."
Abstract

Cited by 247 (0 self)
 Add to MetaCart
Exploratory projection pursuit is concerned with finding relatively highly revealing lower dimensional projections of high dimensional data. The intent is to discover views of the multivariate data set that exhibit nonlinear effectsclustering, concentrations near nonlinear manifolds that are not captured by the linear correlation structure. This paper presents a new algorithm for this purpose that has both statistical and computational advantages over previous methods. A connection to density estimation is established. Examples are presented and issues related to practical application are discussed.
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirtythree Old and New Classification Algorithms
, 2000
"... . Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both cr ..."
Abstract

Cited by 168 (7 self)
 Add to MetaCart
. Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, splinebased, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although splinebased statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...
Dimensionality Reduction via Sparse Support Vector Machines
 Journal of Machine Learning Research
, 2003
"... We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to prod ..."
Abstract

Cited by 68 (13 self)
 Add to MetaCart
We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that a linear SVM (no kernels) with # 1 norm regularization inherently performs variable selection as a sidee#ect of minimizing capacity of the SVM model. The distribution of the linear model weights provides a mechanism for ranking and interpreting the e#ects of variables.
Predictive learning via rule ensembles
, 2005
"... General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predict ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predictive accuracy comparable to the best methods. However, their principal advantage lies in interpretation. Because of its simple form, each rule is easy to understand, as is its influence on individual predictions, selected subsets of predictions, or globally over the entire space of joint input variable values. Similarly, the degree of relevance of the respective input variables can be assessed globally, locally in different regions of the input space, or at individual prediction points. Techniques are presented for automatically identifying those variables that are involved in interactions with other variables, the strength and degree of those interactions, as well as the identities of the other variables with which they interact. Graphical representations are used to visualize both main and interaction effects. 1. Introduction. Predictive
Clustering objects on subsets of attributes
 Journal of the Royal Statistical Society
, 2004
"... Proofs subject to correction. Not to be reproduced without permission. Confidential until read to the Society. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
Proofs subject to correction. Not to be reproduced without permission. Confidential until read to the Society. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor.
Support Vector Regression with ANOVA Decomposition Kernels
, 1997
"... Support Vector Machines using ANOVA Decomposition Kernels (SVAD) [Vapng] are a way of imposing a structure on multidimensional kernels which are generated as the tensor product of onedimensional kernels. This gives more accurate control over the capacity of the learning machine (VCdimension) . SVA ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
Support Vector Machines using ANOVA Decomposition Kernels (SVAD) [Vapng] are a way of imposing a structure on multidimensional kernels which are generated as the tensor product of onedimensional kernels. This gives more accurate control over the capacity of the learning machine (VCdimension) . SVAD uses ideas from ANOVA decomposition methods and extends them to generate kernels which directly implement these ideas. SVAD is used with spline kernels and results show that SVAD performs better than the respective non ANOVA decomposition kernel. The Boston housing data set from UCI has been tested on Bagging [Bre94] and Support Vector methods before [DBK + 97] and these results are compared to the SVAD method. 1 Introduction In this paper we will introduce ANOVA kernels for support vector machines. We firstly introduce multiplicative kernels, which form the basis of the ANOVA kernels, then we introduce the general ANOVA decomposition idea. From this we derive ANOVA kernels and lastly sh...
Bayesian Treed Models
 Machine Learning
, 2000
"... When simple parametric models such as linear regression fail to adequately approximate a function across an entire set of data, an alternative may be to consider a partition of the data, and then use a separate simple model within each subset of the partition. Such an alternative is provided by ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
When simple parametric models such as linear regression fail to adequately approximate a function across an entire set of data, an alternative may be to consider a partition of the data, and then use a separate simple model within each subset of the partition. Such an alternative is provided by a treed model which uses a binary tree to identify such a partition. However, treed models go further than conventional trees (eg CART, C4.5) by tting models rather than simple means or proportions across the partition. In this paper, we propose a Bayesian approach for nding and tting parametric treed models, in particular focusing on Bayesian treed regression. The potential of this approach is illustrated by a crossvalidation comparison of predictive performance with neural nets, MARS, and conventional trees on simulated and real data sets. Keywords: binary trees, Markov chain Monte Carlo, model selection, stochastic search. 1 Hugh Chipman is Associate Professor of Statistics...