Results 1 - 10
of
61
A System for Induction of Oblique Decision Trees
- Journal of Artificial Intelligence Research
, 1994
"... This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned espe ..."
Abstract
-
Cited by 222 (11 self)
- Add to MetaCart
This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees. 1. Introduction Current data collection technology provides a unique challenge and opportunity for automated machine learning techniques. The advent of major scientific projects such as the Human Genome Project, the Hubble Space Telescope, and the human brain mappi...
Exploratory projection pursuit
- Journal of the American Statistical Association
, 1987
"... Exploratory projection pursuit is concerned with finding relatively highly revealing lower dimensional projections of high dimensional data. The intent is to discover views of the multivariate data set that exhibit nonlinear effects-clustering, concentrations near nonlinear manifolds- that are not c ..."
Abstract
-
Cited by 206 (0 self)
- Add to MetaCart
Exploratory projection pursuit is concerned with finding relatively highly revealing lower dimensional projections of high dimensional data. The intent is to discover views of the multivariate data set that exhibit nonlinear effects-clustering, concentrations near nonlinear manifolds- that are not captured by the linear correlation structure. This paper presents a new algorithm for this purpose that has both statistical and computational advantages over previous methods. A connection to density estimation is established. Examples are presented and issues related to practical application are discussed.
Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems
- Proceedings of the IEEE
, 1998
"... this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, ph ..."
Abstract
-
Cited by 193 (4 self)
- Add to MetaCart
this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, physics, biology, control and signal processing, information theory, complexity theory, and psychology (see [45]). Neural networks have provided a fertile soil for the infusion (and occasionally confusion) of ideas, as well as a meeting ground for comparing viewpoints, sharing tools, and renovating approaches. It is within the ill-defined boundaries of the field of neural networks that researchers in traditionally distant fields have come to the realization that they have been attacking fundamentally similar optimization problems.
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms
, 2000
"... . Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both cr ..."
Abstract
-
Cited by 134 (6 self)
- Add to MetaCart
. Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, spline-based, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although spline-based statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...
Dimensionality Reduction via Sparse Support Vector Machines
- Journal of Machine Learning Research
, 2003
"... We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to prod ..."
Abstract
-
Cited by 45 (12 self)
- Add to MetaCart
We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that a linear SVM (no kernels) with # 1 -norm regularization inherently performs variable selection as a side-e#ect of minimizing capacity of the SVM model. The distribution of the linear model weights provides a mechanism for ranking and interpreting the e#ects of variables.
Predictive learning via rule ensembles
, 2005
"... General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predict ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predictive accuracy comparable to the best methods. However, their principal advantage lies in interpretation. Because of its simple form, each rule is easy to understand, as is its influence on individual predictions, selected subsets of predictions, or globally over the entire space of joint input variable values. Similarly, the degree of relevance of the respective input variables can be assessed globally, locally in different regions of the input space, or at individual prediction points. Techniques are presented for automatically identifying those variables that are involved in interactions with other variables, the strength and degree of those interactions, as well as the identities of the other variables with which they interact. Graphical representations are used to visualize both main and interaction effects. 1. Introduction. Predictive
Clustering objects on subsets of attributes
- Journal of the Royal Statistical Society
, 2004
"... Proofs subject to correction. Not to be reproduced without permission. Confidential until read to the Society. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Proofs subject to correction. Not to be reproduced without permission. Confidential until read to the Society. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor.
Support Vector Regression with ANOVA Decomposition Kernels
, 1997
"... Support Vector Machines using ANOVA Decomposition Kernels (SVAD) [Vapng] are a way of imposing a structure on multi-dimensional kernels which are generated as the tensor product of one-dimensional kernels. This gives more accurate control over the capacity of the learning machine (VCdimension) . SVA ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
Support Vector Machines using ANOVA Decomposition Kernels (SVAD) [Vapng] are a way of imposing a structure on multi-dimensional kernels which are generated as the tensor product of one-dimensional kernels. This gives more accurate control over the capacity of the learning machine (VCdimension) . SVAD uses ideas from ANOVA decomposition methods and extends them to generate kernels which directly implement these ideas. SVAD is used with spline kernels and results show that SVAD performs better than the respective non ANOVA decomposition kernel. The Boston housing data set from UCI has been tested on Bagging [Bre94] and Support Vector methods before [DBK + 97] and these results are compared to the SVAD method. 1 Introduction In this paper we will introduce ANOVA kernels for support vector machines. We firstly introduce multiplicative kernels, which form the basis of the ANOVA kernels, then we introduce the general ANOVA decomposition idea. From this we derive ANOVA kernels and lastly sh...
Distributed Multivariate Regression Using Wavelet-based Collective Data Mining
- Journal of Parallel and Distributed Computing
, 1999
"... This paper presents a method for distributed multivariate regression using wavelet-based Collective Data Mining (CDM). The method seamlessly blends machine learning and information theory with the statistical methods employed in multivariate regression to provide an effective data mining technique f ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
This paper presents a method for distributed multivariate regression using wavelet-based Collective Data Mining (CDM). The method seamlessly blends machine learning and information theory with the statistical methods employed in multivariate regression to provide an effective data mining technique for use in a distributed data and computation environment. Evaluation of the method in terms of model accuracy as a function of appropriateness of the selected wavelet function, relative number of non-linear cross-terms, and sample size demonstrates that accurate multivariate regression models can be generated from distributed, heterogeneous, data sets with minimal data communication overhead compared to that required to aggregate a centralized data set. Application of this method to Linear Discriminant Analysis, which is closely related to multivariate regression, produced classification results on the Iris data set that are comparable to those obtained with centralized data analysis. 1 Intr...

