Results 1  10
of
119
A System for Induction of Oblique Decision Trees
 Journal of Artificial Intelligence Research
, 1994
"... This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hillclimbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned espe ..."
Abstract

Cited by 254 (13 self)
 Add to MetaCart
This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hillclimbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axisparallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees. 1. Introduction Current data collection technology provides a unique challenge and opportunity for automated machine learning techniques. The advent of major scientific projects such as the Human Genome Project, the Hubble Space Telescope, and the human brain mappi...
Locally Weighted Learning for Control
, 1996
"... Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We ex ..."
Abstract

Cited by 165 (17 self)
 Add to MetaCart
Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control.
ErrorCorrecting Output Coding Corrects Bias and Variance
 In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... Previous research has shown that a technique called errorcorrecting output coding (ECOC) can dramatically improve the classification accuracy of supervised learning algorithms that learn to classify data points into one of k AE 2 classes. This paper presents an investigation of why the ECOC techniq ..."
Abstract

Cited by 150 (5 self)
 Add to MetaCart
Previous research has shown that a technique called errorcorrecting output coding (ECOC) can dramatically improve the classification accuracy of supervised learning algorithms that learn to classify data points into one of k AE 2 classes. This paper presents an investigation of why the ECOC technique works, particularly when employed with decisiontree learning algorithms. It shows that the ECOC method like any form of voting or committeecan reduce the variance of the learning algorithm. Furthermoreunlike methods that simply combine multiple runs of the same learning algorithmECOC can correct for errors caused by the bias of the learning algorithm. Experiments show that this bias correction ability relies on the nonlocal behavior of C4.5. 1 Introduction Errorcorrecting output coding (ECOC) is a method for applying binary (twoclass) learning algorithms to solve kclass supervised learning problems. It works by converting the kclass supervised learning problem into a la...
Evaluation Of Gaussian Processes And Other Methods For NonLinear Regression
, 1996
"... This thesis develops two Bayesian learning methods relying on Gaussian processes and a rigorous statistical approach for evaluating such methods. In these experimental designs the sources of uncertainty in the estimated generalisation performances due to both variation in training and test sets are ..."
Abstract

Cited by 140 (16 self)
 Add to MetaCart
This thesis develops two Bayesian learning methods relying on Gaussian processes and a rigorous statistical approach for evaluating such methods. In these experimental designs the sources of uncertainty in the estimated generalisation performances due to both variation in training and test sets are accounted for. The framework allows for estimation of generalisation performance as well as statistical tests of significance for pairwise comparisons. Two experimental designs are recommended and supported by the DELVE software environment. Two new nonparametric Bayesian learning methods relying on Gaussian process priors over functions are developed. These priors are controlled by hyperparameters which set the characteristic length scale for each input dimension. In the simplest method, these parameters are fit from the data using optimization. In the second, fully Bayesian method, a Markov chain Monte Carlo technique is used to integrate over the hyperparameters. One advantage of these G...
A tutorial on the crossentropy method
 Annals of Operations Research
, 2005
"... Abstract: The crossentropy method is a recent versatile Monte Carlo technique. This article provides a brief introduction to the crossentropy method and discusses how it can be used for rareevent probability estimation and for solving combinatorial, continuous, constrained and noisy optimization ..."
Abstract

Cited by 111 (15 self)
 Add to MetaCart
Abstract: The crossentropy method is a recent versatile Monte Carlo technique. This article provides a brief introduction to the crossentropy method and discusses how it can be used for rareevent probability estimation and for solving combinatorial, continuous, constrained and noisy optimization problems. A comprehensive list of references on crossentropy methods and applications is included.
Unifying instance–based and rule–based induction
 Machine Learning 24
, 1996
"... Abstract. Several welldeveloped approaches to inductive learning now exist, but each has specific limitations that are hard to overcome. Multistrategy learning attempts to tackle this problem by combining multiple methods in one algorithm. This article describes a unification of two widelyused em ..."
Abstract

Cited by 87 (6 self)
 Add to MetaCart
Abstract. Several welldeveloped approaches to inductive learning now exist, but each has specific limitations that are hard to overcome. Multistrategy learning attempts to tackle this problem by combining multiple methods in one algorithm. This article describes a unification of two widelyused empirical approaches: rule induction and instancebased learning. In the new algorithm, instances are treated as maximally specific rules, and classification is performed using a bestmatch strategy. Rules are learned by gradually generalizing instances until no improvement in apparent accuracy is obtained. Theoretical analysis shows this approach to be efficient. It is implemented in the RISE 3.1 system. In an extensive empirical study, RISE consistently achieves higher accuracies than stateoftheart representatives of both its parent approaches (PEBLS and CN2), as well as a decision tree learner (C4.5). Lesion studies show that each of RISE’s components is essential to this performance. Most significantly, in 14 of the 30 domains studied, RISE is more accurate than the best of PEBLS and CN2, showing that a significant synergy can be obtained by combining multiple empirical methods.
Efficient Locally Weighted Polynomial Regression Predictions
 In Proceedings of the 1997 International Machine Learning Conference
"... Locally weighted polynomial regression (LWPR) is a popular instancebased algorithm for learning continuous nonlinear mappings. For more than two or three inputs and for more than a few thousand datapoints the computational expense of predictions is daunting. We discuss drawbacks with previous appr ..."
Abstract

Cited by 82 (11 self)
 Add to MetaCart
Locally weighted polynomial regression (LWPR) is a popular instancebased algorithm for learning continuous nonlinear mappings. For more than two or three inputs and for more than a few thousand datapoints the computational expense of predictions is daunting. We discuss drawbacks with previous approaches to dealing with this problem, and present a new algorithm based on a multiresolution search of a quicklyconstructible augmented kdtree. Without needing to rebuild the tree, we can make fast predictions with arbitrary local weighting functions, arbitrary kernel widths and arbitrary queries. The paper begins with a new, faster, algorithm for exact LWPR predictions. Next we introduce an approximation that achieves up to a twoordersof magnitude speedup with negligible accuracy losses. Increasing a certain approximation parameter achieves greater speedups still, but with a correspondingly larger accuracy degradation. This is nevertheless useful during operations such as the early stages...
Rule Induction and InstanceBased Learning: A Unified Approach
, 1995
"... This paper presents a new approach to inductive learning that combines aspects of instancebased learning and rule induction in a single simple algorithm. The RISE system searches for rules in a specifictogeneral fashion, starting with one rule per training example, and avoids some of the difficult ..."
Abstract

Cited by 60 (5 self)
 Add to MetaCart
This paper presents a new approach to inductive learning that combines aspects of instancebased learning and rule induction in a single simple algorithm. The RISE system searches for rules in a specifictogeneral fashion, starting with one rule per training example, and avoids some of the difficulties of separateandconquer approaches by evaluating each proposed induction step globally, i.e., through an efficient procedure that is equivalent to checking the accuracy of the rule set as a whole on every training example. Classification is performed using a bestmatch strategy, and reduces to nearestneighbor if all generalizations of instances were rejected. An extensive empirical study shows that RISE consistently achieves higher accuracies than stateoftheart representatives of its "parent" paradigms (PEBLS and CN2), and also outperforms a decisiontree learner (C4.5) in 13 out of 15 test domains (in 10 with 95% confidence). 1 Introduction Several welldeveloped approaches to indu...
ErrorCorrecting Output Coding for Text Classification
, 1999
"... This paper applies errorcorrecting output coding (ECOC) to the task of document categorization. ECOC, of recent vintage in the AI literature, is a method for decomposing a multiway classification problem into many binary classification tasks, and then combining the results of the subtasks int ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
This paper applies errorcorrecting output coding (ECOC) to the task of document categorization. ECOC, of recent vintage in the AI literature, is a method for decomposing a multiway classification problem into many binary classification tasks, and then combining the results of the subtasks into a hypothesized solution to the original problem. There has been much recent interest in the machine learning community about algorithms which integrate "advice" from many subordinate predictors into a single classifier, and errorcorrecting output coding is one such technique. We provide experimental results on several realworld datasets, extracted from the Internet, which demonstrate that ECOC can o#er significant improvements in accuracy over conventional classification algorithms. 1
Finding the number of clusters in a data set: An information theoretic approach
 Journal of the American Statistical Association
, 2003
"... One of the most difficult problems in cluster analysis is the identification of the number of groups in a data set. Most previously suggested approaches to this problem are either somewhat ad hoc or require parametric assumptions and complicated calculations. In this paper we develop a simple yet po ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
One of the most difficult problems in cluster analysis is the identification of the number of groups in a data set. Most previously suggested approaches to this problem are either somewhat ad hoc or require parametric assumptions and complicated calculations. In this paper we develop a simple yet powerful nonparametric method for choosing the number of clusters based on distortion, a quantity that measures the average distance, per dimension, between each observation and its closest cluster center. Our technique is computationally efficient and straightforward to implement. We demonstrate empirically its effectiveness, not only for choosing the number of clusters but also for identifying underlying structure, on a wide range of simulated and real world data sets. In addition, we give a rigorous theoretical justification for the method based on information theoretic ideas. Specifically, results from the subfield of electrical engineering known as rate distortion theory allow us to describe the behavior of the distortion in both the presence and absence of clustering. Finally, we note that these ideas potentially can be extended to a wide range of other statistical model selection problems. 1