Results 1 - 10
of
240
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little u ..."
Abstract
-
Cited by 2831 (123 self)
- Add to MetaCart
(Show Context)
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for real-time applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract
-
Cited by 865 (3 self)
- Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
An introduction to kernel-based learning algorithms
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract
-
Cited by 598 (55 self)
- Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
On kernel target alignment
- ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14
, 2002
"... Kernel based methods are increasingly being used for data modeling because of their conceptual simplicity and outstanding performance on many tasks. However, the kernel function is often chosen using trial-and-error heuristics. In this paper we address the problem of measuring the degree of agreem ..."
Abstract
-
Cited by 298 (8 self)
- Add to MetaCart
Kernel based methods are increasingly being used for data modeling because of their conceptual simplicity and outstanding performance on many tasks. However, the kernel function is often chosen using trial-and-error heuristics. In this paper we address the problem of measuring the degree of agreement between a kernel and a learning task. A quantitative measure of agreement is important from both a theoretical and practical point of view. We propose a quantity to capture this notion, which we call Alignment. We study its theoretical properties, and derive a series of simple algorithms for adapting a kernel to the labels and vice versa. This produces a series of novel methods for clustering and transduction, kernel combination and kernel selection. The algorithms are tested on two publicly available datasets and are shown to exhibit good performance.
A System for Induction of Oblique Decision Trees
- Journal of Artificial Intelligence Research
, 1994
"... This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned espe ..."
Abstract
-
Cited by 292 (14 self)
- Add to MetaCart
(Show Context)
This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees. 1. Introduction Current data collection technology provides a unique challenge and opportunity for automated machine learning techniques. The advent of major scientific projects such as the Human Genome Project, the Hubble Space Telescope, and the human brain mappi...
Improvements to Platt’s SMO Algorithm for SVM Classifier Design
, 2001
"... This article points out an important source of inefficiency in Platt’s sequential minimal optimization (SMO) algorithm that is caused by the use of a single threshold value. Using clues from the KKT conditions for the dual problem, two threshold parameters are employed to derive modifications of SMO ..."
Abstract
-
Cited by 273 (11 self)
- Add to MetaCart
This article points out an important source of inefficiency in Platt’s sequential minimal optimization (SMO) algorithm that is caused by the use of a single threshold value. Using clues from the KKT conditions for the dual problem, two threshold parameters are employed to derive modifications of SMO. These modified algorithms perform significantly faster than the original SMO on all benchmark data sets tried.
Feature Selection via Concave Minimization and Support Vector Machines
- Machine Learning Proceedings of the Fifteenth International Conference(ICML ’98
, 1998
"... Computational comparison is made between two feature selection approaches for finding a separating plane that discriminates between two point sets in an n-dimensional feature space that utilizes as few of the n features (dimensions) as possible. In the concave minimization approach [19, 5] a separat ..."
Abstract
-
Cited by 263 (23 self)
- Add to MetaCart
(Show Context)
Computational comparison is made between two feature selection approaches for finding a separating plane that discriminates between two point sets in an n-dimensional feature space that utilizes as few of the n features (dimensions) as possible. In the concave minimization approach [19, 5] a separating plane is generated by minimizing a weighted sum of distances of misclassified points to two parallel planes that bound the sets and which determine the separating plane midway between them. Furthermore, the number of dimensions of the space used to determine the plane is minimized. In the support vector machine approach [27, 7, 1, 10, 24, 28], in addition to minimizing the weighted sum of distances of misclassified points to the bounding planes, we also maximize the distance between the two bounding planes that generate the separating plane. Computational results show that feature suppression is an indirect consequence of the support vector machine approach when an appropriate norm is us...
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
- Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract
-
Cited by 224 (1 self)
- Add to MetaCart
(Show Context)
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, tree-structured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Semi-supervised support vector machines
- In Proc. NIPS
, 1998
"... We introduce a semi-supervised support vector machine (S3yM) method. Given a training set of labeled data and a working set of unlabeled data, S3YM constructs a support vector machine us-ing both the training and working sets. We use S3YM to solve the transduction problem using overall risk minimiza ..."
Abstract
-
Cited by 223 (6 self)
- Add to MetaCart
(Show Context)
We introduce a semi-supervised support vector machine (S3yM) method. Given a training set of labeled data and a working set of unlabeled data, S3YM constructs a support vector machine us-ing both the training and working sets. We use S3YM to solve the transduction problem using overall risk minimization (ORM) posed by Yapnik. The transduction problem is to estimate the value of a classification function at the given points in the working set. This contrasts with the standard inductive learning problem of estimating the classification function at all possible values and then using the fixed function to deduce the classes of the working set data. We propose a general S3YM model that minimizes both the misclassification error and the function capacity based on all the available data. We show how the S3YM model for I-norm lin-ear support vector machines can be converted to a mixed-integer program and then solved exactly using integer programming. Re-sults of S3YM and the standard I-norm support vector machine approach are compared on ten data sets. Our computational re-sults support the statistical learning theory results showing that incorporating working data improves generalization when insuffi-cient training information is available. In every case, S3YM either improved or showed no significant difference in generalization com-pared to the traditional approach. Semi-Supervised Support Vector Machines 369 1
A New Evolutionary System for Evolving Artificial Neural Networks
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1996
"... This paper presents a new evolutionary system, i.e., EPNet, for evolving artificial neural networks (ANNs). The evolutionary algorithm used in EPNet is based on Fogel's evolutionary programming (EP) [1], [2], [3]. Unlike most previous studies on evolving ANNs, this paper puts its emphasis on ev ..."
Abstract
-
Cited by 202 (35 self)
- Add to MetaCart
This paper presents a new evolutionary system, i.e., EPNet, for evolving artificial neural networks (ANNs). The evolutionary algorithm used in EPNet is based on Fogel's evolutionary programming (EP) [1], [2], [3]. Unlike most previous studies on evolving ANNs, this paper puts its emphasis on evolving ANN's behaviours. This is one of the primary reasons why EP is adopted. Five mutation operators proposed in EPNet reflect such an emphasis on evolving behaviours. Close behavioural links between parents and their offspring are maintained by various mutations, such as partial training and node splitting. EPNet evolves ANN's architectures and connection weights (including biases 1 ) simultaneously in order to reduce the noise in fitness evaluation. The parsimony of evolved ANNs is encouraged by preferring node/connection deletion to addition. EPNet has been tested on a number of benchmark problems in machine learning and ANNs, such as the parity problem, the medical diagnosis problems (bre...