Results 1 
9 of
9
Core vector machines: Fast SVM training on very large data sets
 Journal of Machine Learning Research
, 2005
"... Standard SVM training has O(m 3) time and O(m 2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel met ..."
Abstract

Cited by 81 (13 self)
 Add to MetaCart
Standard SVM training has O(m 3) time and O(m 2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel methods by exploiting such “approximateness ” in this paper. We first show that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry. Then, by adopting an efficient approximate MEB algorithm, we obtain provably approximately optimal solutions with the idea of core sets. Our proposed Core Vector Machine (CVM) algorithm can be used with nonlinear kernels and has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and realworld data sets demonstrate that the CVM is as accurate as existing SVM implementations, but is much faster and can handle much larger data sets than existing scaleup methods. For example, CVM with the Gaussian kernel produces superior results on the KDDCUP99 intrusion detection data, which has about five million training patterns, in only 1.4 seconds on a 3.2GHz Pentium–4 PC.
Cancer classification using gene expression data
 Information Systems
, 2003
"... The classification of different tumor types is of great importance in cancer diagnosis and drug discovery. However, most previous cancer classification studies are clinicalbased and have limited diagnostic ability. Cancer classification using gene expression data is known to contain the keys for ad ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
The classification of different tumor types is of great importance in cancer diagnosis and drug discovery. However, most previous cancer classification studies are clinicalbased and have limited diagnostic ability. Cancer classification using gene expression data is known to contain the keys for addressing the fundamental problems relating to cancer diagnosis and drug discovery. The recent advent of DNA microarray technique has made simultaneous monitoring of thousands of gene expressions possible. With this abundance of gene expression data, researchers have started to explore the possibilities of cancer classification using gene expression data. Quite a number of methods have been proposed in recent years with promising results. But there are still a lot of issues which need to be addressed and understood. In order to gain deep insight into the cancer classification problem, it is necessary to take a closer look at the problem, the proposed solutions and the related issues all together. In this survey paper, we present a comprehensive overview of various proposed cancer classification methods and evaluate them based on their computation time, classification accuracy and ability to reveal biologically meaningful gene information. We also introduce and evaluate various proposed gene selection methods which we believe should be an integral preprocessing step for cancer classification. In order to obtain a full picture of cancer classification, we also discuss several issues related to cancer classification, including the biological significance vs. statistical significance of a cancer classifier, the asymmetrical classification errors for cancer classifiers, and the gene contamination problem.
Generalized core vector machines
 IEEE Transactions on Neural Networks
, 2006
"... Abstract—Kernel methods, such as the support vector machine (SVM), are often formulated as quadratic programming (QP) problems. However, given training patterns, a naive implementation of the QP solver takes @ Q A training time and at least P A space. Hence, scaling up these QPs is a major stumbling ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract—Kernel methods, such as the support vector machine (SVM), are often formulated as quadratic programming (QP) problems. However, given training patterns, a naive implementation of the QP solver takes @ Q A training time and at least P A space. Hence, scaling up these QPs is a major stumbling block in applying kernel methods on very large data sets, and a replacement of the naive method for finding the QP solutions is highly desirable. Recently, by using approximation algorithms for the minimum enclosing ball (MEB) problem, we proposed the core vector machine (CVM) algorithm that is much faster and can handle much larger data sets than existing SVM implementations. However, the CVM can only be used with certain kernel functions and kernel methods. For example, the very popular support vector regression (SVR) cannot be used with the CVM. In this paper, we introduce the centerconstrained MEB problem and subsequently extend the CVM algorithm. The generalized CVM algorithm can now be used with any linear/nonlinear kernel and can also be applied to kernel methods such as SVR and the ranking SVM. Moreover, like the original CVM, its asymptotic time complexity is again linear in and its space complexity is independent of. Experiments show that the generalized CVM has comparable performance with stateoftheart SVM and SVR implementations, but is faster and produces fewer support vectors on very large data sets. Index Terms—Approximation algorithms, core vector machines (CVMs), kernel methods, minimum enclosing ball (MEB), quadratic programming, support vector machines (SVMs). I.
Scalable Kernel Systems
"... Kernelbased systems are currently very popular approaches to supervised learning. Unfortunately, the computational load for training kernelbased systems increases drastically with the number of training data points. Recently, a number of approximate methods for scaling kernelbased systems to larg ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Kernelbased systems are currently very popular approaches to supervised learning. Unfortunately, the computational load for training kernelbased systems increases drastically with the number of training data points. Recently, a number of approximate methods for scaling kernelbased systems to large data sets have been introduced. In this paper we investigate the relationship between three of those approaches and compare their performances experimentally. 1
Kernel rewards regression: An information efficient batch policy iteration approach
 In Proc. of the IASTED Conference on Artificial Intelligence and Applications
, 2006
"... We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Learning on continuous state domains. Our method is able to obtain very useful policies observing just a few state action transitions. It considers the Reinforcement Learning problem as a regression tas ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Learning on continuous state domains. Our method is able to obtain very useful policies observing just a few state action transitions. It considers the Reinforcement Learning problem as a regression task for which any appropriate technique may be applied. The use of kernel methods, e.g. the Support Vector Machine, enables the user to incorporate different types of structural prior knowledge about the state space by redefining the inner product. Furthermore KRR is a completely Offpolicy method. The observations may be constructed by any sufficiently exploring policy, even the fully random one. We tested the algorithm on three typical Reinforcement Learning benchmarks. Moreover we give a proof for the correctness of our model and an error bound for estimating the Qfunctions.
GECKO: A Method to Optimize Composite Kernels for Web Page Classification
 In Proc. of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI2004), pp.299–306
, 2004
"... Most of current researches on Web page classification focus on leveraging heterogeneous features such as plain text, hyperlinks and anchor texts in an effective and efficient way. Composite kernel method is one topic of interest among them. It first selects a bunch of initial kernels, each of which ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Most of current researches on Web page classification focus on leveraging heterogeneous features such as plain text, hyperlinks and anchor texts in an effective and efficient way. Composite kernel method is one topic of interest among them. It first selects a bunch of initial kernels, each of which is determined separately by a certain type of features. Then a classifier is trained based on a linear combination of these kernels. In this paper, we propose an effective way to optimize the linear combination of kernels. We proved that this problem is equivalent to solving a generalized eigenvalue problem. And the weight vector of the kernels is the eigenvector associated with the largest eigenvalue. A support vector machine (SVM) classifier is then trained based on this optimized combination of kernels. Our experiment on the WebKB dataset has shown the effectiveness of the proposed method. 1.
Concept boundary detection for speeding up svms
 in ICML '06: Proceedings of the 23rd international conference on Machine learning
, 2006
"... Support Vector Machines (SVMs) suffer from an O(n 2) training cost, where n denotes the number of training instances. In this paper, we propose an algorithm to select boundary instances as training data to substantially reduce n. Our proposed algorithm is motivated by the result of (Burges, 1999) th ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Support Vector Machines (SVMs) suffer from an O(n 2) training cost, where n denotes the number of training instances. In this paper, we propose an algorithm to select boundary instances as training data to substantially reduce n. Our proposed algorithm is motivated by the result of (Burges, 1999) that, removing nonsupport vectors from the training set does not change SVM training results. Our algorithm eliminates instances that are likely to be nonsupport vectors. In the conceptindependent preprocessing step of our algorithm, we prepare nearestneighbor lists for training instances. In the conceptspecific sampling step, we can then effectively select useful training data for each target concept. Empirical studies show our algorithm to be effective in reducing n, outperforming other competing downsampling algorithms without significantly compromising testing accuracy. 1.
Tree Decomposition for LargeScale SVM Problems: Experimental and Theoretical Results
, 2009
"... To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a data space and trains SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for largescale SVM training ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a data space and trains SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for largescale SVM training. First, it can classify some data points by its own means, thereby reducing the cost of SVM training applied to the remaining data points. Second, it is efficient for seeking the parameter values that maximize the validation accuracy, which helps maintain good test accuracy. Third, we can provide a generalization error bound for the classifier derived by the tree decomposition method. For experiment data sets whose size can be handled by current nonlinear, or kernelbased SVM training techniques, the proposed method can speed up the training by a factor of thousands, and still achieve comparable test accuracy.
Scalable Kernel Systems
"... Kernelbased systems are currently very popular approaches to supervised learning. Unfortunately, the computational load for training kernelbased systems increases drastically with the number of training data points. Recently, a number of approximate methods for scaling kernelbased systems to larg ..."
Abstract
 Add to MetaCart
Kernelbased systems are currently very popular approaches to supervised learning. Unfortunately, the computational load for training kernelbased systems increases drastically with the number of training data points. Recently, a number of approximate methods for scaling kernelbased systems to large data sets have been introduced. In this paper we investigate the relationship between three of those approaches and compare their performances experimentally. 1