Results 1 -
9 of
9
The analysis of decomposition methods for support vector machines
- IEEE Transactions on Neural Networks
, 1999
"... Abstract. The decomposition method is currently one of the major methods for solving support vector machines. An important issue of this method is the selection of working sets. In this paper through the design of decomposition methods for bound-constrained SVM formulations we demonstrate that the w ..."
Abstract
-
Cited by 79 (17 self)
- Add to MetaCart
Abstract. The decomposition method is currently one of the major methods for solving support vector machines. An important issue of this method is the selection of working sets. In this paper through the design of decomposition methods for bound-constrained SVM formulations we demonstrate that the working set selection is not a trivial task. Then from the experimental analysis we propose a simple selection of the working set which leads to faster convergences for difficult cases. Numerical experiments on different types of problems are conducted to demonstrate the viability of the proposed method.
Projected gradient methods for non-negative matrix factorization
- Neural Computation
, 2007
"... Non-negative matrix factorization (NMF) can be formulated as a minimiza-tion problem with bound constraints. Although bound-constrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two pro ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
Non-negative matrix factorization (NMF) can be formulated as a minimiza-tion problem with bound constraints. Although bound-constrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two projected gradient methods for NMF, both of which exhibit strong optimization properties. We discuss efficient implementations and demonstrate that one of the proposed methods converges faster than the popular multiplicative update approach. A simple MATLAB code is also provided. 1
A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods
, 2003
"... The sigmoid kernel was quite popular for support vector machines due to its origin from neural networks. However, as the kernel matrix may not be positive semidefinite (PSD), it is not widely used and the behavior is unknown. In this paper, we analyze such non-PSD kernels through the point of view o ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
The sigmoid kernel was quite popular for support vector machines due to its origin from neural networks. However, as the kernel matrix may not be positive semidefinite (PSD), it is not widely used and the behavior is unknown. In this paper, we analyze such non-PSD kernels through the point of view of separability. Based on the investigation of parameters in different ranges, we show that for some parameters, the kernel matrix is conditionally positive definite (CPD), a property which explains its practical viability. Experiments are given to illustrate our analysis. Finally, we discuss how to solve the non-convex dual problems by SMO-type decomposition methods. Suitable modifications for any symmetric non-PSD kernel matrices are proposed with convergence proofs.
Trust region Newton method for large-scale logistic regression
- In Proceedings of the 24th International Conference on Machine Learning (ICML
, 2007
"... Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also compare it with existing linear SVM implementations. 1
A Study on Reduced Support Vector Machines
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2003
"... Recently the Reduced Support Vector Machine (RSVM) was proposed as an alternate of the standard SVM. Motivated by resolving the difficulty on handling large data sets using SVM with nonlinear kernels, it preselects a subset of data as support vectors and solves a smaller optimization problem. How ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
Recently the Reduced Support Vector Machine (RSVM) was proposed as an alternate of the standard SVM. Motivated by resolving the difficulty on handling large data sets using SVM with nonlinear kernels, it preselects a subset of data as support vectors and solves a smaller optimization problem. However, several issues of its practical use have not been fully discussed yet. For example, we do not know if it possesses comparable generalization ability as the standard SVM. In addition, we would like to see for how large problems RSVM outperforms SVM on training time. In this paper we show that the RSVM formulation is already in a form of linear SVM and discuss four RSVM implementations. Experiments indicate that in general the test accuracy of RSVM are a little lower than that of the standard SVM. In addition, for problems with up to tens of thousands of data, if the percentage of support vectors is not high, existing implementations for SVM is quite competitive on the training time. Thus, from this empirical study, RSVM will be mainly useful for either larger problems or those with many support vectors. Experiments in this paper also serve as comparisons of (1) different implementations for linear SVM; and (2) standard SVM using linear and quadratic cost functions.
Analysis of Nonstationary Time Series Using Support Vector Machines
- Pattern Recognition with Support Vector Machines | First International Workshop, SVM 2002, LNCS 2388
, 2002
"... Time series from alternating dynamics have many important applications. In [5], the authors propose an approach to solve the drifting dynamics. Their method directly solves a non-convex optimization problem. In this paper, we propose a strategy which solves a sequence of convex optimization prob ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Time series from alternating dynamics have many important applications. In [5], the authors propose an approach to solve the drifting dynamics. Their method directly solves a non-convex optimization problem. In this paper, we propose a strategy which solves a sequence of convex optimization problems by using modi ed support vector regression.
A Comparison of Optimization Methods for Large-scale L1-regularized Linear Classification
"... Large-scale linear classification is widely used in many areas. The L1-regularized form can be applied for feature selection, but its non-differentiability causes more difficulties in training. Various optimization methods have been proposed in recent years, but no serious comparison among them has ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Large-scale linear classification is widely used in many areas. The L1-regularized form can be applied for feature selection, but its non-differentiability causes more difficulties in training. Various optimization methods have been proposed in recent years, but no serious comparison among them has been made. In this paper, we discuss several state of the art methods and propose two new implementations. We then conduct a comprehensive comparison. Results show that decomposition methods, in particular coordinate descent methods, are very suitable for training large document data.
Reduction techniques for training support vector machines
, 2002
"... Recently two kinds of reduction techniques which aimed at saving training time for SVM problems with nonlinear kernels were proposed. Instead of solving the standard SVM formulation, these methods explicitly alter the SVM formulation, and solutions for them are used to classify data. The first appro ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Recently two kinds of reduction techniques which aimed at saving training time for SVM problems with nonlinear kernels were proposed. Instead of solving the standard SVM formulation, these methods explicitly alter the SVM formulation, and solutions for them are used to classify data. The first approach, reduced support vector machine (RSVM) [21], preselects a subset of data as support vectors and solves a smaller optimization problem. The second approach [11] uses imcomplete Cholesky factorization (ICF) to obtain a low-rank approximation of the kernel matrix. Therefore, an easier optimization problem is obtained. We find that several issues of their practical uses have not been fully discussed yet. For example, we do not know if they possess comparable generalization ability as the standard SVM. In addition, we would like to see for how large problems they outperform SVM on training time. In this thesis we show that the formulation of each technique is already in a form of linear SVM and discuss several suitable implementations. Experiments indicate that in general the test accuracy of both techniques is a little lower than that of the standard SVM. In addition, for problems with up to tens of thousands of data, if the percentage of support vectors is not high, existing implementations for SVM is quite competitive on the training time. Thus, the two techniques will be mainly useful for either larger problems or those with many support vectors. Experiments in this thesis also serve as comparisons of (1) different implementations for linear SVM; (2) standard SVM using linear and quadratic cost functions; and (3) two ICF algorithms for positive definite dense matrices. ii
Radius Margin Bounds for Support Vector . . .
- NEURAL COMPUTATION
, 2003
"... An important approach for ecient support vector machine (SVM) model selection is to use differentiable bounds of the leave-one-out (loo) error. Past efforts focused on finding tight bounds of loo, for example, radius margin bounds, span bounds, etc. However, their practical viability is still not ve ..."
Abstract
- Add to MetaCart
An important approach for ecient support vector machine (SVM) model selection is to use differentiable bounds of the leave-one-out (loo) error. Past efforts focused on finding tight bounds of loo, for example, radius margin bounds, span bounds, etc. However, their practical viability is still not very satisfactory. In (Duan et al., 2003), it has been shown that radius margin bound gives good prediction for L2-SVM. In this paper, through the analyses why this bound performs well for L2-SVM, we show that finding a bound whose minima are in a region with small loo values may be more important than its tightness. Based on this principle we propose modi ed radius margin bounds for L1-SVM where the original bound is only applicable to the hard-margin case. Our modi cation for L1-SVM achieves comparable performance to L2-SVM. To study whether L1- or L2-SVM should be used, we further analyze other properties such as their differentiability, number of support vectors, and number of free support vectors. In this aspect, L1-SVM possesses the advantage of having fewer support vectors. Their implementations are also dierent so we discuss related issues in detail.

