Results 1 -
9 of
9
Use of the Zero-Norm With Linear Models and Kernel Methods
, 2002
"... We explore the use of the so-called zero-norm of the parameters of linear models in learning. ..."
Abstract
-
Cited by 85 (4 self)
- Add to MetaCart
We explore the use of the so-called zero-norm of the parameters of linear models in learning.
Proximal support vector machine classifiers
- Proceedings KDD-2001: Knowledge Discovery and Data Mining
, 2001
"... Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the ..."
Abstract
-
Cited by 81 (11 self)
- Add to MetaCart
Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the other data set. Each of the two nonparallel proximal planes is obtained by a single MATLAB command as the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem. Classification by proximity to two distinct nonlinear surfaces generated by a nonlinear kernel also leads to two simple generalized eigenvalue problems. The effectiveness of the proposed method is demonstrated by tests on simple examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation time and test set correctness. Index Terms—Support vector machines, proximal classification, generalized eigenvalues. 1
Training Support Vector Machine using Adaptive Clustering
- in Proc. of the 4th SIAM International Conference on Data Mining, Lake Buena
, 2004
"... Training support vector machines involves a huge optimization problem and many specially designed algorithms have been proposed. In this paper, we proposed an algorithm called ClusterSVM that accelerates the training process by exploiting the distributional properties of the training data, that is, ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Training support vector machines involves a huge optimization problem and many specially designed algorithms have been proposed. In this paper, we proposed an algorithm called ClusterSVM that accelerates the training process by exploiting the distributional properties of the training data, that is, the natural clustering of the training data and the overall layout of these clusters relative to the decision boundary of support vector machines. The proposed algorithm first partitions the training data into several pair-wise disjoint clusters. Then, the representatives of these clusters are used to train an initial support vector machine, based on which we can approximately identify the support vectors and non-support vectors. After replacing the cluster containing only non-support vectors with its representative, the number of training data can be significantly reduced, thereby speeding up the training process. The proposed ClusterSVM has been tested against the popular training algorithm SMO on both the artificial data and the real data, and a significant speedup was observed. The complexity of ClusterSVM scales with the square of the number of support vectors and, after a further improvement, it is expected that it will scale with square of the number of non-boundary support vectors.
Nonparametric quantile estimation
- Journal of Machine Learning Research
, 2006
"... In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In regression, the desired estimate of y|x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y|x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints.
Multiclass Classification with Multi-Prototype Support Vector Machines
- Journal of Machine Learning Research
, 2005
"... Winner-take-all multiclass classifiers are built on the top of a set of prototypes each representing one of the available classes. A pattern is then classified with the label associated to the most `similar' prototype. Recent proposal of SVM extensions to multiclass can be considered instances of ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Winner-take-all multiclass classifiers are built on the top of a set of prototypes each representing one of the available classes. A pattern is then classified with the label associated to the most `similar' prototype. Recent proposal of SVM extensions to multiclass can be considered instances of the same strategy with one prototype per class.
Simpler knowledge-based support vector machines
- In ICML
, 2006
"... If appropriately used, prior knowledge can significantly improve the predictive accuracy of learning algorithms or reduce the amount of training data needed. In this paper we introduce a simple method to incorporate prior knowledge in support vector machines by modifying the hypothesis space rather ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
If appropriately used, prior knowledge can significantly improve the predictive accuracy of learning algorithms or reduce the amount of training data needed. In this paper we introduce a simple method to incorporate prior knowledge in support vector machines by modifying the hypothesis space rather than the optimization problem. The optimization problem is amenable to solution by the constrained concave convex procedure, which finds a local optimum. The paper discusses different kinds of prior knowledge and demonstrates the applicability of the approach in some characteristic experiments. 1.
MODEL BUILDING WITH LIKELIHOOD BASIS PURSUIT
, 2004
"... We consider a non-parametric penalized likelihood approach for model building called likelihood basis pursuit (LBP) that determines the probabilities of binary outcomes given explanatory vectors while automatically selecting important features. The LBP model involves parameters that balance the com ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We consider a non-parametric penalized likelihood approach for model building called likelihood basis pursuit (LBP) that determines the probabilities of binary outcomes given explanatory vectors while automatically selecting important features. The LBP model involves parameters that balance the competing goals of maximizing the log-likelihood and minimizing the penalized basis pursuit terms. These parameters are selected to minimize a proxy of misclassification error, namely, the randomized, generalized approximate cross validation (ranGACV) function. The ranGACV function is not easily represented in compact form; its functional values can only be obtained by solving two instances of the LBP model, which may be computationally expensive. A grid search is typically used to find appropriate parameters, requiring the solutions to hundreds or thousands of instances of the LBP model. Since only parameters (data) are changed between solves, the resulting problem is a nonlinear slice model in the parameter space. We show how slice-modeling techniques significantly improve the efficiency of individual solves and thus speed-up the grid search. In addition, we consider using derivative-free optimization algorithms for parameter selection, replacing the grid search. We show how, by seeding the derivative-free algorithms with a coarse grid search, these algorithms can find better solutions with fewer function evaluations. Our interest in this area comes directly from the seminal work that Olvi and his collaborators have carried out designing and applying optimization techniques to problems in machine learning and data mining.
Sparse Metric Learning via Smooth Optimization
"... In this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-norm regularization which is non-convex. We then s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-norm regularization which is non-convex. We then show that it can be equivalently formulated as a convex saddle (min-max) problem. From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets. 1
and
, 2010
"... A Support Vector Machine (SV M) is a powerful classifier method, already used in many problems, which can be viewed as a convex optimization. In recent years, a considerable attention has been given on semi-supervised learning, differing from traditional supervised learning by making use of unlabell ..."
Abstract
- Add to MetaCart
A Support Vector Machine (SV M) is a powerful classifier method, already used in many problems, which can be viewed as a convex optimization. In recent years, a considerable attention has been given on semi-supervised learning, differing from traditional supervised learning by making use of unlabelled data. In fact, in many applications like text categorization, collecting labelled examples may cost large human efforts, while vast amounts of unlabelled data are often readily available offering some additional information. However, being expensive to pursue all the data labelled, Transductive Support Vector Machines (T SV M) were introduced when only a small fraction of them may be considered available to the learner. One difficulty we come across with T SV M formulation, is that it turns into a non-convex optimization problem. Hence, several techniques have been proposed to solve it. Each technique has its own disadvantages the most important one being local minima sensitivity. As a result, experiments on T SV M sometimes perform worse than SV M. The performance of a classification algorithm can be measured through the error rate on the unlabelled points and evidences show that algorithm’s achievements depend on the selected data sets. Conversely, the global optimal solution may be disentangled bestowing a Branch and Bound (BB) technique able to solve the non convex problem across an implicit enumeration process. Besides, with this method the time complexity increases exponentially in the number of instances. In this paper, an original theoretical representation of a T SV M in terms of LP-type problem and violator space is given. Hence, an appealing and randomized method is presented with some experimental evidences, able to efficiently overcame some limitations of BB and extend its use to solve a T SV M. The method may exploits the sparsity property of a T SV M making possible the scaling of the time complexity, utterly given by the number of support vectors.

