Results 1  10
of
13
Use of the ZeroNorm With Linear Models and Kernel Methods
, 2002
"... We explore the use of the socalled zeronorm of the parameters of linear models in learning. ..."
Abstract

Cited by 115 (4 self)
 Add to MetaCart
We explore the use of the socalled zeronorm of the parameters of linear models in learning.
Proximal support vector machine classifiers
 Proceedings KDD2001: Knowledge Discovery and Data Mining
, 2001
"... Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the ..."
Abstract

Cited by 109 (14 self)
 Add to MetaCart
Abstract—A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the other data set. Each of the two nonparallel proximal planes is obtained by a single MATLAB command as the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem. Classification by proximity to two distinct nonlinear surfaces generated by a nonlinear kernel also leads to two simple generalized eigenvalue problems. The effectiveness of the proposed method is demonstrated by tests on simple examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation time and test set correctness. Index Terms—Support vector machines, proximal classification, generalized eigenvalues. 1
Training Support Vector Machine using Adaptive Clustering
 in Proc. of the 4th SIAM International Conference on Data Mining, Lake Buena
, 2004
"... Training support vector machines involves a huge optimization problem and many specially designed algorithms have been proposed. In this paper, we proposed an algorithm called ClusterSVM that accelerates the training process by exploiting the distributional properties of the training data, that is, ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
Training support vector machines involves a huge optimization problem and many specially designed algorithms have been proposed. In this paper, we proposed an algorithm called ClusterSVM that accelerates the training process by exploiting the distributional properties of the training data, that is, the natural clustering of the training data and the overall layout of these clusters relative to the decision boundary of support vector machines. The proposed algorithm first partitions the training data into several pairwise disjoint clusters. Then, the representatives of these clusters are used to train an initial support vector machine, based on which we can approximately identify the support vectors and nonsupport vectors. After replacing the cluster containing only nonsupport vectors with its representative, the number of training data can be significantly reduced, thereby speeding up the training process. The proposed ClusterSVM has been tested against the popular training algorithm SMO on both the artificial data and the real data, and a significant speedup was observed. The complexity of ClusterSVM scales with the square of the number of support vectors and, after a further improvement, it is expected that it will scale with square of the number of nonboundary support vectors.
Nonparametric quantile estimation
, 2006
"... In regression, the desired estimate of yx is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of yx, will be below the estimate. For τ = 0.5 this is an estimate of the median. What ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
In regression, the desired estimate of yx is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of yx, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. 1.
Multiverse recommendation: ndimensional tensor factorization for contextaware collaborative filtering
 In Proceedings of the fourth ACM conference on Recommender systems
, 2010
"... Context has been recognized as an important factor to consider in personalized Recommender Systems. However, most modelbased Collaborative Filtering approaches such as Matrix Factorization do not provide a straightforward way of integrating context information into the model. In this work, we intro ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
Context has been recognized as an important factor to consider in personalized Recommender Systems. However, most modelbased Collaborative Filtering approaches such as Matrix Factorization do not provide a straightforward way of integrating context information into the model. In this work, we introduce a Collaborative Filtering method based on Tensor Factorization, a generalization of Matrix Factorization that allows for a flexible and generic integration of contextual information by modeling the data as a UserItemContext Ndimensional tensor instead of the traditional 2D UserItem matrix. In the proposed model, called Multiverse Recommendation, different types of context are considered as additional dimensions in the representation of the data as a tensor. The factorization of this tensor leads to a compact model of the data which can be used to provide contextaware recommendations. We provide an algorithm to address the Ndimensional factorization, and show that the Multiverse Recommendation improves upon noncontextual Matrix Factorization up to 30 % in terms of the Mean Absolute Error (MAE). We also compare to two stateoftheart contextaware methods and show that Tensor Factorization consistently outperforms them both in semisynthetic and realworld data – improvements range from 2.5 % to more than 12 % depending on the data. Noticeably, our approach outperforms other methods by a wider margin whenever more contextual information is available.
Sparse Metric Learning via Smooth Optimization
"... In this paper we study the problem of learning a lowrank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixednorm regularization which is nonconvex. We then s ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
In this paper we study the problem of learning a lowrank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixednorm regularization which is nonconvex. We then show that it can be equivalently formulated as a convex saddle (minmax) problem. From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets. 1
Multiclass Classification with MultiPrototype Support Vector Machines
 Journal of Machine Learning Research
, 2005
"... Winnertakeall multiclass classifiers are built on the top of a set of prototypes each representing one of the available classes. A pattern is then classified with the label associated to the most `similar' prototype. Recent proposal of SVM extensions to multiclass can be considered instances of ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Winnertakeall multiclass classifiers are built on the top of a set of prototypes each representing one of the available classes. A pattern is then classified with the label associated to the most `similar' prototype. Recent proposal of SVM extensions to multiclass can be considered instances of the same strategy with one prototype per class.
Simpler knowledgebased support vector machines
 In ICML
, 2006
"... If appropriately used, prior knowledge can significantly improve the predictive accuracy of learning algorithms or reduce the amount of training data needed. In this paper we introduce a simple method to incorporate prior knowledge in support vector machines by modifying the hypothesis space rather ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
If appropriately used, prior knowledge can significantly improve the predictive accuracy of learning algorithms or reduce the amount of training data needed. In this paper we introduce a simple method to incorporate prior knowledge in support vector machines by modifying the hypothesis space rather than the optimization problem. The optimization problem is amenable to solution by the constrained concave convex procedure, which finds a local optimum. The paper discusses different kinds of prior knowledge and demonstrates the applicability of the approach in some characteristic experiments. 1.
MODEL BUILDING WITH LIKELIHOOD BASIS PURSUIT
, 2004
"... We consider a nonparametric penalized likelihood approach for model building called likelihood basis pursuit (LBP) that determines the probabilities of binary outcomes given explanatory vectors while automatically selecting important features. The LBP model involves parameters that balance the com ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider a nonparametric penalized likelihood approach for model building called likelihood basis pursuit (LBP) that determines the probabilities of binary outcomes given explanatory vectors while automatically selecting important features. The LBP model involves parameters that balance the competing goals of maximizing the loglikelihood and minimizing the penalized basis pursuit terms. These parameters are selected to minimize a proxy of misclassification error, namely, the randomized, generalized approximate cross validation (ranGACV) function. The ranGACV function is not easily represented in compact form; its functional values can only be obtained by solving two instances of the LBP model, which may be computationally expensive. A grid search is typically used to find appropriate parameters, requiring the solutions to hundreds or thousands of instances of the LBP model. Since only parameters (data) are changed between solves, the resulting problem is a nonlinear slice model in the parameter space. We show how slicemodeling techniques significantly improve the efficiency of individual solves and thus speedup the grid search. In addition, we consider using derivativefree optimization algorithms for parameter selection, replacing the grid search. We show how, by seeding the derivativefree algorithms with a coarse grid search, these algorithms can find better solutions with fewer function evaluations. Our interest in this area comes directly from the seminal work that Olvi and his collaborators have carried out designing and applying optimization techniques to problems in machine learning and data mining.
and
, 2010
"... A Support Vector Machine (SV M) is a powerful classifier method, already used in many problems, which can be viewed as a convex optimization. In recent years, a considerable attention has been given on semisupervised learning, differing from traditional supervised learning by making use of unlabell ..."
Abstract
 Add to MetaCart
A Support Vector Machine (SV M) is a powerful classifier method, already used in many problems, which can be viewed as a convex optimization. In recent years, a considerable attention has been given on semisupervised learning, differing from traditional supervised learning by making use of unlabelled data. In fact, in many applications like text categorization, collecting labelled examples may cost large human efforts, while vast amounts of unlabelled data are often readily available offering some additional information. However, being expensive to pursue all the data labelled, Transductive Support Vector Machines (T SV M) were introduced when only a small fraction of them may be considered available to the learner. One difficulty we come across with T SV M formulation, is that it turns into a nonconvex optimization problem. Hence, several techniques have been proposed to solve it. Each technique has its own disadvantages the most important one being local minima sensitivity. As a result, experiments on T SV M sometimes perform worse than SV M. The performance of a classification algorithm can be measured through the error rate on the unlabelled points and evidences show that algorithm’s achievements depend on the selected data sets. Conversely, the global optimal solution may be disentangled bestowing a Branch and Bound (BB) technique able to solve the non convex problem across an implicit enumeration process. Besides, with this method the time complexity increases exponentially in the number of instances. In this paper, an original theoretical representation of a T SV M in terms of LPtype problem and violator space is given. Hence, an appealing and randomized method is presented with some experimental evidences, able to efficiently overcame some limitations of BB and extend its use to solve a T SV M. The method may exploits the sparsity property of a T SV M making possible the scaling of the time complexity, utterly given by the number of support vectors.